Source author record

Vasant Honavar

Vasant Honavar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Formal Languages and Automata Theory

Catalog footprint

What is connected

8works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Projection-Free Transformers via Gaussian Kernel Attention

Self-attention in Transformers is typically implemented as $\mathrm{softmax}(QK^\top/\sqrt{d})V$, where $Q=XW_Q$, $K=XW_K$, and $V=XW_V$ are learned linear projections of the input $X$. We ask whether these learned projections are necessary, or whether they can be replaced by a simpler similarity-based diffusion operator. We introduce \textbf{Gaussian Kernel Attention} (GKA), a drop-in replacement for dot-product attention that computes token affinities directly using a Gaussian radial basis function (RBF) kernel applied to per-head token features. Each head learns only a bandwidth parameter $σ_h$, while a single output projection $W_O$ preserves compatibility with the standard Transformer interface. GKA can be interpreted as normalized kernel regression over tokens, linking modern Transformer architectures to classical non-local filtering and kernel smoothing methods. We evaluate GKA in both vision and language modeling settings. For autoregressive language modeling within the \texttt{nanochat} framework, we implement causal masking and sliding-window constraints by masking and renormalizing the Gaussian kernel. At depth 20, a GKA model with $0.42\times$ the parameters and $0.49\times$ the total training FLOPs of a standard attention baseline trains stably, exhibits a near-zero train-validation gap, and demonstrates competitive behavior on standard benchmarks, albeit with higher bits-per-byte (BPB) at this compute scale. Overall, GKA provides a minimal, interpretable attention mechanism with an explicit locality scale, offering a dimension in the accuracy-efficiency trade-off for Transformer design.

preprint2020arXiv

A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution

With the increasing adoption of predictive models trained using machine learning across a wide range of high-stakes applications, e.g., health care, security, criminal justice, finance, and education, there is a growing need for effective techniques for explaining such models and their predictions. We aim to address this problem in settings where the predictive model is a black box; That is, we can only observe the response of the model to various inputs, but have no knowledge about the internal structure of the predictive model, its parameters, the objective function, and the algorithm used to optimize the model. We reduce the problem of interpreting a black box predictive model to that of estimating the causal effects of each of the model inputs on the model output, from observations of the model inputs and the corresponding outputs. We estimate the causal effects of model inputs on model output using variants of the Rubin Neyman potential outcomes framework for estimating causal effects from observational data. We show how the resulting causal attribution of responsibility for model output to the different model inputs can be used to interpret the predictive model and to explain its predictions. We present results of experiments that demonstrate the effectiveness of our approach to the interpretation of black box predictive models via causal attribution in the case of deep neural network models trained on one synthetic data set (where the input variables that impact the output variable are known by design) and two real-world data sets: Handwritten digit classification, and Parkinson's disease severity prediction. Because our approach does not require knowledge about the predictive model algorithm and is free of assumptions regarding the black box predictive model except that its input-output responses be observable, it can be applied, in principle, to any black box predictive model.

preprint2015arXiv

CRISNER: A Practically Efficient Reasoner for Qualitative Preferences

We present CRISNER (Conditional & Relative Importance Statement Network PrEference Reasoner), a tool that provides practically efficient as well as exact reasoning about qualitative preferences in popular ceteris paribus preference languages such as CP-nets, TCP-nets, CP-theories, etc. The tool uses a model checking engine to translate preference specifications and queries into appropriate Kripke models and verifiable properties over them respectively. The distinguishing features of the tool are: (1) exact and provably correct query answering for testing dominance, consistency with respect to a preference specification, and testing equivalence and subsumption of two sets of preferences; (2) automatic generation of proofs evidencing the correctness of answer produced by CRISNER to any of the above queries; (3) XML inputs and outputs that make it portable and pluggable into other applications. We also describe the extensible architecture of CRISNER, which can be extended to new reference formalisms based on ceteris paribus semantics that may be developed in the future.

preprint2015arXiv

Lifted Representation of Relational Causal Models Revisited: Implications for Reasoning and Structure Learning

Maier et al. (2010) introduced the relational causal model (RCM) for representing and inferring causal relationships in relational data. A lifted representation, called abstract ground graph (AGG), plays a central role in reasoning with and learning of RCM. The correctness of the algorithm proposed by Maier et al. (2013a) for learning RCM from data relies on the soundness and completeness of AGG for relational d-separation to reduce the learning of an RCM to learning of an AGG. We revisit the definition of AGG and show that AGG, as defined in Maier et al. (2013b), does not correctly abstract all ground graphs. We revise the definition of AGG to ensure that it correctly abstracts all ground graphs. We further show that AGG representation is not complete for relational d-separation, that is, there can exist conditional independence relations in an RCM that are not entailed by AGG. A careful examination of the relationship between the lack of completeness of AGG for relational d-separation and faithfulness conditions suggests that weaker notions of completeness, namely adjacency faithfulness and orientation faithfulness between an RCM and its AGG, can be used to learn an RCM from data.

preprint2014arXiv

Efficient Markov Network Structure Discovery Using Independence Tests

We present two algorithms for learning the structure of a Markov network from data: GSMN* and GSIMN. Both algorithms use statistical independence tests to infer the structure by successively constraining the set of structures consistent with the results of these tests. Until very recently, algorithms for structure learning were based on maximum likelihood estimation, which has been proved to be NP-hard for Markov networks due to the difficulty of estimating the parameters of the network, needed for the computation of the data likelihood. The independence-based approach does not require the computation of the likelihood, and thus both GSMN* and GSIMN can compute the structure efficiently (as shown in our experiments). GSMN* is an adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN* by additionally exploiting Pearls well-known properties of the conditional independence relation to infer novel independences from known ones, thus avoiding the performance of statistical tests to estimate them. To accomplish this efficiently GSIMN uses the Triangle theorem, also introduced in this work, which is a simplified version of the set of Markov axioms. Experimental comparisons on artificial and real-world data sets show GSIMN can yield significant savings with respect to GSMN*, while generating a Markov network with comparable or in some cases improved quality. We also compare GSIMN to a forward-chaining implementation, called GSIMN-FCH, that produces all possible conditional independences resulting from repeatedly applying Pearls theorems on the known conditional independence tests. The results of this comparison show that GSIMN, by the sole use of the Triangle theorem, is nearly optimal in terms of the set of independences tests that it infers.

preprint2014arXiv

Representing and Reasoning with Qualitative Preferences for Compositional Systems

Many applications, e.g., Web service composition, complex system design, team formation, etc., rely on methods for identifying collections of objects or entities satisfying some functional requirement. Among the collections that satisfy the functional requirement, it is often necessary to identify one or more collections that are optimal with respect to user preferences over a set of attributes that describe the non-functional properties of the collection. We develop a formalism that lets users express the relative importance among attributes and qualitative preferences over the valuations of each attribute. We define a dominance relation that allows us to compare collections of objects in terms of preferences over attributes of the objects that make up the collection. We establish some key properties of the dominance relation. In particular, we show that the dominance relation is a strict partial order when the intra-attribute preference relations are strict partial orders and the relative importance preference relation is an interval order. We provide algorithms that use this dominance relation to identify the set of most preferred collections. We show that under certain conditions, the algorithms are guaranteed to return only (sound), all (complete), or at least one (weakly complete) of the most preferred collections. We present results of simulation experiments comparing the proposed algorithms with respect to (a) the quality of solutions (number of most preferred solutions) produced by the algorithms, and (b) their performance and efficiency. We also explore some interesting conjectures suggested by the results of our experiments that relate the properties of the user preferences, the dominance relation, and the algorithms.

preprint2013arXiv

Causal Transportability of Experiments on Controllable Subsets of Variables: z-Transportability

We introduce z-transportability, the problem of estimating the causal effect of a set of variables X on another set of variables Y in a target domain from experiments on any subset of controllable variables Z where Z is an arbitrary subset of observable variables V in a source domain. z-Transportability generalizes z-identifiability, the problem of estimating in a given domain the causal effect of X on Y from surrogate experiments on a set of variables Z such that Z is disjoint from X;. z-Transportability also generalizes transportability which requires that the causal effect of X on Y in the target domain be estimable from experiments on any subset of all observable variables in the source domain. We first generalize z-identifiability to allow cases where Z is not necessarily disjoint from X. Then, we establish a necessary and sufficient condition for z-transportability in terms of generalized z-identifiability and transportability. We provide a correct and complete algorithm that determines whether a causal effect is z-transportable; and if it is, produces a transport formula, that is, a recipe for estimating the causal effect of X on Y in the target domain using information elicited from the results of experimental manipulations of Z in the source domain and observational data from the target domain. Our results also show that do-calculus is complete for z-transportability.

preprint2011arXiv

Abstraction Super-structuring Normal Forms: Towards a Theory of Structural Induction

Induction is the process by which we obtain predictive laws or theories or models of the world. We consider the structural aspect of induction. We answer the question as to whether we can find a finite and minmalistic set of operations on structural elements in terms of which any theory can be expressed. We identify abstraction (grouping similar entities) and super-structuring (combining topologically e.g., spatio-temporally close entities) as the essential structural operations in the induction process. We show that only two more structural operations, namely, reverse abstraction and reverse super-structuring (the duals of abstraction and super-structuring respectively) suffice in order to exploit the full power of Turing-equivalent generative grammars in induction. We explore the implications of this theorem with respect to the nature of hidden variables, radical positivism and the 2-century old claim of David Hume about the principles of connexion among ideas.