Source author record

Parag Singla

Parag Singla appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computation and Language Computer Vision Information Retrieval Logic in Computer Science physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

12works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

A Solver-Free Framework for Scalable Learning in Neural ILP Architectures

There is a recent focus on designing architectures that have an Integer Linear Programming (ILP) layer within a neural model (referred to as Neural ILP in this paper). Neural ILP architectures are suitable for pure reasoning tasks that require data-driven constraint learning or for tasks requiring both perception (neural) and reasoning (ILP). A recent SOTA approach for end-to-end training of Neural ILP explicitly defines gradients through the ILP black box (Paulus et al. 2021) - this trains extremely slowly, owing to a call to the underlying ILP solver for every training data point in a minibatch. In response, we present an alternative training strategy that is solver-free, i.e., does not call the ILP solver at all at training time. Neural ILP has a set of trainable hyperplanes (for cost and constraints in ILP), together representing a polyhedron. Our key idea is that the training loss should impose that the final polyhedron separates the positives (all constraints satisfied) from the negatives (at least one violated constraint or a suboptimal cost value), via a soft-margin formulation. While positive example(s) are provided as part of the training data, we devise novel techniques for generating negative samples. Our solution is flexible enough to handle equality as well as inequality constraints. Experiments on several problems, both perceptual as well as symbolic, which require learning the constraints of an ILP, show that our approach has superior performance and scales much better compared to purely neural baselines and other state-of-the-art models that require solver-based training. In particular, we are able to obtain excellent performance in 9 x 9 symbolic and visual sudoku, to which the other Neural ILP solver is not able to scale.

preprint2022arXiv

CEAR: Cross-Entity Aware Reranker for Knowledge Base Completion

Pre-trained language models (LMs) like BERT have shown to store factual knowledge about the world. This knowledge can be used to augment the information present in Knowledge Bases, which tend to be incomplete. However, prior attempts at using BERT for task of Knowledge Base Completion (KBC) resulted in performance worse than embedding based techniques that rely only on the graph structure. In this work we develop a novel model, Cross-Entity Aware Reranker (CEAR), that uses BERT to re-rank the output of existing KBC models with cross-entity attention. Unlike prior work that scores each entity independently, CEAR uses BERT to score the entities together, which is effective for exploiting its factual knowledge. CEAR achieves a new state of art for the OLPBench dataset.

preprint2022arXiv

Neural Models for Output-Space Invariance in Combinatorial Problems

Recently many neural models have been proposed to solve combinatorial puzzles by implicitly learning underlying constraints using their solved instances, such as sudoku or graph coloring (GCP). One drawback of the proposed architectures, which are often based on Graph Neural Networks (GNN), is that they cannot generalize across the size of the output space from which variables are assigned a value, for example, set of colors in a GCP, or board-size in sudoku. We call the output space for the variables as 'value-set'. While many works have demonstrated generalization of GNNs across graph size, there has been no study on how to design a GNN for achieving value-set invariance for problems that come from the same domain. For example, learning to solve 16 x 16 sudoku after being trained on only 9 x 9 sudokus. In this work, we propose novel methods to extend GNN based architectures to achieve value-set invariance. Specifically, our model builds on recently proposed Recurrent Relational Networks. Our first approach exploits the graph-size invariance of GNNs by converting a multi-class node classification problem into a binary node classification problem. Our second approach works directly with multiple classes by adding multiple nodes corresponding to the values in the value-set, and then connecting variable nodes to value nodes depending on the problem initialization. Our experimental evaluation on three different combinatorial problems demonstrates that both our models perform well on our novel problem, compared to a generic neural reasoner. Between two of our models, we observe an inherent trade-off: while the binarized model gives better performance when trained on smaller value-sets, multi-valued model is much more memory efficient, resulting in improved performance when trained on larger value-sets, where binarized model fails to train.

preprint2022arXiv

PARE: A Simple and Strong Baseline for Monolingual and Multilingual Distantly Supervised Relation Extraction

Neural models for distantly supervised relation extraction (DS-RE) encode each sentence in an entity-pair bag separately. These are then aggregated for bag-level relation prediction. Since, at encoding time, these approaches do not allow information to flow from other sentences in the bag, we believe that they do not utilize the available bag data to the fullest. In response, we explore a simple baseline approach (PARE) in which all sentences of a bag are concatenated into a passage of sentences, and encoded jointly using BERT. The contextual embeddings of tokens are aggregated using attention with the candidate relation as query -- this summary of whole passage predicts the candidate relation. We find that our simple baseline solution outperforms existing state-of-the-art DS-RE models in both monolingual and multilingual DS-RE datasets.

preprint2020arXiv

Advances in Symmetry Breaking for SAT Modulo Theories

Symmetry breaking is a popular technique to reduce the search space for SAT solving by exploiting the underlying symmetry over variables and clauses in a formula. The key idea is to first identify sets of assignments which fall in the same symmetry class, and then impose ordering constraints, called Symmetry Breaking Predicates (SBPs), such that only one (or a small subset) of these assignments is allowed to be a solution of the original SAT formula. While this technique has been exploited extensively in the SAT literature, there is little work on using symmetry breaking for SAT Modulo Theories (SMT). In SMT, logical constraints in SAT theories are combined with another set of theory operations defined over non-Boolean variables such as integers, reals, etc. SMT solvers typically use a combination of SAT solving techniques augmented with calls to the theory solver. In this work, we take up the advances in SAT symmetry breaking and apply them to the domain of SMT. Our key technical contribution is the formulation of symmetry breaking over the Boolean skeleton variables, which are placeholders for actual theory operations in SMT solving. These SBPs are then applied over the SAT solving part of the SMT solver. We implement our SBP ideas on top of CVC4, which is a state-of-the-art SMT solver. Our approach can result in significantly faster solutions on several benchmark problems compared to the state-of-the-art. Our final solver is a hybrid of the original CVC4 solver, and an SBP based solver, and can solve up to 3.8% and 3.1% more problems in the QF_NIA category of 2018 and 2019 SMT benchmarks, respectively, compared to CVC4, the top performer in this category.

preprint2020arXiv

Large Scale Question Answering using Tourism Data

We introduce the novel task of answering entity-seeking recommendation questions using a collection of reviews that describe candidate answer entities. We harvest a QA dataset that contains 47,124 paragraph-sized real user questions from travelers seeking recommendations for hotels, attractions and restaurants. Each question can have thousands of candidate answers to choose from and each candidate is associated with a collection of unstructured reviews. This dataset is especially challenging because commonly used neural architectures for reasoning and QA are prohibitively expensive for a task of this scale. As a solution, we design a scalable cluster-select-rerank approach. It first clusters text for each entity to identify exemplar sentences describing an entity. It then uses a scalable neural information retrieval (IR) module to select a set of potential entities from the large candidate set. A reranker uses a deeper attention-based architecture to pick the best answers from the selected entities. This strategy performs better than a pure IR or a pure attention-based reasoning approach yielding nearly 25% relative improvement in Accuracy@3 over both approaches.

preprint2020arXiv

MaskAAE: Latent space optimization for Adversarial Auto-Encoders

The field of neural generative models is dominated by the highly successful Generative Adversarial Networks (GANs) despite their challenges, such as training instability and mode collapse. Auto-Encoders (AE) with regularized latent space provide an alternative framework for generative models, albeit their performance levels have not reached that of GANs. In this work, we hypothesise that the dimensionality of the AE model's latent space has a critical effect on the quality of generated data. Under the assumption that nature generates data by sampling from a "true" generative latent space followed by a deterministic function, we show that the optimal performance is obtained when the dimensionality of the latent space of the AE-model matches with that of the "true" generative latent space. Further, we propose an algorithm called the Mask Adversarial Auto-Encoder (MaskAAE), in which the dimensionality of the latent space of an adversarial auto encoder is brought closer to that of the "true" generative latent space, via a procedure to mask the spurious latent dimensions. We demonstrate through experiments on synthetic and several real-world datasets that the proposed formulation yields betterment in the generation quality.

preprint2020arXiv

To Regularize or Not To Regularize? The Bias Variance Trade-off in Regularized AEs

Regularized Auto-Encoders (RAEs) form a rich class of neural generative models. They effectively model the joint-distribution between the data and the latent space using an Encoder-Decoder combination, with regularization imposed in terms of a prior over the latent space. Despite their advantages, such as stability in training, the performance of AE based models has not reached the superior standards as that of the other generative models such as Generative Adversarial Networks (GANs). Motivated by this, we examine the effect of the latent prior on the generation quality of deterministic AE models in this paper. Specifically, we consider the class of RAEs with deterministic Encoder-Decoder pairs, Wasserstein Auto-Encoders (WAE), and show that having a fixed prior distribution, \textit{a priori}, oblivious to the dimensionality of the `true' latent space, will lead to the infeasibility of the optimization problem considered. Further, we show that, in the finite data regime, despite knowing the correct latent dimensionality, there exists a bias-variance trade-off with any arbitrary prior imposition. As a remedy to both the issues mentioned above, we introduce an additional state space in the form of flexibly learnable latent priors, in the optimization objective of the WAEs. We implicitly learn the distribution of the latent prior jointly with the AE training, which not only makes the learning objective feasible but also facilitates operation on different points of the bias-variance curve. We show the efficacy of our model, called FlexAE, through several experiments on multiple datasets, and demonstrate that it is the new state-of-the-art for the AE based generative models.

preprint2016arXiv

Contextual Symmetries in Probabilistic Graphical Models

An important approach for efficient inference in probabilistic graphical models exploits symmetries among objects in the domain. Symmetric variables (states) are collapsed into meta-variables (meta-states) and inference algorithms are run over the lifted graphical model instead of the flat one. Our paper extends existing definitions of symmetry by introducing the novel notion of contextual symmetry. Two states that are not globally symmetric, can be contextually symmetric under some specific assignment to a subset of variables, referred to as the context variables. Contextual symmetry subsumes previous symmetry definitions and can rep resent a large class of symmetries not representable earlier. We show how to compute contextual symmetries by reducing it to the problem of graph isomorphism. We extend previous work on exploiting symmetries in the MCMC framework to the case of contextual symmetries. Our experiments on several domains of interest demonstrate that exploiting contextual symmetries can result in significant computational gains.

preprint2016arXiv

Lifted Region-Based Belief Propagation

Due to the intractable nature of exact lifted inference, research has recently focused on the discovery of accurate and efficient approximate inference algorithms in Statistical Relational Models (SRMs), such as Lifted First-Order Belief Propagation. FOBP simulates propositional factor graph belief propagation without constructing the ground factor graph by identifying and lifting over redundant message computations. In this work, we propose a generalization of FOBP called Lifted Generalized Belief Propagation, in which both the region structure and the message structure can be lifted. This approach allows more of the inference to be performed intra-region (in the exact inference step of BP), thereby allowing simulation of propagation on a graph structure with larger region scopes and fewer edges, while still maintaining tractability. We demonstrate that the resulting algorithm converges in fewer iterations to more accurate results on a variety of SRMs.

preprint2015arXiv

On the Role of Conductance, Geography and Topology in Predicting Hashtag Virality

We focus on three aspects of the early spread of a hashtag in order to predict whether it will go viral: the network properties of the subset of users tweeting the hashtag, its geographical properties, and, most importantly, its conductance-related properties. One of our significant contributions is to discover the critical role played by the conductance based features for the successful prediction of virality. More specifically, we show that the first derivative of the conductance gives an early indication of whether the hashtag is going to go viral or not. We present a detailed experimental evaluation of the effect of our various categories of features on the virality prediction task. When compared to the baselines and the state of the art techniques proposed in the literature our feature set is able to achieve significantly better accuracy on a large dataset of 7.7 million users and all their tweets over a period of month, as well as on existing datasets.

preprint2012arXiv

Markov Logic in Infinite Domains

Combining first-order logic and probability has long been a goal of AI. Markov logic (Richardson & Domingos, 2006) accomplishes this by attaching weights to first-order formulas and viewing them as templates for features of Markov networks. Unfortunately, it does not have the full power of first-order logic, because it is only defined for finite domains. This paper extends Markov logic to infinite domains, by casting it in the framework of Gibbs measures (Georgii, 1988). We show that a Markov logic network (MLN) admits a Gibbs measure as long as each ground atom has a finite number of neighbors. Many interesting cases fall in this category. We also show that an MLN admits a unique measure if the weights of its non-unit clauses are small enough. We then examine the structure of the set of consistent measures in the non-unique case. Many important phenomena, including systems with phase transitions, are represented by MLNs with non-unique measures. We relate the problem of satisfiability in first-order logic to the properties of MLN measures, and discuss how Markov logic relates to previous infinite models.

Parag Singla

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

A Solver-Free Framework for Scalable Learning in Neural ILP Architectures

CEAR: Cross-Entity Aware Reranker for Knowledge Base Completion

Neural Models for Output-Space Invariance in Combinatorial Problems

PARE: A Simple and Strong Baseline for Monolingual and Multilingual Distantly Supervised Relation Extraction

Advances in Symmetry Breaking for SAT Modulo Theories

Large Scale Question Answering using Tourism Data

MaskAAE: Latent space optimization for Adversarial Auto-Encoders

To Regularize or Not To Regularize? The Bias Variance Trade-off in Regularized AEs

Contextual Symmetries in Probabilistic Graphical Models

Lifted Region-Based Belief Propagation

On the Role of Conductance, Geography and Topology in Predicting Hashtag Virality

Markov Logic in Infinite Domains