Researcher profile

Parag Singla

Parag Singla contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2023arXiv

A Solver-Free Framework for Scalable Learning in Neural ILP Architectures

There is a recent focus on designing architectures that have an Integer Linear Programming (ILP) layer within a neural model (referred to as Neural ILP in this paper). Neural ILP architectures are suitable for pure reasoning tasks that require data-driven constraint learning or for tasks requiring both perception (neural) and reasoning (ILP). A recent SOTA approach for end-to-end training of Neural ILP explicitly defines gradients through the ILP black box (Paulus et al. 2021) - this trains extremely slowly, owing to a call to the underlying ILP solver for every training data point in a minibatch. In response, we present an alternative training strategy that is solver-free, i.e., does not call the ILP solver at all at training time. Neural ILP has a set of trainable hyperplanes (for cost and constraints in ILP), together representing a polyhedron. Our key idea is that the training loss should impose that the final polyhedron separates the positives (all constraints satisfied) from the negatives (at least one violated constraint or a suboptimal cost value), via a soft-margin formulation. While positive example(s) are provided as part of the training data, we devise novel techniques for generating negative samples. Our solution is flexible enough to handle equality as well as inequality constraints. Experiments on several problems, both perceptual as well as symbolic, which require learning the constraints of an ILP, show that our approach has superior performance and scales much better compared to purely neural baselines and other state-of-the-art models that require solver-based training. In particular, we are able to obtain excellent performance in 9 x 9 symbolic and visual sudoku, to which the other Neural ILP solver is not able to scale.

preprint2022arXiv

CEAR: Cross-Entity Aware Reranker for Knowledge Base Completion

Pre-trained language models (LMs) like BERT have shown to store factual knowledge about the world. This knowledge can be used to augment the information present in Knowledge Bases, which tend to be incomplete. However, prior attempts at using BERT for task of Knowledge Base Completion (KBC) resulted in performance worse than embedding based techniques that rely only on the graph structure. In this work we develop a novel model, Cross-Entity Aware Reranker (CEAR), that uses BERT to re-rank the output of existing KBC models with cross-entity attention. Unlike prior work that scores each entity independently, CEAR uses BERT to score the entities together, which is effective for exploiting its factual knowledge. CEAR achieves a new state of art for the OLPBench dataset.

preprint2022arXiv

Neural Models for Output-Space Invariance in Combinatorial Problems

Recently many neural models have been proposed to solve combinatorial puzzles by implicitly learning underlying constraints using their solved instances, such as sudoku or graph coloring (GCP). One drawback of the proposed architectures, which are often based on Graph Neural Networks (GNN), is that they cannot generalize across the size of the output space from which variables are assigned a value, for example, set of colors in a GCP, or board-size in sudoku. We call the output space for the variables as 'value-set'. While many works have demonstrated generalization of GNNs across graph size, there has been no study on how to design a GNN for achieving value-set invariance for problems that come from the same domain. For example, learning to solve 16 x 16 sudoku after being trained on only 9 x 9 sudokus. In this work, we propose novel methods to extend GNN based architectures to achieve value-set invariance. Specifically, our model builds on recently proposed Recurrent Relational Networks. Our first approach exploits the graph-size invariance of GNNs by converting a multi-class node classification problem into a binary node classification problem. Our second approach works directly with multiple classes by adding multiple nodes corresponding to the values in the value-set, and then connecting variable nodes to value nodes depending on the problem initialization. Our experimental evaluation on three different combinatorial problems demonstrates that both our models perform well on our novel problem, compared to a generic neural reasoner. Between two of our models, we observe an inherent trade-off: while the binarized model gives better performance when trained on smaller value-sets, multi-valued model is much more memory efficient, resulting in improved performance when trained on larger value-sets, where binarized model fails to train.

preprint2022arXiv

PARE: A Simple and Strong Baseline for Monolingual and Multilingual Distantly Supervised Relation Extraction

Neural models for distantly supervised relation extraction (DS-RE) encode each sentence in an entity-pair bag separately. These are then aggregated for bag-level relation prediction. Since, at encoding time, these approaches do not allow information to flow from other sentences in the bag, we believe that they do not utilize the available bag data to the fullest. In response, we explore a simple baseline approach (PARE) in which all sentences of a bag are concatenated into a passage of sentences, and encoded jointly using BERT. The contextual embeddings of tokens are aggregated using attention with the candidate relation as query -- this summary of whole passage predicts the candidate relation. We find that our simple baseline solution outperforms existing state-of-the-art DS-RE models in both monolingual and multilingual DS-RE datasets.

preprint2020arXiv

Advances in Symmetry Breaking for SAT Modulo Theories

Symmetry breaking is a popular technique to reduce the search space for SAT solving by exploiting the underlying symmetry over variables and clauses in a formula. The key idea is to first identify sets of assignments which fall in the same symmetry class, and then impose ordering constraints, called Symmetry Breaking Predicates (SBPs), such that only one (or a small subset) of these assignments is allowed to be a solution of the original SAT formula. While this technique has been exploited extensively in the SAT literature, there is little work on using symmetry breaking for SAT Modulo Theories (SMT). In SMT, logical constraints in SAT theories are combined with another set of theory operations defined over non-Boolean variables such as integers, reals, etc. SMT solvers typically use a combination of SAT solving techniques augmented with calls to the theory solver. In this work, we take up the advances in SAT symmetry breaking and apply them to the domain of SMT. Our key technical contribution is the formulation of symmetry breaking over the Boolean skeleton variables, which are placeholders for actual theory operations in SMT solving. These SBPs are then applied over the SAT solving part of the SMT solver. We implement our SBP ideas on top of CVC4, which is a state-of-the-art SMT solver. Our approach can result in significantly faster solutions on several benchmark problems compared to the state-of-the-art. Our final solver is a hybrid of the original CVC4 solver, and an SBP based solver, and can solve up to 3.8% and 3.1% more problems in the QF_NIA category of 2018 and 2019 SMT benchmarks, respectively, compared to CVC4, the top performer in this category.

preprint2020arXiv

Large Scale Question Answering using Tourism Data

We introduce the novel task of answering entity-seeking recommendation questions using a collection of reviews that describe candidate answer entities. We harvest a QA dataset that contains 47,124 paragraph-sized real user questions from travelers seeking recommendations for hotels, attractions and restaurants. Each question can have thousands of candidate answers to choose from and each candidate is associated with a collection of unstructured reviews. This dataset is especially challenging because commonly used neural architectures for reasoning and QA are prohibitively expensive for a task of this scale. As a solution, we design a scalable cluster-select-rerank approach. It first clusters text for each entity to identify exemplar sentences describing an entity. It then uses a scalable neural information retrieval (IR) module to select a set of potential entities from the large candidate set. A reranker uses a deeper attention-based architecture to pick the best answers from the selected entities. This strategy performs better than a pure IR or a pure attention-based reasoning approach yielding nearly 25% relative improvement in Accuracy@3 over both approaches.

preprint2020arXiv

MaskAAE: Latent space optimization for Adversarial Auto-Encoders

The field of neural generative models is dominated by the highly successful Generative Adversarial Networks (GANs) despite their challenges, such as training instability and mode collapse. Auto-Encoders (AE) with regularized latent space provide an alternative framework for generative models, albeit their performance levels have not reached that of GANs. In this work, we hypothesise that the dimensionality of the AE model's latent space has a critical effect on the quality of generated data. Under the assumption that nature generates data by sampling from a "true" generative latent space followed by a deterministic function, we show that the optimal performance is obtained when the dimensionality of the latent space of the AE-model matches with that of the "true" generative latent space. Further, we propose an algorithm called the Mask Adversarial Auto-Encoder (MaskAAE), in which the dimensionality of the latent space of an adversarial auto encoder is brought closer to that of the "true" generative latent space, via a procedure to mask the spurious latent dimensions. We demonstrate through experiments on synthetic and several real-world datasets that the proposed formulation yields betterment in the generation quality.

preprint2020arXiv

To Regularize or Not To Regularize? The Bias Variance Trade-off in Regularized AEs

Regularized Auto-Encoders (RAEs) form a rich class of neural generative models. They effectively model the joint-distribution between the data and the latent space using an Encoder-Decoder combination, with regularization imposed in terms of a prior over the latent space. Despite their advantages, such as stability in training, the performance of AE based models has not reached the superior standards as that of the other generative models such as Generative Adversarial Networks (GANs). Motivated by this, we examine the effect of the latent prior on the generation quality of deterministic AE models in this paper. Specifically, we consider the class of RAEs with deterministic Encoder-Decoder pairs, Wasserstein Auto-Encoders (WAE), and show that having a fixed prior distribution, \textit{a priori}, oblivious to the dimensionality of the `true' latent space, will lead to the infeasibility of the optimization problem considered. Further, we show that, in the finite data regime, despite knowing the correct latent dimensionality, there exists a bias-variance trade-off with any arbitrary prior imposition. As a remedy to both the issues mentioned above, we introduce an additional state space in the form of flexibly learnable latent priors, in the optimization objective of the WAEs. We implicitly learn the distribution of the latent prior jointly with the AE training, which not only makes the learning objective feasible but also facilitates operation on different points of the bias-variance curve. We show the efficacy of our model, called FlexAE, through several experiments on multiple datasets, and demonstrate that it is the new state-of-the-art for the AE based generative models.