Source author record

Gaurav Pandey

Gaurav Pandey appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Robotics Computation and Language Artificial Intelligence Genomics Information Theory math.IT Computational Engineering, Finance, and Science math.ST Molecular Networks Multiagent Systems Populations and Evolution Statistics Theory

Catalog footprint

What is connected

22works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Mix-and-Match: Scalable Dialog Response Retrieval using Gaussian Mixture Embeddings

Embedding-based approaches for dialog response retrieval embed the context-response pairs as points in the embedding space. These approaches are scalable, but fail to account for the complex, many-to-many relationships that exist between context-response pairs. On the other end of the spectrum, there are approaches that feed the context-response pairs jointly through multiple layers of neural networks. These approaches can model the complex relationships between context-response pairs, but fail to scale when the set of responses is moderately large (>100). In this paper, we combine the best of both worlds by proposing a scalable model that can learn complex relationships between context-response pairs. Specifically, the model maps the contexts as well as responses to probability distributions over the embedding space. We train the models by optimizing the Kullback-Leibler divergence between the distributions induced by context-response pairs in the training data. We show that the resultant model achieves better performance as compared to other embedding-based approaches on publicly available conversation data.

preprint2022arXiv

Real-time Full-stack Traffic Scene Perception for Autonomous Driving with Roadside Cameras

We propose a novel and pragmatic framework for traffic scene perception with roadside cameras. The proposed framework covers a full-stack of roadside perception pipeline for infrastructure-assisted autonomous driving, including object detection, object localization, object tracking, and multi-camera information fusion. Unlike previous vision-based perception frameworks rely upon depth offset or 3D annotation at training, we adopt a modular decoupling design and introduce a landmark-based 3D localization method, where the detection and localization can be well decoupled so that the model can be easily trained based on only 2D annotations. The proposed framework applies to either optical or thermal cameras with pinhole or fish-eye lenses. Our framework is deployed at a two-lane roundabout located at Ellsworth Rd. and State St., Ann Arbor, MI, USA, providing 7x24 real-time traffic flow monitoring and high-precision vehicle trajectory extraction. The whole system runs efficiently on a low-power edge computing device with all-component end-to-end delay of less than 20ms.

preprint2022arXiv

Variational Learning for Unsupervised Knowledge Grounded Dialogs

Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document. These methods do not require the exact document to be known during training and rely on the use of a retrieval system to fetch relevant documents from a large index. The documents used to generate the responses are modeled as latent variables whose prior probabilities need to be estimated. Models such as RAG and REALM, marginalize the document probabilities over the documents retrieved from the index to define the log likelihood loss function which is optimized end-to-end. In this paper, we develop a variational approach to the above technique wherein, we instead maximize the Evidence Lower bound (ELBO). Using a collection of three publicly available open-conversation datasets, we demonstrate how the posterior distribution, that has information from the ground-truth response, allows for a better approximation of the objective function during training. To overcome the challenges associated with sampling over a large knowledge collection, we develop an efficient approach to approximate the ELBO. To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems.

preprint2021arXiv

Developing parsimonious ensembles using ensemble diversity within a reinforcement learning framework

Heterogeneous ensembles built from the predictions of a wide variety and large number of diverse base predictors represent a potent approach to building predictive models for problems where the ideal base/individual predictor may not be obvious. Ensemble selection is an especially promising approach here, not only for improving prediction performance, but also because of its ability to select a collectively predictive subset, often a relatively small one, of the base predictors. In this paper, we present a set of algorithms that explicitly incorporate ensemble diversity, a known factor influencing predictive performance of ensembles, into a reinforcement learning framework for ensemble selection. We rigorously tested these approaches on several challenging problems and associated data sets, yielding that several of them produced more accurate ensembles than those that don't explicitly consider diversity. More importantly, these diversity-incorporating ensembles were much smaller in size, i.e., more parsimonious, than the latter types of ensembles. This can eventually aid the interpretation or reverse engineering of predictive models assimilated into the resultant ensemble(s).

preprint2021arXiv

Developing parsimonious ensembles using predictor diversity within a reinforcement learning framework

Heterogeneous ensembles that can aggregate an unrestricted number and variety of base predictors can effectively address challenging prediction problems. In particular, accurate ensembles that are also parsimonious, i.e., consist of as few base predictors as possible, can help reveal potentially useful knowledge about the target problem domain. Although ensemble selection offers a potential approach to achieving these goals, the currently available algorithms are limited in their abilities. In this paper, we present several algorithms that incorporate ensemble diversity into a reinforcement learning (RL)-based ensemble selection framework to build accurate and parsimonious ensembles. These algorithms, as well as several baselines, are rigorously evaluated on datasets from diverse domains in terms of the predictive performance and parsimony of their ensembles. This evaluation demonstrates that our diversity-incorporated RL-based algorithms perform better than the others for constructing simultaneously accurate and parsimonious ensembles. These algorithms can eventually aid the interpretation or reverse engineering of predictive models assimilated into effective ensembles. To enable such a translation, an implementation of these algorithms, as well the experimental setup they are evaluated in, has been made available at https://github.com/GauravPandeyLab/lens-learning-ensembles-using-reinforcement-learning.

preprint2021arXiv

Real Time Incremental Foveal Texture Mapping for Autonomous Vehicles

We propose an end-to-end real time framework to generate high resolution graphics grade textured 3D map of urban environment. The generated detailed map finds its application in the precise localization and navigation of autonomous vehicles. It can also serve as a virtual test bed for various vision and planning algorithms as well as a background map in the computer games. In this paper, we focus on two important issues: (i) incrementally generating a map with coherent 3D surface, in real time and (ii) preserving the quality of color texture. To handle the above issues, firstly, we perform a pose-refinement procedure which leverages camera image information, Delaunay triangulation and existing scan matching techniques to produce high resolution 3D map from the sparse input LIDAR scan. This 3D map is then texturized and accumulated by using a novel technique of ray-filtering which handles occlusion and inconsistencies in pose-refinement. Further, inspired by human fovea, we introduce foveal-processing which significantly reduces the computation time and also assists ray-filtering to maintain consistency in color texture and coherency in 3D surface of the output map. Moreover, we also introduce texture error (TE) and mean texture mapping error (MTME), which provides quantitative measure of texturing and overall quality of the textured maps.

preprint2020arXiv

Aerial Imagery based LIDAR Localization for Autonomous Vehicles

This paper presents a localization technique using aerial imagery maps and LIDAR based ground reflectivity for autonomous vehicles in urban environments. Traditional localization techniques using LIDAR reflectivity rely on high definition reflectivity maps generated from a mapping vehicle. The cost and effort required to maintain such prior maps are generally very high because it requires a fleet of expensive mapping vehicles. In this work we propose a localization technique where the vehicle localizes using aerial/satellite imagery, eradicating the need to develop and maintain complex high-definition maps. The proposed technique has been tested on a real world dataset collected from a test track in Ann Arbor, Michigan. This research concludes that aerial imagery based maps provides real-time localization performance similar to state-of-the-art LIDAR based maps for autonomous vehicles in urban environments at reduced costs.

preprint2020arXiv

Deep-Geometric 6 DoF Localization from a Single Image in Topo-metric Maps

We describe a Deep-Geometric Localizer that is able to estimate the full 6 Degree of Freedom (DoF) global pose of the camera from a single image in a previously mapped environment. Our map is a topo-metric one, with discrete topological nodes whose 6 DoF poses are known. Each topo-node in our map also comprises of a set of points, whose 2D features and 3D locations are stored as part of the mapping process. For the mapping phase, we utilise a stereo camera and a regular stereo visual SLAM pipeline. During the localization phase, we take a single camera image, localize it to a topological node using Deep Learning, and use a geometric algorithm (PnP) on the matched 2D features (and their 3D positions in the topo map) to determine the full 6 DoF globally consistent pose of the camera. Our method divorces the mapping and the localization algorithms and sensors (stereo and mono), and allows accurate 6 DoF pose estimation in a previously mapped environment using a single camera. With potential VR/AR and localization applications in single camera devices such as mobile phones and drones, our hybrid algorithm compares favourably with the fully Deep-Learning based Pose-Net that regresses pose from a single image in simulated as well as real environments.

preprint2020arXiv

Experimental Evaluation of 3D-LIDAR Camera Extrinsic Calibration

In this paper we perform an experimental comparison of three different target based 3D-LIDAR camera calibration algorithms. We briefly elucidate the mathematical background behind each method and provide insights into practical aspects like ease of data collection for all of them. We extensively evaluate these algorithms on a sensor suite which consists multiple cameras and LIDARs by assessing their robustness to random initialization and by using metrics like Mean Line Re-projection Error (MLRE) and Factory Stereo Calibration Error. We also show the effect of noisy sensor on the calibration result from all the algorithms and conclude with a note on which calibration algorithm should be used under what circumstances.

preprint2020arXiv

Extrinsic Calibration of a 3D-LIDAR and a Camera

This work presents an extrinsic parameter estimation algorithm between a 3D LIDAR and a Projective Camera using a marker-less planar target, by exploiting Planar Surface Point to Plane and Planar Edge Point to back-projected Plane geometric constraints. The proposed method uses the data collected by placing the planar board at different poses in the common field of view of the LIDAR and the Camera. The steps include, detection of the target and the edges of the target in LIDAR and Camera frames, matching the detected planes and lines across both the sensing modalities and finally solving a cost function formed by the aforementioned geometric constraints that link the features detected in both the LIDAR and the Camera using non-linear least squares. We have extensively validated our algorithm using two Basler Cameras, Velodyne VLP-32 and Ouster OS1 LIDARs.

preprint2020arXiv

Ford Multi-AV Seasonal Dataset

This paper presents a challenging multi-agent seasonal dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017-18. The vehicles traversed an average route of 66 km in Michigan that included a mix of driving scenarios such as the Detroit Airport, freeways, city-centers, university campus and suburban neighbourhoods, etc. Each vehicle used in this data collection is a Ford Fusion outfitted with an Applanix POS-LV GNSS system, four HDL-32E Velodyne 3D-lidar scanners, 6 Point Grey 1.3 MP Cameras arranged on the rooftop for 360-degree coverage and 1 Pointgrey 5 MP camera mounted behind the windshield for the forward field of view. We present the seasonal variation in weather, lighting, construction and traffic conditions experienced in dynamic urban environments. This dataset can help design robust algorithms for autonomous vehicles and multi-agent systems. Each log in the dataset is time-stamped and contains raw data from all the sensors, calibration values, pose trajectory, ground truth pose, and 3D maps. All data is available in Rosbag format that can be visualized, modified and applied using the open-source Robot Operating System (ROS). We also provide the output of state-of-the-art reflectivity-based localization for bench-marking purposes. The dataset can be freely downloaded at our website.

preprint2020arXiv

Mask & Focus: Conversation Modelling by Learning Concepts

Sequence to sequence models attempt to capture the correlation between all the words in the input and output sequences. While this is quite useful for machine translation where the correlation among the words is indeed quite strong, it becomes problematic for conversation modelling where the correlation is often at a much abstract level. In contrast, humans tend to focus on the essential concepts discussed in the conversation context and generate responses accordingly. In this paper, we attempt to mimic this response generating mechanism by learning the essential concepts in the context and response in an unsupervised manner. The proposed model, referred to as Mask \& Focus maps the input context to a sequence of concepts which are then used to generate the response concepts. Together, the context and the response concepts generate the final response. In order to learn context concepts from the training data automatically, we \emph{mask} words in the input and observe the effect of masking on response generation. We train our model to learn those response concepts that have high mutual information with respect to the context concepts, thereby guiding the model to \emph{focus} on the context concepts. Mask \& Focus achieves significant improvement over the existing baselines in several established metrics for dialogues.

preprint2020arXiv

Motion Guided LIDAR-camera Self-calibration and Accelerated Depth Upsampling for Autonomous Vehicles

This work proposes a novel motion guided method for target-less self-calibration of a LiDAR and camera and use the re-projection of LiDAR points onto the image reference frame for real-time depth upsampling. The calibration parameters are estimated by optimizing an objective function that penalizes distances between 2D and re-projected 3D motion vectors obtained from time-synchronized image and point cloud sequences. For upsampling, a simple, yet effective and time efficient formulation that minimizes depth gradients subject to an equality constraint involving the LiDAR measurements is proposed. Validation is performed on recorded real data from urban environments and demonstrations that our two methods are effective and suitable to mobile robotics and autonomous vehicle applications imposing real-time requirements is shown.

preprint2020arXiv

SEIR and Regression Model based COVID-19 outbreak predictions in India

COVID-19 pandemic has become a major threat to the country. Till date, well tested medication or antidote is not available to cure this disease. According to WHO reports, COVID-19 is a severe acute respiratory syndrome which is transmitted through respiratory droplets and contact routes. Analysis of this disease requires major attention by the Government to take necessary steps in reducing the effect of this global pandemic. In this study, outbreak of this disease has been analysed for India till 30th March 2020 and predictions have been made for the number of cases for the next 2 weeks. SEIR model and Regression model have been used for predictions based on the data collected from John Hopkins University repository in the time period of 30th January 2020 to 30th March 2020. The performance of the models was evaluated using RMSLE and achieved 1.52 for SEIR model and 1.75 for the regression model. The RMSLE error rate between SEIR model and Regression model was found to be 2.01. Also, the value of R0 which is the spread of the disease was calculated to be 2.02. Expected cases may rise between 5000-6000 in the next two weeks of time. This study will help the Government and doctors in preparing their plans for the next two weeks. Based on the predictions for short-term interval, these models can be tuned for forecasting in long-term intervals.

preprint2020arXiv

Unravelling the Architecture of Membrane Proteins with Conditional Random Fields

In this paper, we will show that the recently introduced graphical model: Conditional Random Fields (CRF) provides a template to integrate micro-level information about biological entities into a mathematical model to understand their macro-level behavior. More specifically, we will apply the CRF model to an important classification problem in protein science, namely the secondary structure prediction of proteins based on the observed primary structure. A comparison on benchmark data sets against twenty-eight other methods shows that not only does the CRF model lead to extremely accurate predictions but the modular nature of the model and the freedom to integrate disparate, overlapping and non-independent sources of information, makes the model an extremely versatile tool to potentially solve many other problems in bioinformatics.

preprint2016arXiv

On collapsed representation of hierarchical Completely Random Measures

The aim of the paper is to provide an exact approach for generating a Poisson process sampled from a hierarchical CRM, without having to instantiate the infinitely many atoms of the random measures. We use completely random measures~(CRM) and hierarchical CRM to define a prior for Poisson processes. We derive the marginal distribution of the resultant point process, when the underlying CRM is marginalized out. Using well known properties unique to Poisson processes, we were able to derive an exact approach for instantiating a Poisson process with a hierarchical CRM prior. Furthermore, we derive Gibbs sampling strategies for hierarchical CRM models based on Chinese restaurant franchise sampling scheme. As an example, we present the sum of generalized gamma process (SGGP), and show its application in topic-modelling. We show that one can determine the power-law behaviour of the topics and words in a Bayesian fashion, by defining a prior on the parameters of SGGP.

preprint2016arXiv

Variational methods for Conditional Multimodal Deep Learning

In this paper, we address the problem of conditional modality learning, whereby one is interested in generating one modality given the other. While it is straightforward to learn a joint distribution over multiple modalities using a deep multimodal architecture, we observe that such models aren't very effective at conditional generation. Hence, we address the problem by learning conditional distributions between the modalities. We use variational methods for maximizing the corresponding conditional log-likelihood. The resultant deep model, which we refer to as conditional multimodal autoencoder (CMMA), forces the latent representation obtained from a single modality alone to be `close' to the joint representation obtained from multiple modalities. We use the proposed model to generate faces from attributes. We show that the faces generated from attributes using the proposed model, are qualitatively and quantitatively more representative of the attributes from which they were generated, than those obtained by other deep generative models. We also propose a secondary task, whereby the existing faces are modified by modifying the corresponding attributes. We observe that the modifications in face introduced by the proposed model are representative of the corresponding modifications in attributes.

preprint2014arXiv

To go deep or wide in learning?

To achieve acceptable performance for AI tasks, one can either use sophisticated feature extraction methods as the first layer in a two-layered supervised learning model, or learn the features directly using a deep (multi-layered) model. While the first approach is very problem-specific, the second approach has computational overheads in learning multiple layers and fine-tuning of the model. In this paper, we propose an approach called wide learning based on arc-cosine kernels, that learns a single layer of infinite width. We propose exact and inexact learning strategies for wide learning and show that wide learning with single layer outperforms single layer as well as deep architectures of finite width for some benchmark datasets.

preprint2013arXiv

A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.

preprint2013arXiv

Generative Maximum Entropy Learning for Multiclass Classification

Maximum entropy approach to classification is very well studied in applied statistics and machine learning and almost all the methods that exists in literature are discriminative in nature. In this paper, we introduce a maximum entropy classification method with feature selection for large dimensional data such as text datasets that is generative in nature. To tackle the curse of dimensionality of large data sets, we employ conditional independence assumption (Naive Bayes) and we perform feature selection simultaneously, by enforcing a `maximum discrimination' between estimated class conditional densities. For two class problems, in the proposed method, we use Jeffreys ($J$) divergence to discriminate the class conditional densities. To extend our method to the multi-class case, we propose a completely new approach by considering a multi-distribution divergence: we replace Jeffreys divergence by Jensen-Shannon ($JS$) divergence to discriminate conditional densities of multiple classes. In order to reduce computational complexity, we employ a modified Jensen-Shannon divergence ($JS_{GM}$), based on AM-GM inequality. We show that the resulting divergence is a natural generalization of Jeffreys divergence to a multiple distributions case. As far as the theoretical justifications are concerned we show that when one intends to select the best features in a generative maximum entropy approach, maximum discrimination using $J-$divergence emerges naturally in binary classification. Performance and comparative study of the proposed algorithms have been demonstrated on large dimensional text and gene expression datasets that show our methods scale up very well with large dimensional datasets.

preprint2013arXiv

Minimum Description Length Principle for Maximum Entropy Model Selection

Model selection is central to statistics, and many learning problems can be formulated as model selection problems. In this paper, we treat the problem of selecting a maximum entropy model given various feature subsets and their moments, as a model selection problem, and present a minimum description length (MDL) formulation to solve this problem. For this, we derive normalized maximum likelihood (NML) codelength for these models. Furthermore, we prove that the minimax entropy principle is a special case of maximum entropy model selection, where one assumes that complexity of all the models are equal. We apply our approach to gene selection problem and present simulation results.

preprint2012arXiv

Enhancing the functional content of protein interaction networks

Protein interaction networks are a promising type of data for studying complex biological systems. However, despite the rich information embedded in these networks, they face important data quality challenges of noise and incompleteness that adversely affect the results obtained from their analysis. Here, we explore the use of the concept of common neighborhood similarity (CNS), which is a form of local structure in networks, to address these issues. Although several CNS measures have been proposed in the literature, an understanding of their relative efficacies for the analysis of interaction networks has been lacking. We follow the framework of graph transformation to convert the given interaction network into a transformed network corresponding to a variety of CNS measures evaluated. The effectiveness of each measure is then estimated by comparing the quality of protein function predictions obtained from its corresponding transformed network with those from the original network. Using a large set of S. cerevisiae interactions, and a set of 136 GO terms, we find that several of the transformed networks produce more accurate predictions than those obtained from the original network. In particular, the $HC.cont$ measure proposed here performs particularly well for this task. Further investigation reveals that the two major factors contributing to this improvement are the abilities of CNS measures, especially $HC.cont$, to prune out noisy edges and introduce new links between functionally related proteins.

Gaurav Pandey

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Mix-and-Match: Scalable Dialog Response Retrieval using Gaussian Mixture Embeddings

Real-time Full-stack Traffic Scene Perception for Autonomous Driving with Roadside Cameras

Variational Learning for Unsupervised Knowledge Grounded Dialogs

Developing parsimonious ensembles using ensemble diversity within a reinforcement learning framework

Developing parsimonious ensembles using predictor diversity within a reinforcement learning framework

Real Time Incremental Foveal Texture Mapping for Autonomous Vehicles

Aerial Imagery based LIDAR Localization for Autonomous Vehicles

Deep-Geometric 6 DoF Localization from a Single Image in Topo-metric Maps

Experimental Evaluation of 3D-LIDAR Camera Extrinsic Calibration

Extrinsic Calibration of a 3D-LIDAR and a Camera

Ford Multi-AV Seasonal Dataset

Mask & Focus: Conversation Modelling by Learning Concepts

Motion Guided LIDAR-camera Self-calibration and Accelerated Depth Upsampling for Autonomous Vehicles

SEIR and Regression Model based COVID-19 outbreak predictions in India

Unravelling the Architecture of Membrane Proteins with Conditional Random Fields

On collapsed representation of hierarchical Completely Random Measures

Variational methods for Conditional Multimodal Deep Learning

To go deep or wide in learning?

A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

Generative Maximum Entropy Learning for Multiclass Classification

Minimum Description Length Principle for Maximum Entropy Model Selection

Enhancing the functional content of protein interaction networks