Source author record

Ambuj K. Singh

Ambuj K. Singh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Social and Information Networks Information Retrieval physics.soc-ph Artificial Intelligence Computer Vision Machine Learning Multimedia Neurons and Cognition

Catalog footprint

What is connected

14works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

DANR: Discrepancy-aware Network Regularization

Network regularization is an effective tool for incorporating structural prior knowledge to learn coherent models over networks, and has yielded provably accurate estimates in applications ranging from spatial economics to neuroimaging studies. Recently, there has been an increasing interest in extending network regularization to the spatio-temporal case to accommodate the evolution of networks. However, in both static and spatio-temporal cases, missing or corrupted edge weights can compromise the ability of network regularization to discover desired solutions. To address these gaps, we propose a novel approach---{\it discrepancy-aware network regularization} (DANR)---that is robust to inadequate regularizations and effectively captures model evolution and structural changes over spatio-temporal networks. We develop a distributed and scalable algorithm based on the alternating direction method of multipliers (ADMM) to solve the proposed problem with guaranteed convergence to global optimum solutions. Experimental results on both synthetic and real-world networks demonstrate that our approach achieves improved performance on various tasks, and enables interpretation of model changes in evolving networks.

preprint2020arXiv

The 1995-2018 Global Evolution of the Network of Amicable and Hostile Relations Among Nation-States

We draw on the data collected by the Integrated Crisis Early Warning System on millions of international and regional public news stories, and this system's indicators of the orientation toward a specific nation-state. We construct the networks of international amicable and hostile relations among nation-states that occur in specific time-periods in order to study the global evolution of the network of such international appraisals. Our analysis presents evidence of an evolution of the structure of this network and a model of the probabilistic micro-dynamics of the alterations of international appraisals during the 1995-2018 span of the available data. Our research provides empirical findings on long-standing debates in the interdisciplinary field of work on Structural Balance Theory. Also remarkably, we find that the trajectory of the Frobenius norm of sequential transition probabilities, which govern the evolution of international appraisals among nations, dramatically stabilizes.

preprint2016arXiv

Prediction-based Online Trajectory Compression

Recent spatio-temporal data applications, such as car-shar\-ing and smart cities, impose new challenges regarding the scalability and timeliness of data processing systems. Trajectory compression is a promising approach for scaling up spatio-temporal databases. However, existing techniques fail to address the online setting, in which a compressed version of a trajectory stream has to be maintained over time. In this paper, we introduce ONTRAC, a new framework for map-matched online trajectory compression. ONTRAC learns prediction models for suppressing updates to a trajectory database using training data. Two prediction schemes are proposed, one for road segments via a Markov model and another for travel-times by combining Quadratic Programming and Expectation Maximization. Experiments show that ONTRAC outperforms the state-of-the-art offline technique even when long update delays (4 mininutes) are allowed and achieves up to 21 times higher compression ratio for travel-times. Moreover, our approach increases database scalability by up to one order of magnitude.

preprint2015arXiv

Behavior Query Discovery in System-Generated Temporal Graphs

Computer system monitoring generates huge amounts of logs that record the interaction of system entities. How to query such data to better understand system behaviors and identify potential system risks and malicious behaviors becomes a challenging task for system administrators due to the dynamics and heterogeneity of the data. System monitoring data are essentially heterogeneous temporal graphs with nodes being system entities and edges being their interactions over time. Given the complexity of such graphs, it becomes time-consuming for system administrators to manually formulate useful queries in order to examine abnormal activities, attacks, and vulnerabilities in computer systems. In this work, we investigate how to query temporal graphs and treat query formulation as a discriminative temporal graph pattern mining problem. We introduce TGMiner to mine discriminative patterns from system logs, and these patterns can be taken as templates for building more complex queries. TGMiner leverages temporal information in graphs to prune graph patterns that share similar growth trend without compromising pattern quality. Experimental results on real system data show that TGMiner is 6-32 times faster than baseline methods. The discovered patterns were verified by system experts; they achieved high precision (97%) and recall (91%).

preprint2015arXiv

Discriminative Subnetworks with Regularized Spectral Learning for Global-state Network Data

Data mining practitioners are facing challenges from data with network structure. In this paper, we address a specific class of global-state networks which comprises of a set of network instances sharing a similar structure yet having different values at local nodes. Each instance is associated with a global state which indicates the occurrence of an event. The objective is to uncover a small set of discriminative subnetworks that can optimally classify global network values. Unlike most existing studies which explore an exponential subnetwork space, we address this difficult problem by adopting a space transformation approach. Specifically, we present an algorithm that optimizes a constrained dual-objective function to learn a low-dimensional subspace that is capable of discriminating networks labelled by different global states, while reconciling with common network topology sharing across instances. Our algorithm takes an appealing approach from spectral graph learning and we show that the globally optimum solution can be achieved via matrix eigen-decomposition.

preprint2014arXiv

Learning about Learning: Human Brain Sub-Network Biomarkers in fMRI Data

It has become increasingly popular to study the brain as a network due to the realization that functionality cannot be explained exclusively by independent activation of specialized regions. Instead, across a large spectrum of behaviors, function arises due to the dynamic interactions between brain regions. The existing literature on functional brain networks focuses mainly on a battery of network properties characterizing the "resting state" using for example the modularity, clustering, or path length among regions. In contrast, we seek to uncover subgraphs of functional connectivity that predict or drive individual differences in sensorimotor learning across subjects. We employ a principled approach for the discovery of significant subgraphs of functional connectivity, induced by brain activity (measured via fMRI imaging) while subjects perform a motor learning task. Our aim is to uncover patterns of functional connectivity that discriminate between high and low rates of learning among subjects. The discovery of such significant discriminative subgraphs promises a better data-driven understanding of the dynamic brain processes associated with brain plasticity.

preprint2014arXiv

Nearest Keyword Set Search in Multi-dimensional Datasets

Keyword-based search in text-rich multi-dimensional datasets facilitates many novel applications and tools. In this paper, we consider objects that are tagged with keywords and are embedded in a vector space. For these datasets, we study queries that ask for the tightest groups of points satisfying a given set of keywords. We propose a novel method called ProMiSH (Projection and Multi Scale Hashing) that uses random projection and hash-based index structures, and achieves high scalability and speedup. We present an exact and an approximate version of the algorithm. Our empirical studies, both on real and synthetic datasets, show that ProMiSH has a speedup of more than four orders over state-of-the-art tree-based techniques. Our scalability tests on datasets of sizes up to 10 million and dimensions up to 100 for queries having up to 9 keywords show that ProMiSH scales linearly with the dataset size, the dataset dimension, the query size, and the result size.

preprint2013arXiv

The Social Media Genome: Modeling Individual Topic-Specific Behavior in Social Media

Information propagation in social media depends not only on the static follower structure but also on the topic-specific user behavior. Hence novel models incorporating dynamic user behavior are needed. To this end, we propose a model for individual social media users, termed a genotype. The genotype is a per-topic summary of a user's interest, activity and susceptibility to adopt new information. We demonstrate that user genotypes remain invariant within a topic by adopting them for classification of new information spread in large-scale real networks. Furthermore, we extract topic-specific influence backbone structures based on information adoption and show that they differ significantly from the static follower network. When employed for influence prediction of new content spread, our genotype model and influence backbones enable more than $20% improvement, compared to purely structural features. We also demonstrate that knowledge of user genotypes and influence backbones allow for the design of effective strategies for latency minimization of topic-specific information spread.

preprint2012arXiv

Inferring the Underlying Structure of Information Cascades

In social networks, information and influence diffuse among users as cascades. While the importance of studying cascades has been recognized in various applications, it is difficult to observe the complete structure of cascades in practice. Moreover, much less is known on how to infer cascades based on partial observations. In this paper we study the cascade inference problem following the independent cascade model, and provide a full treatment from complexity to algorithms: (a) We propose the idea of consistent trees as the inferred structures for cascades; these trees connect source nodes and observed nodes with paths satisfying the constraints from the observed temporal information. (b) We introduce metrics to measure the likelihood of consistent trees as inferred cascades, as well as several optimization problems for finding them. (c) We show that the decision problems for consistent trees are in general NP-complete, and that the optimization problems are hard to approximate. (d) We provide approximation algorithms with performance guarantees on the quality of the inferred cascades, as well as heuristics. We experimentally verify the efficiency and effectiveness of our inference algorithms, using real and synthetic data.

preprint2011arXiv

Answering Top-k Queries Over a Mixture of Attractive and Repulsive Dimensions

In this paper, we formulate a top-k query that compares objects in a database to a user-provided query object on a novel scoring function. The proposed scoring function combines the idea of attractive and repulsive dimensions into a general framework to overcome the weakness of traditional distance or similarity measures. We study the properties of the proposed class of scoring functions and develop efficient and scalable index structures that index the isolines of the function. We demonstrate various scenarios where the query finds application. Empirical evaluation demonstrates a performance gain of one to two orders of magnitude on querying time over existing state-of-the-art top-k techniques. Further, a qualitative analysis is performed on a real dataset to highlight the potential of the proposed query in discovering hidden data characteristics.

preprint2011arXiv

Indexing the Earth Mover's Distance Using Normal Distributions

Querying uncertain data sets (represented as probability distributions) presents many challenges due to the large amount of data involved and the difficulties comparing uncertainty between distributions. The Earth Mover's Distance (EMD) has increasingly been employed to compare uncertain data due to its ability to effectively capture the differences between two distributions. Computing the EMD entails finding a solution to the transportation problem, which is computationally intensive. In this paper, we propose a new lower bound to the EMD and an index structure to significantly improve the performance of EMD based K-nearest neighbor (K-NN) queries on uncertain databases. We propose a new lower bound to the EMD that approximates the EMD on a projection vector. Each distribution is projected onto a vector and approximated by a normal distribution, as well as an accompanying error term. We then represent each normal as a point in a Hough transformed space. We then use the concept of stochastic dominance to implement an efficient index structure in the transformed space. We show that our method significantly decreases K-NN query time on uncertain databases. The index structure also scales well with database cardinality. It is well suited for heterogeneous data sets, helping to keep EMD based queries tractable as uncertain data sets become larger and more complex.

preprint2010arXiv

Finding top-k similar pairs of objects annotated with terms from an ontology

With the growing focus on semantic searches and interpretations, an increasing number of standardized vocabularies and ontologies are being designed and used to describe data. We investigate the querying of objects described by a tree-structured ontology. Specifically, we consider the case of finding the top-k best pairs of objects that have been annotated with terms from such an ontology when the object descriptions are available only at runtime. We consider three distance measures. The first one defines the object distance as the minimum pairwise distance between the sets of terms describing them, and the second one defines the distance as the average pairwise term distance. The third and most useful distance measure, earth mover's distance, finds the best way of matching the terms and computes the distance corresponding to this best matching. We develop lower bounds that can be aggregated progressively and utilize them to speed up the search for top-k object pairs when the earth mover's distance is used. For the minimum pairwise distance, we devise an algorithm that runs in O(D + Tk log k) time, where D is the total information size and T is the total number of terms in the ontology. We also develop a novel best-first search strategy for the average pairwise distance that utilizes lower bounds generated in an ordered manner. Experiments on real and synthetic datasets demonstrate the practicality and scalability of our algorithms.

preprint2010arXiv

Profile Based Sub-Image Search in Image Databases

Sub-image search with high accuracy in natural images still remains a challenging problem. This paper proposes a new feature vector called profile for a keypoint in a bag of visual words model of an image. The profile of a keypoint captures the spatial geometry of all the other keypoints in an image with respect to itself, and is very effective in discriminating true matches from false matches. Sub-image search using profiles is a single-phase process requiring no geometric validation, yields high precision on natural images, and works well on small visual codebook. The proposed search technique differs from traditional methods that first generate a set of candidates disregarding spatial information and then verify them geometrically. Conventional methods also use large codebooks. We achieve a precision of 81% on a combined data set of synthetic and real natural images using a codebook size of 500 for top-10 queries; that is 31% higher than the conventional candidate generation approach.

preprint2009arXiv

Finding Significant Subregions in Large Image Databases

Images have become an important data source in many scientific and commercial domains. Analysis and exploration of image collections often requires the retrieval of the best subregions matching a given query. The support of such content-based retrieval requires not only the formulation of an appropriate scoring function for defining relevant subregions but also the design of new access methods that can scale to large databases. In this paper, we propose a solution to this problem of querying significant image subregions. We design a scoring scheme to measure the similarity of subregions. Our similarity measure extends to any image descriptor. All the images are tiled and each alignment of the query and a database image produces a tile score matrix. We show that the problem of finding the best connected subregion from this matrix is NP-hard and develop a dynamic programming heuristic. With this heuristic, we develop two index based scalable search strategies, TARS and SPARS, to query patterns in a large image repository. These strategies are general enough to work with other scoring schemes and heuristics. Experimental results on real image datasets show that TARS saves more than 87% query time on small queries, and SPARS saves up to 52% query time on large queries as compared to linear search. Qualitative tests on synthetic and real datasets achieve precision of more than 80%.

Ambuj K. Singh

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

DANR: Discrepancy-aware Network Regularization

The 1995-2018 Global Evolution of the Network of Amicable and Hostile Relations Among Nation-States

Prediction-based Online Trajectory Compression

Behavior Query Discovery in System-Generated Temporal Graphs

Discriminative Subnetworks with Regularized Spectral Learning for Global-state Network Data

Learning about Learning: Human Brain Sub-Network Biomarkers in fMRI Data

Nearest Keyword Set Search in Multi-dimensional Datasets

The Social Media Genome: Modeling Individual Topic-Specific Behavior in Social Media

Inferring the Underlying Structure of Information Cascades

Answering Top-k Queries Over a Mixture of Attractive and Repulsive Dimensions

Indexing the Earth Mover's Distance Using Normal Distributions

Finding top-k similar pairs of objects annotated with terms from an ontology

Profile Based Sub-Image Search in Image Databases

Finding Significant Subregions in Large Image Databases