Source author record

Venkatesh Saligrama

Venkatesh Saligrama appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Information Theory math.IT Computation and Language math.OC math.ST Statistics Theory Artificial Intelligence Distributed, Parallel, and Cluster Computing Information Retrieval Discrete Mathematics Human-Computer Interaction Networking and Internet Architecture Neural and Evolutionary Computing Social and Information Networks Systems and Control

Catalog footprint

What is connected

63works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Data Deletion Can Help in Adaptive RL

Deploying reinforcement learning policies in the real world requires adapting to time-varying environments. We study this problem in the contextual Markov Decision Process (cMDP) framework, where a family of environments is indexed by a low-dimensional context unknown at test time. The standard approach decomposes the problem: train a so-called "universal policy" which assumes knowledge of the true context, then pair it with a context estimator which approximates context using the observed trajectory. We identify a simple, counterintuitive trick that substantially improves the estimator: randomly delete a fraction of the training buffer after each round. This works because data is collected across multiple rounds using progressively better policies, and older trajectories come from a different distribution than what the estimator will face at deployment time; random deletion creates an implicit exponential decay on older data while preserving diversity without requiring any explicit identification of which samples are stale. This reduces robustness gap by 30% for MLPs and by 6% on average for recurrent networks. Strikingly, it allows a narrow MLP with 5x fewer parameters to outperform a wide MLP trained without deletion. To understand when and why deletion helps, we analyze regularized empirical risk minimization with a mismatch between the train distribution and the distribution at deployment; in this idealized setting, we prove that removing a single uniformly random training point decreases expected test loss in expectation under mild conditions. For ridge regression we make this quantitative: deletion helps when the regularization coefficient is moderate and the signal-to-noise ratio (SNR) is sufficiently low, and, crucially, this SNR threshold gives a direct measure of how large the distribution mismatch between training and deployment must be for deletion to be beneficial.

preprint2022arXiv

Faster Algorithms for Learning Convex Functions

The task of approximating an arbitrary convex function arises in several learning problems such as convex regression, learning with a difference of convex (DC) functions, and learning Bregman or $f$-divergences. In this paper, we develop and analyze an approach for solving a broad range of convex function learning problems that is faster than state-of-the-art approaches. Our approach is based on a 2-block ADMM method where each block can be computed in closed form. For the task of convex Lipschitz regression, we establish that our proposed algorithm converges with iteration complexity of $ O(n\sqrt{d}/ε)$ for a dataset $\bm X \in \mathbb R^{n\times d}$ and $ε> 0$. Combined with per-iteration computation complexity, our method converges with the rate $O(n^3 d^{1.5}/ε+n^2 d^{2.5}/ε+n d^3/ε)$. This new rate improves the state of the art rate of $O(n^5d^2/ε)$ if $d = o( n^4)$. Further we provide similar solvers for DC regression and Bregman divergence learning. Unlike previous approaches, our method is amenable to the use of GPUs. We demonstrate on regression and metric learning experiments that our approach is over 100 times faster than existing approaches on some data sets, and produces results that are comparable to state of the art.

preprint2022arXiv

FedHeN: Federated Learning in Heterogeneous Networks

We propose a novel training recipe for federated learning with heterogeneous networks where each device can have different architectures. We introduce training with a side objective to the devices of higher complexities to jointly train different architectures in a federated setting. We empirically show that our approach improves the performance of different architectures and leads to high communication savings compared to the state-of-the-art methods.

preprint2022arXiv

Learning Compositional Representations for Effective Low-Shot Generalization

We propose Recognition as Part Composition (RPC), an image encoding approach inspired by human cognition. It is based on the cognitive theory that humans recognize complex objects by components, and that they build a small compact vocabulary of concepts to represent each instance with. RPC encodes images by first decomposing them into salient parts, and then encoding each part as a mixture of a small number of prototypes, each representing a certain concept. We find that this type of learning inspired by human cognition can overcome hurdles faced by deep convolutional networks in low-shot generalization tasks, like zero-shot learning, few-shot learning and unsupervised domain adaptation. Furthermore, we find a classifier using an RPC image encoder is fairly robust to adversarial attacks, that deep neural networks are known to be prone to. Given that our image encoding principle is based on human cognition, one would expect the encodings to be interpretable by humans, which we find to be the case via crowd-sourcing experiments. Finally, we propose an application of these interpretable encodings in the form of generating synthetic attribute annotations for evaluating zero-shot learning methods on new datasets.

preprint2022arXiv

Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

We investigate a natural but surprisingly unstudied approach to the multi-armed bandit problem under safety risk constraints. Each arm is associated with an unknown law on safety risks and rewards, and the learner's goal is to maximise reward whilst not playing unsafe arms, as determined by a given threshold on the mean risk. We formulate a pseudo-regret for this setting that enforces this safety constraint in a per-round way by softly penalising any violation, regardless of the gain in reward due to the same. This has practical relevance to scenarios such as clinical trials, where one must maintain safety for each round rather than in an aggregated sense. We describe doubly optimistic strategies for this scenario, which maintain optimistic indices for both safety risk and reward. We show that schema based on both frequentist and Bayesian indices satisfy tight gap-dependent logarithmic regret bounds, and further that these play unsafe arms only logarithmically many times in total. This theoretical analysis is complemented by simulation studies demonstrating the effectiveness of the proposed schema, and probing the domains in which their use is appropriate.

preprint2022arXiv

Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

Pre-training models on Imagenet or other massive datasets of real images has led to major advances in computer vision, albeit accompanied with shortcomings related to curation cost, privacy, usage rights, and ethical issues. In this paper, for the first time, we study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks from very different domains. In using such synthetic data for pre-training, we find that downstream performance on different tasks are favored by different configurations of simulation parameters (e.g. lighting, object pose, backgrounds, etc.), and that there is no one-size-fits-all solution. It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance. We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters to generate synthetic pre-training data for them. Task2Sim learns this mapping by training to find the set of best parameters on a set of "seen" tasks. Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot, without requiring additional training. Given a budget in number of images per class, our extensive experiments with 20 diverse downstream tasks show Task2Sim's task-adaptive pre-training data results in significantly better downstream performance than non-adaptively choosing simulation parameters on both seen and unseen tasks. It is even competitive with pre-training on real images from Imagenet.

preprint2020arXiv

Budget Learning via Bracketing

Conventional machine learning applications in the mobile/IoT setting transmit data to a cloud-server for predictions. Due to cost considerations (power, latency, monetary), it is desirable to minimise device-to-server transmissions. The budget learning (BL) problem poses the learner's goal as minimising use of the cloud while suffering no discernible loss in accuracy, under the constraint that the methods employed be edge-implementable. We propose a new formulation for the BL problem via the concept of bracketings. Concretely, we propose to sandwich the cloud's prediction, $g,$ via functions $h^-, h^+$ from a `simple' class so that $h^- \le g \le h^+$ nearly always. On an instance $x$, if $h^+(x)=h^-(x)$, we leverage local processing, and bypass the cloud. We explore theoretical aspects of this formulation, providing PAC-style learnability definitions; associating the notion of budget learnability to approximability via brackets; and giving VC-theoretic analyses of their properties. We empirically validate our theory on real-world datasets, demonstrating improved performance over prior gating based methods.

preprint2020arXiv

Dont Even Look Once: Synthesizing Features for Zero-Shot Detection

Zero-shot detection, namely, localizing both seen and unseen objects, increasingly gains importance for large-scale applications, with large number of object classes, since, collecting sufficient annotated data with ground truth bounding boxes is simply not scalable. While vanilla deep neural networks deliver high performance for objects available during training, unseen object detection degrades significantly. At a fundamental level, while vanilla detectors are capable of proposing bounding boxes, which include unseen objects, they are often incapable of assigning high-confidence to unseen objects, due to the inherent precision/recall tradeoffs that requires rejecting background objects. We propose a novel detection algorithm Dont Even Look Once (DELO), that synthesizes visual features for unseen objects and augments existing training algorithms to incorporate unseen object detection. Our proposed scheme is evaluated on Pascal VOC and MSCOCO, and we demonstrate significant improvements in test accuracy over vanilla and other state-of-art zero-shot detectors

preprint2020arXiv

Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers

We consider worker skill estimation for the single-coin Dawid-Skene crowdsourcing model. In practice, skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlations between workers. We show that the correlation matrix can be successfully recovered and skills are identifiable if and only if the sampling matrix (observed components) does not have a bipartite connected component. We then propose a projected gradient descent scheme and show that skill estimates converge to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP-hard in general. Next, we derive sample complexity bounds in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.

preprint2016arXiv

Clustering and Community Detection with Imbalanced Clusters

Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.

preprint2016arXiv

Efficient Training of Very Deep Neural Networks for Supervised Hashing

In this paper, we propose training very deep neural networks (DNNs) for supervised learning of hash codes. Existing methods in this context train relatively "shallow" networks limited by the issues arising in back propagation (e.e. vanishing gradients) as well as computational efficiency. We propose a novel and efficient training algorithm inspired by alternating direction method of multipliers (ADMM) that overcomes some of these limitations. Our method decomposes the training process into independent layer-wise local updates through auxiliary variables. Empirically we observe that our training algorithm always converges and its computational complexity is linearly proportional to the number of edges in the networks. Empirically we manage to train DNNs with 64 hidden layers and 1024 nodes per layer for supervised hashing in about 3 hours using a single GPU. Our proposed very deep supervised hashing (VDSH) method significantly outperforms the state-of-the-art on several benchmark datasets.

preprint2016arXiv

Learning Joint Feature Adaptation for Zero-Shot Recognition

Zero-shot recognition (ZSR) aims to recognize target-domain data instances of unseen classes based on the models learned from associated pairs of seen-class source and target domain data. One of the key challenges in ZSR is the relative scarcity of source-domain features (e.g. one feature vector per class), which do not fully account for wide variability in target-domain instances. In this paper we propose a novel framework of learning data-dependent feature transforms for scoring similarity between an arbitrary pair of source and target data instances to account for the wide variability in target domain. Our proposed approach is based on optimizing over a parameterized family of local feature displacements that maximize the source-target adaptive similarity functions. Accordingly we propose formulating zero-shot learning (ZSL) using latent structural SVMs to learn our similarity functions from training data. As demonstration we design a specific algorithm under the proposed framework involving bilinear similarity functions and regularized least squares as penalties for feature displacement. We test our approach on several benchmark datasets for ZSR and show significant improvement over the state-of-the-art. For instance, on aP&Y dataset we can achieve 80.89% in terms of recognition accuracy, outperforming the state-of-the-art by 11.15%.

preprint2016arXiv

Learning Minimum Volume Sets and Anomaly Detectors from KNN Graphs

We propose a non-parametric anomaly detection algorithm for high dimensional data. We first rank scores derived from nearest neighbor graphs on $n$-point nominal training data. We then train limited complexity models to imitate these scores based on the max-margin learning-to-rank framework. A test-point is declared as an anomaly at $α$-false alarm level if the predicted score is in the $α$-percentile. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate $α$, its decision region converges to the $α$-percentile minimum volume level set of the unknown underlying density. In addition, we test both the statistical performance and computational efficiency of our algorithm on a number of synthetic and real-data experiments. Our results demonstrate the superiority of our algorithm over existing $K$-NN based anomaly detection algorithms, with significant computational savings.

preprint2016arXiv

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

preprint2016arXiv

On the Non-Existence of Unbiased Estimators in Constrained Estimation Problems

We address the problem of existence of unbiased constrained parameter estimators. We show that if the constrained set of parameters is compact and the hypothesized distributions are absolutely continuous with respect to one another, then there exists no unbiased estimator. Weaker conditions for the absence of unbiased constrained estimators are also specified. We provide several examples which demonstrate the utility of these conditions.

preprint2016arXiv

Optimally Pruning Decision Tree Ensembles With Feature Cost

We consider the problem of learning decision rules for prediction with feature budget constraint. In particular, we are interested in pruning an ensemble of decision trees to reduce expected feature cost while maintaining high prediction accuracy for any test example. We propose a novel 0-1 integer program formulation for ensemble pruning. Our pruning formulation is general - it takes any ensemble of decision trees as input. By explicitly accounting for feature-sharing across trees together with accuracy/cost trade-off, our method is able to significantly reduce feature cost by pruning subtrees that introduce more loss in terms of feature cost than benefit in terms of prediction accuracy gain. Theoretically, we prove that a linear programming relaxation produces the exact solution of the original integer program. This allows us to use efficient convex optimization tools to obtain an optimally pruned ensemble for any given budget. Empirically, we see that our pruning algorithm significantly improves the performance of the state of the art ensemble method BudgetRF.

preprint2016arXiv

Pruning Random Forests for Prediction on a Budget

We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original integer program. We then exploit connections to combinatorial optimization and develop an efficient primal-dual algorithm, scalable to large datasets. In contrast to our bottom-up approach, which benefits from good RF initialization, conventional methods are top-down acquiring features based on their utility value and is generally intractable, requiring heuristics. Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms.

preprint2016arXiv

Quantifying and Reducing Stereotypes in Word Embeddings

Machine learning algorithms are optimized to model statistical properties of the training data. If the input data reflects stereotypes and biases of the broader society, then the output of the learning algorithm also captures these stereotypes. In this paper, we initiate the study of gender stereotypes in {\em word embedding}, a popular framework to represent text data. As their use becomes increasingly common, applications can inadvertently amplify unwanted stereotypes. We show across multiple datasets that the embeddings contain significant gender stereotypes, especially with regard to professions. We created a novel gender analogy task and combined it with crowdsourcing to systematically quantify the gender bias in a given embedding. We developed an efficient algorithm that reduces gender stereotype using just a handful of training examples while preserving the useful geometric properties of the embedding. We evaluated our algorithm on several metrics. While we focus on male/female stereotypes, our framework may be applicable to other types of embedding biases.

preprint2016arXiv

Resource Constrained Structured Prediction

We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy.

preprint2016arXiv

Sequential Learning without Feedback

In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn} strategies for selecting tests to optimize accuracy \& costs. Unfortunately it is often impossible to acquire in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an {\it unusual} reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well. We empirically verify that weak dominance holds on real datasets and prove that it is a maximal condition for achieving sublinear regret. We reduce USS to a special case of multi-armed bandit problem with side information and develop polynomial time algorithms that achieve sublinear regret.

preprint2016arXiv

Zero-Shot Learning via Joint Latent Similarity Embedding

Zero-shot recognition (ZSR) deals with the problem of predicting class labels for target domain instances based on source domain side information (e.g. attributes) of unseen classes. We formulate ZSR as a binary prediction problem. Our resulting classifier is class-independent. It takes an arbitrary pair of source and target domain instances as input and predicts whether or not they come from the same class, i.e. whether there is a match. We model the posterior probability of a match since it is a sufficient statistic and propose a latent probabilistic model in this context. We develop a joint discriminative learning framework based on dictionary learning to jointly learn the parameters of our model for both domains, which ultimately leads to our class-independent classifier. Many of the existing embedding methods can be viewed as special cases of our probabilistic model. On ZSR our method shows 4.90\% improvement over the state-of-the-art in accuracy averaged across four benchmark datasets. We also adapt ZSR method for zero-shot retrieval and show 22.45\% improvement accordingly in mean average precision (mAP).

preprint2015arXiv

A Topic Modeling Approach to Ranking

We propose a topic modeling approach to the prediction of preferences in pairwise comparisons. We develop a new generative model for pairwise comparisons that accounts for multiple shared latent rankings that are prevalent in a population of users. This new model also captures inconsistent user behavior in a natural way. We show how the estimation of latent rankings in the new generative model can be formally reduced to the estimation of topics in a statistically equivalent topic modeling problem. We leverage recent advances in the topic modeling literature to develop an algorithm that can learn shared latent rankings with provable consistency as well as sample and computational complexity guarantees. We demonstrate that the new approach is empirically competitive with the current state-of-the-art approaches in predicting preferences on some semi-synthetic and real world datasets.

preprint2015arXiv

Algorithms for Linear Bandits on Polyhedral Sets

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for the expected regret that scales as $Ω(N\log T)$. We then provide a nearly optimal algorithm and show that its expected regret scales as $O(N\log^{1+ε}(T))$ for an arbitrary small $ε>0$. The algorithm alternates between exploration and exploitation intervals sequentially where deterministic set of arms are played in the exploration intervals and greedily selected arm is played in the exploitation intervals. We also develop an algorithm that achieves the optimal regret when sub-Gaussianity parameter of the noise term is known. Our key insight is that for a polyhedron the optimal arm is robust to small perturbations in the reward function. Consequently, a greedily selected arm is guaranteed to be optimal when the estimation error falls below some suitable threshold. Our solution resolves a question posed by Rusmevichientong and Tsitsiklis (2011) that left open the possibility of efficient algorithms with asymptotic logarithmic regret bounds. We also show that the regret upper bounds hold with probability $1$. Our numerical investigations show that while theoretical results are asymptotic the performance of our algorithms compares favorably to state-of-the-art algorithms in finite time as well.

preprint2015arXiv

Cheap Bandits

We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions to observe represent some (geographical) area. The importance of this setting is that in these applications, it is actually \textit{cheaper} to observe average reward of a group of actions rather than the reward of a single action. We show that when the reward is \textit{smooth} over a given graph representing the neighboring actions, we can maximize the cumulative reward of learning while \textit{minimizing the sensing cost}. In this paper we propose CheapUCB, an algorithm that matches the regret guarantees of the known algorithms for this setting and at the same time guarantees a linear cost again over them. As a by-product of our analysis, we establish a $Ω(\sqrt{dT})$ lower bound on the cumulative regret of spectral bandits for a class of graphs with effective dimension $d$.

preprint2015arXiv

Efficient Learning by Directed Acyclic Graph For Resource Constrained Prediction

We study the problem of reducing test-time acquisition costs in classification systems. Our goal is to learn decision rules that adaptively select sensors for each example as necessary to make a confident prediction. We model our system as a directed acyclic graph (DAG) where internal nodes correspond to sensor subsets and decision functions at each node choose whether to acquire a new sensor or classify using the available measurements. This problem can be naturally posed as an empirical risk minimization over training data. Rather than jointly optimizing such a highly coupled and non-convex problem over all decision nodes, we propose an efficient algorithm motivated by dynamic programming. We learn node policies in the DAG by reducing the global objective to a series of cost sensitive learning problems. Our approach is computationally efficient and has proven guarantees of convergence to the optimal system for a fixed architecture. In addition, we present an extension to map other budgeted learning problems with large number of sensors to our DAG architecture and demonstrate empirical performance exceeding state-of-the-art algorithms for data composed of both few and many sensors.

preprint2015arXiv

Feature-Budgeted Random Forest

We seek decision rules for prediction-time cost reduction, where complete data is available for training, but during prediction-time, each feature can only be acquired for an additional cost. We propose a novel random forest algorithm to minimize prediction error for a user-specified {\it average} feature acquisition budget. While random forests yield strong generalization performance, they do not explicitly account for feature costs and furthermore require low correlation among trees, which amplifies costs. Our random forest grows trees with low acquisition cost and high strength based on greedy minimax cost-weighted-impurity splits. Theoretically, we establish near-optimal acquisition cost guarantees for our algorithm. Empirically, on a number of benchmark datasets we demonstrate superior accuracy-cost curves against state-of-the-art prediction-time algorithms.

preprint2015arXiv

Group Membership Prediction

The group membership prediction (GMP) problem involves predicting whether or not a collection of instances share a certain semantic property. For instance, in kinship verification given a collection of images, the goal is to predict whether or not they share a {\it familial} relationship. In this context we propose a novel probability model and introduce latent {\em view-specific} and {\em view-shared} random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our model posits that data from each view is independent conditioned on the shared variables. This postulate leads to a parametric probability model that decomposes group membership likelihood into a tensor product of data-independent parameters and data-dependent factors. We propose learning the data-independent parameters in a discriminative way with bilinear classifiers, and test our prediction algorithm on challenging visual recognition tasks such as multi-camera person re-identification and kinship verification. On most benchmark datasets, our method can significantly outperform the current state-of-the-art.

preprint2015arXiv

Learning Efficient Anomaly Detectors from $K$-NN Graphs

We propose a non-parametric anomaly detection algorithm for high dimensional data. We score each datapoint by its average $K$-NN distance, and rank them accordingly. We then train limited complexity models to imitate these scores based on the max-margin learning-to-rank framework. A test-point is declared as an anomaly at $α$-false alarm level if the predicted score is in the $α$-percentile. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate $α$, its decision region converges to the $α$-percentile minimum volume level set of the unknown underlying density. In addition, we test both the statistical performance and computational efficiency of our algorithm on a number of synthetic and real-data experiments. Our results demonstrate the superiority of our algorithm over existing $K$-NN based anomaly detection algorithms, with significant computational savings.

preprint2015arXiv

Learning Immune-Defectives Graph through Group Tests

This paper deals with an abstraction of a unified problem of drug discovery and pathogen identification. Pathogen identification involves identification of disease-causing biomolecules. Drug discovery involves finding chemical compounds, called lead compounds, that bind to pathogenic proteins and eventually inhibit the function of the protein. In this paper, the lead compounds are abstracted as inhibitors, pathogenic proteins as defectives, and the mixture of "ineffective" chemical compounds and non-pathogenic proteins as normal items. A defective could be immune to the presence of an inhibitor in a test. So, a test containing a defective is positive iff it does not contain its "associated" inhibitor. The goal of this paper is to identify the defectives, inhibitors, and their "associations" with high probability, or in other words, learn the Immune Defectives Graph (IDG) efficiently through group tests. We propose a probabilistic non-adaptive pooling design, a probabilistic two-stage adaptive pooling design and decoding algorithms for learning the IDG. For the two-stage adaptive-pooling design, we show that the sample complexity of the number of tests required to guarantee recovery of the inhibitors, defectives, and their associations with high probability, i.e., the upper bound, exceeds the proposed lower bound by a logarithmic multiplicative factor in the number of items. For the non-adaptive pooling design too, we show that the upper bound exceeds the proposed lower bound by at most a logarithmic multiplicative factor in the number of items.

preprint2015arXiv

Learning Mixed Membership Mallows Models from Pairwise Comparisons

We propose a novel parameterized family of Mixed Membership Mallows Models (M4) to account for variability in pairwise comparisons generated by a heterogeneous population of noisy and inconsistent users. M4 models individual preferences as a user-specific probabilistic mixture of shared latent Mallows components. Our key algorithmic insight for estimation is to establish a statistical connection between M4 and topic models by viewing pairwise comparisons as words, and users as documents. This key insight leads us to explore Mallows components with a separable structure and leverage recent advances in separable topic discovery. While separability appears to be overly restrictive, we nevertheless show that it is an inevitable outcome of a relatively small number of latent Mallows components in a world of large number of items. We then develop an algorithm based on robust extreme-point identification of convex polygons to learn the reference rankings, and is provably consistent with polynomial sample complexity guarantees. We demonstrate that our new model is empirically competitive with the current state-of-the-art approaches in predicting real-world preferences.

preprint2015arXiv

Max-Cost Discrete Function Evaluation Problem under a Budget

We propose novel methods for max-cost Discrete Function Evaluation Problem (DFEP) under budget constraints. We are motivated by applications such as clinical diagnosis where a patient is subjected to a sequence of (possibly expensive) tests before a decision is made. Our goal is to develop strategies for minimizing max-costs. The problem is known to be NP hard and greedy methods based on specialized impurity functions have been proposed. We develop a broad class of \emph{admissible} impurity functions that admit monomials, classes of polynomials, and hinge-loss functions that allow for flexible impurity design with provably optimal approximation bounds. This flexibility is important for datasets when max-cost can be overly sensitive to "outliers." Outliers bias max-cost to a few examples that require a large number of tests for classification. We design admissible functions that allow for accuracy-cost trade-off and result in $O(\log n)$ guarantees of the optimal cost among trees with corresponding classification accuracy levels.

preprint2015arXiv

Minimax Optimal Sparse Signal Recovery with Poisson Statistics

We are motivated by problems that arise in a number of applications such as Online Marketing and Explosives detection, where the observations are usually modeled using Poisson statistics. We model each observation as a Poisson random variable whose mean is a sparse linear superposition of known patterns. Unlike many conventional problems observations here are not identically distributed since they are associated with different sensing modalities. We analyze the performance of a Maximum Likelihood (ML) decoder, which for our Poisson setting involves a non-linear optimization but yet is computationally tractable. We derive fundamental sample complexity bounds for sparse recovery when the measurements are contaminated with Poisson noise. In contrast to the least-squares linear regression setting with Gaussian noise, we observe that in addition to sparsity, the scale of the parameters also fundamentally impacts $\ell_2$ error in the Poisson setting. We show tightness of our upper bounds both theoretically and experimentally. In particular, we derive a minimax matching lower bound on the mean-squared error and show that our constrained ML decoder is minimax optimal for this regime.

preprint2015arXiv

Necessary and Sufficient Conditions and a Provably Efficient Algorithm for Separable Topic Discovery

We develop necessary and sufficient conditions and a novel provably consistent and efficient algorithm for discovering topics (latent factors) from observations (documents) that are realized from a probabilistic mixture of shared latent factors that have certain properties. Our focus is on the class of topic models in which each shared latent factor contains a novel word that is unique to that factor, a property that has come to be known as separability. Our algorithm is based on the key insight that the novel words correspond to the extreme points of the convex hull formed by the row-vectors of a suitably normalized word co-occurrence matrix. We leverage this geometric insight to establish polynomial computation and sample complexity bounds based on a few isotropic random projections of the rows of the normalized word co-occurrence matrix. Our proposed random-projections-based algorithm is naturally amenable to an efficient distributed implementation and is attractive for modern web-scale distributed data mining applications.

preprint2015arXiv

PRISM: Person Re-Identification via Structured Matching

Person re-identification (re-id), an emerging problem in visual surveillance, deals with maintaining entities of individuals whilst they traverse various locations surveilled by a camera network. From a visual perspective re-id is challenging due to significant changes in visual appearance of individuals in cameras with different pose, illumination and calibration. Globally the challenge arises from the need to maintain structurally consistent matches among all the individual entities across different camera views. We propose PRISM, a structured matching method to jointly account for these challenges. We view the global problem as a weighted graph matching problem and estimate edge weights by learning to predict them based on the co-occurrences of visual patterns in the training examples. These co-occurrence based scores in turn account for appearance changes by inferring likely and unlikely visual co-occurrences appearing in training instances. We implement PRISM on single shot and multi-shot scenarios. PRISM uniformly outperforms state-of-the-art in terms of matching rate while being computationally efficient.

preprint2015arXiv

Sensor Selection by Linear Programming

We learn sensor trees from training data to minimize sensor acquisition costs during test time. Our system adaptively selects sensors at each stage if necessary to make a confident classification. We pose the problem as empirical risk minimization over the choice of trees and node decision rules. We decompose the problem, which is known to be intractable, into combinatorial (tree structures) and continuous parts (node decision rules) and propose to solve them separately. Using training data we greedily solve for the combinatorial tree structures and for the continuous part, which is a non-convex multilinear objective function, we derive convex surrogate loss functions that are piecewise linear. The resulting problem can be cast as a linear program and has the advantage of guaranteed convergence, global optimality, repeatability and computational efficiency. We show that our proposed approach outperforms the state-of-art on a number of benchmark datasets.

preprint2015arXiv

Zero-Shot Learning via Semantic Similarity Embedding

In this paper we consider a version of the zero-shot learning problem where seen class source and target domain data are provided. The goal during test-time is to accurately predict the class label of an unseen target domain instance based on revealed source domain side information (\eg attributes) for unseen classes. Our method is based on viewing each source or target data as a mixture of seen class proportions and we postulate that the mixture patterns have to be similar if the two instances belong to the same unseen class. This perspective leads us to learning source/target embedding functions that map an arbitrary source/target domain data into a same semantic space where similarity can be readily measured. We develop a max-margin framework to learn these similarity functions and jointly optimize parameters by means of cross validation. Our test results are compelling, leading to significant improvement in terms of accuracy on most benchmark datasets for zero-shot recognition.

preprint2014arXiv

A Novel Visual Word Co-occurrence Model for Person Re-identification

Person re-identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. The problem is fundamentally challenging due to appearance variations resulting from differing poses, illumination and configurations of camera views. To deal with these difficulties, we propose a novel visual word co-occurrence model. We first map each pixel of an image to a visual word using a codebook, which is learned in an unsupervised manner. The appearance transformation between camera views is encoded by a co-occurrence matrix of visual word joint distributions in probe and gallery images. Our appearance model naturally accounts for spatial similarities and variations caused by pose, illumination & configuration change across camera views. Linear SVMs are then trained as classifiers using these co-occurrence descriptors. On the VIPeR and CUHK Campus benchmark datasets, our method achieves 83.86% and 85.49% at rank-15 on the Cumulative Match Characteristic (CMC) curves, and beats the state-of-the-art results by 10.44% and 22.27%.

preprint2014arXiv

A Rank-SVM Approach to Anomaly Detection

We propose a novel non-parametric adaptive anomaly detection algorithm for high dimensional data based on rank-SVM. Data points are first ranked based on scores derived from nearest neighbor graphs on n-point nominal data. We then train a rank-SVM using this ranked data. A test-point is declared as an anomaly at alpha-false alarm level if the predicted score is in the alpha-percentile. The resulting anomaly detector is shown to be asymptotically optimal and adaptive in that for any false alarm rate alpha, its decision region converges to the alpha-percentile level set of the unknown underlying density. In addition we illustrate through a number of synthetic and real-data experiments both the statistical performance and computational efficiency of our anomaly detector.

preprint2014arXiv

Efficient Minimax Signal Detection on Graphs

Several problems such as network intrusion, community detection, and disease outbreak can be described by observations attributed to nodes or edges of a graph. In these applications presence of intrusion, community or disease outbreak is characterized by novel observations on some unknown connected subgraph. These problems can be formulated in terms of optimization of suitable objectives on connected subgraphs, a problem which is generally computationally difficult. We overcome the combinatorics of connectivity by embedding connected subgraphs into linear matrix inequalities (LMI). Computationally efficient tests are then realized by optimizing convex objective functions subject to these LMI constraints. We prove, by means of a novel Euclidean embedding argument, that our tests are minimax optimal for exponential family of distributions on 1-D and 2-D lattices. We show that internal conductance of the connected subgraph family plays a fundamental role in characterizing detectability.

preprint2014arXiv

Information-Theoretic Bounds for Adaptive Sparse Recovery

We derive an information-theoretic lower bound for sample complexity in sparse recovery problems where inputs can be chosen sequentially and adaptively. This lower bound is in terms of a simple mutual information expression and unifies many different linear and nonlinear observation models. Using this formula we derive bounds for adaptive compressive sensing (CS), group testing and 1-bit CS problems. We show that adaptivity cannot decrease sample complexity in group testing, 1-bit CS and CS with linear sparsity. In contrast, we show there might be mild performance gains for CS in the sublinear regime. Our unified analysis also allows characterization of gains due to adaptivity from a wider perspective on sparse problems.

preprint2014arXiv

Non-Adaptive Group Testing with Inhibitors

Group testing with inhibitors (GTI) introduced by Farach at al. is studied in this paper. There are three types of items, $d$ defectives, $r$ inhibitors and $n-d-r$ normal items in a population of $n$ items. The presence of any inhibitor in a test can prevent the expression of a defective. For this model, we propose a probabilistic non-adaptive pooling design with a low complexity decoding algorithm. We show that the sample complexity of the number of tests required for guaranteed recovery with vanishing error probability using the proposed algorithm scales as $T=O(d \log n)$ and $T=O(\frac{r^2}{d}\log n)$ in the regimes $r=O(d)$ and $d=o(r)$ respectively. In the former regime, the number of tests meets the lower bound order while in the latter regime, the number of tests is shown to exceed the lower bound order by a $\log \frac{r}{d}$ multiplicative factor. When only upper bounds on the number of defectives $D$ and the number of inhibitors $R$ are given instead of their exact values, the sample complexity of the number of tests using the proposed algorithm scales as $T=O(D \log n)$ and $T=O(R^2 \log n)$ in the regimes $R^2=O(D)$ and $D=o(R^2)$ respectively. In the former regime, the number of tests meets the lower bound order while in the latter regime, the number of tests exceeds the lower bound order by a $\log R$ multiplicative factor. The time complexity of the proposed decoding algorithms scale as $O(nT)$.

preprint2014arXiv

Non-adaptive Group Testing: Explicit bounds and novel algorithms

We consider some computationally efficient and provably correct algorithms with near-optimal sample-complexity for the problem of noisy non-adaptive group testing. Group testing involves grouping arbitrary subsets of items into pools. Each pool is then tested to identify the defective items, which are usually assumed to be "sparse". We consider non-adaptive randomly pooling measurements, where pools are selected randomly and independently of the test outcomes. We also consider a model where noisy measurements allow for both some false negative and some false positive test outcomes (and also allow for asymmetric noise, and activation noise). We consider three classes of algorithms for the group testing problem (we call them specifically the "Coupon Collector Algorithm", the "Column Matching Algorithms", and the "LP Decoding Algorithms" -- the last two classes of algorithms (versions of some of which had been considered before in the literature) were inspired by corresponding algorithms in the Compressive Sensing literature. The second and third of these algorithms have several flavours, dealing separately with the noiseless and noisy measurement scenarios. Our contribution is novel analysis to derive explicit sample-complexity bounds -- with all constants expressly computed -- for these algorithms as a function of the desired error probability; the noise parameters; the number of items; and the size of the defective set (or an upper bound on it). We also compare the bounds to information-theoretic lower bounds for sample complexity based on Fano's inequality and show that the upper and lower bounds are equal up to an explicitly computable universal constant factor (independent of problem parameters).

preprint2014arXiv

RAPID: Rapidly Accelerated Proximal Gradient Algorithms for Convex Minimization

In this paper, we propose a new algorithm to speed-up the convergence of accelerated proximal gradient (APG) methods. In order to minimize a convex function $f(\mathbf{x})$, our algorithm introduces a simple line search step after each proximal gradient step in APG so that a biconvex function $f(θ\mathbf{x})$ is minimized over scalar variable $θ>0$ while fixing variable $\mathbf{x}$. We propose two new ways of constructing the auxiliary variables in APG based on the intermediate solutions of the proximal gradient and the line search steps. We prove that at arbitrary iteration step $t (t\geq1)$, our algorithm can achieve a smaller upper-bound for the gap between the current and optimal objective values than those in the traditional APG methods such as FISTA, making it converge faster in practice. In fact, our algorithm can be potentially applied to many important convex optimization problems, such as sparse linear regression and kernel SVMs. Our experimental results clearly demonstrate that our algorithm converges faster than APG in all of the applications above, even comparable to some sophisticated solvers.

preprint2014arXiv

Retrieval in Long Surveillance Videos using User Described Motion and Object Attributes

We present a content-based retrieval method for long surveillance videos both for wide-area (Airborne) as well as near-field imagery (CCTV). Our goal is to retrieve video segments, with a focus on detecting objects moving on routes, that match user-defined events of interest. The sheer size and remote locations where surveillance videos are acquired, necessitates highly compressed representations that are also meaningful for supporting user-defined queries. To address these challenges we archive long-surveillance video through lightweight processing based on low-level local spatio-temporal extraction of motion and object features. These are then hashed into an inverted index using locality-sensitive hashing (LSH). This local approach allows for query flexibility as well as leads to significant gains in compression. Our second task is to extract partial matches to the user-created query and assembles them into full matches using Dynamic Programming (DP). DP exploits causality to assemble the indexed low level features into a video segment which matches the query route. We examine CCTV and Airborne footage, whose low contrast makes motion extraction more difficult. We generate robust motion estimates for Airborne data using a tracklets generation algorithm while we use Horn and Schunck approach to generate motion estimates for CCTV. Our approach handles long routes, low contrasts and occlusion. We derive bounds on the rate of false positives and demonstrate the effectiveness of the approach for counting, motion pattern recognition and abandoned object applications.

preprint2014arXiv

Sensing-Aware Kernel SVM

We propose a novel approach for designing kernels for support vector machines (SVMs) when the class label is linked to the observation through a latent state and the likelihood function of the observation given the state (the sensing model) is available. We show that the Bayes-optimum decision boundary is a hyperplane under a mapping defined by the likelihood function. Combining this with the maximum margin principle yields kernels for SVMs that leverage knowledge of the sensing model in an optimal way. We derive the optimum kernel for the bag-of-words (BoWs) sensing model and demonstrate its superior performance over other kernels in document and image classification tasks. These results indicate that such optimum sensing-aware kernel SVMs can match the performance of rather sophisticated state-of-the-art approaches.

preprint2014arXiv

Sparse Recovery with Linear and Nonlinear Observations: Dependent and Noisy Data

We formulate sparse support recovery as a salient set identification problem and use information-theoretic analyses to characterize the recovery performance and sample complexity. We consider a very general model where we are not restricted to linear models or specific distributions. We state non-asymptotic bounds on recovery probability and a tight mutual information formula for sample complexity. We evaluate our bounds for applications such as sparse linear regression and explicitly characterize effects of correlation or noisy features on recovery performance. We show improvements upon previous work and identify gaps between the performance of recovery algorithms and fundamental information.

preprint2013arXiv

A New Geometric Approach to Latent Topic Modeling and Discovery

A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering extreme points of empirical cross-document word-frequencies that correspond to novel "words" unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state-of-the-art approaches on synthetic and real-world datasets.

preprint2013arXiv

An Impossibility Result for High Dimensional Supervised Learning

We study high-dimensional asymptotic performance limits of binary supervised classification problems where the class conditional densities are Gaussian with unknown means and covariances and the number of signal dimensions scales faster than the number of labeled training samples. We show that the Bayes error, namely the minimum attainable error probability with complete distributional knowledge and equally likely classes, can be arbitrarily close to zero and yet the limiting minimax error probability of every supervised learning algorithm is no better than a random coin toss. In contrast to related studies where the classification difficulty (Bayes error) is made to vanish, we hold it constant when taking high-dimensional limits. In contrast to VC-dimension based minimax lower bounds that consider the worst case error probability over all distributions that have a fixed Bayes error, our worst case is over the family of Gaussian distributions with constant Bayes error. We also show that a nontrivial asymptotic minimax error probability can only be attained for parametric subsets of zero measure (in a suitable measure space). These results expose the fundamental importance of prior knowledge and suggest that unless we impose strong structural constraints, such as sparsity, on the parametric space, supervised learning may be ineffective in high dimensional small sample settings.

preprint2013arXiv

Boolean Compressed Sensing and Noisy Group Testing

The fundamental task of group testing is to recover a small distinguished subset of items from a large population while efficiently reducing the total number of tests (measurements). The key contribution of this paper is in adopting a new information-theoretic perspective on group testing problems. We formulate the group testing problem as a channel coding/decoding problem and derive a single-letter characterization for the total number of tests used to identify the defective set. Although the focus of this paper is primarily on group testing, our main result is generally applicable to other compressive sensing models. The single letter characterization is shown to be order-wise tight for many interesting noisy group testing scenarios. Specifically, we consider an additive Bernoulli($q$) noise model where we show that, for $N$ items and $K$ defectives, the number of tests $T$ is $O(\frac{K\log N}{1-q})$ for arbitrarily small average error probability and $O(\frac{K^2\log N}{1-q})$ for a worst case error criterion. We also consider dilution effects whereby a defective item in a positive pool might get diluted with probability $u$ and potentially missed. In this case, it is shown that $T$ is $O(\frac{K\log N}{(1-u)^2})$ and $O(\frac{K^2\log N}{(1-u)^2})$ for the average and the worst case error criteria, respectively. Furthermore, our bounds allow us to verify existing known bounds for noiseless group testing including the deterministic noise-free case and approximate reconstruction with bounded distortion. Our proof of achievability is based on random coding and the analysis of a Maximum Likelihood Detector, and our information theoretic lower bound is based on Fano's inequality.

preprint2013arXiv

Multi-Stage Classifier Design

In many classification systems, sensing modalities have different acquisition costs. It is often {\it unnecessary} to use every modality to classify a majority of examples. We study a multi-stage system in a prediction time cost reduction setting, where the full data is available for training, but for a test example, measurements in a new modality can be acquired at each stage for an additional cost. We seek decision rules to reduce the average measurement acquisition cost. We formulate an empirical risk minimization problem (ERM) for a multi-stage reject classifier, wherein the stage $k$ classifier either classifies a sample using only the measurements acquired so far or rejects it to the next stage where more attributes can be acquired for a cost. To solve the ERM problem, we show that the optimal reject classifier at each stage is a combination of two binary classifiers, one biased towards positive examples and the other biased towards negative examples. We use this parameterization to construct stage-by-stage global surrogate risk, develop an iterative algorithm in the boosting framework and present convergence and generalization results. We test our work on synthetic, medical and explosives detection datasets. Our results demonstrate that substantial cost reduction without a significant sacrifice in accuracy is achievable.

preprint2013arXiv

Near-Optimal Stochastic Threshold Group Testing

We formulate and analyze a stochastic threshold group testing problem motivated by biological applications. Here a set of $n$ items contains a subset of $d \ll n$ defective items. Subsets (pools) of the $n$ items are tested -- the test outcomes are negative, positive, or stochastic (negative or positive with certain probabilities that might depend on the number of defectives being tested in the pool), depending on whether the number of defective items in the pool being tested are fewer than the {\it lower threshold} $l$, greater than the {\it upper threshold} $u$, or in between. The goal of a {\it stochastic threshold group testing} scheme is to identify the set of $d$ defective items via a "small" number of such tests. In the regime that $l = o(d)$ we present schemes that are computationally feasible to design and implement, and require near-optimal number of tests (significantly improving on existing schemes). Our schemes are robust to a variety of models for probabilistic threshold group testing.

preprint2013arXiv

Necessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models

The simplicial condition and other stronger conditions that imply it have recently played a central role in developing polynomial time algorithms with provable asymptotic consistency and sample complexity guarantees for topic estimation in separable topic models. Of these algorithms, those that rely solely on the simplicial condition are impractical while the practical ones need stronger conditions. In this paper, we demonstrate, for the first time, that the simplicial condition is a fundamental, algorithm-independent, information-theoretic necessary condition for consistent separable topic estimation. Furthermore, under solely the simplicial condition, we present a practical quadratic-complexity algorithm based on random projections which consistently detects all novel words of all topics using only up to second-order empirical word moments. This algorithm is amenable to distributed implementation making it attractive for 'big-data' scenarios involving a network of large distributed databases.

preprint2013arXiv

Spectral Clustering with Imbalanced Data

Spectral clustering is sensitive to how graphs are constructed from data particularly when proximal and imbalanced clusters are present. We show that Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced data since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced data. Our approach parameterizes a family of graphs, by adaptively modulating node degrees on a fixed node set, to yield a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach. We demonstrate the superiority of our method through unsupervised and semi-supervised experiments on synthetic and real data sets.

preprint2013arXiv

Spectral Clustering with Unbalanced Data

Spectral clustering (SC) and graph-based semi-supervised learning (SSL) algorithms are sensitive to how graphs are constructed from data. In particular if the data has proximal and unbalanced clusters these algorithms can lead to poor performance on well-known graphs such as $k$-NN, full-RBF, $ε$-graphs. This is because the objectives such as Ratio-Cut (RCut) or normalized cut (NCut) attempt to tradeoff cut values with cluster sizes, which are not tailored to unbalanced data. We propose a novel graph partitioning framework, which parameterizes a family of graphs by adaptively modulating node degrees in a $k$-NN graph. We then propose a model selection scheme to choose sizable clusters which are separated by smallest cut values. Our framework is able to adapt to varying levels of unbalancedness of data and can be naturally used for small cluster detection. We theoretically justify our ideas through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method.

preprint2013arXiv

Topic Discovery through Data Dependent and Random Projections

We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and real-world datasets to demonstrate qualitative and quantitative merits of our scheme.

preprint2012arXiv

Graph-based Learning with Unbalanced Clusters

Graph construction is a crucial step in spectral clustering (SC) and graph-based semi-supervised learning (SSL). Spectral methods applied on standard graphs such as full-RBF, $ε$-graphs and $k$-NN graphs can lead to poor performance in the presence of proximal and unbalanced data. This is because spectral methods based on minimizing RatioCut or normalized cut on these graphs tend to put more importance on balancing cluster sizes over reducing cut values. We propose a novel graph construction technique and show that the RatioCut solution on this new graph is able to handle proximal and unbalanced data. Our method is based on adaptively modulating the neighborhood degrees in a $k$-NN graph, which tends to sparsify neighborhoods in low density regions. Our method adapts to data with varying levels of unbalancedness and can be naturally used for small cluster detection. We justify our ideas through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method.

preprint2011arXiv

A Token Based Algorithm to Distributed Computation in Sensor Networks

We consider distributed algorithms for data aggregation and function computation in sensor networks. The algorithms perform pairwise computations along edges of an underlying communication graph. A token is associated with each sensor node, which acts as a transmission permit. Nodes with active tokens have transmission permits; they generate messages at a constant rate and send each message to a randomly selected neighbor. By using different strategies to control the transmission permits we can obtain tradeoffs between message and time complexity. Gossip corresponds to the case when all nodes have permits all the time. We study algorithms where permits are revoked after transmission and restored upon reception. Examples of such algorithms include Simple-Random Walk(SRW), Coalescent-Random-Walk(CRW) and Controlled Flooding(CFLD) and their hybrid variants. SRW has a single node permit, which is passed on in the network. CRW, initially initially has a permit for each node but these permits are revoked gradually. The final result for SRW and CRW resides at a single(or few) random node(s) making a direct comparison with GOSSIP difficult. A hybrid two-phase algorithm switching from CRW to CFLD at a suitable pre-determined time can be employed to achieve consensus. We show that such hybrid variants achieve significant gains in both message and time complexity. The per-node message complexity for n-node graphs, such as 2D mesh, torii, and Random geometric graphs, scales as $O(polylog(n))$ and the corresponding time complexity scales as O(n). The reduced per-node message complexity leads to reduced energy utilization in sensor networks.

preprint2011arXiv

Graph Construction for Learning with Unbalanced Data

Unbalanced data arises in many learning tasks such as clustering of multi-class data, hierarchical divisive clustering and semisupervised learning. Graph-based approaches are popular tools for these problems. Graph construction is an important aspect of graph-based learning. We show that graph-based algorithms can fail for unbalanced data for many popular graphs such as k-NN, ε-neighborhood and full-RBF graphs. We propose a novel graph construction technique that encodes global statistical information into node degrees through a ranking scheme. The rank of a data sample is an estimate of its p-value and is proportional to the total number of data samples with smaller density. This ranking scheme serves as a surrogate for density; can be reliably estimated; and indicates whether a data sample is close to valleys/modes. This rank-modulated degree(RMD) scheme is able to significantly sparsify the graph near valleys and provides an adaptive way to cope with unbalanced data. We then theoretically justify our method through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method.

preprint2011arXiv

Graph-Constrained Group Testing

Non-adaptive group testing involves grouping arbitrary subsets of $n$ items into different pools. Each pool is then tested and defective items are identified. A fundamental question involves minimizing the number of pools required to identify at most $d$ defective items. Motivated by applications in network tomography, sensor networks and infection propagation, a variation of group testing problems on graphs is formulated. Unlike conventional group testing problems, each group here must conform to the constraints imposed by a graph. For instance, items can be associated with vertices and each pool is any set of nodes that must be path connected. In this paper, a test is associated with a random walk. In this context, conventional group testing corresponds to the special case of a complete graph on $n$ vertices. For interesting classes of graphs a rather surprising result is obtained, namely, that the number of tests required to identify $d$ defective items is substantially similar to what is required in conventional group testing problems, where no such constraints on pooling is imposed. Specifically, if T(n) corresponds to the mixing time of the graph $G$, it is shown that with $m=O(d^2T^2(n)\log(n/d))$ non-adaptive tests, one can identify the defective items. Consequently, for the Erdos-Renyi random graph $G(n,p)$, as well as expander graphs with constant spectral gap, it follows that $m=O(d^2\log^3n)$ non-adaptive tests are sufficient to identify $d$ defective items. Next, a specific scenario is considered that arises in network tomography, for which it is shown that $m=O(d^3\log^3n)$ non-adaptive tests are sufficient to identify $d$ defective items. Noisy counterparts of the graph constrained group testing problem are considered, for which parallel results are developed. We also briefly discuss extensions to compressive sensing on graphs.

preprint2011arXiv

Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms

We consider the problem of detecting a small subset of defective items from a large set via non-adaptive "random pooling" group tests. We consider both the case when the measurements are noiseless, and the case when the measurements are noisy (the outcome of each group test may be independently faulty with probability q). Order-optimal results for these scenarios are known in the literature. We give information-theoretic lower bounds on the query complexity of these problems, and provide corresponding computationally efficient algorithms that match the lower bounds up to a constant factor. To the best of our knowledge this work is the first to explicitly estimate such a constant that characterizes the gap between the upper and lower bounds for these problems.

preprint2011arXiv

Structural Similarity and Distance in Learning

We propose a novel method of introducing structure into existing machine learning techniques by developing structure-based similarity and distance measures. To learn structural information, low-dimensional structure of the data is captured by solving a non-linear, low-rank representation problem. We show that this low-rank representation can be kernelized, has a closed-form solution, allows for separation of independent manifolds, and is robust to noise. From this representation, similarity between observations based on non-linear structure is computed and can be incorporated into existing feature transformations, dimensionality reduction techniques, and machine learning methods. Experimental results on both synthetic and real data sets show performance improvements for clustering, and anomaly detection through the use of structural similarity.

preprint2010arXiv

Information theoretic bounds for Compressed Sensing

In this paper we derive information theoretic performance bounds to sensing and reconstruction of sparse phenomena from noisy projections. We consider two settings: output noise models where the noise enters after the projection and input noise models where the noise enters before the projection. We consider two types of distortion for reconstruction: support errors and mean-squared errors. Our goal is to relate the number of measurements, $m$, and $\snr$, to signal sparsity, $k$, distortion level, $d$, and signal dimension, $n$. We consider support errors in a worst-case setting. We employ different variations of Fano's inequality to derive necessary conditions on the number of measurements and $\snr$ required for exact reconstruction. To derive sufficient conditions we develop new insights on max-likelihood analysis based on a novel superposition property. In particular this property implies that small support errors are the dominant error events. Consequently, our ML analysis does not suffer the conservatism of the union bound and leads to a tighter analysis of max-likelihood. These results provide order-wise tight bounds. For output noise models we show that asymptotically an $\snr$ of $Θ(\log(n))$ together with $Θ(k \log(n/k))$ measurements is necessary and sufficient for exact support recovery. Furthermore, if a small fraction of support errors can be tolerated, a constant $\snr$ turns out to be sufficient in the linear sparsity regime. In contrast for input noise models we show that support recovery fails if the number of measurements scales as $o(n\log(n)/SNR)$ implying poor compression performance for such cases. We also consider Bayesian set-up and characterize tradeoffs between mean-squared distortion and the number of measurements using rate-distortion theory.

preprint2008arXiv

Distributed Detection in Sensor Networks with Limited Range Sensors

We consider a multi-object detection problem over a sensor network (SNET) with limited range sensors. This problem complements the widely considered decentralized detection problem where all sensors observe the same object. While the necessity for global collaboration is clear in the decentralized detection problem, the benefits of collaboration with limited range sensors is unclear and has not been widely explored. In this paper we develop a distributed detection approach based on recent development of the false discovery rate (FDR). We first extend the FDR procedure and develop a transformation that exploits complete or partial knowledge of either the observed distributions at each sensor or the ensemble (mixture) distribution across all sensors. We then show that this transformation applies to multi-dimensional observations, thus extending FDR to multi-dimensional settings. We also extend FDR theory to cases where distributions under both null and positive hypotheses are uncertain. We then propose a robust distributed algorithm to perform detection. We further demonstrate scalability to large SNETs by showing that the upper bound on the communication complexity scales linearly with the number of sensors that are in the vicinity of objects and is independent of the total number of sensors. Finally, we deal with situations where the sensing model may be uncertain and establish robustness of our techniques to such uncertainties.

Venkatesh Saligrama

What is connected

Connect this record

See the researcher in context

Building this map preview

63 published item(s)

Data Deletion Can Help in Adaptive RL

Faster Algorithms for Learning Convex Functions

FedHeN: Federated Learning in Heterogeneous Networks

Learning Compositional Representations for Effective Low-Shot Generalization

Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

Budget Learning via Bracketing

Dont Even Look Once: Synthesizing Features for Zero-Shot Detection

Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers

Clustering and Community Detection with Imbalanced Clusters

Efficient Training of Very Deep Neural Networks for Supervised Hashing

Learning Joint Feature Adaptation for Zero-Shot Recognition

Learning Minimum Volume Sets and Anomaly Detectors from KNN Graphs

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

On the Non-Existence of Unbiased Estimators in Constrained Estimation Problems

Optimally Pruning Decision Tree Ensembles With Feature Cost

Pruning Random Forests for Prediction on a Budget

Quantifying and Reducing Stereotypes in Word Embeddings

Resource Constrained Structured Prediction

Sequential Learning without Feedback

Zero-Shot Learning via Joint Latent Similarity Embedding

A Topic Modeling Approach to Ranking

Algorithms for Linear Bandits on Polyhedral Sets

Cheap Bandits

Efficient Learning by Directed Acyclic Graph For Resource Constrained Prediction

Feature-Budgeted Random Forest

Group Membership Prediction

Learning Efficient Anomaly Detectors from $K$-NN Graphs

Learning Immune-Defectives Graph through Group Tests

Learning Mixed Membership Mallows Models from Pairwise Comparisons

Max-Cost Discrete Function Evaluation Problem under a Budget

Minimax Optimal Sparse Signal Recovery with Poisson Statistics

Necessary and Sufficient Conditions and a Provably Efficient Algorithm for Separable Topic Discovery

PRISM: Person Re-Identification via Structured Matching

Sensor Selection by Linear Programming

Zero-Shot Learning via Semantic Similarity Embedding

A Novel Visual Word Co-occurrence Model for Person Re-identification

A Rank-SVM Approach to Anomaly Detection

Efficient Minimax Signal Detection on Graphs

Information-Theoretic Bounds for Adaptive Sparse Recovery

Non-Adaptive Group Testing with Inhibitors

Non-adaptive Group Testing: Explicit bounds and novel algorithms

RAPID: Rapidly Accelerated Proximal Gradient Algorithms for Convex Minimization

Retrieval in Long Surveillance Videos using User Described Motion and Object Attributes

Sensing-Aware Kernel SVM

Sparse Recovery with Linear and Nonlinear Observations: Dependent and Noisy Data

A New Geometric Approach to Latent Topic Modeling and Discovery

An Impossibility Result for High Dimensional Supervised Learning

Boolean Compressed Sensing and Noisy Group Testing

Multi-Stage Classifier Design

Near-Optimal Stochastic Threshold Group Testing

Necessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models

Spectral Clustering with Imbalanced Data

Spectral Clustering with Unbalanced Data

Topic Discovery through Data Dependent and Random Projections

Graph-based Learning with Unbalanced Clusters

A Token Based Algorithm to Distributed Computation in Sensor Networks

Graph Construction for Learning with Unbalanced Data

Graph-Constrained Group Testing

Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms

Structural Similarity and Distance in Learning

Information theoretic bounds for Compressed Sensing

Distributed Detection in Sensor Networks with Limited Range Sensors