Source author record

Arindam Banerjee

Arindam Banerjee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.AC math.CO math.OC math.ST Statistics Theory Applications Artificial Intelligence physics.flu-dyn Computational Engineering, Finance, and Science Cryptography and Security Discrete Mathematics math.AT

Catalog footprint

What is connected

38works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Complexity and speed of semi-algebraic multi-persistence

Let $\mathrm{R}$ be a real closed field, $S \subset \mathrm{R}^n$ a closed and bounded semi-algebraic set, and $\mathbf{f}=(f_1,\ldots,f_p):S \rightarrow \mathrm{R}^p$ a continuous semi-algebraic map inducing a $p$-parameter semi-algebraic filtration by sublevel sets. We introduce a barcode invariant for such filtrations that directly extends the classical ($p=1$) barcode. After scaling of the parameter space, in each homological degree $\ell$ the invariant is encoded by a $\mathbb{Z}_{\ge 0}$-valued function \[ μ_\ell(S,\mathbf{f}):\ \Big(({-}1,1)^p\times(({-}1,1)^p \cup\{(1,\ldots,1)\}) \Big)\ \cap\ \{(\mathbf a,\mathbf b)\mid \mathbf a\preceq \mathbf b\} \ \longrightarrow\ \mathbb{Z}_{\ge 0}, \] where $\preceq$ denotes the product order on $\mathrm{R}^p$. We prove that $μ_\ell(S,\mathbf{f})$ is semi-algebraically constructible and establish a singly exponential upper bound on its description complexity. Moreover, we give a singly exponential-time algorithm to compute $μ_\ell(S,\mathbf{f})$, extending to arbitrary $p$ the corresponding result for $p=1$ by Basu and Karisani. Finally, for semi-algebraic filtrations of bounded description complexity we bound the number of equivalence classes of finite poset modules realizable in this way, yielding a tight analogue of "speed" bounds for algebraically defined graph classes.

preprint2026arXiv

Plan Before You Trade: Inference-Time Optimization for RL Trading Agents

Reinforcement learning agents for portfolio management are typically trained and deployed as static policies, with no mechanism for using price forecasts at inference time. We propose $\text{FPILOT}$ (**Fin**ancial **P**lugin **I**nference-time **L**earning for **O**ptimal **T**rading), a plugin inference-time optimization framework inspired by Model Predictive Control (MPC). Our key structural insight is that future prices mostly do not depend on one agent's portfolio allocation, so a suitable predictive model can produce a multi-step price trajectory without iterative action-conditioned rollouts as in typical reinforcement learning. At each decision step, we use the forecaster's predicted price trajectory to construct an allocation-based imagined return objective, and optimize the policy at inference-time before executing one step of the trade. Our framework is compatible with any pre-trained agent and adapts the policy to the forecaster's predictions without any retraining. Evaluated across five policy learning algorithms on the TradeMaster DJ30 benchmark, $\text{FPILOT}$ produces consistent improvements in total return and return-based risk-adjusted metrics (Sharpe, Sortino, Calmar), with stochastic policies benefiting more than deterministic ones. Further, using synthetic forecasts at calibrated quality levels, we show that gains consistently improve with forecaster quality, suggesting that our performance will improve based on advances in financial forecasting.

preprint2023arXiv

Improved Algorithms for Neural Active Learning

We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. In particular, we introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work. Then, the proposed algorithm leverages the powerful representation of NNs for both exploitation and exploration, has the query decision-maker tailored for $k$-class classification problems with the performance guarantee, utilizes the full feedback, and updates parameters in a more practical and efficient manner. These careful designs lead to an instance-dependent regret upper bound, roughly improving by a multiplicative factor $O(\log T)$ and removing the curse of input dimensionality. Furthermore, we show that the algorithm can achieve the same performance as the Bayes-optimal classifier in the long run under the hard-margin setting in classification problems. In the end, we use extensive experiments to evaluate the proposed algorithm and SOTA baselines, to show the improved empirical performance.

preprint2022arXiv

Bounds for the regularity of product of edge ideals

Let $I$ and $J$ be edge ideals in a polynomial ring $R = \mathbb{K}[x_1,\ldots,x_n]$ with $I \subseteq J$. In this paper, we obtain a general upper and lower bound for the Castelnuovo-Mumford regularity of $IJ$ in terms of certain invariants associated with $I$ and $J$. Using these results, we explicitly compute the regularity of $IJ$ for several classes of edge ideals. Let $J_1,\ldots,J_d$ be edge ideals in a polynomial ring $R$ with $J_1 \subseteq \cdots \subseteq J_d$. Finally, we compute the precise expression for the regularity of $J_1 J_2\cdots J_d$ when $d \in \{3,4\}$ and $J_d$ is the edge ideal of complete graph.

preprint2022arXiv

EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits

In this paper, we propose a novel neural exploration strategy in contextual bandits, EE-Net, distinct from the standard UCB-based and TS-based approaches. Contextual multi-armed bandits have been studied for decades with various applications. To solve the exploitation-exploration tradeoff in bandits, there are three main techniques: epsilon-greedy, Thompson Sampling (TS), and Upper Confidence Bound (UCB). In recent literature, linear contextual bandits have adopted ridge regression to estimate the reward function and combine it with TS or UCB strategies for exploration. However, this line of works explicitly assumes the reward is based on a linear function of arm vectors, which may not be true in real-world datasets. To overcome this challenge, a series of neural bandit algorithms have been proposed, where a neural network is used to learn the underlying reward function and TS or UCB are adapted for exploration. Instead of calculating a large-deviation based statistical bound for exploration like previous methods, we propose "EE-Net", a novel neural-based exploration strategy. In addition to using a neural network (Exploitation network) to learn the reward function, EE-Net uses another neural network (Exploration network) to adaptively learn potential gains compared to the currently estimated reward for exploration. Then, a decision-maker is constructed to combine the outputs from the Exploitation and Exploration networks. We prove that EE-Net can achieve $\mathcal{O}(\sqrt{T\log T})$ regret and show that EE-Net outperforms existing linear and neural contextual bandit baselines on real-world datasets.

preprint2022arXiv

On the Hilbert-Samuel coefficients of Frobenius powers of an ideal

We provide suitable conditions under which the asymptotic limit of the Hilbert-Samuel coefficients of the Frobenius powers of an $\mathfrak{m}$-primary ideal exists in a Noetherian local ring $(R,\mathfrak{m})$ with prime characteristic $p>0.$ This, in turn, gives an expression of the Hilbert-Kunz multiplicity of powers of the ideal. We also prove that for a face ring $R$ of a simplicial complex and an ideal $J$ generated by pure powers of the variables, the generalized Hilbert-Kunz function $\ell(R/(J^{[q]})^k)$ is a polynomial for all $q,k$ and also give an expression of the generalized Hilbert-Kunz multiplicity of powers of $J$ in terms of Hilbert-Samuel multiplicity of $J.$ We conclude by giving a counter-example to a conjecture proposed by I. Smirnov which connects the stability of an ideal with the asymptotic limit of the first Hilbert coefficient of the Frobenius power of the ideal.

preprint2021arXiv

Experiments with Rich Regime Training for Deep Learning

In spite of advances in understanding lazy training, recent work attributes the practical success of deep learning to the rich regime with complex inductive bias. In this paper, we study rich regime training empirically with benchmark datasets, and find that while most parameters are lazy, there is always a small number of active parameters which change quite a bit during training. We show that re-initializing (resetting to their initial random values) the active parameters leads to worse generalization. Further, we show that most of the active parameters are in the bottom layers, close to the input, especially as the networks become wider. Based on such observations, we study static Layer-Wise Sparse (LWS) SGD, which only updates some subsets of layers. We find that only updating the top and bottom layers have good generalization and, as expected, only updating the top layers yields a fast algorithm. Inspired by this, we investigate probabilistic LWS-SGD, which mostly updates the top layers and occasionally updates the full network. We show that probabilistic LWS-SGD matches the generalization performance of vanilla SGD and the back-propagation time can be 2-5 times more efficient.

preprint2021arXiv

Generalized Hilbert-Kunz function of the Rees algebra of the face ring of a simplicial complex

Let $R$ be the face ring of a simplicial complex of dimension $d-1$ and ${\mathcal R}(\mathfrak{n})$ be the Rees algebra of the maximal homogeneous ideal $\mathfrak{n}$ of $R.$ We show that the generalized Hilbert-Kunz function $HK(s)=\ell({\mathcal R}(\mathfrak n)/(\mathfrak n, \mathfrak n t)^{[s]})$ is given by a polynomial for all large $s.$ We calculate it in many examples and also provide a Macaulay2 code for computing $HK(s).$

preprint2021arXiv

Packing properties of cubic squarefree monomial ideals

The symbolic powers, in general, are not equal to the ordinary powers. Therefore, one interesting question here is for what classes of ideals ordinary and symbolic powers coincide? The answer to this question for squarefree monomial ideals may be packing property. In this paper, we classify all cubic path ideals for those the symbolic and ordinary powers coincide.

preprint2020arXiv

Atwood and Reynolds numbers effects on the evolution of buoyancy-driven homogeneous variable-density turbulence

The evolution of buoyancy-driven homogeneous variable-density turbulence (HVDT) at Atwood numbers up to 0.75 and large Reynolds numbers is studied by using high-resolution Direct Numerical Simulations. To help understand the highly non-equilibrium nature of buoyancy-driven HVDT, the flow evolution is divided into four different regimes based on the behavior of turbulent kinetic energy derivatives. The results show that each regime has a unique type of dependency on both Atwood and Reynolds numbers. It is found that the local statistics of the flow based on the flow composition are more sensitive to Atwood and Reynolds numbers compared to those based on the entire flow. It is also observed that at higher Atwood numbers, different flow features reach their asymptotic Reynolds number behavior at different times. The energy spectrum defined based on the Favre fluctuations momentum has less large scale contamination from viscous effects for variable density flows with constant properties, compared to other forms used previously. The evolution of the energy spectrum highlights distinct dynamical features of the four flow regimes. Thus, the slope of the energy spectrum at intermediate to large scales evolves from -7/3 to -1, as a function of the production to dissipation ratio. The classical Kolmogorov spectrum emerges at intermediate to high scales at the highest Reynolds numbers examined, after the turbulence starts to decay. Finally, the similarities and differences between buoyancy-driven HVDT and the more conventional stationary turbulence are discussed and new strategies and tools for analysis are proposed.

preprint2020arXiv

Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

We study differentially private (DP) algorithms for stochastic non-convex optimization. In this problem, the goal is to minimize the population loss over a $p$-dimensional space given $n$ i.i.d. samples drawn from a distribution. We improve upon the population gradient bound of ${\sqrt{p}}/{\sqrt{n}}$ from prior work and obtain a sharper rate of $\sqrt[4]{p}/\sqrt{n}$. We obtain this rate by providing the first analyses on a collection of private gradient-based methods, including adaptive algorithms DP RMSProp and DP Adam. Our proof technique leverages the connection between differential privacy and adaptive data analysis to bound gradient estimation error at every iterate, which circumvents the worse generalization bound from the standard uniform convergence argument. Finally, we evaluate the proposed algorithms on two popular deep learning tasks and demonstrate the empirical advantages of DP adaptive gradient methods over standard DP SGD.

preprint2020arXiv

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter $θ^*$ has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- and multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for $θ^*$ with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of $θ^*$. We also obtain sharper regret bounds compared to earlier work for the unstructured $θ^*$ setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.

preprint2020arXiv

Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances

Sub-seasonal climate forecasting (SSF) focuses on predicting key climate variables such as temperature and precipitation in the 2-week to 2-month time scales. Skillful SSF would have immense societal value, in areas such as agricultural productivity, water resource management, transportation and aviation systems, and emergency planning for extreme weather events. However, SSF is considered more challenging than either weather prediction or even seasonal prediction. In this paper, we carefully study a variety of machine learning (ML) approaches for SSF over the US mainland. While atmosphere-land-ocean couplings and the limited amount of good quality data makes it hard to apply black-box ML naively, we show that with carefully constructed feature representations, even linear regression models, e.g., Lasso, can be made to perform well. Among a broad suite of 10 ML approaches considered, gradient boosting performs the best, and deep learning (DL) methods show some promise with careful architecture choices. Overall, suitable ML methods are able to outperform the climatological baseline, i.e., predictions based on the 30-year average at a given location and time. Further, based on studying feature importance, ocean (especially indices based on climatic oscillations such as El Nino) and land (soil moisture) covariates are found to be predictive, whereas atmospheric covariates are not considered helpful.

preprint2020arXiv

Variable-density buoyancy-driven turbulence with asymmetric initial density distribution

The effects of different initial density distributions on the evolution of buoyancy-driven homogeneous variable-density turbulence (HVDT) at low (0.05) and high (0.75) Atwood numbers are studied by using high-resolution direct numerical simulations. HVDT aims to mimic the acceleration-driven Rayleigh-Taylor and shock-driven Richtmyer-Meshkov instabilities and reveals new physics that arise from variable-density effects on the turbulent mixing. Here, the initial amounts of pure light and pure heavy flows are altered primarily to mimic the variable-density turbulence at the different locations of the Rayleigh-Taylor and Richtmyer-Meshkov instabilities' mixing layers where the amounts of the mixing fluids are not equal. It is found that for the low Atwood number cases, the asymmetric initial density distribution has limited effects on both global and local flow evolution for HVDT. However, at high Atwood number, both global flow evolution and the local flow structures are strongly affected by the initial composition ratio. The flow composed of more light fluid reaches higher turbulent levels and the local statistics reach their fully-developed behavior earlier in the time evolution. During the late time decay, where most of the flow is well-mixed, all parameters become independent of the initial composition ratio for both low and high Atwood number cases.

preprint2016arXiv

A Spectral Algorithm for Inference in Hidden Semi-Markov Models

Hidden semi-Markov models (HSMMs) are latent variable models which allow latent state persistence and can be viewed as a generalization of the popular hidden Markov models (HMMs). In this paper, we introduce a novel spectral algorithm to perform inference in HSMMs. Unlike expectation maximization (EM), our approach correctly estimates the probability of given observation sequence based on a set of training sequences. Our approach is based on estimating moments from the sample, whose number of dimensions depends only logarithmically on the maximum length of the hidden state persistence. Moreover, the algorithm requires only a few matrix inversions and is therefore computationally efficient. Empirical evaluations on synthetic and real data demonstrate the advantage of the algorithm over EM in terms of speed and accuracy, especially for large datasets.

preprint2016arXiv

Alternating Estimation for Structured High-Dimensional Multi-Response Models

We consider learning high-dimensional multi-response linear models with structured parameters. By exploiting the noise correlations among responses, we propose an alternating estimation (AltEst) procedure to estimate the model parameters based on the generalized Dantzig selector. Under suitable sample size and resampling assumptions, we show that the error of the estimates generated by AltEst, with high probability, converges linearly to certain minimum achievable level, which can be tersely expressed by a few geometric measures, such as Gaussian width of sets related to the parameter structure. To the best of our knowledge, this is the first non-asymptotic statistical guarantee for such AltEst-type algorithm applied to estimation problem with general structures.

preprint2016arXiv

Estimating Structured Vector Autoregressive Model

While considerable advances have been made in estimating high-dimensional structured models from independent data using Lasso-type models, limited progress has been made for settings when the samples are dependent. We consider estimating structured VAR (vector auto-regressive models), where the structure can be captured by any suitable norm, e.g., Lasso, group Lasso, order weighted Lasso, sparse group Lasso, etc. In VAR setting with correlated noise, although there is strong dependence over time and covariates, we establish bounds on the non-asymptotic estimation error of structured VAR parameters. Surprisingly, the estimation error is of the same order as that of the corresponding Lasso-type estimator with independent samples, and the analysis holds for any norm. Our analysis relies on results in generic chaining, sub-exponential martingales, and spectral representation of VAR models. Experimental results on synthetic data with a variety of structures as well as real aviation data are presented, validating theoretical results.

preprint2016arXiv

Generalized Direct Change Estimation in Ising Model Structure

We consider the problem of estimating change in the dependency structure between two $p$-dimensional Ising models, based on respectively $n_1$ and $n_2$ samples drawn from the models. The change is assumed to be structured, e.g., sparse, block sparse, node-perturbed sparse, etc., such that it can be characterized by a suitable (atomic) norm. We present and analyze a norm-regularized estimator for directly estimating the change in structure, without having to estimate the structures of the individual Ising models. The estimator can work with any norm, and can be generalized to other graphical models under mild assumptions. We show that only one set of samples, say $n_2$, needs to satisfy the sample complexity requirement for the estimator to work, and the estimation error decreases as $\frac{c}{\sqrt{\min(n_1,n_2)}}$, where $c$ depends on the Gaussian width of the unit norm ball. For example, for $\ell_1$ norm applied to $s$-sparse change, the change can be accurately estimated with $\min(n_1,n_2)=O(s \log p)$ which is sharper than an existing result $n_1= O(s^2 \log p)$ and $n_2 = O(n_1^2)$. Experimental results illustrating the effectiveness of the proposed estimator are presented.

preprint2016arXiv

Graph Connectivity and Binomial Edge Ideals

We relate homological properties of a binomial edge ideal $\mathcal{J}_G$ to invariants that measure the connectivity of a simple graph $G$. Specifically, we show if $R/\mathcal{J}_G$ is a Cohen-Macaulay ring, then graph toughness of $G$ is exactly $\frac{1}{2}$. We also give an inequality between the depth of $R/\mathcal{J}_G$ and the vertex-connectivity of $G$. In addition, we study the Hilbert-Samuel multiplicity, and the Hilbert-Kunz multiplicity of $R/\mathcal{J}_G$.

preprint2016arXiv

Semi-Markov Switching Vector Autoregressive Model-based Anomaly Detection in Aviation Systems

In this work we consider the problem of anomaly detection in heterogeneous, multivariate, variable-length time series datasets. Our focus is on the aviation safety domain, where data objects are flights and time series are sensor readings and pilot switches. In this context the goal is to detect anomalous flight segments, due to mechanical, environmental, or human factors in order to identifying operationally significant events and provide insights into the flight operations and highlight otherwise unavailable potential safety risks and precursors to accidents. For this purpose, we propose a framework which represents each flight using a semi-Markov switching vector autoregressive (SMS-VAR) model. Detection of anomalies is then based on measuring dissimilarities between the model's prediction and data observation. The framework is scalable, due to the inherent parallel nature of most computations, and can be used to perform online anomaly detection. Extensive experimental results on simulated and real datasets illustrate that the framework can detect various types of anomalies along with the key parameters involved.

preprint2016arXiv

Structured Matrix Recovery via the Generalized Dantzig Selector

In recent years, structured matrix recovery problems have gained considerable attention for its real world applications, such as recommender systems and computer vision. Much of the existing work has focused on matrices with low-rank structure, and limited progress has been made matrices with other types of structure. In this paper we present non-asymptotic analysis for estimation of generally structured matrices via the generalized Dantzig selector under generic sub-Gaussian measurements. We show that the estimation error can always be succinctly expressed in terms of a few geometric measures of suitable sets which only depend on the structure of the underlying true matrix. In addition, we derive the general bounds on these geometric measures for structures characterized by unitarily invariant norms, which is a large family covering most matrix norms of practical interest. Examples are provided to illustrate the utility of our theoretical development.

preprint2016arXiv

Structured Stochastic Linear Bandits

The stochastic linear bandit problem proceeds in rounds where at each round the algorithm selects a vector from a decision set after which it receives a noisy linear loss parameterized by an unknown vector. The goal in such a problem is to minimize the (pseudo) regret which is the difference between the total expected loss of the algorithm and the total expected loss of the best fixed vector in hindsight. In this paper, we consider settings where the unknown parameter has structure, e.g., sparse, group sparse, low-rank, which can be captured by a norm, e.g., $L_1$, $L_{(1,2)}$, nuclear norm. We focus on constructing confidence ellipsoids which contain the unknown parameter across all rounds with high-probability. We show the radius of such ellipsoids depend on the Gaussian width of sets associated with the norm capturing the structure. Such characterization leads to tighter confidence ellipsoids and, therefore, sharper regret bounds compared to bounds in the existing literature which are based on the ambient dimensionality.

preprint2016arXiv

The Matrix Generalized Inverse Gaussian Distribution: Properties and Applications

While the Matrix Generalized Inverse Gaussian ($\mathcal{MGIG}$) distribution arises naturally in some settings as a distribution over symmetric positive semi-definite matrices, certain key properties of the distribution and effective ways of sampling from the distribution have not been carefully studied. In this paper, we show that the $\mathcal{MGIG}$ is unimodal, and the mode can be obtained by solving an Algebraic Riccati Equation (ARE) equation [7]. Based on the property, we propose an importance sampling method for the $\mathcal{MGIG}$ where the mode of the proposal distribution matches that of the target. The proposed sampling method is more efficient than existing approaches [32, 33], which use proposal distributions that may have the mode far from the $\mathcal{MGIG}$'s mode. Further, we illustrate that the the posterior distribution in latent factor models, such as probabilistic matrix factorization (PMF) [25], when marginalized over one latent factor has the $\mathcal{MGIG}$ distribution. The characterization leads to a novel Collapsed Monte Carlo (CMC) inference algorithm for such latent factor models. We illustrate that CMC has a lower log loss or perplexity than MCMC, and needs fewer samples.

preprint2015arXiv

Enumerating all maximal biclusters in numerical datasets

Biclustering has proved to be a powerful data analysis technique due to its wide success in various application domains. However, the existing literature presents efficient solutions only for enumerating maximal biclusters with constant values, or heuristic-based approaches which can not find all biclusters or even support the maximality of the obtained biclusters. Here, we present a general family of biclustering algorithms for enumerating all maximal biclusters with (i) constant values on rows, (ii) constant values on columns, or (iii) coherent values. Versions for perfect and for perturbed biclusters are provided. Our algorithms have four key properties (just the algorithm for perturbed biclusters with coherent values fails to exhibit the first property): they are (1) efficient (take polynomial time per pattern), (2) complete (find all maximal biclusters), (3) correct (all biclusters attend the user-defined measure of similarity), and (4) non-redundant (all the obtained biclusters are maximal and the same bicluster is not enumerated twice). They are based on a generalization of an efficient formal concept analysis algorithm called In-Close2. Experimental results point to the necessity of having efficient enumerative biclustering algorithms and provide a valuable insight into the scalability of our family of algorithms and its sensitivity to user-defined parameters.

preprint2015arXiv

Estimation with Norm Regularization

Analysis of non-asymptotic estimation error and structured statistical recovery based on norm regularized regression, such as Lasso, needs to consider four aspects: the norm, the loss function, the design matrix, and the noise model. This paper presents generalizations of such estimation error analysis on all four aspects compared to the existing literature. We characterize the restricted error set where the estimation error vector lies, establish relations between error sets for the constrained and regularized problems, and present an estimation error bound applicable to any norm. Precise characterizations of the bound is presented for isotropic as well as anisotropic subGaussian design matrices, subGaussian noise models, and convex loss functions, including least squares and generalized linear models. Generic chaining and associated results play an important role in the analysis. A key result from the analysis is that the sample complexity of all such estimators depends on the Gaussian width of a spherical cap corresponding to the restricted error set. Further, once the number of samples $n$ crosses the required sample complexity, the estimation error decreases as $\frac{c}{\sqrt{n}}$, where $c$ depends on the Gaussian width of the unit norm ball.

preprint2015arXiv

Generalized Dantzig Selector: Application to the k-support norm

We propose a Generalized Dantzig Selector (GDS) for linear models, in which any norm encoding the parameter structure can be leveraged for estimation. We investigate both computational and statistical aspects of the GDS. Based on conjugate proximal operator, a flexible inexact ADMM framework is designed for solving GDS, and non-asymptotic high-probability bounds are established on the estimation error, which rely on Gaussian width of unit norm ball and suitable set encompassing estimation error. Further, we consider a non-trivial example of the GDS using $k$-support norm. We derive an efficient method to compute the proximal operator for $k$-support norm since existing methods are inapplicable in this setting. For statistical analysis, we provide upper bounds for the Gaussian widths needed in the GDS analysis, yielding the first statistical recovery guarantee for estimation with the $k$-support norm. The experimental results confirm our theoretical analysis.

preprint2014arXiv

Bregman Alternating Direction Method of Multipliers

The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. BADMM provides a unified framework for ADMM and its variants, including generalized ADMM, inexact ADMM and Bethe ADMM. We establish the global convergence and the $O(1/T)$ iteration complexity for BADMM. In some cases, BADMM can be faster than ADMM by a factor of $O(n/\log(n))$. In solving the linear program of mass transportation problem, BADMM leads to massive parallelism and can easily run on GPU. BADMM is several times faster than highly optimized commercial software Gurobi.

preprint2014arXiv

Parallel Direction Method of Multipliers

We consider the problem of minimizing block-separable convex functions subject to linear constraints. While the Alternating Direction Method of Multipliers (ADMM) for two-block linear constraints has been intensively studied both theoretically and empirically, in spite of some preliminary work, effective generalizations of ADMM to multiple blocks is still unclear. In this paper, we propose a randomized block coordinate method named Parallel Direction Method of Multipliers (PDMM) to solve the optimization problems with multi-block linear constraints. PDMM randomly updates some primal and dual blocks in parallel, behaving like parallel randomized block coordinate descent. We establish the global convergence and the iteration complexity for PDMM with constant step size. We also show that PDMM can do randomized block coordinate descent on overlapping blocks. Experimental results show that PDMM performs better than state-of-the-arts methods in two applications, robust principal component analysis and overlapping group lasso.

preprint2014arXiv

Powers of edge ideals of regularity three bipartite graphs

In this paper we prove that if $I(G)$ is a bipartite edge ideal with regularity three then for all $s\geq 2$ the regularity of $I(G)^s$ is exactly $2s+1$.

preprint2014arXiv

Properties of Lyubeznik numbers under localization and polarization

We exhibit a global bound for the Lyubeznik numbers of a ring of prime characteristic. In addition, we show that for a monomial ideal, the Lyubeznik numbers of the quotient rings of its radical and its polarization are the same. Furthermore, we present examples that show striking behavior of the Lyubeznik numbers under localization. We also show related results for generalizations of the Lyubeznik numbers.

preprint2014arXiv

Randomized Block Coordinate Descent for Online and Stochastic Optimization

Two types of low cost-per-iteration gradient descent methods have been extensively studied in parallel. One is online or stochastic gradient descent (OGD/SGD), and the other is randomzied coordinate descent (RBCD). In this paper, we combine the two types of methods together and propose online randomized block coordinate descent (ORBCD). At each iteration, ORBCD only computes the partial gradient of one block coordinate of one mini-batch samples. ORBCD is well suited for the composite minimization problem where one function is the average of the losses of a large number of samples and the other is a simple regularizer defined on high dimensional variables. We show that the iteration complexity of ORBCD has the same order as OGD or SGD. For strongly convex functions, by reducing the variance of stochastic gradients, we show that ORBCD can converge at a geometric rate in expectation, matching the convergence rate of SGD with variance reduction and RBCD.

preprint2014arXiv

Regularity of Path Ideals of Gap Free Graphs

In this paper we study the Castelnuovo-Mumford regularity of the path ideals of finite simple graphs. We find new upper bounds for various path ideals of gap free graphs. In particular we prove that the path ideals of gap free and claw graphs have linear minimal free resolutions.

preprint2014arXiv

The Regularity of Powers of Edge Ideals

In this paper we prove the existence of a special order on the set of minimal monomial generators of powers of edge ideals of arbitrary graphs. Using this order we find new upper bounds on the regularity of powers of edge ideals of graphs whose complement does not have any induced four cycle.

preprint2013arXiv

Bethe-ADMM for Tree Decomposition based Parallel MAP Inference

We consider the problem of maximum a posteriori (MAP) inference in discrete graphical models. We present a parallel MAP inference algorithm called Bethe-ADMM based on two ideas: tree-decomposition of the graph and the alternating direction method of multipliers (ADMM). However, unlike the standard ADMM, we use an inexact ADMM augmented with a Bethe-divergence based proximal function, which makes each subproblem in ADMM easy to solve in parallel using the sum-product algorithm. We rigorously prove global convergence of Bethe-ADMM. The proposed algorithm is extensively evaluated on both synthetic and real datasets to illustrate its effectiveness. Further, the parallel Bethe-ADMM is shown to scale almost linearly with increasing number of cores.

preprint2013arXiv

Online Alternating Direction Method (longer version)

Online optimization has emerged as powerful tool in large scale optimization. In this pa- per, we introduce efficient online optimization algorithms based on the alternating direction method (ADM), which can solve online convex optimization under linear constraints where the objective could be non-smooth. We introduce new proof techniques for ADM in the batch setting, which yields a O(1/T) convergence rate for ADM and forms the basis for regret anal- ysis in the online setting. We consider two scenarios in the online setting, based on whether an additional Bregman divergence is needed or not. In both settings, we establish regret bounds for both the objective function as well as constraints violation for general and strongly convex functions. We also consider inexact ADM updates where certain terms are linearized to yield efficient updates and show the stochastic convergence rates. In addition, we briefly discuss that online ADM can be used as projection- free online learning algorithm in some scenarios. Preliminary results are presented to illustrate the performance of the proposed algorithms.

preprint2012arXiv

Gap Filling in the Plant Kingdom---Trait Prediction Using Hierarchical Probabilistic Matrix Factorization

Plant traits are a key to understanding and predicting the adaptation of ecosystems to environmental changes, which motivates the TRY project aiming at constructing a global database for plant traits and becoming a standard resource for the ecological community. Despite its unprecedented coverage, a large percentage of missing data substantially constrains joint trait analysis. Meanwhile, the trait data is characterized by the hierarchical phylogenetic structure of the plant kingdom. While factorization based matrix completion techniques have been widely used to address the missing data problem, traditional matrix factorization methods are unable to leverage the phylogenetic structure. We propose hierarchical probabilistic matrix factorization (HPMF), which effectively uses hierarchical phylogenetic information for trait prediction. We demonstrate HPMF's high accuracy, effectiveness of incorporating hierarchical structure and ability to capture trait correlation through experiments.

preprint2012arXiv

Gaussian Process Topic Models

We introduce Gaussian Process Topic Models (GPTMs), a new family of topic models which can leverage a kernel among documents while extracting correlated topics. GPTMs can be considered a systematic generalization of the Correlated Topic Models (CTMs) using ideas from Gaussian Process (GP) based embedding. Since GPTMs work with both a topic covariance matrix and a document kernel matrix, learning GPTMs involves a novel component-solving a suitable Sylvester equation capturing both topic and document dependencies. The efficacy of GPTMs is demonstrated with experiments evaluating the quality of both topic modeling and embedding.

preprint2012arXiv

Online Alternating Direction Method

Online optimization has emerged as powerful tool in large scale optimization. In this paper, we introduce efficient online algorithms based on the alternating directions method (ADM). We introduce a new proof technique for ADM in the batch setting, which yields the O(1/T) convergence rate of ADM and forms the basis of regret analysis in the online setting. We consider two scenarios in the online setting, based on whether the solution needs to lie in the feasible set or not. In both settings, we establish regret bounds for both the objective function as well as constraint violation for general and strongly convex functions. Preliminary results are presented to illustrate the performance of the proposed algorithms.

Arindam Banerjee

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

Complexity and speed of semi-algebraic multi-persistence

Plan Before You Trade: Inference-Time Optimization for RL Trading Agents

Improved Algorithms for Neural Active Learning

Bounds for the regularity of product of edge ideals

EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits

On the Hilbert-Samuel coefficients of Frobenius powers of an ideal

Experiments with Rich Regime Training for Deep Learning

Generalized Hilbert-Kunz function of the Rees algebra of the face ring of a simplicial complex

Packing properties of cubic squarefree monomial ideals

Atwood and Reynolds numbers effects on the evolution of buoyancy-driven homogeneous variable-density turbulence

Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances

Variable-density buoyancy-driven turbulence with asymmetric initial density distribution

A Spectral Algorithm for Inference in Hidden Semi-Markov Models

Alternating Estimation for Structured High-Dimensional Multi-Response Models

Estimating Structured Vector Autoregressive Model

Generalized Direct Change Estimation in Ising Model Structure

Graph Connectivity and Binomial Edge Ideals

Semi-Markov Switching Vector Autoregressive Model-based Anomaly Detection in Aviation Systems

Structured Matrix Recovery via the Generalized Dantzig Selector

Structured Stochastic Linear Bandits

The Matrix Generalized Inverse Gaussian Distribution: Properties and Applications

Enumerating all maximal biclusters in numerical datasets

Estimation with Norm Regularization

Generalized Dantzig Selector: Application to the k-support norm

Bregman Alternating Direction Method of Multipliers

Parallel Direction Method of Multipliers

Powers of edge ideals of regularity three bipartite graphs

Properties of Lyubeznik numbers under localization and polarization

Randomized Block Coordinate Descent for Online and Stochastic Optimization

Regularity of Path Ideals of Gap Free Graphs

The Regularity of Powers of Edge Ideals

Bethe-ADMM for Tree Decomposition based Parallel MAP Inference

Online Alternating Direction Method (longer version)

Gap Filling in the Plant Kingdom---Trait Prediction Using Hierarchical Probabilistic Matrix Factorization

Gaussian Process Topic Models

Online Alternating Direction Method