Source author record

Sharan Vaswani

Sharan Vaswani appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Social and Information Networks math.OC Artificial Intelligence Computer Vision physics.soc-ph

Catalog footprint

What is connected

12works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Augmented Lagrangian Method for Last-Iterate Convergence for Constrained MDPs

We study policy optimization for infinite-horizon, discounted constrained Markov decision processes (CMDPs). While existing theoretical guarantees typically hold for the mixture policy, deploying such a policy is computationally and memory intensive. This leads to a practical mismatch where a single (last-iterate) policy must be deployed. Recent theoretical works have thus focused on proving last-iterate convergence, but are largely limited to the tabular setting or to algorithmic variants that are rarely used in practice. To address this, we use the classic inexact augmented Lagrangian ($\texttt{AL}$) method from constrained optimization, and propose a general framework with provable last-iterate convergence for CMDPs. We first focus on the tabular setting and propose to solve the $\texttt{AL}$ sub-problem with projected Q-ascent ($\texttt{PQA}$). Combining the theoretical guarantees of $\texttt{PQA}$ and the standard $\texttt{AL}$ analysis enables us to establish global last-iterate convergence. We generalize these results to handle log-linear policies, and demonstrate that an efficient, projected variant of $\texttt{PQA}$ can achieve last-iterate convergence with comparable guarantees as prior work. Finally, we demonstrate that our framework scales to complex non-linear policies, and evaluate it on continuous control tasks.

preprint2022arXiv

Improved Policy Optimization for Online Imitation Learning

We consider online imitation learning (OIL), where the task is to find a policy that imitates the behavior of an expert via active interaction with the environment. We aim to bridge the gap between the theory and practice of policy optimization algorithms for OIL by analyzing one of the most popular OIL algorithms, DAGGER. Specifically, if the class of policies is sufficiently expressive to contain the expert policy, we prove that DAGGER achieves constant regret. Unlike previous bounds that require the losses to be strongly-convex, our result only requires the weaker assumption that the losses be strongly-convex with respect to the policy's sufficient statistics (not its parameterization). In order to ensure convergence for a wider class of policies and losses, we augment DAGGER with an additional regularization term. In particular, we propose a variant of Follow-the-Regularized-Leader (FTRL) and its adaptive variant for OIL and develop a memory-efficient implementation, which matches the memory requirements of FTL. Assuming that the loss functions are smooth and convex with respect to the parameters of the policy, we also prove that FTRL achieves constant regret for any sufficiently expressive policy class, while retaining $O(\sqrt{T})$ regret in the worst-case. We demonstrate the effectiveness of these algorithms with experiments on synthetic and high-dimensional control tasks.

preprint2022arXiv

Towards Painless Policy Optimization for Constrained MDPs

We study policy optimization in an infinite horizon, $γ$-discounted constrained Markov decision process (CMDP). Our objective is to return a policy that achieves large expected reward with a small constraint violation. We consider the online setting with linear function approximation and assume global access to the corresponding features. We propose a generic primal-dual framework that allows us to bound the reward sub-optimality and constraint violation for arbitrary algorithms in terms of their primal and dual regret on online linear optimization problems. We instantiate this framework to use coin-betting algorithms and propose the Coin Betting Politex (CBP) algorithm. Assuming that the action-value functions are $\varepsilon_b$-close to the span of the $d$-dimensional state-action features and no sampling errors, we prove that $T$ iterations of CBP result in an $O\left(\frac{1}{(1 - γ)^3 \sqrt{T}} + \frac{\varepsilon_b\sqrt{d}}{(1 - γ)^2} \right)$ reward sub-optimality and an $O\left(\frac{1}{(1 - γ)^2 \sqrt{T}} + \frac{\varepsilon_b \sqrt{d}}{1 - γ} \right)$ constraint violation. Importantly, unlike gradient descent-ascent and other recent methods, CBP does not require extensive hyperparameter tuning. Via experiments on synthetic and Cartpole environments, we demonstrate the effectiveness and robustness of CBP.

preprint2021arXiv

Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)

Adaptive gradient methods are typically used for training over-parameterized models. To better understand their behaviour, we study a simplistic setting -- smooth, convex losses with models over-parameterized enough to interpolate the data. In this setting, we prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate. When interpolation is only approximately satisfied, constant step-size AMSGrad converges to a neighbourhood of the solution at the same rate, while AdaGrad is robust to the violation of interpolation. However, even for simple convex problems satisfying interpolation, the empirical performance of both methods heavily depends on the step-size and requires tuning, questioning their adaptivity. We alleviate this problem by automatically determining the step-size using stochastic line-search or Polyak step-sizes. With these techniques, we prove that both AdaGrad and AMSGrad retain their convergence guarantees, without needing to know problem-dependent constants. Empirically, we demonstrate that these techniques improve the convergence and generalization of adaptive gradient methods across tasks, from binary classification with kernel mappings to multi-class classification with deep networks.

preprint2020arXiv

Combining Bayesian Optimization and Lipschitz Optimization

Bayesian optimization and Lipschitz optimization have developed alternative techniques for optimizing black-box functions. They each exploit a different form of prior about the function. In this work, we explore strategies to combine these techniques for better global optimization. In particular, we propose ways to use the Lipschitz continuity assumption within traditional BO algorithms, which we call Lipschitz Bayesian optimization (LBO). This approach does not increase the asymptotic runtime and in some cases drastically improves the performance (while in the worst-case the performance is similar). Indeed, in a particular setting, we prove that using the Lipschitz information yields the same or a better bound on the regret compared to using Bayesian optimization on its own. Moreover, we propose a simple heuristics to estimate the Lipschitz constant, and prove that a growing estimate of the Lipschitz constant is in some sense ``harmless''. Our experiments on 15 datasets with 4 acquisition functions show that in the worst case LBO performs similar to the underlying BO method while in some cases it performs substantially better. Thompson sampling in particular typically saw drastic improvements (as the Lipschitz information corrected for its well-known ``over-exploration'' phenomenon) and its LBO variant often outperformed other acquisition functions.

preprint2020arXiv

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.

preprint2020arXiv

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and exploitation. In the $K$-armed bandit setting, we show that there are infinitely many variants of $\tt RandUCB$, all of which achieve the minimax-optimal $\widetilde{O}(\sqrt{K T})$ regret after $T$ rounds. Moreover, for a specific multi-armed bandit setting, we show that both UCB and TS can be recovered as special cases of $\tt RandUCB$. For structured bandits, where each arm is associated with a $d$-dimensional feature vector and rewards are distributed according to a linear or generalized linear model, we prove that $\tt RandUCB$ achieves the minimax-optimal $\widetilde{O}(d \sqrt{T})$ regret even in the case of infinitely many arms. Through experiments in both the multi-armed and structured bandit settings, we demonstrate that $\tt RandUCB$ matches or outperforms TS and other randomized exploration strategies. Our theoretical and empirical results together imply that $\tt RandUCB$ achieves the best of both worlds.

preprint2020arXiv

To Each Optimizer a Norm, To Each Norm its Generalization

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoning, we prove that for over-parameterized linear regression, projections onto linear spans can be used to move between different interpolating solutions. For under-parameterized linear classification, we prove that for any linear classifier separating the data, there exists a family of quadratic norms ||.||_P such that the classifier's direction is the same as that of the maximum P-margin solution. For linear classification, we argue that analyzing convergence to the standard maximum l2-margin is arbitrary and show that minimizing the norm induced by the data results in better generalization. Furthermore, for over-parameterized linear classification, projections onto the data-span enable us to use techniques from the under-parameterized setting. On the empirical side, we propose techniques to bias optimizers towards better generalizing solutions, improving their test performance. We validate our theoretical results via synthetic experiments, and use the neural tangent kernel to handle non-linear models.

preprint2016arXiv

Adaptive Influence Maximization in Social Networks: Why Commit when You can Adapt?

Most previous work on influence maximization in social networks is limited to the non-adaptive setting in which the marketer is supposed to select all of the seed users, to give free samples or discounts to, up front. A disadvantage of this setting is that the marketer is forced to select all the seeds based solely on a diffusion model. If some of the selected seeds do not perform well, there is no opportunity to course-correct. A more practical setting is the adaptive setting in which the marketer initially selects a batch of users and observes how well seeding those users leads to a diffusion of product adoptions. Based on this market feedback, she formulates a policy for choosing the remaining seeds. In this paper, we study adaptive offline strategies for two problems: (a) MAXSPREAD -- given a budget on number of seeds and a time horizon, maximize the spread of influence and (b) MINTSS -- given a time horizon and an expected number of target users to be influenced, minimize the number of seeds that will be required. In particular, we present theoretical bounds and empirical results for an adaptive strategy and quantify its practical benefit over the non-adaptive strategy. We evaluate adaptive and non-adaptive policies on three real data sets. We conclude that while benefit of going adaptive for the MAXSPREAD problem is modest, adaptive policies lead to significant savings for the MINTSS problem.

preprint2016arXiv

Influence Maximization with Bandits

We consider the problem of \emph{influence maximization}, the problem of maximizing the number of people that become aware of a product by finding the `best' set of `seed' users to expose the product to. Most prior work on this topic assumes that we know the probability of each user influencing each other user, or we have data that lets us estimate these influences. However, this information is typically not initially available or is difficult to obtain. To avoid this assumption, we adopt a combinatorial multi-armed bandit paradigm that estimates the influence probabilities as we sequentially try different seed sets. We establish bounds on the performance of this procedure under the existing edge-level feedback as well as a novel and more realistic node-level feedback. Beyond our theoretical results, we describe a practical implementation and experimentally demonstrate its efficiency and effectiveness on four real datasets.

preprint2014arXiv

Modeling Non-Progressive Phenomena for Influence Propagation

Recent work on modeling influence propagation focus on progressive models, i.e., once a node is influenced (active) the node stays in that state and cannot become inactive. However, this assumption is unrealistic in many settings where nodes can transition between active and inactive states. For instance, a user of a social network may stop using an app and become inactive, but again activate when instigated by a friend, or when the app adds a new feature or releases a new version. In this work, we study such non-progressive phenomena and propose an efficient model of influence propagation. Specifically, we model in influence propagation as a continuous-time Markov process with 2 states: active and inactive. Such a model is both highly scalable (we evaluated on graphs with over 2 million nodes), 17-20 times faster, and more accurate for estimating the spread of influence, as compared with state-of-the-art progressive models for several applications where nodes may switch states.

preprint2013arXiv

Fast 3D Salient Region Detection in Medical Images using GPUs

Automated detection of visually salient regions is an active area of research in computer vision. Salient regions can serve as inputs for object detectors as well as inputs for region based registration algorithms. In this paper we consider the problem of speeding up computationally intensive bottom-up salient region detection in 3D medical volumes.The method uses the Kadir Brady formulation of saliency. We show that in the vicinity of a salient region, entropy is a monotonically increasing function of the degree of overlap of a candidate window with the salient region. This allows us to initialize a sparse seed-point grid as the set of tentative salient region centers and iteratively converge to the local entropy maxima, thereby reducing the computation complexity compared to the Kadir Brady approach of performing this computation at every point in the image. We propose two different approaches for achieving this. The first approach involves evaluating entropy in the four quadrants around the seed point and iteratively moving in the direction that increases entropy. The second approach we propose makes use of mean shift tracking framework to affect entropy maximizing moves. Specifically, we propose the use of uniform pmf as the target distribution to seek high entropy regions. We demonstrate the use of our algorithm on medical volumes for left ventricle detection in PET images and tumor localization in brain MR sequences.

Sharan Vaswani

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Augmented Lagrangian Method for Last-Iterate Convergence for Constrained MDPs

Improved Policy Optimization for Online Imitation Learning

Towards Painless Policy Optimization for Constrained MDPs

Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)

Combining Bayesian Optimization and Lipschitz Optimization

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

To Each Optimizer a Norm, To Each Norm its Generalization

Adaptive Influence Maximization in Social Networks: Why Commit when You can Adapt?

Influence Maximization with Bandits

Modeling Non-Progressive Phenomena for Influence Propagation

Fast 3D Salient Region Detection in Medical Images using GPUs