Source author record

András György

András György appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence math.OC Computation and Language Information Theory math.IT Systems and Control

Catalog footprint

What is connected

16works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Optimistic Meta-Gradients

We study the connection between gradient-based meta-learning and convex op-timisation. We observe that gradient descent with momentum is a special case of meta-gradients, and building on recent results in optimisation, we prove convergence rates for meta-learning in the single task setting. While a meta-learned update rule can yield faster convergence up to constant factor, it is not sufficient for acceleration. Instead, some form of optimism is required. We show that optimism in meta-learning can be captured through Bootstrapped Meta-Gradients (Flennerhag et al., 2022), providing deeper insight into its underlying mechanics.

preprint2022arXiv

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies. We propose a new method to compute a lower bound on the value of an arbitrary target policy given some logged data in contextual bandits for a desired coverage. The lower bound is built around the so-called Self-normalized Importance Weighting (SN) estimator. It combines the use of a semi-empirical Efron-Stein tail inequality to control the concentration and a new multiplicative (rather than additive) control of the bias. The new approach is evaluated on a number of synthetic and real datasets and is found to be superior to its main competitors, both in terms of tightness of the confidence intervals and the quality of the policies chosen.

preprint2022arXiv

Mutual Information Constraints for Monte-Carlo Objectives

A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless. Two contributing factors, the underspecification of the model and the looseness of the variational lower bound, have been studied separately in the literature. We weave these two strands of research together, specifically the tighter bounds of Monte-Carlo objectives and constraints on the mutual information between the observable and the latent variables. Estimating the mutual information as the average Kullback-Leibler divergence between the easily available variational posterior $q(z|x)$ and the prior does not work with Monte-Carlo objectives because $q(z|x)$ is no longer a direct approximation to the model's true posterior $p(z|x)$. Hence, we construct estimators of the Kullback-Leibler divergence of the true posterior from the prior by recycling samples used in the objective, with which we train models of continuous and discrete latents at much improved rate-distortion and no posterior collapse. While alleviated, the tradeoff between modelling the data and using the latents still remains, and we urge for evaluating inference methods across a range of mutual information values.

preprint2022arXiv

On the Role of Neural Collapse in Transfer Learning

We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. In this paper we provide an explanation for this behavior based on the recently observed phenomenon that the features learned by overparameterized classification networks show an interesting clustering property, called neural collapse. We demonstrate both theoretically and empirically that neural collapse generalizes to new samples from the training classes, and -- more importantly -- to new classes as well, allowing foundation models to provide feature maps that work well in transfer learning and, specifically, in the few-shot setting.

preprint2022arXiv

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) the optimal value function $v^\star$ or (ii) the optimal action-value function $q^\star$ lie in the linear span of some features; or (iii) both $v^\star$ and $q^\star$ lie in the linear span when restricted to the states reachable from the starting state. Recently, Weisz et al. (2021b) showed that under (ii) the minimax query complexity of any planning algorithm is at least exponential in the horizon $H$ or in the feature dimension $d$ when the size $A$ of the action set can be chosen to be exponential in $\min(d,H)$. On the other hand, for the setting (i), Weisz et al. (2021a) introduced TensorPlan, a planner whose query cost is polynomial in all relevant quantities when the number of actions is fixed. Among other things, these two works left open the question whether polynomial query complexity is possible when $A$ is subexponential in $\min(d,H)$. In this paper we answer this question in the negative: we show that an exponentially large lower bound holds when $A=Ω(\min(d^{1/4},H^{1/2}))$, under either (i), (ii) or (iii). In particular, this implies a perhaps surprising exponential separation of query complexity compared to the work of Du et al. (2021) who prove a polynomial upper bound when (iii) holds for all states. Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).

preprint2020arXiv

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients. Depending on the properties of the function to be optimized and the nature of ``noise'' in the bandit feedback, the bias and variance of gradient estimates exhibit various tradeoffs. In this paper we propose a novel framework that replaces the specific gradient estimation methods with an abstract oracle. With the help of the new framework we unify previous works, reproducing their results in a clean and concise fashion, while, perhaps more importantly, the framework also allows us to formally show that to achieve the optimal root-$n$ rate either the algorithms that use existing gradient estimators, or the proof techniques used to analyze them have to go beyond what exists today.

preprint2016arXiv

Chaining Bounds for Empirical Risk Minimization

This paper extends the standard chaining technique to prove excess risk upper bounds for empirical risk minimization with random design settings even if the magnitude of the noise and the estimates is unbounded. The bound applies to many loss functions besides the squared loss, and scales only with the sub-Gaussian or subexponential parameters without further statistical assumptions such as the bounded kurtosis condition over the hypothesis class. A detailed analysis is provided for slope constrained and penalized linear least squares regression with a sub-Gaussian setting, which often proves tight sample complexity bounds up to logartihmic factors.

preprint2016arXiv

Max-affine estimators for convex stochastic programming

In this paper, we consider two sequential decision making problems with a convexity structure, namely an energy storage optimization task and a multi-product assembly example. We formulate these problems in the stochastic programming framework and discuss an approximate dynamic programming technique for their solutions. As the cost-to-go functions are convex in these cases, we use max-affine estimates for their approximations. To train such a max-affine estimate, we provide a new convex regression algorithm, and evaluate it empirically for these planning scenarios.

preprint2016arXiv

SDP Relaxation with Randomized Rounding for Energy Disaggregation

We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring. In this problem the goal is to estimate the energy consumption of each appliance over time based on the total energy-consumption signal of a household. The current state of the art is to model the problem as inference in factorial HMMs, and use quadratic programming to find an approximate solution to the resulting quadratic integer program. Here we take a more principled approach, better suited to integer programming problems, and find an approximate optimum by combining convex semidefinite relaxations randomized rounding, as well as a scalable ADMM method that exploits the special structure of the resulting semidefinite program. Simulation results both in synthetic and real-world datasets demonstrate the superiority of our method.

preprint2015arXiv

Fast Cross-Validation for Incremental Learning

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning. The general recipe for computing CV estimate is to run a learning algorithm separately for each CV fold, a computationally expensive process. In this paper, we propose a new approach to reduce the computational burden of CV-based performance estimation. As opposed to all previous attempts, which are specific to a particular learning model or problem domain, we propose a general method applicable to a large class of incremental learning algorithms, which are uniquely fitted to big data problems. In particular, our method applies to a wide range of supervised and unsupervised learning tasks with different performance criteria, as long as the base learning algorithm is incremental. We show that the running time of the algorithm scales logarithmically, rather than linearly, in the number of CV folds. Furthermore, the algorithm has favorable properties for parallel and distributed implementation. Experiments with state-of-the-art incremental learning algorithms confirm the practicality of the proposed method.

preprint2015arXiv

Online Learning with Gaussian Payoffs and Side Observations

We consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action $i$, the learner receives information about the payoff of every action $j$ in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair $(i,j)$ (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).

preprint2014arXiv

Adaptive Monte Carlo via Bandit Allocation

We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate. By reducing this task to a stochastic multi-armed bandit problem, we show that well developed allocation strategies can be used to achieve an MSE that approaches that of the best estimator chosen in retrospect. We then extend these developments to a scenario where alternative estimators have different, possibly stochastic costs. The outcome is a new set of adaptive Monte Carlo strategies that provide stronger guarantees than previous approaches while offering practical advantages.

preprint2014arXiv

Efficient Multi-Start Strategies for Local Search Algorithms

Local search algorithms applied to optimization problems often suffer from getting trapped in a local optimum. The common solution for this deficiency is to restart the algorithm when no progress is observed. Alternatively, one can start multiple instances of a local search algorithm, and allocate computational resources (in particular, processing time) to the instances depending on their behavior. Hence, a multi-start strategy has to decide (dynamically) when to allocate additional resources to a particular instance and when to start new instances. In this paper we propose multi-start strategies motivated by works on multi-armed bandit problems and Lipschitz optimization with an unknown constant. The strategies continuously estimate the potential performance of each algorithm instance by supposing a convergence rate of the local search algorithm up to an unknown constant, and in every phase allocate resources to those instances that could converge to the optimum for a particular range of the constant. Asymptotic bounds are given on the performance of the strategies. In particular, we prove that at most a quadratic increase in the number of times the target function is evaluated is needed to achieve the performance of a local search algorithm started from the attraction region of the optimum. Experiments are provided using SPSA (Simultaneous Perturbation Stochastic Approximation) and k-means as local search algorithms, and the results indicate that the proposed strategies work well in practice, and, in all cases studied, need only logarithmically more evaluations of the target function as opposed to the theoretically suggested quadratic increase.

preprint2013arXiv

A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning

We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels $d$ to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in $d$ are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group $p$-norm penalized empirical risk. The key to achieve the required exponential speed-up is the computationally efficient construction of low-variance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show the surprising result that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows the implementation of sampling from this distribution to run in $O(\log(d))$ time, making the total computational cost of the method to achieve an $ε$-optimal solution to be $O(\log(d)/ε^2)$, thereby allowing our method to operate for very large values of $d$. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.

preprint2013arXiv

Online Learning under Delayed Feedback

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms. Somewhat surprisingly, it turns out that delay increases the regret in a multiplicative way in adversarial problems, and in an additive way in stochastic problems. We give meta-algorithms that transform, in a black-box fashion, algorithms developed for the non-delayed case into ones that can handle the presence of delays in the feedback loop. Modifications of the well-known UCB algorithm are also developed for the bandit problem with delayed feedback, with the advantage over the meta-algorithms that they can be implemented with lower complexity.

preprint2012arXiv

Partition Tree Weighting

This paper introduces the Partition Tree Weighting technique, an efficient meta-algorithm for piecewise stationary sources. The technique works by performing Bayesian model averaging over a large class of possible partitions of the data into locally stationary segments. It uses a prior, closely related to the Context Tree Weighting technique of Willems, that is well suited to data compression applications. Our technique can be applied to any coding distribution at an additional time and space cost only logarithmic in the sequence length. We provide a competitive analysis of the redundancy of our method, and explore its application in a variety of settings. The order of the redundancy and the complexity of our algorithm matches those of the best competitors available in the literature, and the new algorithm exhibits a superior complexity-performance trade-off in our experiments.

András György

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Optimistic Meta-Gradients

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

Mutual Information Constraints for Monte-Carlo Objectives

On the Role of Neural Collapse in Transfer Learning

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

Chaining Bounds for Empirical Risk Minimization

Max-affine estimators for convex stochastic programming

SDP Relaxation with Randomized Rounding for Energy Disaggregation

Fast Cross-Validation for Incremental Learning

Online Learning with Gaussian Payoffs and Side Observations

Adaptive Monte Carlo via Bandit Allocation

Efficient Multi-Start Strategies for Local Search Algorithms

A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning

Online Learning under Delayed Feedback

Partition Tree Weighting