Source author record

Rasul Tutunov

Rasul Tutunov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Distributed, Parallel, and Cluster Computing Artificial Intelligence math.PR Multiagent Systems physics.soc-ph quant-ph Social and Information Networks

Catalog footprint

What is connected

14works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

A recurring pattern in "reasoning without training" is that base LLMs already assign non-trivial probability mass to correct multi-step solutions; the bottleneck is locating these modes efficiently at inference time. Power sampling provides a principled way to bias decoding toward such modes by targeting p_theta(x)^alpha with alpha > 1, but practical approximations must account for future-dependent correction factors that determine which prefixes remain promising. We introduce Auxiliary Particle Power Sampling (APPS), a blockwise particle algorithm for approximating the sequence-level power target with a bounded population of partial solutions. APPS propagates hypotheses in parallel using proposal-corrected power reweighting and refines their survival through future-value-guided selection at resampling boundaries. This redistributes finite compute across competing prefixes rather than committing to a single unfolding path, while providing a direct scaling knob in the particle count and predictable peak memory. We instantiate the future-value signal with short-horizon rollouts and also study an amortized variant that replaces rollouts with a lightweight learned selection head. Across reasoning benchmarks, APPS improves the accuracy-runtime trade-off of training-free decoding and suggests that part of the gap to post-trained systems can be recovered through more faithful inference-time power approximation.

preprint2022arXiv

HEBO Pushing The Limits of Sample-Efficient Hyperparameter Optimisation

In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input and output warping, admits exact marginal log-likelihood optimisation and is robust to the values of learned parameters. We demonstrate HEBO's empirical efficacy on the NeurIPS 2020 Black-Box Optimisation challenge, where HEBO placed first. Upon further analysis, we observe that HEBO significantly outperforms existing black-box optimisers on 108 machine learning hyperparameter tuning tasks comprising the Bayesmark benchmark. Our findings indicate that the majority of hyper-parameter tuning tasks exhibit heteroscedasticity and non-stationarity, multi-objective acquisition ensembles with Pareto front solutions improve queried configurations, and robust acquisition maximisers afford empirical advantages relative to their non-robust counterparts. We hope these findings may serve as guiding principles for practitioners of Bayesian optimisation. All code is made available at https://github.com/huawei-noah/HEBO.

preprint2022arXiv

Sample-Efficient Optimisation with Probabilistic Transformer Surrogates

Faced with problems of increasing complexity, recent research in Bayesian Optimisation (BO) has focused on adapting deep probabilistic models as flexible alternatives to Gaussian Processes (GPs). In a similar vein, this paper investigates the feasibility of employing state-of-the-art probabilistic transformers in BO. Upon further investigation, we observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. First, we notice that these models are trained on uniformly distributed inputs, which impairs predictive accuracy on non-uniform data - a setting arising from any typical BO loop due to exploration-exploitation trade-offs. Second, we realise that training losses (e.g., cross-entropy) only asymptotically guarantee accurate posterior approximations, i.e., after arriving at the global optimum, which generally cannot be ensured. At the stationary points of the loss function, however, we observe a degradation in predictive performance especially in exploratory regions of the input space. To tackle these shortcomings we introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance. In a large panel of experiments, we demonstrate, for the first time, that one transformer pre-trained on data sampled from random GP priors produces competitive results on 16 benchmark black-boxes compared to GP-based BO. Since our model is only pre-trained once and used in all tasks without any retraining and/or fine-tuning, we report an order of magnitude time-reduction, while matching and sometimes outperforming GPs.

preprint2022arXiv

Self-consistent Gradient-like Eigen Decomposition in Solving Schrödinger Equations

The Schrödinger equation is at the heart of modern quantum mechanics. Since exact solutions of the ground state are typically intractable, standard approaches approximate Schrödinger equation as forms of nonlinear generalized eigenvalue problems $F(V)V = SVΛ$ in which $F(V)$, the matrix to be decomposed, is a function of its own top-$k$ smallest eigenvectors $V$, leading to a "self-consistency problem". Traditional iterative methods heavily rely on high-quality initial guesses of $V$ generated via domain-specific heuristics methods based on quantum mechanics. In this work, we eliminate such a need for domain-specific heuristics by presenting a novel framework, Self-consistent Gradient-like Eigen Decomposition (SCGLED) that regards $F(V)$ as a special "online data generator", thus allows gradient-like eigendecomposition methods in streaming $k$-PCA to approach the self-consistency of the equation from scratch in an iterative way similar to online learning. With several critical numerical improvements, SCGLED is robust to initial guesses, free of quantum-mechanism-based heuristics designs, and neat in implementation. Our experiments show that it not only can simply replace traditional heuristics-based initial guess methods with large performance advantage (achieved averagely 25x more precise than the best baseline in similar wall time), but also is capable of finding highly precise solutions independently without any traditional iterative methods.

preprint2021arXiv

Efficient Semi-Implicit Variational Inference

In this paper, we propose CI-VI an efficient and scalable solver for semi-implicit variational inference (SIVI). Our method, first, maps SIVI's evidence lower bound (ELBO) to a form involving a nonlinear functional nesting of expected values and then develops a rigorous optimiser capable of correctly handling bias inherent to nonlinear nested expectations using an extrapolation-smoothing mechanism coupled with gradient sketching. Our theoretical results demonstrate convergence to a stationary point of the ELBO in general non-convex settings typically arising when using deep network models and an order of $O(t^{-\frac{4}{5}})$ gradient-bias-vanishing rate. We believe these results generalise beyond the specific nesting arising from SIVI to other forms. Finally, in a set of experiments, we demonstrate the effectiveness of our algorithm in approximating complex posteriors on various data-sets including those from natural language processing.

preprint2020arXiv

$α^α$-Rank: Practically Scaling $α$-Rank through Stochastic Optimisation

Recently, $α$-Rank, a graph-based algorithm, has been proposed as a solution to ranking joint policy profiles in large scale multi-agent systems. $α$-Rank claimed tractability through a polynomial time implementation with respect to the total number of pure strategy profiles. Here, we note that inputs to the algorithm were not clearly specified in the original presentation; as such, we deem complexity claims as not grounded, and conjecture solving $α$-Rank is NP-hard. The authors of $α$-Rank suggested that the input to $α$-Rank can be an exponentially-sized payoff matrix; a claim promised to be clarified in subsequent manuscripts. Even though $α$-Rank exhibits a polynomial-time solution with respect to such an input, we further reflect additional critical problems. We demonstrate that due to the need of constructing an exponentially large Markov chain, $α$-Rank is infeasible beyond a small finite number of agents. We ground these claims by adopting amount of dollars spent as a non-refutable evaluation metric. Realising such scalability issue, we present a stochastic implementation of $α$-Rank with a double oracle mechanism allowing for reductions in joint strategy spaces. Our method, $α^α$-Rank, does not need to save exponentially-large transition matrix, and can terminate early under required precision. Although theoretically our method exhibits similar worst-case complexity guarantees compared to $α$-Rank, it allows us, for the first time, to practically conduct large-scale multi-agent evaluations. On $10^4 \times 10^4$ random matrices, we achieve $1000x$ speed reduction. Furthermore, we also show successful results on large joint strategy profiles with a maximum size in the order of $\mathcal{O}(2^{25})$ ($\approx 33$ million joint strategies) -- a setting not evaluable using $α$-Rank with reasonable computational budget.

preprint2020arXiv

Compositional ADAM: An Adaptive Compositional Solver

In this paper, we present C-ADAM, the first adaptive solver for compositional problems involving a non-linear functional nesting of expected values. We proof that C-ADAM converges to a stationary point in $\mathcal{O}(δ^{-2.25})$ with $δ$ being a precision parameter. Moreover, we demonstrate the importance of our results by bridging, for the first time, model-agnostic meta-learning (MAML) and compositional optimisation showing fastest known rates for deep network adaptation to-date. Finally, we validate our findings in a set of experiments from portfolio optimisation and meta-learning. Our results manifest significant sample complexity reductions compared to both standard and compositional solvers.

preprint2016arXiv

A Distributed Newton Method for Large Scale Consensus Optimization

In this paper, we propose a distributed Newton method for consensus optimization. Our approach outperforms state-of-the-art methods, including ADMM. The key idea is to exploit the sparsity of the dual Hessian and recast the computation of the Newton step as one of efficiently solving symmetric diagonally dominant linear equations. We validate our algorithm both theoretically and empirically. On the theory side, we demonstrate that our algorithm exhibits superlinear convergence within a neighborhood of optimality. Empirically, we show the superiority of this new method on a variety of machine learning problems. The proposed approach is scalable to very large problems and has a low communication overhead.

preprint2016arXiv

An Exact Distributed Newton Method for Reinforcement Learning

In this paper, we propose a distributed second- order method for reinforcement learning. Our approach is the fastest in literature so-far as it outperforms state-of-the-art methods, including ADMM, by significant margins. We achieve this by exploiting the sparsity pattern of the dual Hessian and transforming the problem of computing the Newton direction to one of solving a sequence of symmetric diagonally dominant system of equations. We validate the above claim both theoretically and empirically. On the theoretical side, we prove that similar to exact Newton, our algorithm exhibits super-linear convergence within a neighborhood of the optimal solution. Empirically, we demonstrate the superiority of this new method on a set of benchmark reinforcement learning tasks.

preprint2015arXiv

A Fast Distributed Solver for Symmetric Diagonally Dominant Linear Equations

In this paper, we propose a fast distributed solver for linear equations given by symmetric diagonally dominant M-Matrices. Our approach is based on a distributed implementation of the parallel solver of Spielman and Peng by considering a specific approximated inverse chain which can be computed efficiently in a distributed fashion. Representing the system of equations by a graph $\mathbb{G}$, the proposed distributed algorithm is capable of attaining $ε$-close solutions (for arbitrary $ε$) in time proportional to $n^{3}$ (number of nodes in $\mathbb{G}$), $α$ (upper bound on the size of the R-Hop neighborhood), and $\frac{{W}_{max}}{{W}_{min}}$ (maximum and minimum weight of edges in $\mathbb{G}$).

preprint2015arXiv

Distributed SDDM Solvers: Theory & Applications

In this paper, we propose distributed solvers for systems of linear equations given by symmetric diagonally dominant M-matrices based on the parallel solver of Spielman and Peng. We propose two versions of the solvers, where in the first, full communication in the network is required, while in the second communication is restricted to the R-Hop neighborhood between nodes for some $R \geq 1$. We rigorously analyze the convergence and convergence rates of our solvers, showing that our methods are capable of outperforming state-of-the-art techniques. Having developed such solvers, we then contribute by proposing an accurate distributed Newton method for network flow optimization. Exploiting the sparsity pattern of the dual Hessian, we propose a Newton method for network flow optimization that is both faster and more accurate than state-of-the-art techniques. Our method utilizes the distributed SDDM solvers for determining the Newton direction up to any arbitrary precision $ε>0$. We analyze the properties of our algorithm and show superlinear convergence within a neighborhood of the optimal. Finally, in a set of experiments conducted on randomly generated and barbell networks, we demonstrate that our approach is capable of significantly outperforming state-of-the-art techniques.

preprint2015arXiv

Fast, Accurate Second Order Methods for Network Optimization

Dual descent methods are commonly used to solve network flow optimization problems, since their implementation can be distributed over the network. These algorithms, however, often exhibit slow convergence rates. Approximate Newton methods which compute descent directions locally have been proposed as alternatives to accelerate the convergence rates of conventional dual descent. The effectiveness of these methods, is limited by the accuracy of such approximations. In this paper, we propose an efficient and accurate distributed second order method for network flow problems. The proposed approach utilizes the sparsity pattern of the dual Hessian to approximate the the Newton direction using a novel distributed solver for symmetric diagonally dominant linear equations. Our solver is based on a distributed implementation of a recent parallel solver of Spielman and Peng (2014). We analyze the properties of the proposed algorithm and show that, similar to conventional Newton methods, superlinear convergence within a neighbor- hood of the optimal value is attained. We finally demonstrate the effectiveness of the approach in a set of experiments on randomly generated networks.

preprint2015arXiv

Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret

Lifelong reinforcement learning provides a promising framework for developing versatile agents that can accumulate knowledge over a lifetime of experience and rapidly learn new tasks by building upon prior knowledge. However, current lifelong learning methods exhibit non-vanishing regret as the amount of experience increases and include limitations that can lead to suboptimal or unsafe control policies. To address these issues, we develop a lifelong policy gradient learner that operates in an adversarial set- ting to learn multiple tasks online while enforcing safety constraints on the learned policies. We demonstrate, for the first time, sublinear regret for lifelong policy search, and validate our algorithm on several benchmark dynamical systems and an application to quadrotor control.

preprint2014arXiv

On the Degree Distribution of Pólya Urn Graph Processes

This paper presents a tighter bound on the degree distribution of arbitrary Pólya urn graph processes, proving that the proportion of vertices with degree $d$ obeys a power-law distribution $P(d) \propto d^{-γ}$ for $d \leq n^{\frac{1}{6}-ε}$ for any $ε> 0$, where $n$ represents the number of vertices in the network. Previous work by Bollobás et al. formalized the well-known preferential attachment model of Barabási and Albert, and showed that the power-law distribution held for $d \leq n^{\frac{1}{15}}$ with $γ= 3$. Our revised bound represents a significant improvement over existing models of degree distribution in scale-free networks, where its tightness is restricted by the Azuma-Hoeffding concentration inequality for martingales. We achieve this tighter bound through a careful analysis of the first set of vertices in the network generation process, and show that the newly acquired is at the edge of exhausting Bollobás model in the sense that the degree expectation breaks down for other powers.

Rasul Tutunov

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

HEBO Pushing The Limits of Sample-Efficient Hyperparameter Optimisation

Sample-Efficient Optimisation with Probabilistic Transformer Surrogates

Self-consistent Gradient-like Eigen Decomposition in Solving Schrödinger Equations

Efficient Semi-Implicit Variational Inference

$α^α$-Rank: Practically Scaling $α$-Rank through Stochastic Optimisation

Compositional ADAM: An Adaptive Compositional Solver

A Distributed Newton Method for Large Scale Consensus Optimization

An Exact Distributed Newton Method for Reinforcement Learning

A Fast Distributed Solver for Symmetric Diagonally Dominant Linear Equations

Distributed SDDM Solvers: Theory & Applications

Fast, Accurate Second Order Methods for Network Optimization

Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret

On the Degree Distribution of Pólya Urn Graph Processes