Source author record

Borjan Geshkovski

Borjan Geshkovski appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.AP math.OC Machine Learning eess.SY math.PR Systems and Control

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Kinetic theory for Transformers and the lost-in-the-middle phenomenon

We study causal self-attention dynamics -- a toy model for decoder Transformers -- which we interpret as a non-exchangeable interacting particle system. Adapting cumulant expansions to the triangular causal dependency structure of the model, and appealing to non-hierarchical methods to estimate correlations using Glauber calculus, we prove a quantitative mean-field limit result and a next-order characterization of correlations. For iid uniformly distributed tokens, the limiting correlation equation can be solved in closed form and we obtain a rigorous explanation of the empirically observed \emph{lost-in-the-middle} phenomenon: the token retrieval profile, as a function of the source position in the prompt, is $\mathsf{U}$-shaped, with primacy, recency, and a unique interior minimum under an explicit smallness condition.

preprint2022arXiv

Control of the Stefan problem in a periodic box

In this paper we consider the one-phase Stefan problem with surface tension, set in a two-dimensional strip-like geometry, with periodic boundary conditions respect to the horizontal direction $x_1\in\mathbb{T}$. We prove that the system is locally null-controllable in any positive time, by means of a control supported within an arbitrary open and non-empty subset. We proceed by a linear test and duality, but quickly find that the linearized system is not symmetric and the adjoint has a dynamic coupling between the two states through the (fixed) boundary. Hence, motivated by a Fourier decomposition with respect to $x_1$, we consider a family of one-dimensional systems and prove observability results which are uniform with respect to the Fourier frequency parameter. The latter results are also novel, as we compute the full spectrum of the underlying operator for the non-zero Fourier modes. The zeroth mode system, on the other hand, is seen as a controllability problem for the linear heat equation with a finite-dimensional constraint. The complete observability of the adjoint is derived by using a Lebeau-Robbiano strategy, and the local controllability of the nonlinear system is then shown by combining an adaptation of the source term method introduced in \cite{tucsnak_burgers} and a Banach fixed point argument. Numerical experiments motivate several challenging open problems, foraying even beyond the specific setting we deal with herein.

preprint2022arXiv

Sparsity in long-time control of neural ODEs

We consider the neural ODE and optimal control perspective of supervised learning, with $\ell^1$-control penalties, where rather than only minimizing a final cost (the \emph{empirical risk}) for the state, we integrate this cost over the entire time horizon. We prove that any optimal control (for this cost) vanishes beyond some positive stopping time. When seen in the discrete-time context, this result entails an \emph{ordered} sparsity pattern for the parameters of the associated residual neural network: ordered in the sense that these parameters are all $0$ beyond a certain layer. Furthermore, we provide a polynomial stability estimate for the empirical risk with respect to the time horizon. This can be seen as a \emph{turnpike property}, for nonsmooth dynamics and functionals with $\ell^1$-penalties, and without any smallness assumptions on the data, both of which are new in the literature.

preprint2022arXiv

Turnpike in Lipschitz-nonlinear optimal control

We present a new proof of the turnpike property for nonlinear optimal control problems, when the running target is a steady control-state pair of the underlying system. Our strategy combines the construction of quasi-turnpike controls via controllability, and a bootstrap argument, and does not rely on analyzing the optimality system or linearization techniques. This in turn allows us to address several optimal control problems for finite-dimensional, control-affine systems with globally Lipschitz (possibly nonsmooth) nonlinearities, without any smallness conditions on the initial data or the running target. These results are motivated by applications in machine learning through deep residual neural networks, which may be fit within our setting. We show that our methodology is applicable to controlled PDEs as well, such as the semilinear wave and heat equation with a globally Lipschitz nonlinearity, once again without any smallness assumptions.

preprint2022arXiv

Turnpike in optimal control of PDEs, ResNets, and beyond

The \emph{turnpike property} in contemporary macroeconomics asserts that if an economic planner seeks to move an economy from one level of capital to another, then the most efficient path, as long as the planner has enough time, is to rapidly move stock to a level close to the optimal stationary or constant path, then allow for capital to develop along that path until the desired term is nearly reached, at which point the stock ought to be moved to the final target. Motivated in part by its nature as a resource allocation strategy, over the past decade, the turnpike property has also been shown to hold for several classes of partial differential equations arising in mechanics. When formalized mathematically, the turnpike theory corroborates the insights from economics: for an optimal control problem set in a finite-time horizon, optimal controls and corresponding states, are close (often exponentially), during most of the time, except near the initial and final time, to the optimal control and corresponding state for the associated stationary optimal control problem. In particular, the former are mostly constant over time. This fact provides a rigorous meaning to the asymptotic simplification that some optimal control problems appear to enjoy over long time intervals, allowing the consideration of the corresponding stationary problem for computing and applications. We review a slice of the theory developed over the past decade --the controllability of the underlying system is an important ingredient, and can even be used to devise simple turnpike-like strategies which are nearly optimal--, and present several novel applications, including, among many others, the characterization of Hamilton-Jacobi-Bellman asymptotics, and stability estimates in deep learning via residual neural networks.

Borjan Geshkovski

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Kinetic theory for Transformers and the lost-in-the-middle phenomenon

Control of the Stefan problem in a periodic box

Sparsity in long-time control of neural ODEs

Turnpike in Lipschitz-nonlinear optimal control

Turnpike in optimal control of PDEs, ResNets, and beyond