Researcher profile

Pontus Giselsson

Pontus Giselsson contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

HeatKV: Head-tuned KV-cache Compression for Visual Autoregressive Modeling

Visual Autoregressive (VAR) models have recently demonstrated impressive image generation quality while maintaining low latency. However, they suffer from severe KV-cache memory constraints, often requiring gigabytes of memory per generated image. We introduce HeatKV, a novel compression method that adapts cache allocation in each head based on its attention to previously generated scales. Using a small offline calibration set, the attention heads are ranked according to their attention scores over prior scales. Based on this ranking, we construct a static pruning schedule tailored to a given memory budget. Applied to the Infinity-2B model, HeatKV achieves $2 \times$ higher compression ratio in memory allocation for KV cache compared to existing methods, while maintaining similar or better image fidelity, prompt alignment and human perception score. Our method achieves a new state-of-the-art (SOTA) for VAR model KV-cache compression, showcasing the effectiveness of fine-grained, head-specific cache allocation.

preprint2022arXiv

DWIFOB: A Dynamically Weighted Inertial Forward-Backward Algorithm for Monotone Inclusions

We propose a novel dynamically weighted inertial forward-backward algorithm (DWIFOB) for solving structured monotone inclusion problems. The scheme exploits the globally convergent forward-backward algorithm with deviations in [26] as the basis and combines it with the extrapolation technique used in Anderson acceleration to improve local convergence. We also present a globally convergent primal-dual variant of DWIFOB and numerically compare its performance to the primal-dual method of Chambolle-Pock and a Tikhonov regularized version of Anderson acceleration applied to the same problem. In all our numerical evaluations, the primal-dual variant of DWIFOB outperforms the Chambolle-Pock algorithm. Moreover, our numerical experiments suggest that our proposed method is much more robust than the regularized Anderson acceleration, which can fail to converge and be sensitive to algorithm parameters. These numerical experiments highlight that our method performs very well while still being robust and reliable.

preprint2022arXiv

Forward--Backward Splitting with Deviations for Monotone Inclusions

We propose and study a weakly convergent variant of the forward--backward algorithm for solving structured monotone inclusion problems. Our algorithm features a per-iteration deviation vector which provides additional degrees of freedom. The only requirement on the deviation vector to guarantee convergence is that its norm is bounded by a quantity that can be computed online. This approach provides great flexibility and opens up for the design of new and improved forward--backward-based algorithms, while retaining global convergence guarantees. These guarantees include linear convergence of our method under a metric subregularity assumption without the need to adapt the algorithm parameters. Choosing suitable monotone operators allows for incorporating deviations into other algorithms, such as Chambolle--Pock and Krasnoselsky--Mann iterations. We propose a novel inertial primal--dual algorithm by selecting the deviations along a momentum direction and deciding their size using the norm condition. Numerical experiments demonstrate our convergence claims and show that even this simple choice of deviation vector can improve the performance, compared, e.g., to the standard Chambolle--Pock algorithm.

preprint2021arXiv

Nonlinear Forward-Backward Splitting with Projection Correction

We propose and analyze a versatile and general algorithm called nonlinear forward-backward splitting (NOFOB). The algorithm consists of two steps; first an evaluation of a nonlinear forward-backward map followed by a relaxed projection onto the separating hyperplane it constructs. The key of the method is the nonlinearity in the forward-backward step, where the backward part is based on a nonlinear resolvent construction that allows for the kernel in the resolvent to be a nonlinear single-valued maximal monotone operator. This generalizes the standard resolvent as well as the Bregman resolvent, whose resolvent kernels are gradients of convex functions. This construction opens up for a new understanding of many existing operator splitting methods and paves the way for devising new algorithms. In particular, we present a four-operator splitting method as a special case of NOFOB that relies nonlinearity and nonsymmetry in the forward-backward kernel. We show that forward-backward-forward splitting (FBF), forward-backward-half-forward splitting (FBHF), asymmetric forward-backward-adjoint splitting (AFBA) with its many special cases, as well as synchronous projective splitting are special cases of the four-operator splitting method and hence of NOFOB. We also show that standard formulations of FB(H)F use smaller relaxations in the projections than allowed in NOFOB. Besides proving convergence for NOFOB, we show linear convergence under a metric subregularity assumption, which in a unified manner shows (in some cases new) linear convergence results for its special cases.

preprint2021arXiv

Two Applications of Deep Learning in the Physical Layer of Communication Systems

Deep learning has proved itself to be a powerful tool to develop data-driven signal processing algorithms for challenging engineering problems. By learning the key features and characteristics of the input signals, instead of requiring a human to first identify and model them, learned algorithms can beat many man-made algorithms. In particular, deep neural networks are capable of learning the complicated features in nature-made signals, such as photos and audio recordings, and use them for classification and decision making. The situation is rather different in communication systems, where the information signals are man-made, the propagation channels are relatively easy to model, and we know how to operate close to the Shannon capacity limits. Does this mean that there is no role for deep learning in the development of future communication systems?

preprint2020arXiv

Operator Splitting Performance Estimation: Tight contraction factors and optimal parameter selection

We propose a methodology for studying the performance of common splitting methods through semidefinite programming. We prove tightness of the methodology and demonstrate its value by presenting two applications of it. First, we use the methodology as a tool for computer-assisted proofs to prove tight analytical contraction factors for Douglas--Rachford splitting that are likely too complicated for a human to find bare-handed. Second, we use the methodology as an algorithmic tool to computationally select the optimal splitting method parameters by solving a series of semidefinite programs.

preprint2019arXiv

On compositions of special cases of Lipschitz continuous operators

Many iterative optimization algorithms involve compositions of special cases of Lipschitz continuous operators, namely firmly nonexpansive, averaged and nonexpansive operators. The structure and properties of the compositions are of particular importance in the proofs of convergence of such algorithms. In this paper, we systematically study the compositions of further special cases of Lipschitz continuous operators. Applications of our results include compositions of scaled conically nonexpansive mappings, as well as the Douglas--Rachford and forward-backward operators, when applied to solve certain structured monotone inclusion and optimization problems. Several examples illustrate and tighten our conclusions.

preprint2018arXiv

Efficient Proximal Mapping Computation for Unitarily Invariant Low-Rank Inducing Norms

Low-rank inducing unitarily invariant norms have been introduced to convexify problems with low-rank/sparsity constraint. They are the convex envelope of a unitary invariant norm and the indicator function of an upper bounding rank constraint. The most well-known member of this family is the so-called nuclear norm. To solve optimization problems involving such norms with proximal splitting methods, efficient ways of evaluating the proximal mapping of the low-rank inducing norms are needed. This is known for the nuclear norm, but not for most other members of the low-rank inducing family. This work supplies a framework that reduces the proximal mapping evaluation into a nested binary search, in which each iteration requires the solution of a much simpler problem. This simpler problem can often be solved analytically as it is demonstrated for the so-called low-rank inducing Frobenius and spectral norms. Moreover, the framework allows to compute the proximal mapping of compositions of these norms with increasing convex functions and the projections onto their epigraphs. This has the additional advantage that we can also deal with compositions of increasing convex functions and low-rank inducing norms in proximal splitting methods.

preprint2016arXiv

Line Search for Averaged Operator Iteration

Many popular first order algorithms for convex optimization, such as forward-backward splitting, Douglas-Rachford splitting, and the alternating direction method of multipliers (ADMM), can be formulated as averaged iteration of a nonexpansive mapping. In this paper we propose a line search for averaged iteration that preserves the theoretical convergence guarantee, while often accelerating practical convergence. We discuss several general cases in which the additional computational cost of the line search is modest compared to the savings obtained.

preprint2016arXiv

Line Search For Generalized Alternating Projections

This paper is about line search for the generalized alternating projections (GAP) method. This method is a generalization of the von Neumann alternating projections method, where instead of performing alternating projections, relaxed projections are alternated. The method can be interpreted as an averaged iteration of a nonexpansive mapping. Therefore, a recently proposed line search method for such algorithms is applicable to GAP. We evaluate this line search and show situations when the line search can be performed with little additional cost. We also present a variation of the basic line search for GAP - the projected line search. We prove its convergence and show that the line search condition is convex in the step length parameter. We show that almost all convex optimization problems can be solved using this approach and numerical results show superior performance with both the standard and the projected line search, sometimes by several orders of magnitude, compared to the nominal method.

preprint2016arXiv

Linear Convergence and Metric Selection for Douglas-Rachford Splitting and ADMM

Recently, several convergence rate results for Douglas-Rachford splitting and the alternating direction method of multipliers (ADMM) have been presented in the literature. In this paper, we show global linear convergence rate bounds for Douglas-Rachford splitting and ADMM under strong convexity and smoothness assumptions. We further show that the rate bounds are tight for the class of problems under consideration for all feasible algorithm parameters. For problems that satisfy the assumptions, we show how to select step-size and metric for the algorithm that optimize the derived convergence rate bounds. For problems with a similar structure that do not satisfy the assumptions, we present heuristic step-size and metric selection methods.

preprint2015arXiv

Tight Linear Convergence Rate Bounds for Douglas-Rachford Splitting and ADMM

Douglas-Rachford splitting and the alternating direction method of multipliers (ADMM) can be used to solve convex optimization problems that consist of a sum of two functions. Convergence rate estimates for these algorithms have received much attention lately. In particular, linear convergence rates have been shown by several authors under various assumptions. One such set of assumptions is strong convexity and smoothness of one of the functions in the minimization problem. The authors recently provided a linear convergence rate bound for such problems. In this paper, we show that this rate bound is tight for many algorithm parameter choices.

preprint2014arXiv

Improving Fast Dual Ascent for MPC - Part I: The Distributed Case

In dual decomposition, the dual to an optimization problem with a specific structure is solved in distributed fashion using (sub)gradient and recently also fast gradient methods. The traditional dual decomposition suffers from two main short-comings. The first is that the convergence is often slow, although fast gradient methods have significantly improved the situation. The second is that computation of the optimal step-size requires centralized computations, which hinders a fully distributed implementation of the algorithm. In this paper, the first issue is addressed by providing a tighter characterization of the dual function than what has previously been reported in the literature. Then a distributed and a parallel algorithm are presented in which the provided dual function approximation is minimized in each step. Since the approximation is more accurate than the approximation used in standard and fast dual decomposition, the convergence properties are improved. For the second issue, we extend a recent result to allow for a fully distributed parameter selection in the algorithm. Further, we show how to apply the proposed algorithms to optimization problems arising in distributed model predictive control (DMPC) and show that the proposed distributed algorithm enjoys distributed reconfiguration, i.e. plug-and-play, in the DMPC context.

preprint2014arXiv

Improving Fast Dual Ascent for MPC - Part II: The Embedded Case

Recently, several authors have suggested the use of first order methods, such as fast dual ascent and the alternating direction method of multipliers, for embedded model predictive control. The main reason is that they can be implemented using simple arithmetic operations only. However, a known limitation of gradient-based methods is that they are sensitive to ill-conditioning of the problem data. In this paper, we present a fast dual gradient method for which the sensitivity to ill-conditioning is greatly reduced. This is achieved by approximating the negative dual function with a quadratic upper bound with different curvature in different directions in the algorithm, as opposed to having the same curvature in all directions as in standard fast gradient methods. The main contribution of this paper is a characterization of the set of matrices that can be used to form such a quadratic upper bound to the negative dual function. We also describe how to choose a matrix from this set to get an improved approximation of the dual function, especially if it is ill-conditioned, compared to the approximation used in standard fast dual gradient methods. This can give a significantly improved performance as illustrated by a numerical evaluation on an ill-conditioned AFTI-16 aircraft model.

preprint2013arXiv

A distributed accelerated gradient algorithm for distributed model predictive control of a hydro power valley

A distributed model predictive control (DMPC) approach based on distributed optimization is applied to the power reference tracking problem of a hydro power valley (HPV) system. The applied optimization algorithm is based on accelerated gradient methods and achieves a convergence rate of O(1/k^2), where k is the iteration number. Major challenges in the control of the HPV include a nonlinear and large-scale model, nonsmoothness in the power-production functions, and a globally coupled cost function that prevents distributed schemes to be applied directly. We propose a linearization and approximation approach that accommodates the proposed the DMPC framework and provides very similar performance compared to a centralized solution in simulations. The provided numerical studies also suggest that for the sparsely interconnected system at hand, the distributed algorithm we propose is faster than a centralized state-of-the-art solver such as CPLEX.

preprint2013arXiv

On feasibility, stability and performance in distributed model predictive control

In distributed model predictive control (DMPC), where a centralized optimization problem is solved in distributed fashion using dual decomposition, it is important to keep the number of iterations in the solution algorithm, i.e. the amount of communication between subsystems, as small as possible. At the same time, the number of iterations must be enough to give a feasible solution to the optimization problem and to guarantee stability of the closed loop system. In this paper, a stopping condition to the distributed optimization algorithm that guarantees these properties, is presented. The stopping condition is based on two theoretical contributions. First, since the optimization problem is solved using dual decomposition, standard techniques to prove stability in model predictive control (MPC), i.e. with a terminal cost and a terminal constraint set that involve all state variables, do not apply. For the case without a terminal cost or a terminal constraint set, we present a new method to quantify the control horizon needed to ensure stability and a prespecified performance. Second, the stopping condition is based on a novel adaptive constraint tightening approach. Using this adaptive constraint tightening approach, we guarantee that a primal feasible solution to the optimization problem is found and that closed loop stability and performance is obtained. Numerical examples show that the number of iterations needed to guarantee feasibility of the optimization problem, stability and a prespecified performance of the closed-loop system can be reduced significantly using the proposed stopping condition.