Source author record

Sebastian Pokutta

Sebastian Pokutta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

44works

21topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Agentic MIP Research: Accelerated Constraint Handler Generation

Mixed-integer programming (MIP) research is both mathematically sophisticated and engineering-intensive: testing an algorithmic hypothesis within a branch-and-cut solver requires substantial implementation, debugging, tuning, and large-scale benchmarking. We propose an agentic MIP research framework that shortens this feedback loop by embedding LLM agents into a solver-aware harness for generating, verifying, and evaluating plugins for the open-source solver SCIP. Propagation methods play a central role in accelerating MIP solving by exploiting global constraints. We instantiate our framework on the semantic lifting of MIP formulations into global constraints and the automatic construction of propagation-only SCIP constraint handlers. On the MIPLIB 2017 benchmark set, the framework successfully recovers global constraint structures from constraint programming and generates executable constraint detectors and propagation-only constraint handlers. Furthermore, the framework naturally extends to in-context learning within a sandboxed environment, enabling agents not only to tune and debug generated constraint handlers on real instances, but also to explore global constraint patterns in MIP problems and discover novel propagation strategies not yet implemented in SCIP. This framework allows us to systematically distinguish meaningful algorithmic improvements from low-value or overly costly candidates: the novel propagation methods successfully solved five additional instances within the explored benchmark. Overall, this framework demonstrates that LLM agents can autonomously navigate the complex MIP research loop, paving the way for a more automated solver development process.

preprint2026arXiv

Global Optimization for Combinatorial Geometry Problems Revisited in the Era of LLMs

Recent progress in LLM-driven algorithm discovery, exemplified by DeepMind's AlphaEvolve, has produced new best-known solutions for a range of hard geometric and combinatorial problems. This raises a natural question: to what extent can modern off-the-shelf global optimization solvers match such results when the problems are formulated directly as nonlinear optimization problems (NLPs)? We revisit a subset of problems from the AlphaEvolve benchmark suite and evaluate straightforward NLP formulations with two state-of-the-art solvers, the commercial FICO Xpress and the open-source SCIP. Without any solver modifications, both solvers reproduce, and in several cases improve upon, the best solutions previously reported in the literature, including the recent LLM-driven discoveries. Our results not only highlight the maturity of generic NLP technology and its ability to tackle nonlinear mathematical problems that were out of reach for general-purpose solvers only a decade ago, but also position global NLP solvers as powerful tools that may be exploited within LLM-driven algorithm discovery.

preprint2026arXiv

What Do Evolutionary Coding Agents Evolve?

Recent work pairs LLMs with evolutionary search to iteratively generate, modify, and select code using task-specific feedback. These systems have produced strong results in mathematical discovery and algorithm design, yet a fundamental question remains: what do they actually evolve? Progress is typically summarized by the best score a run reaches under a task-specific evaluator, but that score can reflect several different mechanisms: new algorithmic structure, re-tuning an existing strategy, recombining ideas already in the model's internal knowledge, or overfitting to the evaluator. Distinguishing these mechanisms requires inspecting the search process itself, not only its final outcome. We introduce EvoTrace, a dataset of evolutionary coding traces spanning four evolutionary frameworks, reasoning and non-reasoning models, and 16 tasks across mathematics and algorithm design. To analyze these traces, we develop EvoReplay, a replay-based methodology that reconstructs the local search states behind high-scoring solutions and tests controlled interventions, including adjusting constants, removing program components and substituting models or prompting contexts. We annotate every code edit in EvoTrace with one of nine recurring edit types using an LLM-as-judge pipeline validated against blind human re-annotation. Across EvoTrace, most score gains come from a small subset of these edit types. We further find a deterministic cycling pattern: about 30% of code lines added during search are byte-identical re-introductions of previously-deleted lines, present throughout nearly every run. These results show that benchmark gains in evolutionary coding agents can arise from qualitatively different mechanisms, only some of which correspond to new algorithmic structure. EvoTrace enables more diagnostic evaluation of evolutionary coding agents beyond final benchmark scores.

preprint2023arXiv

Accelerated Riemannian Optimization: Handling Constraints with a Prox to Bound Geometric Penalties

We propose a globally-accelerated, first-order method for the optimization of smooth and (strongly or not) geodesically-convex functions in a wide class of Hadamard manifolds. We achieve the same convergence rates as Nesterov's accelerated gradient descent, up to a multiplicative geometric penalty and log factors. Crucially, we can enforce our method to stay within a compact set we define. Prior fully accelerated works \emph{resort to assuming} that the iterates of their algorithms stay in some pre-specified compact set, except for two previous methods of limited applicability. For our manifolds, this solves the open question in [KY22] about obtaining global general acceleration without iterates assumptively staying in the feasible set. In our solution, we design an accelerated Riemannian inexact proximal point algorithm, which is a result that was unknown even with exact access to the proximal operator, and is of independent interest. For smooth functions, we show we can implement the prox step inexactly with first-order methods in Riemannian balls of certain diameter that is enough for global accelerated optimization.

preprint2023arXiv

Low-rank tensor decompositions of quantum circuits

Quantum computing is arguably one of the most revolutionary and disruptive technologies of this century. Due to the ever-increasing number of potential applications as well as the continuing rise in complexity, the development, simulation, optimization, and physical realization of quantum circuits is of utmost importance for designing novel algorithms. We show how matrix product states (MPSs) and matrix product operators (MPOs) can be used to express certain quantum states, quantum gates, and entire quantum circuits as low-rank tensors. This enables the analysis and simulation of complex quantum circuits on classical computers and to gain insight into the underlying structure of the system. We present different examples to demonstrate the advantages of MPO formulations and show that they are more efficient than conventional techniques if the bond dimensions of the wave function representation can be kept small throughout the simulation.

preprint2022arXiv

Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

We study the effects of constrained optimization formulations and Frank-Wolfe algorithms for obtaining interpretable neural network predictions. Reformulating the Rate-Distortion Explanations (RDE) method for relevance attribution as a constrained optimization problem provides precise control over the sparsity of relevance maps. This enables a novel multi-rate as well as a relevance-ordering variant of RDE that both empirically outperform standard RDE and other baseline methods in a well-established comparison test. We showcase several deterministic and stochastic variants of the Frank-Wolfe algorithm and their effectiveness for RDE.

preprint2022arXiv

Minimizing a low-dimensional convex function over a high-dimensional cube

For a matrix $W \in \mathbb{Z}^{m \times n}$, $m \leq n$, and a convex function $g: \mathbb{R}^m \rightarrow \mathbb{R}$, we are interested in minimizing $f(x) = g(Wx)$ over the set $\{0,1\}^n$. We will study separable convex functions and sharp convex functions $g$. Moreover, the matrix $W$ is unknown to us. Only the number of rows $m \leq n$ and $\|W\|_{\infty}$ is revealed. The composite function $f(x)$ is presented by a zeroth and first order oracle only. Our main result is a proximity theorem that ensures that an integral minimum and a continuous minimum for separable convex and sharp convex functions are always "close" by. This will be a key ingredient to develop an algorithm for detecting an integer minimum that achieves a running time of roughly $(m \| W \|_{\infty})^{\mathcal{O}(m^3)} \cdot \text{poly}(n)$. In the special case when $(i)$ $W$ is given explicitly and $(ii)$ $g$ is separable convex one can also adapt an algorithm of Hochbaum and Shanthikumar. The running time of this adapted algorithm matches with the running time of our general algorithm.

preprint2022arXiv

Principled Deep Neural Network Training through Linear Programming

Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident in recent years. In this work, using a unified framework, we show that there exists a polyhedron which encodes simultaneously all possible deep neural network training problems that can arise from a given architecture, activation functions, loss function, and sample-size. Notably, the size of the polyhedral representation depends only linearly on the sample-size, and a better dependency on several other network parameters is unlikely (assuming $P\neq NP$). Additionally, we use our polyhedral representation to obtain new and better computational complexity results for training problems of well-known neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal a strong structure arising from these problems.

preprint2022arXiv

Sparser Kernel Herding with Pairwise Conditional Gradients without Swap Steps

The Pairwise Conditional Gradients (PCG) algorithm is a powerful extension of the Frank-Wolfe algorithm leading to particularly sparse solutions, which makes PCG very appealing for problems such as sparse signal recovery, sparse regression, and kernel herding. Unfortunately, PCG exhibits so-called swap steps that might not provide sufficient primal progress. The number of these bad steps is bounded by a function in the dimension and as such known guarantees do not generalize to the infinite-dimensional case, which would be needed for kernel herding. We propose a new variant of PCG, the so-called Blended Pairwise Conditional Gradients (BPCG). This new algorithm does not exhibit any swap steps, is very easy to implement, and does not require any internal gradient alignment procedures. The convergence rate of BPCG is basically that of PCG if no drop steps would occur and as such is no worse than PCG but improves and provides new rates in many cases. Moreover, we observe in the numerical experiments that BPCG's solutions are much sparser than those of PCG. We apply BPCG to the kernel herding setting, where we derive nice quadrature rules and provide numerical results demonstrating the performance of our method.

preprint2022arXiv

The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights

Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.

preprint2022arXiv

Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four

One of the goals of Explainable AI (XAI) is to determine which input components were relevant for a classifier decision. This is commonly know as saliency attribution. Characteristic functions (from cooperative game theory) are able to evaluate partial inputs and form the basis for theoretically "fair" attribution methods like Shapley values. Given only a standard classifier function, it is unclear how partial input should be realised. Instead, most XAI-methods for black-box classifiers like neural networks consider counterfactual inputs that generally lie off-manifold. This makes them hard to evaluate and easy to manipulate. We propose a setup to directly train characteristic functions in the form of neural networks to play simple two-player games. We apply this to the game of Connect Four by randomly hiding colour information from our agents during training. This has three advantages for comparing XAI-methods: It alleviates the ambiguity about how to realise partial input, makes off-manifold evaluation unnecessary and allows us to compare the methods by letting them play against each other.

preprint2021arXiv

Adversaries in Online Learning Revisited: with applications in Robust Optimization and Adversarial training

We revisit the concept of "adversary" in online learning, motivated by solving robust optimization and adversarial training using online learning methods. While one of the classical setups in online learning deals with the "adversarial" setup, it appears that this concept is used less rigorously, causing confusion in applying results and insights from online learning. Specifically, there are two fundamentally different types of adversaries, depending on whether the "adversary" is able to anticipate the exogenous randomness of the online learning algorithms. This is particularly relevant to robust optimization and adversarial training because the adversarial sequences are often anticipative, and many online learning algorithms do not achieve diminishing regret in such a case. We then apply this to solving robust optimization problems or (equivalently) adversarial training problems via online learning and establish a general approach for a large variety of problem classes using imaginary play. Here two players play against each other, the primal player playing the decisions and the dual player playing realizations of uncertain data. When the game terminates, the primal player has obtained an approximately robust solution. This meta-game allows for solving a large variety of robust optimization and multi-objective optimization problems and generalizes the approach of arXiv:1402.6361.

preprint2021arXiv

Dual Prices for Frank--Wolfe Algorithms

In this note we observe that for constrained convex minimization problems $\min_{x \in P}f(x)$ over a polytope $P$, dual prices for the linear program $\min_{z \in P} \nabla f(x) z$ obtained from linearization at approximately optimal solutions $x$ have a similar interpretation of rate of change in optimal value as for linear programming, providing a convex form of sensitivity analysis. This is of particular interest for Frank--Wolfe algorithms (also called conditional gradients), forming an important class of first-order methods, where a basic building block is linear minimization of gradients of $f$ over $P$, which in most implementations already compute the dual prices as a by-product.

preprint2021arXiv

Local and Global Uniform Convexity Conditions

We review various characterizations of uniform convexity and smoothness on norm balls in finite-dimensional spaces and connect results stemming from the geometry of Banach spaces with \textit{scaling inequalities} used in analysing the convergence of optimization methods. In particular, we establish local versions of these conditions to provide sharper insights on a recent body of complexity results in learning theory, online learning, or offline optimization, which rely on the strong convexity of the feasible set. While they have a significant impact on complexity, these strong convexity or uniform convexity properties of feasible sets are not exploited as thoroughly as their functional counterparts, and this work is an effort to correct this imbalance. We conclude with some practical examples in optimization and machine learning where leveraging these conditions and localized assumptions lead to new complexity results.

preprint2021arXiv

Projection-Free Adaptive Gradients for Large-Scale Optimization

The complexity in large-scale optimization can lie in both handling the objective function and handling the constraint set. In this respect, stochastic Frank-Wolfe algorithms occupy a unique position as they alleviate both computational burdens, by querying only approximate first-order information from the objective and by maintaining feasibility of the iterates without using projections. In this paper, we improve the quality of their first-order information by blending in adaptive gradients. We derive convergence rates and demonstrate the computational advantage of our method over the state-of-the-art stochastic Frank-Wolfe algorithms on both convex and nonconvex objectives. The experiments further show that our method can improve the performance of adaptive gradient algorithms for constrained optimization.

preprint2020arXiv

An Online-Learning Approach to Inverse Optimization

In this paper, we demonstrate how to learn the objective function of a decision-maker while only observing the problem input data and the decision-maker's corresponding decisions over multiple rounds. We present exact algorithms for this online version of inverse optimization which converge at a rate of $ \mathcal{O}(1/\sqrt{T}) $ in the number of observations~$T$ and compare their further properties. Especially, they all allow taking decisions which are essentially as good as those of the observed decision-maker already after relatively few iterations, but are suited best for different settings each. Our approach is based on online learning and works for linear objectives over arbitrary feasible sets for which we have a linear optimization oracle. As such, it generalizes previous approaches based on KKT-system decomposition and dualization. We also introduce several generalizations, such as the approximate learning of non-linear objective functions, dynamically changing as well as parameterized objectives and the case of suboptimal observed decisions. When applied to the stochastic offline case, our algorithms are able to give guarantees on the quality of the learned objectives in expectation. Finally, we show the effectiveness and possible applications of our methods in indicative computational experiments.

preprint2020arXiv

Boosting Frank-Wolfe by Chasing Gradients

The Frank-Wolfe algorithm has become a popular first-order optimization algorithm for it is simple and projection-free, and it has been successfully applied to a variety of real-world problems. Its main drawback however lies in its convergence rate, which can be excessively slow due to naive descent directions. We propose to speed up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient via a subroutine. This subroutine chases the negative gradient direction in a matching pursuit-style while still preserving the projection-free property. Although the approach is reasonably natural, it produces very significant results. We derive convergence rates $\mathcal{O}(1/t)$ to $\mathcal{O}(e^{-ωt})$ of our method and we demonstrate its competitive advantage both per iteration and in CPU time over the state-of-the-art in a series of computational experiments.

preprint2020arXiv

IPBoost -- Non-Convex Boosting via Integer Programming

Recently non-convex optimization approaches for solving machine learning problems have gained significant attention. In this paper we explore non-convex boosting in classification by means of integer programming and demonstrate real-world practicability of the approach while circumventing shortcomings of convex boosting approaches. We report results that are comparable to or better than the current state-of-the-art.

preprint2020arXiv

On the Unreasonable Effectiveness of the Greedy Algorithm: Greedy Adapts to Sharpness

Submodular maximization has been widely studied over the past decades, mostly because of its numerous applications in real-world problems. It is well known that the standard greedy algorithm guarantees a worst-case approximation factor of 1-1/e when maximizing a monotone submodular function under a cardinality constraint. However, empirical studies show that its performance is substantially better in practice. This raises a natural question of explaining this improved performance of the greedy algorithm. In this work, we define sharpness for submodular functions as a candidate explanation for this phenomenon. The sharpness criterion is inspired by the concept of strong convexity in convex optimization. We show that the greedy algorithm provably performs better as the sharpness of the submodular function increases. This improvement ties in closely with the faster convergence rates of first order methods for sharp functions in convex optimization. Finally, we perform a computational study to empirically support our theoretical results and show that sharpness explains the greedy performance better than other justifications in the literature.

preprint2020arXiv

Projection-Free Optimization on Uniformly Convex Sets

The Frank-Wolfe method solves smooth constrained convex optimization problems at a generic sublinear rate of $\mathcal{O}(1/T)$, and it (or its variants) enjoys accelerated convergence rates for two fundamental classes of constraints: polytopes and strongly-convex sets. Uniformly convex sets non-trivially subsume strongly convex sets and form a large variety of \textit{curved} convex sets commonly encountered in machine learning and signal processing. For instance, the $\ell_p$-balls are uniformly convex for all $p > 1$, but strongly convex for $p\in]1,2]$ only. We show that these sets systematically induce accelerated convergence rates for the original Frank-Wolfe algorithm, which continuously interpolate between known rates. Our accelerated convergence rates emphasize that it is the curvature of the constraint sets -- not just their strong convexity -- that leads to accelerated convergence rates. These results also importantly highlight that the Frank-Wolfe algorithm is adaptive to much more generic constraint set structures, thus explaining faster empirical convergence. Finally, we also show accelerated convergence rates when the set is only locally uniformly convex and provide similar results in online linear optimization.

preprint2020arXiv

Restarting Algorithms: Sometimes there is Free Lunch

In this overview article we will consider the deliberate restarting of algorithms, a meta technique, in order to improve the algorithm's performance, e.g., convergence rates or approximation guarantees. One of the major advantages is that restarts are relatively black box, not requiring any (significant) changes to the base algorithm that is restarted or the underlying argument, while leading to potentially significant improvements, e.g., from sublinear to linear rates of convergence. Restarts are widely used in different fields and have become a powerful tool to leverage additional information that has not been directly incorporated in the base algorithm or argument. We will review restarts in various settings from continuous optimization, discrete optimization, and submodular function maximization where they have delivered impressive results.

preprint2016arXiv

Affine reductions for LPs and SDPs

We define a reduction mechanism for LP and SDP formulations that degrades approximation factors in a controlled fashion. Our reduction mechanism is a minor restriction of classical reductions establishing inapproximability in the context of PCP theorems. As a consequence we establish strong linear programming inapproximability (for LPs with a polynomial number of constraints) for many problems. In particular we obtain a $3/2-\varepsilon$ inapproximability for VertexCover answering an open question in [arXiv:1309.0563] and we answer a weak version of our sparse graph conjecture posed in [arXiv:1311.4001] showing an inapproximability factor of $1/2+\varepsilon$ for bounded degree IndependentSet. In the case of SDPs, we obtain inapproximability results for these problems relative to the SDP-inapproximability of MaxCUT. Moreover, using our reduction framework we are able to reproduce various results for CSPs from [arXiv:1309.0563] via simple reductions from Max-2-XOR.

preprint2016arXiv

Aggregation-based cutting-planes for packing and covering integer programs

In this paper, we study the strength of Chvatal-Gomory (CG) cuts and more generally aggregation cuts for packing and covering integer programs (IPs). Aggregation cuts are obtained as follows: Given an IP formulation, we first generate a single implied inequality using aggregation of the original constraints, then obtain the integer hull of the set defined by this single inequality with variable bounds, and finally use the inequalities describing the integer hull as cutting-planes. Our first main result is to show that for packing and covering IPs, the CG and aggregation closures can be 2-approximated by simply generating the respective closures for each of the original formulation constraints, without using any aggregations. On the other hand, we use computational experiments to show that aggregation cuts can be arbitrarily stronger than cuts from individual constraints for general IPs. The proof of the above stated results for the case of covering IPs with bounds require the development of some new structural results, which may be of independent interest. Finally, we examine the strength of cuts based on k different aggregation inequalities simultaneously, the so-called multi-row cuts, and show that every packing or covering IP with a large integrality gap also has a large k-aggregation closure rank. In particular, this rank is always at least of the order of the logarithm of the integrality gap.

preprint2016arXiv

An efficient high-probability algorithm for Linear Bandits

For the linear bandit problem, we extend the analysis of algorithm CombEXP from [R. Combes, M. S. Talebi Mazraeh Shahi, A. Proutiere, and M. Lelarge. Combinatorial bandits revisited. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2116--2124. Curran Associates, Inc., 2015. URL http://papers.nips.cc/paper/5831-combinatorial-bandits-revisited.pdf] to the high-probability case against adaptive adversaries, allowing actions to come from an arbitrary polytope. We prove a high-probability regret of $O(T^{2/3})$ for time horizon $T$. While this bound is weaker than the optimal $O(\sqrt{T})$ bound achieved by GeometricHedge in [P. L. Bartlett, V. Dani, T. Hayes, S. Kakade, A. Rakhlin, and A. Tewari. High-probability regret bounds for bandit online linear optimization. In 21th Annual Conference on Learning Theory (COLT 2008), July 2008. http://eprints.qut.edu.au/45706/1/30-Bartlett.pdf], CombEXP is computationally efficient, requiring only an efficient linear optimization oracle over the convex hull of the actions.

preprint2016arXiv

Average case polyhedral complexity of the maximum stable set problem

We study the minimum number of constraints needed to formulate random instances of the maximum stable set problem via linear programs (LPs), in two distinct models. In the uniform model, the constraints of the LP are not allowed to depend on the input graph, which should be encoded solely in the objective function. There we prove a $2^{Ω(n/ \log n)}$ lower bound with probability at least $1 - 2^{-2^n}$ for every LP that is exact for a randomly selected set of instances; each graph on at most n vertices being selected independently with probability $p \geq 2^{-\binom{n/4}{2}+n}$. In the non-uniform model, the constraints of the LP may depend on the input graph, but we allow weights on the vertices. The input graph is sampled according to the G(n, p) model. There we obtain upper and lower bounds holding with high probability for various ranges of p. We obtain a super-polynomial lower bound all the way from $p = Ω(\log^{6+\varepsilon} / n)$ to $p = o (1 / \log n)$. Our upper bound is close to this as there is only an essentially quadratic gap in the exponent, which currently also exists in the worst-case model. Finally, we state a conjecture that would close this gap, both in the average-case and worst-case models.

preprint2016arXiv

Hierarchical Clustering via Spreading Metrics

We study the cost function for hierarchical clusterings introduced by [arXiv:1510.05043] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical clustering of cost at most $O\left(α_n \log n\right)$ times the cost of the optimal hierarchical clustering, where $α_n$ is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top down algorithm returns a hierarchical clustering of cost at most $O\left(\log^{3/2} n\right)$ times the cost of the optimal solution. We improve this by giving an $O(\log{n})$-approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics induced by this cost function, deriving an Integer Linear Programming (ILP) formulation for this family of ultrametrics, and showing how to iteratively round an LP relaxation of this formulation by using the idea of \emph{sphere growing} which has been extensively used in the context of graph partitioning. We also prove that our algorithm returns an $O(\log{n})$-approximate hierarchical clustering for a generalization of this cost function also studied in [arXiv:1510.05043]. Experiments show that the hierarchies found by using the ILP formulation as well as our rounding algorithm often have better projections into flat clusters than the standard linkage based algorithms. We also give constant factor inapproximability results for this problem.

preprint2016arXiv

The matching problem has no small symmetric SDP

Yannakakis showed that the matching problem does not have a small symmetric linear program. Rothvoß recently proved that any, not necessarily symmetric, linear program also has exponential size. It is natural to ask whether the matching problem can be expressed compactly in a framework such as semidefinite programming (SDP) that is more powerful than linear programming but still allows efficient optimization. We answer this question negatively for symmetric SDPs: any symmetric SDP for the matching problem has exponential size. We also show that an O(k)-round Lasserre SDP relaxation for the metric traveling salesperson problem yields at least as good an approximation as any symmetric SDP relaxation of size $n^k$. The key technical ingredient underlying both these results is an upper bound on the degree needed to derive polynomial identities that hold over the space of matchings or traveling salesperson tours.

preprint2016arXiv

Toward a Science of Autonomy for Physical Systems: Transportation

Transportation systems are currently being transformed by advances in information and communication technologies. The development of autonomous transportation holds the promise of providing revolutionary improvements in speed, efficiency, safety and reliability along with concomitant benefits for society and economy. It is anticipated these changes will soon affect household activity patterns, public safety, supply chains and logistics, manufacturing, and quality of life in general.

preprint2015arXiv

An information diffusion Fano inequality

In this note, we present an information diffusion inequality derived from an elementary argument, which gives rise to a very general Fano-type inequality. The latter unifies and generalizes the distance-based Fano inequality and the continuous Fano inequality established in [Corollary 1, Propositions 1 and 2, arXiv:1311.2669v2], as well as the generalized Fano inequality in [Equation following (10); T. S. Han and S. Verdú. Generalizing the Fano inequality. IEEE Transactions on Information Theory, 40(4):1247-1251, July 1994].

preprint2015arXiv

Border bases and order ideals: a polyhedral characterization

Border bases arise as a canonical generalization of Gröbner bases. We provide a polyhedral characterization of all order ideals (and hence border bases) that are supported by a zero-dimensional ideal: order ideals that support a border basis correspond one-to-one to integral points of the order ideal polytope. In particular, we establish a crucial connection between the ideal and its combinatorial structure. Based on this characterization we adapt the classical border basis algorithm to allow for computing border bases for arbitrary order ideals, which are independent of term orderings. We also show that finding a maximum weight order ideal that supports a border basis is NP-hard, and that the convex hull of admissible order ideals has no polynomial polyhedral description.

preprint2015arXiv

Exponential Lower Bounds for Polytopes in Combinatorial Optimization

We solve a 20-year old problem posed by Yannakakis and prove that there exists no polynomial-size linear program (LP) whose associated polytope projects to the traveling salesman polytope, even if the LP is not required to be symmetric. Moreover, we prove that this holds also for the cut polytope and the stable set polytope. These results were discovered through a new connection that we make between one-way quantum communication protocols and semidefinite programming reformulations of LPs.

preprint2015arXiv

No Small Linear Program Approximates Vertex Cover within a Factor $2 - ε$

The vertex cover problem is one of the most important and intensively studied combinatorial optimization problems. Khot and Regev (2003) proved that the problem is NP-hard to approximate within a factor $2 - ε$, assuming the Unique Games Conjecture (UGC). This is tight because the problem has an easy 2-approximation algorithm. Without resorting to the UGC, the best inapproximability result for the problem is due to Dinur and Safra (2002): vertex cover is NP-hard to approximate within a factor 1.3606. We prove the following unconditional result about linear programming (LP) relaxations of the problem: every LP relaxation that approximates vertex cover within a factor $2-ε$ has super-polynomially many inequalities. As a direct consequence of our methods, we also establish that LP relaxations (as well as SDP relaxations) that approximate the independent set problem within any constant factor have super-polynomial size.

preprint2015arXiv

Sequential Information Guided Sensing

We study the value of information in sequential compressed sensing by characterizing the performance of sequential information guided sensing in practical scenarios when information is inaccurate. In particular, we assume the signal distribution is parameterized through Gaussian or Gaussian mixtures with estimated mean and covariance matrices, and we can measure compressively through a noisy linear projection or using one-sparse vectors, i.e., observing one entry of the signal each time. We establish a set of performance bounds for the bias and variance of the signal estimator via posterior mean, by capturing the conditional entropy (which is also related to the size of the uncertainty), and the additional power required due to inaccurate information to reach a desired precision. Based on this, we further study how to estimate covariance based on direct samples or covariance sketching. Numerical examples also demonstrate the superior performance of Info-Greedy Sensing algorithms compared with their random and non-adaptive counterparts.

preprint2015arXiv

Sequential Sensing with Model Mismatch

We characterize the performance of sequential information guided sensing, Info-Greedy Sensing, when there is a mismatch between the true signal model and the assumed model, which may be a sample estimate. In particular, we consider a setup where the signal is low-rank Gaussian and the measurements are taken in the directions of eigenvectors of the covariance matrix in a decreasing order of eigenvalues. We establish a set of performance bounds when a mismatched covariance matrix is used, in terms of the gap of signal posterior entropy, as well as the additional amount of power required to achieve the same signal recovery precision. Based on this, we further study how to choose an initialization for Info-Greedy Sensing using the sample covariance matrix, or using an efficient covariance sketching scheme.

preprint2015arXiv

Solving MIPs via Scaling-based Augmentation

Augmentation methods for mixed-integer (linear) programs are a class of primal solution approaches in which a current iterate is augmented to a better solution or proved optimal. It is well known that the performance of these methods, i.e., number of iterations needed, can theoretically be improved by scaling methods. We extend these results by an improved and extended convergence analysis, which shows that bit scaling and geometric scaling theoretically perform similarly well in the worst case for 0/1 polytopes as well as show that in some cases geometric scaling can outperform bit scaling arbitrarily. We also investigate the performance of implementations of these methods, where the augmentation directions are computed by a MIP solver. It turns out that the number of iterations is usually low. While scaling methods usually do not improve the performance for easier problems, in the case of hard mixed-integer optimization problems they allow to compute solutions of very good quality and are often superior.

preprint2015arXiv

Strict linear prices in non-convex European day-ahead electricity markets

The European power grid can be divided into several market areas where the price of electricity is determined in a day-ahead auction. Market participants can provide continuous hourly bid curves and combinatorial bids with associated quantities given the prices. The goal of our auction is to maximize the economic surplus of all participants subject to quantity constraints and price constraints. The price constraints ensure that no one incurs a loss. Only traders who submitted a combinatorial bid might miss a not-realized profit. The resulting problem is a large scale mathematical program with equilibrium constraints (MPEC) and binary variables that cannot be solved efficiently by standard solvers. We present an exact algorithm and a fast heuristic for this type of problem. Both algorithms decompose the MPEC into a master problem (a MIQP) and pricing subproblems (LPs). The modeling technique and the algorithms are applicable to a wide variety of combinatorial auctions that are based on mixed integer programs.

preprint2015arXiv

The matching polytope does not admit fully-polynomial size relaxation schemes

The groundbreaking work of Rothvoß [arxiv:1311.2369] established that every linear program expressing the matching polytope has an exponential number of inequalities (formally, the matching polytope has exponential extension complexity). We generalize this result by deriving strong bounds on the polyhedral inapproximability of the matching polytope: for fixed $0 < \varepsilon < 1$, every polyhedral $(1 + \varepsilon / n)$-approximation requires an exponential number of inequalities, where $n$ is the number of vertices. This is sharp given the well-known $ρ$-approximation of size $O(\binom{n}{ρ/(ρ-1)})$ provided by the odd-sets of size up to $ρ/(ρ-1)$. Thus matching is the first problem in $P$, whose natural linear encoding does not admit a fully polynomial-size relaxation scheme (the polyhedral equivalent of an FPTAS), which provides a sharp separation from the polynomial-size relaxation scheme obtained e.g., via constant-sized odd-sets mentioned above. Our approach reuses ideas from Rothvoß [arxiv:1311.2369], however the main lower bounding technique is different. While the original proof is based on the hyperplane separation bound (also called the rectangle corruption bound), we employ the information-theoretic notion of common information as introduced in Braun and Pokutta [http://eccc.hpi-web.de/report/2013/056/], which allows to analyze perturbations of slack matrices. It turns out that the high extension complexity for the matching polytope stem from the same source of hardness as for the correlation polytope: a direct sum structure.

preprint2014arXiv

A short proof for the polyhedrality of the Chvátal-Gomory closure of a compact convex set

Recently Schrijver's open problem, whether the Chvátal--Gomory closure of an irrational polytope is polyhedral was answered independently in the affirmative by Dadush, Dey, and Vielma (even for arbitrarily compact convex set) as well as by Dunkel and Schulz. We present a very short, easily accesible proof that the Chvátal--Gomory closure of a compact convex set is a polytope.

preprint2014arXiv

Approximation Limits of Linear Programs (Beyond Hierarchies)

We develop a framework for approximation limits of polynomial-size linear programs from lower bounds on the nonnegative ranks of suitably defined matrices. This framework yields unconditional impossibility results that are applicable to any linear program as opposed to only programs generated by hierarchies. Using our framework, we prove that O(n^{1/2-eps})-approximations for CLIQUE require linear programs of size 2^{n^Ω(eps)}. (This lower bound applies to linear programs using a certain encoding of CLIQUE as a linear optimization problem.) Moreover, we establish a similar result for approximations of semidefinite programs by linear programs. Our main ingredient is a quantitative improvement of Razborov's rectangle corruption lemma for the high error regime, which gives strong lower bounds on the nonnegative rank of certain perturbations of the unique disjointness matrix.

preprint2013arXiv

On the existence of 0/1 polytopes with high semidefinite extension complexity

In Rothvoß it was shown that there exists a 0/1 polytope (a polytope whose vertices are in \{0,1\}^{n}) such that any higher-dimensional polytope projecting to it must have 2^{Ω(n)} facets, i.e., its linear extension complexity is exponential. The question whether there exists a 0/1 polytope with high PSD extension complexity was left open. We answer this question in the affirmative by showing that there is a 0/1 polytope such that any spectrahedron projecting to it must be the intersection of a semidefinite cone of dimension~2^{Ω(n)} and an affine space. Our proof relies on a new technique to rescale semidefinite factorizations.

preprint2012arXiv

An algebraic approach to symmetric extended formulations

Extended formulations are an important tool to obtain small (even compact) formulations of polytopes by representing them as projections of higher dimensional ones. It is an important question whether a polytope admits a small extended formulation, i.e., one involving only a polynomial number of inequalities in its dimension. For the case of symmetric extended formulations (i.e., preserving the symmetries of the polytope) Yannakakis established a powerful technique to derive lower bounds and rule out small formulations. We rephrase the technique of Yannakakis in a group-theoretic framework. This provides a different perspective on symmetric extensions and considerably simplifies several lower bound constructions.

preprint2011arXiv

Reconstructing biochemical cluster networks

Motivated by fundamental problems in chemistry and biology we study cluster graphs arising from a set of initial states $S\subseteq\Z^n_+$ and a set of transitions/reactions $M\subseteq\Z^n_+\times\Z^n_+$. The clusters are formed out of states that can be mutually transformed into each other by a sequence of reversible transitions. We provide a solution method from computational commutative algebra that allows for deciding whether two given states belong to the same cluster as well as for the reconstruction of the full cluster graph. Using the cluster graph approach we provide solutions to two fundamental questions: 1) Deciding whether two states are connected, e.g., if the initial state can be turned into the final state by a sequence of transition and 2) listing concisely all reactions processes that can accomplish that. As a computational example, we apply the framework to the permanganate/oxalic acid reaction.

preprint2011arXiv

Rigid abelian groups and the probabilistic method

The construction of torsion-free abelian groups with prescribed endomorphism rings starting with Corner's seminal work is a well-studied subject in the theory of abelian groups. Usually these construction work by adding elements from a (topological) completion in order to get rid of (kill) unwanted homomorphisms. The critical part is to actually prove that every unwanted homomorphism can be killed by adding a suitable element. We will demonstrate that some of those constructions can be significantly simplified by choosing the elements at random. As a result, the endomorphism ring will be almost surely prescribed, i.e., with probability one.

preprint2010arXiv

A polyhedral approach to computing border bases

Border bases can be considered to be the natural extension of Gröbner bases that have several advantages. Unfortunately, to date the classical border basis algorithm relies on (degree-compatible) term orderings and implicitly on reduced Gröbner bases. We adapt the classical border basis algorithm to allow for calculating border bases for arbitrary degree-compatible order ideals, which is \emph{independent} from term orderings. Moreover, the algorithm also supports calculating degree-compatible order ideals with \emph{preference} on contained elements, even though finding a preferred order ideal is NP-hard. Effectively we retain degree-compatibility only to successively extend our computation degree-by-degree. The adaptation is based on our polyhedral characterization: order ideals that support a border basis correspond one-to-one to integral points of the order ideal polytope. This establishes a crucial connection between the ideal and the combinatorial structure of the associated factor spaces.

Sebastian Pokutta

What is connected

Connect this record

See the researcher in context

Building this map preview

44 published item(s)

Agentic MIP Research: Accelerated Constraint Handler Generation

Global Optimization for Combinatorial Geometry Problems Revisited in the Era of LLMs

What Do Evolutionary Coding Agents Evolve?

Accelerated Riemannian Optimization: Handling Constraints with a Prox to Bound Geometric Penalties

Low-rank tensor decompositions of quantum circuits

Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

Minimizing a low-dimensional convex function over a high-dimensional cube

Principled Deep Neural Network Training through Linear Programming

Sparser Kernel Herding with Pairwise Conditional Gradients without Swap Steps

The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights

Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four

Adversaries in Online Learning Revisited: with applications in Robust Optimization and Adversarial training

Dual Prices for Frank--Wolfe Algorithms

Local and Global Uniform Convexity Conditions

Projection-Free Adaptive Gradients for Large-Scale Optimization

An Online-Learning Approach to Inverse Optimization

Boosting Frank-Wolfe by Chasing Gradients

IPBoost -- Non-Convex Boosting via Integer Programming

On the Unreasonable Effectiveness of the Greedy Algorithm: Greedy Adapts to Sharpness

Projection-Free Optimization on Uniformly Convex Sets

Restarting Algorithms: Sometimes there is Free Lunch

Affine reductions for LPs and SDPs

Aggregation-based cutting-planes for packing and covering integer programs

An efficient high-probability algorithm for Linear Bandits

Average case polyhedral complexity of the maximum stable set problem

Hierarchical Clustering via Spreading Metrics

The matching problem has no small symmetric SDP

Toward a Science of Autonomy for Physical Systems: Transportation

An information diffusion Fano inequality

Border bases and order ideals: a polyhedral characterization

Exponential Lower Bounds for Polytopes in Combinatorial Optimization

No Small Linear Program Approximates Vertex Cover within a Factor $2 - ε$

Sequential Information Guided Sensing

Sequential Sensing with Model Mismatch

Solving MIPs via Scaling-based Augmentation

Strict linear prices in non-convex European day-ahead electricity markets

The matching polytope does not admit fully-polynomial size relaxation schemes

A short proof for the polyhedrality of the Chvátal-Gomory closure of a compact convex set

Approximation Limits of Linear Programs (Beyond Hierarchies)

On the existence of 0/1 polytopes with high semidefinite extension complexity

An algebraic approach to symmetric extended formulations

Reconstructing biochemical cluster networks

Rigid abelian groups and the probabilistic method

A polyhedral approach to computing border bases