Source author record

Anant Sahai

Anant Sahai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.OC Machine Learning Computational Complexity Systems and Control Computer Science and Game Theory Computer Vision cond-mat.stat-mech eess.SP

Catalog footprint

What is connected

17works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Thermodynamic Costs of Simple Linear Regression

The construction of models from data is a significant contributor to the energetic costs of computation. Because of this, understanding how foundational thermodynamic bounds apply to modeling algorithms will be increasingly important. Here, we study the thermodynamic costs of a basic and fundamental modeling algorithm: simple linear regression. Following Landauer, we approximate the thermodynamic lower bound on irreversibly performing both exact linear regression and linear regression via stochastic gradient descent as implemented on floating-point numbers. From this, we derive energycost aware scaling laws for the optimal dataset size for training a linear regression model given a generalization error dependent demand for inference. Additionally, we discuss a method to lower bound the entropy production from the mismatch cost for algorithms with continuous input variables.

preprint2022arXiv

Generalization for multiclass classification with overparameterized linear models

Via an overparameterized linear model with Gaussian features, we provide conditions for good generalization for multiclass classification of minimum-norm interpolating solutions in an asymptotic setting where both the number of underlying features and the number of classes scale with the number of training points. The survival/contamination analysis framework for understanding the behavior of overparameterized learning problems is adapted to this setting, revealing that multiclass classification qualitatively behaves like binary classification in that, as long as there are not too many classes (made precise in the paper), it is possible to generalize well even in some settings where the corresponding regression tasks would not generalize. Besides various technical challenges, it turns out that the key difference from the binary classification setting is that there are relatively fewer positive training examples of each class in the multiclass setting as the number of classes increases, making the multiclass problem "harder" than the binary one.

preprint2022arXiv

On the Impossibility of Convergence of Mixed Strategies with No Regret Learning

We study the limiting behavior of the mixed strategies that result from optimal no-regret learning strategies in a repeated game setting where the stage game is any 2 by 2 competitive game. We consider optimal no-regret algorithms that are mean-based and monotonic in their argument. We show that for any such algorithm, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium. This negative result is also shown to hold under a broad relaxation of these assumptions, including popular variants of Online-Mirror-Descent with optimism and/or adaptive step-sizes. Finally, we conjecture that the monotonicity assumption can be removed, and provide partial evidence for this conjecture. Our results identify the inherent stochasticity in players' realizations as a critical factor underlying this divergence in outcomes between using the opponent's mixtures and realizations to make updates.

preprint2020arXiv

Blind interactive learning of modulation schemes: Multi-agent cooperation without co-design

We examine the problem of learning to cooperate in the context of wireless communication. In our setting, two agents must learn modulation schemes that enable them to communicate across a power-constrained additive white Gaussian noise channel. We investigate whether learning is possible under different levels of information sharing between distributed agents which are not necessarily co-designed. We employ the "Echo" protocol, a "blind" interactive learning protocol where an agent hears, understands, and repeats (echoes) back the message received from another agent, simultaneously training itself to communicate. To capture the idea of cooperation between "not necessarily co-designed" agents we use two different populations of function approximators - neural networks and polynomials. We also include interactions between learning agents and non-learning agents with fixed modulation protocols such as QPSK and 16QAM. We verify the universality of the Echo learning approach, showing it succeeds independent of the inner workings of the agents. In addition to matching the communication expectations of others, we show that two learning agents can collaboratively invent a successful communication approach from independent random initializations. We complement our simulations with an implementation of the Echo protocol in software-defined radios. To explore the continuum of co-design, we study how learning is impacted by different levels of information sharing between agents, including sharing training symbols, losses, and full gradients. We find that co-design (increased information sharing) accelerates learning. Learning higher order modulation schemes is a more difficult task, and the beneficial effect of co-design becomes more pronounced as the task becomes harder.

preprint2013arXiv

A universal, operational theory of unicast multi-user communication with fidelity criteria

This is a three part paper. Optimality of source-channel separation for communication with a fidelity criterion when the channel is compound as defined by Csiszar and Korner in their book and general as defined by Verdu and Han, is proved in Part I. It is assumed that random codes are permitted. The word "universal" in the title of this paper refers to the fact that the channel model is compound. The proof uses a layered black-box or a layered input-output view-point. In particular, only the end-to-end description of the channel as being capable of communicating a source to within a certain distortion level is used when proving separation. This implies that the channel model does not play any role for separation to hold as long as there is a source model. Further implications of the layered black-box view-point are discussed. Optimality of source-medium separation for multi-user communication with fidelity criteria over a general, compound medium in the unicast setting is proved in Part II, thus generalizing Part I to the unicast, multi-user setting. Part III gets to an understanding of the question, "Why is a channel which is capable of communicating a source to within a certain distortion level, also capable of communicating bits at any rate less than the infimum of the rates needed to code the source to within the distortion level": this lies at the heart of why optimality of separation for communication with a fidelity criterion holds. The perspective taken to get to this understanding is a randomized covering-packing perspective, and the proof is operational.

preprint2013arXiv

An approximate solution to the decentralized two-controller infinite-horizon scalar LQG problem: Part I- fast dynamics

We consider scalar decentralized average-cost infinite-horizon LQG problems with two controllers, focusing on the fast dynamics case when the (scalar) eigenvalue of the system is large. It is shown that the best linear controllers' performance can be an arbitrary factor worse than the optimal performance. We propose a set of finite-dimensional nonlinear controllers, and prove that the proposed set contains an easy-to-find approximately optimal solution that achieves within a constant ratio of the optimal quadratic cost. The insight for nonlinear strategies comes from revealing the relationship between information flow in control and wireless information flow. More precisely, we discuss a close relationship between the high-SNR limit in wireless communication and fast-dynamics case in decentralized control, and justify how the proposed nonlinear control strategy can be understood as exploiting the generalized degree-of-freedom gain in wireless communication theory. For a rigorous justification of this argument, we develop new mathematical tools and ideas. To reveal the relationship between infinite-horizon problems and generalized MIMO Witsenhausen's counterexamples, we introduce the idea of geometric slicing. To analyze the nonlinear strategy performance, we introduce an approximate-comb-lattice model for the relevant random variables.

preprint2013arXiv

An approximate solution to the decentralized two-controller infinite-horizon scalar LQG problem: Part II- slow dynamics

Continuing the first part of the paper, we consider scalar decentralized average-cost infinite-horizon LQG problems with two controllers. This paper focuses on the slow dynamics case when the eigenvalue of the system is small and prove that the single-controller optimal strategies ---linear strategies--- are constant ratio optimal among all distributed control strategies.

preprint2013arXiv

Information embedding and the triple role of control

We consider the problem of information embedding where the encoder modifies a white Gaussian host signal in a power-constrained manner to encode a message, and the decoder recovers both the embedded message and the modified host signal. This partially extends the recent work of Sumszyk and Steinberg to the continuous-alphabet Gaussian setting. Through a control-theoretic lens, we observe that the problem is a minimalist example of what is called the "triple role" of control actions. We show that a dirty-paper-coding strategy achieves the optimal rate for perfect recovery of the modified host and the message for any message rate. For imperfect recovery of the modified host, by deriving bounds on the minimum mean-square error (MMSE) in recovering the modified host signal, we show that DPC-based strategies are guaranteed to attain within a uniform constant factor of 16 of the optimal weighted sum of power required in host signal modification and the MMSE in the modified host signal reconstruction for all weights and all message rates. When specialized to the zero-rate case, our results provide the tightest known lower bounds on the asymptotic costs for the vector version of a famous open problem in decentralized control: the Witsenhausen counterexample. Numerically, this tighter bound helps us characterize the asymptotically optimal costs for the vector Witsenhausen problem to within a factor of 1.3 for all problem parameters, improving on the earlier best known bound of 2.

preprint2013arXiv

Intermittent Kalman Filtering: Eigenvalue Cycles and Nonuniform Sampling

We consider Kalman filtering problems when the observations are intermittently erased or lost. It was known that the estimates are mean-square unstable when the erasure probability is larger than a certain critical value, and stable otherwise. But the characterization of the critical erasure probability has been open for years. We introduce a new concept of \textit{eigenvalue cycles} which captures periodicity of systems, and characterize the critical erasure probability based on this. It is also proved that eigenvalue cycles can be easily broken if the original physical system is considered to be continuous-time --- randomly-dithered nonuniform sampling of the observations makes the critical erasure probability almost surely $\frac{1}{|λ_{max}|^2}$.

preprint2013arXiv

Network Coding meets Decentralized Control: Network Linearization and Capacity-Stabilizablilty Equivalence

We take a unified view of network coding and decentralized control. Precisely speaking, we consider both as linear time-invariant systems by appropriately restricting channels and coding schemes of network coding to be linear time-invariant, and the plant and controllers of decentralized control to be linear time-invariant as well. First, we apply linear system theory to network coding. This gives a novel way of converting an arbitrary relay network to an equivalent acyclic single-hop relay network, which we call Network Linearization. Based on network linearization, we prove that the fundamental design limit, mincut, is achievable by a linear time-invariant network-coding scheme regardless of the network topology. Then, we use the network-coding to view decentralized linear systems. We argue that linear time-invariant controllers in a decentralized linear system "communicate" via linear network coding to stabilize the plant. To justify this argument, we give an algorithm to "externalize" the implicit communication between the controllers that we believe must be occurring to stabilize the plant. Based on this, we show that the stabilizability condition for decentralized linear systems comes from an underlying communication limit, which can be described by the algebraic mincut-maxflow theorem. With this re-interpretation in hand, we also consider stabilizability over LTI networks to emphasize the connection with network coding. In particular, in broadcast and unicast problems, unintended messages at the receivers will be modeled as secrecy constraints.

preprint2011arXiv

Towards a communication-theoretic understanding of system-level power consumption

Traditional communication theory focuses on minimizing transmit power. However, communication links are increasingly operating at shorter ranges where transmit power can be significantly smaller than the power consumed in decoding. This paper models the required decoding power and investigates the minimization of total system power from two complementary perspectives. First, an isolated point-to-point link is considered. Using new lower bounds on the complexity of message-passing decoding, lower bounds are derived on decoding power. These bounds show that 1) there is a fundamental tradeoff between transmit and decoding power; 2) unlike the implications of the traditional "waterfall" curve which focuses on transmit power, the total power must diverge to infinity as error probability goes to zero; 3) Regular LDPCs, and not their known capacity-achieving irregular counterparts, can be shown to be power order optimal in some cases; and 4) the optimizing transmit power is bounded away from the Shannon limit. Second, we consider a collection of links. When systems both generate and face interference, coding allows a system to support a higher density of transmitter-receiver pairs (assuming interference is treated as noise). However, at low densities, uncoded transmission may be more power-efficient in some cases.

preprint2010arXiv

Implicit and explicit communication in decentralized control

There has been substantial progress recently in understanding toy problems of purely implicit signaling. These are problems where the source and the channel are implicit -- the message is generated endogenously by the system, and the plant itself is used as a channel. In this paper, we explore how implicit and explicit communication can be used synergistically to reduce control costs. The setting is an extension of Witsenhausen's counterexample where a rate-limited external channel connects the two controllers. Using a semi-deterministic version of the problem, we arrive at a binning-based strategy that can outperform the best known strategies by an arbitrarily large factor. We also show that our binning-based strategy attains within a constant factor of the optimal cost for an asymptotically infinite-length version of the problem uniformly over all problem parameters and all rates on the external channel. For the scalar case, although our results yield approximate optimality for each fixed rate, we are unable to prove approximately-optimality uniformly over all rates.

preprint2010arXiv

Information embedding meets distributed control

We consider the problem of information embedding where the encoder modifies a white Gaussian host signal in a power-constrained manner to encode the message, and the decoder recovers both the embedded message and the modified host signal. This extends the recent work of Sumszyk and Steinberg to the continuous-alphabet Gaussian setting. We show that a dirty-paper-coding based strategy achieves the optimal rate for perfect recovery of the modified host and the message. We also provide bounds for the extension wherein the modified host signal is recovered only to within a specified distortion. When specialized to the zero-rate case, our results provide the tightest known lower bounds on the asymptotic costs for the vector version of a famous open problem in distributed control -- the Witsenhausen counterexample. Using this bound, we characterize the asymptotically optimal costs for the vector Witsenhausen problem numerically to within a factor of 1.3 for all problem parameters, improving on the earlier best known bound of 2.

preprint2010arXiv

Is Witsenhausen's counterexample a relevant toy?

This paper answers a question raised by Doyle on the relevance of the Witsenhausen counterexample as a toy decentralized control problem. The question has two sides, the first of which focuses on the lack of an external channel in the counterexample. Using existing results, we argue that the core difficulty in the counterexample is retained even in the presence of such a channel. The second side questions the LQG formulation of the counterexample. We consider alternative formulations and show that the understanding developed for the LQG case guides the investigation for these other cases as well. Specifically, we consider 1) a variation on the original counterexample with general, but bounded, noise distributions, and 2) an adversarial extension with bounded disturbance and quadratic costs. For each of these formulations, we show that quantization-based nonlinear strategies outperform linear strategies by an arbitrarily large factor. Further, these nonlinear strategies also perform within a constant factor of the optimal, uniformly over all possible parameter choices (for fixed noise distributions in the Bayesian case). Fortuitously, the assumption of bounded noise results in a significant simplification of proofs as compared to those for the LQG formulation. Therefore, the results in this paper are also of pedagogical interest.

preprint2010arXiv

The finite-dimensional Witsenhausen counterexample

Recently, a vector version of Witsenhausen's counterexample was considered and it was shown that in that limit of infinite vector length, certain quantization-based control strategies are provably within a constant factor of the optimal cost for all possible problem parameters. In this paper, finite vector lengths are considered with the dimension being viewed as an additional problem parameter. By applying a large-deviation "sphere-packing" philosophy, a lower bound to the optimal cost for the finite dimensional case is derived that uses appropriate shadows of the infinite-length bound. Using the new lower bound, we show that good lattice-based control strategies achieve within a constant factor of the optimal cost uniformly over all possible problem parameters, including the vector length. For Witsenhausen's original problem -- the scalar case -- the gap between regular lattice-based strategies and the lower bound is numerically never more than a factor of 8.

preprint2009arXiv

Zero-rate feedback can achieve the empirical capacity

The utility of limited feedback for coding over an individual sequence of DMCs is investigated. This study complements recent results showing how limited or noisy feedback can boost the reliability of communication. A strategy with fixed input distribution $P$ is given that asymptotically achieves rates arbitrarily close to the mutual information induced by $P$ and the state-averaged channel. When the capacity achieving input distribution is the same over all channel states, this achieves rates at least as large as the capacity of the state averaged channel, sometimes called the empirical capacity.

preprint2007arXiv

The source coding game with a cheating switcher

Motivated by the lossy compression of an active-vision video stream, we consider the problem of finding the rate-distortion function of an arbitrarily varying source (AVS) composed of a finite number of subsources with known distributions. Berger's paper `The Source Coding Game', \emph{IEEE Trans. Inform. Theory}, 1971, solves this problem under the condition that the adversary is allowed only strictly causal access to the subsource realizations. We consider the case when the adversary has access to the subsource realizations non-causally. Using the type-covering lemma, this new rate-distortion function is determined to be the maximum of the IID rate-distortion function over a set of source distributions attainable by the adversary. We then extend the results to allow for partial or noisy observations of subsource realizations. We further explore the model by attempting to find the rate-distortion function when the adversary is actually helpful. Finally, a bound is developed on the uniform continuity of the IID rate-distortion function for finite-alphabet sources. The bound is used to give a sufficient number of distributions that need to be sampled to compute the rate-distortion function of an AVS to within a certain accuracy. The bound is also used to give a rate of convergence for the estimate of the rate-distortion function for an unknown IID finite-alphabet source .

Anant Sahai

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

The Thermodynamic Costs of Simple Linear Regression

Generalization for multiclass classification with overparameterized linear models

On the Impossibility of Convergence of Mixed Strategies with No Regret Learning

Blind interactive learning of modulation schemes: Multi-agent cooperation without co-design

A universal, operational theory of unicast multi-user communication with fidelity criteria

An approximate solution to the decentralized two-controller infinite-horizon scalar LQG problem: Part I- fast dynamics

An approximate solution to the decentralized two-controller infinite-horizon scalar LQG problem: Part II- slow dynamics

Information embedding and the triple role of control

Intermittent Kalman Filtering: Eigenvalue Cycles and Nonuniform Sampling

Network Coding meets Decentralized Control: Network Linearization and Capacity-Stabilizablilty Equivalence

Towards a communication-theoretic understanding of system-level power consumption

Implicit and explicit communication in decentralized control

Information embedding meets distributed control

Is Witsenhausen's counterexample a relevant toy?

The finite-dimensional Witsenhausen counterexample

Zero-rate feedback can achieve the empirical capacity

The source coding game with a cheating switcher