Source author record

Abbas El Gamal

Abbas El Gamal appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning math.ST Computer Science and Game Theory Cryptography and Security eess.SP math.OC Networking and Internet Architecture Statistics Theory Systems and Control

Catalog footprint

What is connected

24works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Information Theory and Statistical Learning

This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), $f$\!-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.

preprint2026arXiv

Information-theoretic Limits of Learning and Estimation

Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-chapter exercises makes the material suitable for both classroom use and self-study. We begin by introducing concentration inequalities along with the notions of covering and packing in metric spaces, and the associated concept of metric entropy. These tools are essential for our analysis. We then introduce the learning-theoretic framework and derive upper bounds on generalization error in terms of metric entropy, Rademacher complexity, and the VC dimension, as well as mutual information and relative entropy. Finally we discuss the minimax estimation framework and establish lower bounds on minimax risk using Fano's inequality, yielding bounds in terms of relative entropy and covering and packing numbers. This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of Cover and Thomas's Elements of Information Theory, posted with permission from Wiley. It would follow the chapter posted at arXiv:2605.02989 . The table of contents of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu.

preprint2022arXiv

A Strengthened Cutset Upper Bound on the Capacity of the Relay Channel and Applications

We develop a new upper bound on the capacity of the relay channel that is tighter than previously known upper bounds. This upper bound is proved using traditional weak converse techniques involving mutual information inequalities and Gallager-type explicit identification of auxiliary random variables. We show that the new upper bound is strictly tighter than all previous bounds for the Gaussian relay channel with non-zero channel gains. When specialized to the relay channel with orthogonal receiver components, the bound resolves a conjecture by Kim on a class of deterministic relay channels. When further specialized to the class of product-form relay channels with orthogonal receiver components, the bound resolves a generalized version of Cover's relay channel problem, recovers the recent upper bound for the Gaussian case by Wu et al., and improves upon the recent bounds for the binary symmetric case by Wu et al. and Barnes et al., which were obtained using non-traditional geometric proof techniques. For the special class of a relay channel with orthogonal receiver components, we develop another upper bound on the capacity which utilizes an auxiliary receiver and show that it is strictly tighter than the bound by Tandon and Ulukus. Finally, we show through the Gaussian relay channel with i.i.d. relay output sequence that the bound with the auxiliary receiver can be strictly tighter than our main bound.

preprint2020arXiv

Network Information Theoretic Security

Shannon showed that to achieve perfect secrecy in point-to-point communication, the message rate cannot exceed the shared secret key rate giving rise to the simple one-time pad encryption scheme. In this paper, we extend this work from point-to-point to networks. We consider a connected network with pairwise communication between the nodes. We assume that each node is provided with a certain amount of secret bits before communication commences. An eavesdropper with unlimited computing power has access to all communication and can hack a subset of the nodes not known to the rest of the nodes. We investigate the limits on information-theoretic secure communication for this network. We establish a tradeoff between the secure channel rate (for a node pair) and the secure network rate (sum over all node pair rates) and show that perfect secrecy can be achieved if and only if the sum rate of any subset of unhacked channels does not exceed the shared unhacked-secret-bit rate of these channels. We also propose two practical and efficient schemes that achieve a good balance of network and channel rates with perfect secrecy guarantee. This work has a wide range of potential applications for which perfect secrecy is desired, such as cyber-physical systems, distributed-control systems, and ad-hoc networks.

preprint2018arXiv

Minimax Learning for Remote Prediction

The classical problem of supervised learning is to infer an accurate predictor of a target variable $Y$ from a measured variable $X$ by using a finite number of labeled training samples. Motivated by the increasingly distributed nature of data and decision making, in this paper we consider a variation of this classical problem in which the prediction is performed remotely based on a rate-constrained description $M$ of $X$. Upon receiving $M$, the remote node computes an estimate $\hat Y$ of $Y$. We follow the recent minimax approach to study this learning problem and show that it corresponds to a one-shot minimax noisy source coding problem. We then establish information theoretic bounds on the risk-rate Lagrangian cost and a general method to design a near-optimal descriptor-estimator pair, which can be viewed as a rate-constrained analog to the maximum conditional entropy principle used in the classical minimax learning problem. Our results show that a naive estimate-compress scheme for rate-constrained prediction is not in general optimal.

preprint2016arXiv

An Efficient Feedback Coding Scheme with Low Error Probability for Discrete Memoryless Channels

Existing fixed-length feedback communication schemes are either specialized to particular channels (Schalkwijk--Kailath, Horstein), or apply to general channels but either have high coding complexity (block feedback schemes) or are difficult to analyze (posterior matching). This paper introduces a new fixed-length feedback coding scheme which achieves the capacity for all discrete memoryless channels, has an error exponent that approaches the sphere packing bound as the rate approaches the capacity, and has $O(n\log n)$ coding complexity. These benefits are achieved by judiciously combining features from previous schemes with new randomization technique and encoding/decoding rule. These new features make the analysis of the error probability for the new scheme easier than for posterior matching.

preprint2015arXiv

Capacity Approximations for Gaussian Relay Networks

Consider a Gaussian relay network where a source node communicates to a destination node with the help of several layers of relays. Recent work has shown that compress-and-forward based strategies can achieve the capacity of this network within an additive gap. Here, the relays quantize their received signals at the noise level and map them to random Gaussian codebooks. The resultant gap to capacity is independent of the SNRs of the channels in the network and the topology but is linear in the total number of nodes. In this paper, we provide an improved lower bound on the rate achieved by compress-and-forward based strategies (noisy network coding in particular) in arbitrary Gaussian relay networks, whose gap to capacity depends on the network not only through the total number of nodes but also through the degrees of freedom of the min cut of the network. We illustrate that for many networks, this refined lower bound can lead to a better approximation of the capacity. In particular, we demonstrate that it leads to a logarithmic rather than linear capacity gap in the total number of nodes for certain classes of layered networks. The improvement comes from quantizing the received signals of the relays at a resolution decreasing with the total number of nodes in the network. This suggests that the rule-of-thumb in literature of quantizing the received signals at the noise level can be highly suboptimal.

preprint2015arXiv

Capacity Theorems for Broadcast Channels with Two Channel State Components Known at the Receivers

We establish the capacity region of several classes of broadcast channels with random state in which the channel to each user is selected from two possible channel state components and the state is known only at the receivers. When the channel components are deterministic, we show that the capacity region is achieved via Marton coding. This channel model does not belong to any class of broadcast channels for which the capacity region was previously known and is useful in studying wireless communication channels when the fading state is known only at the receivers. We then establish the capacity region when the channel components are ordered, e.g., degraded. In particular we show that the capacity region for the broadcast channel with degraded Gaussian vector channel components is attained via Gaussian input distribution. Finally, we extend the results on ordered channels to two broadcast channel examples with more than two channel components, but show that these extensions do not hold in general.

preprint2015arXiv

Superposition Coding is Almost Always Optimal for the Poisson Broadcast Channel

This paper shows that the capacity region of the continuous-time Poisson broadcast channel is achieved via superposition coding for most channel parameter values. Interestingly, the channel in some subset of these parameter values does not belong to any of the existing classes of broadcast channels for which superposition coding is optimal (e.g., degraded, less noisy, more capable). In particular, we introduce the notion of effectively less noisy broadcast channel and show that it implies less noisy but is not in general implied by more capable. For the rest of the channel parameter values, we show that there is a gap between Marton's inner bound and the UV outer bound.

preprint2014arXiv

A Note on Broadcast Channels with Stale State Information at the Transmitter

This paper shows that the Maddah-Ali--Tse (MAT) scheme which establishes the symmetric capacity of two example broadcast channels with strictly causal state information at the transmitter is a simple special case of the Shayevitz--Wigger scheme for the broadcast channel with generalized feedback, which involves block Markov coding, compression, superposition coding, Marton coding, and coded time sharing. Focusing on the class of symmetric broadcast channels with state, we derive an expression for the maximum achievable symmetric rate using the Shayevitz--Wigger scheme. We show that the MAT results can be recovered by evaluating this expression for the special case in which superposition coding and Marton coding are not used. We then introduce a new broadcast channel example that shares many features of the MAT examples. We show that another special case of our maximum symmetric rate expression in which superposition coding is also used attains a higher symmetric rate than the MAT scheme. The symmetric capacity of this example is not known, however.

preprint2014arXiv

Capacity Region of the Broadcast Channel with Two Deterministic Channel State Components

This paper establishes the capacity region of a class of broadcast channels with random state in which each channel component is selected from two possible functions and each receiver knows its state sequence. This channel model does not fit into any class of broadcast channels for which the capacity region was previously known and is useful in studying wireless communication channels when the fading state is known only at the receivers. The capacity region is shown to coincide with the UV outer bound and is achieved via Marton coding.

preprint2014arXiv

Compensating Demand Response Participants Via Their Shapley Values

Designing fair compensation mechanisms for demand response (DR) is challenging. This paper models the problem in a game theoretic setting and designs a payment distribution mechanism based on the Shapley Value. As exact computation of the Shapley Value is in general intractable, we propose estimating it using a reinforcement learning algorithm that approximates optimal stratified sampling. We apply this algorithm to two DR programs that utilize the Shapley Value for payments and quantify the accuracy of the resulting estimates.

preprint2014arXiv

Exact Common Information

This paper introduces the notion of exact common information, which is the minimum description length of the common randomness needed for the exact distributed generation of two correlated random variables $(X,Y)$. We introduce the quantity $G(X;Y)=\min_{X\to W \to Y} H(W)$ as a natural bound on the exact common information and study its properties and computation. We then introduce the exact common information rate, which is the minimum description rate of the common randomness for the exact generation of a 2-DMS $(X,Y)$. We give a multiletter characterization for it as the limit $\bar{G}(X;Y)=\lim_{n\to \infty}(1/n)G(X^n;Y^n)$. While in general $\bar{G}(X;Y)$ is greater than or equal to the Wyner common information, we show that they are equal for the Symmetric Binary Erasure Source. We do not know, however, if the exact common information rate has a single letter characterization in general.

preprint2012arXiv

Limits on the Benefits of Energy Storage for Renewable Integration

The high variability of renewable energy resources presents significant challenges to the operation of the electric power grid. Conventional generators can be used to mitigate this variability but are costly to operate and produce carbon emissions. Energy storage provides a more environmentally friendly alternative, but is costly to deploy in large amounts. This paper studies the limits on the benefits of energy storage to renewable energy: How effective is storage at mitigating the adverse effects of renewable energy variability? How much storage is needed? What are the optimal control policies for operating storage? To provide answers to these questions, we first formulate the power flow in a single-bus power system with storage as an infinite horizon stochastic program. We find the optimal policies for arbitrary net renewable generation process when the cost function is the average conventional generation (environmental cost) and when it is the average loss of load probability (reliability cost). We obtain more refined results by considering the multi-timescale operation of the power system. We view the power flow in each timescale as the superposition of a predicted (deterministic) component and an prediction error (residual) component and formulate the residual power flow problem as an infinite horizon dynamic program. Assuming that the net generation prediction error is an IID process, we quantify the asymptotic benefits of storage. With the additional assumption of Laplace distributed prediction error, we obtain closed form expressions for the stationary distribution of storage and conventional generation. Finally, we propose a two-threshold policy that trades off conventional generation saving with loss of load probability. We illustrate our results and corroborate the IID and Laplace assumptions numerically using datasets from CAISO and NREL.

preprint2012arXiv

Optimal Achievable Rates for Interference Networks with Random Codes

The optimal rate region for interference networks is characterized when encoding is restricted to random code ensembles with superposition coding and time sharing. A simple simultaneous nonunique decoding rule, under which each receiver decodes for the intended message as well as the interfering messages, is shown to achieve this optimal rate region regardless of the relative strengths of signal, interference, and noise. This result implies that the Han-Kobayashi bound, the best known inner bound on the capacity region of the two-user-pair interference channel, cannot be improved merely by using the optimal maximum likelihood decoder.

preprint2011arXiv

3-Receiver Broadcast Channels with Common and Confidential Messages

This paper establishes inner bounds on the secrecy capacity regions for the general 3-receiver broadcast channel with one common and one confidential message sets. We consider two setups. The first is when the confidential message is to be sent to two receivers and kept secret from the third receiver. Achievability is established using indirect decoding, Wyner wiretap channel coding, and the new idea of generating secrecy from a publicly available superposition codebook. The inner bound is shown to be tight for a class of reversely degraded broadcast channels and when both legitimate receivers are less noisy than the third receiver. The second setup investigated in this paper is when the confidential message is to be sent to one receiver and kept secret from the other two receivers. Achievability in this case follows from Wyner wiretap channel coding and indirect decoding. This inner bound is also shown to be tight for several special cases.

preprint2011arXiv

Communication with Disturbance Constraints

Motivated by the broadcast view of the interference channel, the new problem of communication with disturbance constraints is formulated. The rate-disturbance region is established for the single constraint case and the optimal encoding scheme turns out to be the same as the Han-Kobayashi scheme for the two user-pair interference channel. This result is extended to the Gaussian vector (MIMO) case. For the case of communication with two disturbance constraints, inner and outer bounds on the rate-disturbance region for a deterministic model are established. The inner bound is achieved by an encoding scheme that involves rate splitting, Marton coding, and superposition coding, and is shown to be optimal in several nontrivial cases. This encoding scheme can be readily applied to discrete memoryless interference channels and motivates a natural extension of the Han-Kobayashi scheme to more than two user pairs.

preprint2011arXiv

Interference Decoding for Deterministic Channels

An inner bound to the capacity region of a class of deterministic interference channels with three user pairs is presented. The key idea is to simultaneously decode the combined interference signal and the intended message at each receiver. It is shown that this interference-decoding inner bound is tight under certain strong interference conditions. The inner bound is also shown to strictly contain the inner bound obtained by treating interference as noise, which includes interference alignment for deterministic channels. The gain comes from judicious analysis of the number of combined interference sequences in different regimes of input distributions and message rates. Finally, the inner bound is generalized to the case where each channel output is observed through a noisy channel.

preprint2011arXiv

Lecture Notes on Network Information Theory

These lecture notes have been converted to a book titled Network Information Theory published recently by Cambridge University Press. This book provides a significantly expanded exposition of the material in the lecture notes as well as problems and bibliographic notes at the end of each chapter. The authors are currently preparing a set of slides based on the book that will be posted in the second half of 2012. More information about the book can be found at http://www.cambridge.org/9781107008731/. The previous (and obsolete) version of the lecture notes can be found at http://arxiv.org/abs/1001.3404v4/.

preprint2011arXiv

On Marton's Inner Bound for the General Broadcast Channel

We establish several new results on Marton's coding scheme and its corresponding inner bound on the capacity region of the general broadcast channel. We show that unlike the Gaussian case, Marton's coding scheme without superposition coding is not optimal in general even for a degraded broadcast channel with no common message. We then establish properties of Marton's inner bound that help restrict the search space for computing the sum-rate. Next, we show that the inner bound is optimal along certain directions. Finally, we propose a coding scheme that may lead to a larger inner bound.

preprint2010arXiv

An Achievability Scheme for the Compound Channel with State Noncausally Available at the Encoder

A new achievability scheme for the compound channel with discrete memoryless (DM) state noncausally available at the encoder is established. Achievability is proved using superposition coding, Marton coding, joint typicality encoding, and indirect decoding. The scheme is shown to achieve strictly higher rate than the straightforward extension of the Gelfand-Pinsker coding scheme for a single DMC with DM state, and is optimal for some classes of channels.

preprint2010arXiv

Distributed Lossy Averaging

An information theoretic formulation of the distributed averaging problem previously studied in computer science and control is presented. We assume a network with m nodes each observing a WGN source. The nodes communicate and perform local processing with the goal of computing the average of the sources to within a prescribed mean squared error distortion. The network rate distortion function R^*(D) for a 2-node network with correlated Gaussian sources is established. A general cutset lower bound on R^*(D) is established and shown to be achievable to within a factor of 2 via a centralized protocol over a star network. A lower bound on the network rate distortion function for distributed weighted-sum protocols, which is larger in order than the cutset bound by a factor of log m is established. An upper bound on the network rate distortion function for gossip-base weighted-sum protocols, which is only log log m larger in order than the lower bound for a complete graph network, is established. The results suggest that using distributed protocols results in a factor of log m increase in order relative to centralized protocols.

preprint2010arXiv

Noisy Network Coding

A noisy network coding scheme for sending multiple sources over a general noisy network is presented. For multi-source multicast networks, the scheme naturally extends both network coding over noiseless networks by Ahlswede, Cai, Li, and Yeung, and compress-forward coding for the relay channel by Cover and El Gamal to general discrete memoryless and Gaussian networks. The scheme also recovers as special cases the results on coding for wireless relay networks and deterministic networks by Avestimehr, Diggavi, and Tse, and coding for wireless erasure networks by Dana, Gowaikar, Palanki, Hassibi, and Effros. The scheme involves message repetition coding, relay signal compression, and simultaneous decoding. Unlike previous compress--forward schemes, where independent messages are sent over multiple blocks, the same message is sent multiple times using independent codebooks as in the network coding scheme for cyclic networks. Furthermore, the relays do not use Wyner--Ziv binning as in previous compress-forward schemes, and each decoder performs simultaneous joint typicality decoding on the received signals from all the blocks without explicitly decoding the compression indices. A consequence of this new scheme is that achievability is proved simply and more generally without resorting to time expansion to extend results for acyclic networks to networks with cycles. The noisy network coding scheme is then extended to general multi-source networks by combining it with decoding techniques for interference channels. For the Gaussian multicast network, noisy network coding improves the previously established gap to the cutset bound. We also demonstrate through two popular AWGN network examples that noisy network coding can outperform conventional compress-forward, amplify-forward, and hash-forward schemes.

preprint2009arXiv

On the Sum Capacity of A Class of Cyclically Symmetric Deterministic Interference Channels

Certain deterministic interference channels have been shown to accurately model Gaussian interference channels in the asymptotic low-noise regime. Motivated by this correspondence, we investigate a K user-pair, cyclically symmetric, deterministic interference channel in which each receiver experiences interference only from its neighboring transmitters (Wyner model). We establish the sum capacity for a large set of channel parameters, thus generalizing previous results for the 2-pair case.

Abbas El Gamal

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

Information Theory and Statistical Learning

Information-theoretic Limits of Learning and Estimation

A Strengthened Cutset Upper Bound on the Capacity of the Relay Channel and Applications

Network Information Theoretic Security

Minimax Learning for Remote Prediction

An Efficient Feedback Coding Scheme with Low Error Probability for Discrete Memoryless Channels

Capacity Approximations for Gaussian Relay Networks

Capacity Theorems for Broadcast Channels with Two Channel State Components Known at the Receivers

Superposition Coding is Almost Always Optimal for the Poisson Broadcast Channel

A Note on Broadcast Channels with Stale State Information at the Transmitter

Capacity Region of the Broadcast Channel with Two Deterministic Channel State Components

Compensating Demand Response Participants Via Their Shapley Values

Exact Common Information

Limits on the Benefits of Energy Storage for Renewable Integration

Optimal Achievable Rates for Interference Networks with Random Codes

3-Receiver Broadcast Channels with Common and Confidential Messages

Communication with Disturbance Constraints

Interference Decoding for Deterministic Channels

Lecture Notes on Network Information Theory

On Marton's Inner Bound for the General Broadcast Channel

An Achievability Scheme for the Compound Channel with State Noncausally Available at the Encoder

Distributed Lossy Averaging

Noisy Network Coding

On the Sum Capacity of A Class of Cyclically Symmetric Deterministic Interference Channels