Source author record

Rüdiger Urbanke

Rüdiger Urbanke appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Cryptography and Security math.ST Neural and Evolutionary Computing Statistics Theory

Catalog footprint

What is connected

15works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits

Despite ongoing theoretical research on cross-validation (CV), many theoretical questions remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds in $k$-fold CV. Our results consist of a novel decomposition of the mean-squared error of cross-validation for risk estimation, which explicitly captures the correlations of error estimates across overlapping folds and includes a novel algorithmic stability notion, squared loss stability, that is considerably weaker than the typically required hypothesis stability in other comparable works. Furthermore, we prove: 1. For any learning algorithm that minimizes empirical risk, the mean-squared error of the $k$-fold cross-validation estimator $\widehat{L}_{\mathrm{CV}}^{(k)}$ of the population risk $L_{D}$ satisfies the following minimax lower bound: \[ \min_{k \mid n} \max_{D} \mathbb{E}\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{D}\big)^{2}\right]=Ω\big(\sqrt{k^*}/n\big), \] where $n$ is the sample size, $k$ the number of folds, and $k^*$ denotes the number of folds attaining the minimax optimum. This shows that even under idealized conditions, for large values of $k$, CV cannot attain the optimum of order $1/n$ achievable by a validation set of size $n$, reflecting an inherent penalty caused by dependence between folds. 2. Complementing this, we exhibit learning rules for which \[ \max_{D}\mathbb{E}\!\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{D}\big)^{2}\right]=Ω(k/n), \] matching (up to constants) the accuracy of a hold-out estimator of a single fold of size $n/k$. Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all $n$ samples for unbiased risk evaluation, and its minimax performance is pinned between the $k/n$ and $\sqrt{k}/n$ regimes.

preprint2022arXiv

Polar Codes Do Not Have Many Affine Automorphisms

Polar coding solutions demonstrate excellent performance under the list decoding that is challenging to implement in hardware due to the path sorting operations. As a potential solution to this problem, permutation decoding recently became a hot research topic. However, it imposes more constraints on the code structure. In this paper, we study the structural properties of Arikan's polar codes. It is known that they are invariant under lower-triangular affine permutations among others. However, those permutations are not useful in the context of permutation decoding. We show that, unfortunately, the group of affine automorphisms of Arikan's polar codes asymptotically cannot be much bigger than the group of lower-triangular permutations.

preprint2021arXiv

Adversarial Robustness: What fools you makes you stronger

We prove an exponential separation for the sample complexity between the standard PAC-learning model and a version of the Equivalence-Query-learning model. We then show that this separation has interesting implications for adversarial robustness. We explore a vision of designing an adaptive defense that in the presence of an attacker computes a model that is provably robust. In particular, we show how to realize this vision in a simplified setting. In order to do so, we introduce a notion of a strong adversary: he is not limited by the type of perturbations he can apply but when presented with a classifier can repetitively generate different adversarial examples. We explain why this notion is interesting to study and use it to prove the following. There exists an efficient adversarial-learning-like scheme such that for every strong adversary $\mathbf{A}$ it outputs a classifier that (a) cannot be strongly attacked by $\mathbf{A}$, or (b) has error at most $ε$. In both cases our scheme uses exponentially (in $ε$) fewer samples than what the PAC bound requires.

preprint2021arXiv

Partially symmetric monomial codes

A framework of monomial codes is considered, which includes linear codes generated by the evaluation of certain monomials. Polar and Reed-Muller codes are the two best-known representatives of such codes and can be considered as two extreme cases. Reed-Muller codes have a large automorphism group but their low-complexity maximum likelihood decoding still remains an open problem. On the other hand, polar codes have much less symmetries but admit the efficient near-ML decoding. We study the dependency between the code symmetries and the decoding efficiency. We introduce a new family of codes, partially symmetric monomial codes. These codes have a smaller group of symmetries than the Reed-Muller codes and are in this sense "between" RM and polar codes. A lower bound on their parameters is introduced along with the explicit construction which achieves it. Structural properties of these codes are demonstrated and it is shown that they often have a recursive structure.

preprint2021arXiv

Query complexity of adversarial attacks

There are two main attack models considered in the adversarial robustness literature: black-box and white-box. We consider these threat models as two ends of a fine-grained spectrum, indexed by the number of queries the adversary can ask. Using this point of view we investigate how many queries the adversary needs to make to design an attack that is comparable to the best possible attack in the white-box model. We give a lower bound on that number of queries in terms of entropy of decision boundaries of the classifier. Using this result we analyze two classical learning algorithms on two synthetic tasks for which we prove meaningful security guarantees. The obtained bounds suggest that some learning algorithms are inherently more robust against query-bounded adversaries than others.

preprint2016arXiv

Comparing the Bit-MAP and Block-MAP Decoding Thresholds of Reed-Muller Codes on BMS Channels

The question whether RM codes are capacity-achieving is a long-standing open problem in coding theory that was recently answered in the affirmative for transmission over erasure channels [1], [2]. Remarkably, the proof does not rely on specific properties of RM codes, apart from their symmetry. Indeed, the main technical result consists in showing that any sequence of linear codes, with doubly-transitive permutation groups, achieves capacity on the memoryless erasure channel under bit-MAP decoding. Thus, a natural question is what happens under block-MAP decoding. In [1], [2], by exploiting further symmetries of the code, the bit-MAP threshold was shown to be sharp enough so that the block erasure probability also converges to 0. However, this technique relies heavily on the fact that the transmission is over an erasure channel. We present an alternative approach to strengthen results regarding the bit-MAP threshold to block-MAP thresholds. This approach is based on a careful analysis of the weight distribution of RM codes. In particular, the flavor of the main result is the following: assume that the bit-MAP error probability decays as $N^{-δ}$, for some $δ>0$. Then, the block-MAP error probability also converges to 0. This technique applies to transmission over any binary memoryless symmetric channel. Thus, it can be thought of as a first step in extending the proof that RM codes are capacity-achieving to the general case.

preprint2016arXiv

Reed-Muller Codes Achieve Capacity on Erasure Channels

We introduce a new approach to proving that a sequence of deterministic linear codes achieves capacity on an erasure channel under maximum a posteriori decoding. Rather than relying on the precise structure of the codes our method exploits code symmetry. In particular, the technique applies to any sequence of linear codes where the blocklengths are strictly increasing, the code rates converge, and the permutation group of each code is doubly transitive. In other words, we show that symmetry alone implies near-optimal performance. An important consequence of this result is that a sequence of Reed-Muller codes with increasing blocklength and converging rate achieves capacity. This possibility has been suggested previously in the literature but it has only been proven for cases where the limiting code rate is 0 or 1. Moreover, these results extend naturally to all affine-invariant codes and, thus, to extended primitive narrow-sense BCH codes. This also resolves, in the affirmative, the existence question for capacity-achieving sequences of binary cyclic codes. The primary tools used in the proof are the sharp threshold property for symmetric monotone boolean functions and the area theorem for extrinsic information transfer functions.

preprint2016arXiv

Unified Scaling of Polar Codes: Error Exponent, Scaling Exponent, Moderate Deviations, and Error Floors

Consider the transmission of a polar code of block length $N$ and rate $R$ over a binary memoryless symmetric channel $W$ and let $P_e$ be the block error probability under successive cancellation decoding. In this paper, we develop new bounds that characterize the relationship of the parameters $R$, $N$, $P_e$, and the quality of the channel $W$ quantified by its capacity $I(W)$ and its Bhattacharyya parameter $Z(W)$. In previous work, two main regimes were studied. In the error exponent regime, the channel $W$ and the rate $R<I(W)$ are fixed, and it was proved that the error probability $P_e$ scales roughly as $2^{-\sqrt{N}}$. In the scaling exponent approach, the channel $W$ and the error probability $P_e$ are fixed and it was proved that the gap to capacity $I(W)-R$ scales as $N^{-1/μ}$. Here, $μ$ is called scaling exponent and this scaling exponent depends on the channel $W$. A heuristic computation for the binary erasure channel (BEC) gives $μ=3.627$ and it was shown that, for any channel $W$, $3.579 \le μ\le 5.702$. Our contributions are as follows. First, we provide the tighter upper bound $μ\le 4.714$ valid for any $W$. With the same technique, we obtain $μ\le 3.639$ for the case of the BEC, which approaches very closely its heuristically derived value. Second, we develop a trade-off between the gap to capacity $I(W)-R$ and the error probability $P_e$ as functions of the block length $N$. In other words, we consider a moderate deviations regime in which we study how fast both quantities, as functions of the block length $N$, simultaneously go to $0$. Third, we prove that polar codes are not affected by error floors. To do so, we fix a polar code of block length $N$ and rate $R$. Then, we vary the channel $W$ and we show that the error probability $P_e$ scales as the Bhattacharyya parameter $Z(W)$ raised to a power that scales roughly like $\sqrt{N}$.

preprint2015arXiv

A Scaling Law to Predict the Finite-Length Performance of Spatially-Coupled LDPC Codes

Spatially-coupled LDPC codes are known to have excellent asymptotic properties. Much less is known regarding their finite-length performance. We propose a scaling law to predict the error probability of finite-length spatially-coupled ensembles when transmission takes place over the binary erasure channel. We discuss how the parameters of the scaling law are connected to fundamental quantities appearing in the asymptotic analysis of these ensembles and we verify that the predictions of the scaling law fit well to the data derived from simulations over a wide range of parameters. The ultimate goal of this line of research is to develop analytic tools for the design of spatially-coupled LDPC codes under practical constraints.

preprint2015arXiv

Reed-Muller Codes Achieve Capacity on the Binary Erasure Channel under MAP Decoding

We show that Reed-Muller codes achieve capacity under maximum a posteriori bit decoding for transmission over the binary erasure channel for all rates $0 < R < 1$. The proof is generic and applies to other codes with sufficient amount of symmetry as well. The main idea is to combine the following observations: (i) monotone functions experience a sharp threshold behavior, (ii) the extrinsic information transfer (EXIT) functions are monotone, (iii) Reed--Muller codes are 2-transitive and thus the EXIT functions associated with their codeword bits are all equal, and (iv) therefore the Area Theorem for the average EXIT functions implies that RM codes' threshold is at channel capacity.

preprint2015arXiv

Spatial Coupling as a Proof Technique

The aim of this paper is to show that spatial coupling can be viewed not only as a means to build better graphical models, but also as a tool to better understand uncoupled models. The starting point is the observation that some asymptotic properties of graphical models are easier to prove in the case of spatial coupling. In such cases, one can then use the so-called interpolation method to transfer known results for the spatially coupled case to the uncoupled one. Our main use of this framework is for LDPC codes, where we use interpolation to show that the average entropy of the codeword conditioned on the observation is asymptotically the same for spatially coupled as for uncoupled ensembles. We give three applications of this result for a large class of LDPC ensembles. The first one is a proof of the so-called Maxwell construction stating that the MAP threshold is equal to the Area threshold of the BP GEXIT curve. The second is a proof of the equality between the BP and MAP GEXIT curves above the MAP threshold. The third application is the intimately related fact that the replica symmetric formula for the conditional entropy in the infinite block length limit is exact.

preprint2014arXiv

Achieving Marton's Region for Broadcast Channels Using Polar Codes

This paper presents polar coding schemes for the 2-user discrete memoryless broadcast channel (DM-BC) which achieve Marton's region with both common and private messages. This is the best achievable rate region known to date, and it is tight for all classes of 2-user DM-BCs whose capacity regions are known. To accomplish this task, we first construct polar codes for both the superposition as well as the binning strategy. By combining these two schemes, we obtain Marton's region with private messages only. Finally, we show how to handle the case of common information. The proposed coding schemes possess the usual advantages of polar codes, i.e., they have low encoding and decoding complexity and a super-polynomial decay rate of the error probability. We follow the lead of Goela, Abbe, and Gastpar, who recently introduced polar codes emulating the superposition and binning schemes. In order to align the polar indices, for both schemes, their solution involves some degradedness constraints that are assumed to hold between the auxiliary random variables and the channel outputs. To remove these constraints, we consider the transmission of $k$ blocks and employ a chaining construction that guarantees the proper alignment of the polarized indices. The techniques described in this work are quite general, and they can be adopted to many other multi-terminal scenarios whenever there polar indices need to be aligned.

preprint2014arXiv

From Polar to Reed-Muller Codes: a Technique to Improve the Finite-Length Performance

We explore the relationship between polar and RM codes and we describe a coding scheme which improves upon the performance of the standard polar code at practical block lengths. Our starting point is the experimental observation that RM codes have a smaller error probability than polar codes under MAP decoding. This motivates us to introduce a family of codes that "interpolates" between RM and polar codes, call this family ${\mathcal C}_{\rm inter} = \{C_α : α\in [0, 1]\}$, where $C_α \big |_{α= 1}$ is the original polar code, and $C_α \big |_{α= 0}$ is an RM code. Based on numerical observations, we remark that the error probability under MAP decoding is an increasing function of $α$. MAP decoding has in general exponential complexity, but empirically the performance of polar codes at finite block lengths is boosted by moving along the family ${\mathcal C}_{\rm inter}$ even under low-complexity decoding schemes such as, for instance, belief propagation or successive cancellation list decoder. We demonstrate the performance gain via numerical simulations for transmission over the erasure channel as well as the Gaussian channel.

preprint2014arXiv

Scaling Exponent of List Decoders with Applications to Polar Codes

Motivated by the significant performance gains which polar codes experience under successive cancellation list decoding, their scaling exponent is studied as a function of the list size. In particular, the error probability is fixed and the trade-off between block length and back-off from capacity is analyzed. A lower bound is provided on the error probability under $\rm MAP$ decoding with list size $L$ for any binary-input memoryless output-symmetric channel and for any class of linear codes such that their minimum distance is unbounded as the block length grows large. Then, it is shown that under $\rm MAP$ decoding, although the introduction of a list can significantly improve the involved constants, the scaling exponent itself, i.e., the speed at which capacity is approached, stays unaffected for any finite list size. In particular, this result applies to polar codes, since their minimum distance tends to infinity as the block length increases. A similar result is proved for genie-aided successive cancellation decoding when transmission takes place over the binary erasure channel, namely, the scaling exponent remains constant for any fixed number of helps from the genie. Note that since genie-aided successive cancellation decoding might be strictly worse than successive cancellation list decoding, the problem of establishing the scaling exponent of the latter remains open.

preprint2011arXiv

Scaling Behavior of Convolutional LDPC Ensembles over the BEC

We study the scaling behavior of coupled sparse graph codes over the binary erasure channel. In particular, let 2L+1 be the length of the coupled chain, let M be the number of variables in each of the 2L + 1 local copies, let l be the number of iterations, let Pb denote the bit error probability, and let ε denote the channel parameter. We are interested in how these quantities scale when we let the blocklength (2L + 1)M tend to infinity. Based on empirical evidence we show that the threshold saturation phenomenon is rather stable with respect to the scaling of the various parameters and we formulate some general rules of thumb which can serve as a guide for the design of coding systems based on coupled graphs.

Rüdiger Urbanke

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits

Polar Codes Do Not Have Many Affine Automorphisms

Adversarial Robustness: What fools you makes you stronger

Partially symmetric monomial codes

Query complexity of adversarial attacks

Comparing the Bit-MAP and Block-MAP Decoding Thresholds of Reed-Muller Codes on BMS Channels

Reed-Muller Codes Achieve Capacity on Erasure Channels

Unified Scaling of Polar Codes: Error Exponent, Scaling Exponent, Moderate Deviations, and Error Floors

A Scaling Law to Predict the Finite-Length Performance of Spatially-Coupled LDPC Codes

Reed-Muller Codes Achieve Capacity on the Binary Erasure Channel under MAP Decoding

Spatial Coupling as a Proof Technique

Achieving Marton's Region for Broadcast Channels Using Polar Codes

From Polar to Reed-Muller Codes: a Technique to Improve the Finite-Length Performance

Scaling Exponent of List Decoders with Applications to Polar Codes

Scaling Behavior of Convolutional LDPC Ensembles over the BEC