Source author record

Tsachy Weissman

Tsachy Weissman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT eess.IV Machine Learning eess.SP math.ST Statistics Theory Data Structures and Algorithms Genomics Methodology Multimedia Quantitative Methods

Catalog footprint

What is connected

48works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

An Information-Theoretic Perspective on LLM Tokenizers

Large language model (LLM) tokenizers act as structured compressors: by mapping text to discrete token sequences, they determine token count (and thus compute and context usage) and the statistical structure seen by downstream models. Despite their central role in LLM pipelines, the link between tokenization, compression efficiency and induced structure is not well understood. We empirically demonstrate that tokenizer training scale redistributes entropy: as training data grows, the token stream becomes more diverse in aggregate (higher unigram entropy) yet markedly more predictable in-context (lower higher-order conditional entropies), indicating that tokenization absorbs substantial short-range regularity although these gains degrade under train-test domain mismatch. To ground these observations, we first benchmark i) pretrained GPT-family tokenizers as black-box compressors across various domains, and ii) learned tokenizers across configurations spanning vocabulary size, training scale, and domain. Next, we study tokenization as a transform for universal compression and introduce a compression-aware BPE variant. Finally, we adopt a channel lens and introduce capacity-utilization metrics to analyze tokenizer behaviour and outline implications for downstream modeling. Put together, our results expose various trade-offs between compression, induced structure, and robustness under domain shift, and motivate principled, compression-aware tokenizer design.

preprint2022arXiv

An Information-Theoretic Justification for Model Pruning

We study the neural network (NN) compression problem, viewing the tension between the compression ratio and NN performance through the lens of rate-distortion theory. We choose a distortion metric that reflects the effect of NN compression on the model output and derive the tradeoff between rate (compression) and distortion. In addition to characterizing theoretical limits of NN compression, this formulation shows that \emph{pruning}, implicitly or explicitly, must be a part of a good compression algorithm. This observation bridges a gap between parts of the literature pertaining to NN and data compression, respectively, providing insight into the empirical success of model pruning. Finally, we propose a novel pruning strategy derived from our information-theoretic formulation and show that it outperforms the relevant baselines on CIFAR-10 and ImageNet datasets.

preprint2022arXiv

An Interactive Annotation Tool for Perceptual Video Compression

Human perception is at the core of lossy video compression and yet, it is challenging to collect data that is sufficiently dense to drive compression. In perceptual quality assessment, human feedback is typically collected as a single scalar quality score indicating preference of one distorted video over another. In reality, some videos may be better in some parts but not in others. We propose an approach to collecting finer-grained feedback by asking users to use an interactive tool to directly optimize for perceptual quality given a fixed bitrate. To this end, we built a novel web-tool which allows users to paint these spatio-temporal importance maps over videos. The tool allows for interactive successive refinement: we iteratively re-encode the original video according to the painted importance maps, while maintaining the same bitrate, thus allowing the user to visually see the trade-off of assigning higher importance to one spatio-temporal part of the video at the cost of others. We use this tool to collect data in-the-wild (10 videos, 17 users) and utilize the obtained importance maps in the context of x264 coding to demonstrate that the tool can indeed be used to generate videos which, at the same bitrate, look perceptually better through a subjective study - and are 1.9 times more likely to be preferred by viewers. The code for the tool and dataset can be found at https://github.com/jenyap/video-annotation-tool.git

preprint2022arXiv

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure. In addition, the recent COVID-19 pandemic fueled a surge in the use of video conferencing tools. Since videos take up considerable bandwidth (~100 Kbps to a few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. We present a novel video compression pipeline, called Txt2Vid, which dramatically reduces data transmission rates by compressing webcam videos ("talking-head videos") to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models. Our generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience based on a subjective evaluation by users (n = 242) in an online study. The Txt2Vid framework opens up the potential for creating novel applications such as enabling audio-video communication during poor internet connectivity, or in remote terrains with limited bandwidth. The code for this work is available at https://github.com/tpulkit/txt2vid.git.

preprint2021arXiv

Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

We study the minimax estimation of $α$-divergences between discrete distributions for integer $α\ge 1$, which include the Kullback--Leibler divergence and the $χ^2$-divergences as special examples. Dropping the usual theoretical tricks to acquire independence, we construct the first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials. The estimator uses a hybrid approach which solves a problem-independent linear program based on moment matching in the non-smooth regime, and applies a problem-dependent bias-corrected plug-in estimator in the smooth regime, with a soft decision boundary between these regimes.

preprint2021arXiv

Reducing latency and bandwidth for video streaming using keypoint extraction and digital puppetry

COVID-19 has made video communication one of the most important modes of information exchange. While extensive research has been conducted on the optimization of the video streaming pipeline, in particular the development of novel video codecs, further improvement in the video quality and latency is required, especially under poor network conditions. This paper proposes an alternative to the conventional codec through the implementation of a keypoint-centric encoder relying on the transmission of keypoint information from within a video feed. The decoder uses the streamed keypoints to generate a reconstruction preserving the semantic features in the input feed. Focusing on video calling applications, we detect and transmit the body pose and face mesh information through the network, which are displayed at the receiver in the form of animated puppets. Using efficient pose and face mesh detection in conjunction with skeleton-based animation, we demonstrate a prototype requiring lower than 35 kbps bandwidth, an order of magnitude reduction over typical video calling systems. The added computational latency due to the mesh extraction and animation is below 120ms on a standard laptop, showcasing the potential of this framework for real-time applications. The code for this work is available at https://github.com/shubhamchandak94/digital-puppetry/.

preprint2020arXiv

LFZip: Lossy compression of multivariate floating-point time series data via improved prediction

Time series data compression is emerging as an important problem with the growth in IoT devices and sensors. Due to the presence of noise in these datasets, lossy compression can often provide significant compression gains without impacting the performance of downstream applications. In this work, we propose an error-bounded lossy compressor, LFZip, for multivariate floating-point time series data that provides guaranteed reconstruction up to user-specified maximum absolute error. The compressor is based on the prediction-quantization-entropy coder framework and benefits from improved prediction using linear models and neural networks. We evaluate the compressor on several time series datasets where it outperforms the existing state-of-the-art error-bounded lossy compressors. The code and data are available at https://github.com/shubhamchandak94/LFZip

preprint2016arXiv

Rateless Lossy Compression via the Extremes

We begin by presenting a simple lossy compressor operating at near-zero rate: The encoder merely describes the indices of the few maximal source components, while the decoder's reconstruction is a natural estimate of the source components based on this information. This scheme turns out to be near-optimal for the memoryless Gaussian source in the sense of achieving the zero-rate slope of its distortion-rate function. Motivated by this finding, we then propose a scheme comprised of iterating the above lossy compressor on an appropriately transformed version of the difference between the source and its reconstruction from the previous iteration. The proposed scheme achieves the rate distortion function of the Gaussian memoryless source (under squared error distortion) when employed on any finite-variance ergodic source. It further possesses desirable properties we respectively refer to as infinitesimal successive refinability, ratelessness, and complete separability. Its storage and computation requirements are of order no more than $\frac{n^2}{\log^β n}$ per source symbol for $β>0$ at both the encoder and decoder. Though the details of its derivation, construction, and analysis differ considerably, we discuss similarities between the proposed scheme and the recently introduced Sparse Regression Codes (SPARC) of Venkataramanan et al.

preprint2016arXiv

Strong Successive Refinability and Rate-Distortion-Complexity Tradeoff

We investigate the second order asymptotics (source dispersion) of the successive refinement problem. Similarly to the classical definition of a successively refinable source, we say that a source is strongly successively refinable if successive refinement coding can achieve the second order optimum rate (including the dispersion terms) at both decoders. We establish a sufficient condition for strong successive refinability. We show that any discrete source under Hamming distortion and the Gaussian source under quadratic distortion are strongly successively refinable. We also demonstrate how successive refinement ideas can be used in point-to-point lossy compression problems in order to reduce complexity. We give two examples, the binary-Hamming and Gaussian-quadratic cases, in which a layered code construction results in a low complexity scheme that attains optimal performance. For example, when the number of layers grows with the block length $n$, we show how to design an $O(n^{\log(n)})$ algorithm that asymptotically achieves the rate-distortion bound.

preprint2016arXiv

When is Noisy State Information at the Encoder as Useless as No Information or as Good as Noise-Free State?

For any binary-input channel with perfect state information at the decoder, if the mutual information between the noisy state observation at the encoder and the true channel state is below a positive threshold determined solely by the state distribution, then the capacity is the same as that with no encoder side information. A complementary phenomenon is revealed for the generalized probing capacity. Extensions beyond binary-input channels are developed.

preprint2015arXiv

Distortion-Rate Function of Sub-Nyquist Sampled Gaussian Sources

The amount of information lost in sub-Nyquist sampling of a continuous-time Gaussian stationary process is quantified. We consider a combined source coding and sub-Nyquist reconstruction problem in which the input to the encoder is a noisy sub-Nyquist sampled version of the analog source. We first derive an expression for the mean squared error in the reconstruction of the process from a noisy and information rate-limited version of its samples. This expression is a function of the sampling frequency and the average number of bits describing each sample. It is given as the sum of two terms: Minimum mean square error in estimating the source from its noisy but otherwise fully observed sub-Nyquist samples, and a second term obtained by reverse waterfilling over an average of spectral densities associated with the polyphase components of the source. We extend this result to multi-branch uniform sampling, where the samples are available through a set of parallel channels with a uniform sampler and a pre-sampling filter in each branch. Further optimization to reduce distortion is then performed over the pre-sampling filters, and an optimal set of pre-sampling filters associated with the statistics of the input signal and the sampling frequency is found. This results in an expression for the minimal possible distortion achievable under any analog to digital conversion scheme involving uniform sampling and linear filtering. These results thus unify the Shannon-Whittaker-Kotelnikov sampling theorem and Shannon rate-distortion theory for Gaussian sources.

preprint2015arXiv

Justification of Logarithmic Loss via the Benefit of Side Information

We consider a natural measure of relevance: the reduction in optimal prediction risk in the presence of side information. For any given loss function, this relevance measure captures the benefit of side information for performing inference on a random variable under this loss function. When such a measure satisfies a natural data processing property, and the random variable of interest has alphabet size greater than two, we show that it is uniquely characterized by the mutual information, and the corresponding loss function coincides with logarithmic loss. In doing so, our work provides a new characterization of mutual information, and justifies its use as a measure of relevance. When the alphabet is binary, we characterize the only admissible forms the measure of relevance can assume while obeying the specified data processing property. Our results naturally extend to measuring causal influence between stochastic processes, where we unify different causal-inference measures in the literature as instantiations of directed information.

preprint2015arXiv

Minimax Estimation of Discrete Distributions under $\ell_1$ Loss

We analyze the problem of discrete distribution estimation under $\ell_1$ loss. We provide non-asymptotic upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the alphabet size $S$ may grow with the number of observations $n$. We show that among distributions with bounded entropy $H$, the asymptotic maximum risk for the empirical distribution is $2H/\ln n$, while the asymptotic minimax risk is $H/\ln n$. Moreover, Moreover, we show that a hard-thresholding estimator oblivious to the unknown upper bound $H$, is asymptotically minimax. However, if we constrain the estimates to lie in the simplex of probability distributions, then the asymptotic minimax risk is again $2H/\ln n$. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance ($\ell_1$ divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation.

preprint2015arXiv

Minimax Estimation of Functionals of Discrete Distributions

We propose a general methodology for the construction and analysis of minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the alphabet size $S$ is unknown and may be comparable with the number of observations $n$. We treat the respective regions where the functional is "nonsmooth" and "smooth" separately. In the "nonsmooth" regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the "smooth" regime, we apply a bias-corrected Maximum Likelihood Estimator (MLE). We illustrate the merit of this approach by thoroughly analyzing two important cases: the entropy $H(P) = \sum_{i = 1}^S -p_i \ln p_i$ and $F_α(P) = \sum_{i = 1}^S p_i^α,α>0$. We obtain the minimax $L_2$ rates for estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity $n \asymp S/\ln S$ for entropy estimation. We also show that the sample complexity for estimating $F_α(P),0<α<1$ is $n\asymp S^{1/α}/ \ln S$, which can be achieved by our estimator but not the MLE. For $1<α<3/2$, we show the minimax $L_2$ rate for estimating $F_α(P)$ is $(n\ln n)^{-2(α-1)}$ regardless of the alphabet size, while the $L_2$ rate for the MLE is $n^{-2(α-1)}$. For all the above cases, the behavior of the minimax rate-optimal estimators with $n$ samples is essentially that of the MLE with $n\ln n$ samples. We highlight the practical advantages of our schemes for entropy and mutual information estimation. We demonstrate that our approach reduces running time and boosts the accuracy compared to existing various approaches. Moreover, we show that the mutual information estimator induced by our methodology leads to significant performance boosts over the Chow--Liu algorithm in learning graphical models.

preprint2014arXiv

Beyond Maximum Likelihood: from Theory to Practice

Maximum likelihood is the most widely used statistical estimation technique. Recent work by the authors introduced a general methodology for the construction of estimators for functionals in parametric models, and demonstrated improvements - both in theory and in practice - over the maximum likelihood estimator (MLE), particularly in high dimensional scenarios involving parameter dimension comparable to or larger than the number of samples. This approach to estimation, building on results from approximation theory, is shown to yield minimax rate-optimal estimators for a wide class of functionals, implementable with modest computational requirements. In a nutshell, a message of this recent work is that, for a wide class of functionals, the performance of these essentially optimal estimators with $n$ samples is comparable to that of the MLE with $n \ln n$ samples. In the present paper, we highlight the applicability of the aforementioned methodology to statistical problems beyond functional estimation, and show that it can yield substantial gains. For example, we demonstrate that for learning tree-structured graphical models, our approach achieves a significant reduction of the required data size compared with the classical Chow--Liu algorithm, which is an implementation of the MLE, to achieve the same accuracy. The key step in improving the Chow--Liu algorithm is to replace the empirical mutual information with the estimator for mutual information proposed by the authors. Further, applying the same replacement approach to classical Bayesian network classification, the resulting classifiers uniformly outperform the previous classifiers on 26 widely used datasets.

preprint2014arXiv

Comparison of the Achievable Rates in OFDM and Single Carrier Modulation with I.I.D. Inputs

We compare the maximum achievable rates in single-carrier and OFDM modulation schemes, under the practical assumptions of i.i.d. finite alphabet inputs and linear ISI with additive Gaussian noise. We show that the Shamai-Laroia approximation serves as a bridge between the two rates: while it is well known that this approximation is often a lower bound on the single-carrier achievable rate, it is revealed to also essentially upper bound the OFDM achievable rate. We apply Information-Estimation relations in order to rigorously establish this result for both general input distributions and to sharpen it for commonly used PAM and QAM constellations. To this end, novel bounds on MMSE estimation of PAM inputs to a scalar Gaussian channel are derived, which may be of general interest. Our results show that, under reasonable assumptions, optimal single-carrier schemes may offer spectral efficiency significantly superior to that of OFDM, motivating further research of such systems.

preprint2014arXiv

Compression for Quadratic Similarity Queries: Finite Blocklength and Practical Schemes

We study the problem of compression for the purpose of similarity identification, where similarity is measured by the mean square Euclidean distance between vectors. While the asymptotical fundamental limits of the problem - the minimal compression rate and the error exponent - were found in a previous work, in this paper we focus on the nonasymptotic domain and on practical, implementable schemes. We first present a finite blocklength achievability bound based on shape-gain quantization: The gain (amplitude) of the vector is compressed via scalar quantization and the shape (the projection on the unit sphere) is quantized using a spherical code. The results are numerically evaluated and they converge to the asymptotic values as predicted by the error exponent. We then give a nonasymptotic lower bound on the performance of any compression scheme, and compare to the upper (achievability) bound. For a practical implementation of such a scheme, we use wrapped spherical codes, studied by Hamkins and Zeger, and use the Leech lattice as an example for an underlying lattice. As a side result, we obtain a bound on the covering angle of any wrapped spherical code, as a function of the covering radius of the underlying lattice.

preprint2014arXiv

Information Measures: the Curious Case of the Binary Alphabet

Four problems related to information divergence measures defined on finite alphabets are considered. In three of the cases we consider, we illustrate a contrast which arises between the binary-alphabet and larger-alphabet settings. This is surprising in some instances, since characterizations for the larger-alphabet settings do not generalize their binary-alphabet counterparts. Specifically, we show that $f$-divergences are not the unique decomposable divergences on binary alphabets that satisfy the data processing inequality, thereby clarifying claims that have previously appeared in the literature. We also show that KL divergence is the unique Bregman divergence which is also an $f$-divergence for any alphabet size. We show that KL divergence is the unique Bregman divergence which is invariant to statistically sufficient transformations of the data, even when non-decomposable divergences are considered. Like some of the problems we consider, this result holds only when the alphabet size is at least three.

preprint2014arXiv

Minimax Filtering via Relations between Information and Estimation

We investigate the problem of continuous-time causal estimation under a minimax criterion. Let $X^T = \{X_t,0\leq t\leq T\}$ be governed by the probability law $P_θ$ from a class of possible laws indexed by $θ\in Λ$, and $Y^T$ be the noise corrupted observations of $X^T$ available to the estimator. We characterize the estimator minimizing the worst case regret, where regret is the difference between the causal estimation loss of the estimator and that of the optimum estimator. One of the main contributions of this paper is characterizing the minimax estimator, showing that it is in fact a Bayesian estimator. We then relate minimax regret to the channel capacity when the channel is either Gaussian or Poisson. In this case, we characterize the minimax regret and the minimax estimator more explicitly. If we further assume that the uncertainty set consists of deterministic signals, the worst case regret is exactly equal to the corresponding channel capacity, namely the maximal mutual information attainable across the channel among all possible distributions on the uncertainty set of signals. The corresponding minimax estimator is the Bayesian estimator assuming the capacity-achieving prior. Using this relation, we also show that the capacity achieving prior coincides with the least favorable input. Moreover, we show that this minimax estimator is not only minimizing the worst case regret but also essentially minimizing regret for "most" of the other sources in the uncertainty set. We present a couple of examples for the construction of an minimax filter via an approximation of the associated capacity achieving distribution.

preprint2013arXiv

Capacity of a POST Channel with and without Feedback

We consider finite state channels where the state of the channel is its previous output. We refer to these as POST (Previous Output is the STate) channels. We first focus on POST($α$) channels. These channels have binary inputs and outputs, where the state determines if the channel behaves as a $Z$ or an $S$ channel, both with parameter $α$. %with parameter $α.$ We show that the non feedback capacity of the POST($α$) channel equals its feedback capacity, despite the memory of the channel. The proof of this surprising result is based on showing that the induced output distribution, when maximizing the directed information in the presence of feedback, can also be achieved by an input distribution that does not utilize of the feedback. We show that this is a sufficient condition for the feedback capacity to equal the non feedback capacity for any finite state channel. We show that the result carries over from the POST($α$) channel to a binary POST channel where the previous output determines whether the current channel will be binary with parameters $(a,b)$ or $(b,a)$. Finally, we show that, in general, feedback may increase the capacity of a POST channel.

preprint2013arXiv

Compression for Quadratic Similarity Queries

The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on compressed data. For a Gaussian source, we show that queries can be answered reliably if and only if the compression rate exceeds a given threshold - the identification rate - which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source we prove that, as with classical compression, the Gaussian source requires the largest compression rate among sources with a given variance. Moreover, a robust scheme is described that attains this maximal rate for any source distribution.

preprint2013arXiv

Information, Estimation, and Lookahead in the Gaussian channel

We consider mean squared estimation with lookahead of a continuous-time signal corrupted by additive white Gaussian noise. We show that the mutual information rate function, i.e., the mutual information rate as function of the signal-to-noise ratio (SNR), does not, in general, determine the minimum mean squared error (MMSE) with fixed finite lookahead, in contrast to the special cases with 0 and infinite lookahead (filtering and smoothing errors), respectively, which were previously established in the literature. We also establish a new expectation identity under a generalized observation model where the Gaussian channel has an SNR jump at $t=0$, capturing the tradeoff between lookahead and SNR. Further, we study the class of continuous-time stationary Gauss-Markov processes (Ornstein-Uhlenbeck processes) as channel inputs, and explicitly characterize the behavior of the minimum mean squared error (MMSE) with finite lookahead and signal-to-noise ratio (SNR). The MMSE with lookahead is shown to converge exponentially rapidly to the non-causal error, with the exponent being the reciprocal of the non-causal error. We extend our results to mixtures of Ornstein-Uhlenbeck processes, and use the insight gained to present lower and upper bounds on the MMSE with lookahead for a class of stationary Gaussian input processes, whose spectrum can be expressed as a mixture of Ornstein-Uhlenbeck spectra.

preprint2013arXiv

Network Compression: Worst-Case Analysis

We study the problem of communicating a distributed correlated memoryless source over a memoryless network, from source nodes to destination nodes, under quadratic distortion constraints. We establish the following two complementary results: (a) for an arbitrary memoryless network, among all distributed memoryless sources of a given correlation, Gaussian sources are least compressible, that is, they admit the smallest set of achievable distortion tuples, and (b) for any memoryless source to be communicated over a memoryless additive-noise network, among all noise processes of a given correlation, Gaussian noise admits the smallest achievable set of distortion tuples. We establish these results constructively by showing how schemes for the corresponding Gaussian problems can be applied to achieve similar performance for (source or noise) distributions that are not necessarily Gaussian but have the same covariance.

preprint2013arXiv

Secure Source Coding with a Public Helper

We consider secure multi-terminal source coding problems in the presence of a public helper. Two main scenarios are studied: 1) source coding with a helper where the coded side information from the helper is eavesdropped by an external eavesdropper; 2) triangular source coding with a helper where the helper is considered as a public terminal. We are interested in how the helper can support the source transmission subject to a constraint on the amount of information leaked due to its public nature. We characterize the tradeoff between transmission rate, incurred distortion, and information leakage rate at the helper/eavesdropper in the form of a rate-distortion-leakage region for various classes of problems.

preprint2013arXiv

Universal Estimation of Directed Information

Four estimators of the directed information rate between a pair of jointly stationary ergodic finite-alphabet processes are proposed, based on universal probability assignments. The first one is a Shannon--McMillan--Breiman type estimator, similar to those used by Verdú (2005) and Cai, Kulkarni, and Verdú (2006) for estimation of other information measures. We show the almost sure and $L_1$ convergence properties of the estimator for any underlying universal probability assignment. The other three estimators map universal probability assignments to different functionals, each exhibiting relative merits such as smoothness, nonnegativity, and boundedness. We establish the consistency of these estimators in almost sure and $L_1$ senses, and derive near-optimal rates of convergence in the minimax sense under mild conditions. These estimators carry over directly to estimating other information measures of stationary ergodic finite-alphabet processes, such as entropy rate and mutual information rate, with near-optimal performance and provide alternatives to classical approaches in the existing literature. Guided by these theoretical results, the proposed estimators are implemented using the context-tree weighting algorithm as the universal probability assignment. Experiments on synthetic and real data are presented, demonstrating the potential of the proposed schemes in practice and the utility of directed information estimation in detecting and measuring causal influence and delay.

preprint2012arXiv

Achievable Error Exponents in the Gaussian Channel with Rate-Limited Feedback

We investigate the achievable error probability in communication over an AWGN discrete time memoryless channel with noiseless delay-less rate-limited feedback. For the case where the feedback rate R_FB is lower than the data rate R transmitted over the forward channel, we show that the decay of the probability of error is at most exponential in blocklength, and obtain an upper bound for increase in the error exponent due to feedback. Furthermore, we show that the use of feedback in this case results in an error exponent that is at least RF B higher than the error exponent in the absence of feedback. For the case where the feedback rate exceeds the forward rate (R_FB \geq R), we propose a simple iterative scheme that achieves a probability of error that decays doubly exponentially with the codeword blocklength n. More generally, for some positive integer L, we show that a L-th order exponential error decay is achievable if R_FB \geq (L-1)R. We prove that the above results hold whether the feedback constraint is expressed in terms of the average feedback rate or per channel use feedback rate. Our results show that the error exponent as a function of R_FB has a strong discontinuity at R, where it jumps from a finite value to infinity.

preprint2012arXiv

Compression with Actions

We consider the setting where actions can be used to modify a state sequence before compression. The minimum rate needed to losslessly describe the optimal modified sequence is characterized when the state sequence is either non-causally or causally available at the action encoder. The achievability is closely related to the optimal channel coding strategy for channel with states. We also extend the analysis to the the lossy case.

preprint2012arXiv

Directed Information, Causal Estimation, and Communication in Continuous Time

A notion of directed information between two continuous-time processes is proposed. A key component in the definition is taking an infimum over all possible partitions of the time interval, which plays a role no less significant than the supremum over "space" partitions inherent in the definition of mutual information. Properties and operational interpretations in estimation and communication are then established for the proposed notion of directed information. For the continuous-time additive white Gaussian noise channel, it is shown that Duncan's classical relationship between causal estimation and information continues to hold in the presence of feedback upon replacing mutual information by directed information. A parallel result is established for the Poisson channel. The utility of this relationship is then demonstrated in computing the directed information rate between the input and output processes of a continuous-time Poisson channel with feedback, where the channel input process is constrained to be constant between events at the channel output. Finally, the capacity of a wide class of continuous-time channels with feedback is established via directed information, characterizing the fundamental limit on reliable communication.

preprint2012arXiv

Estimation with a helper who knows the interference

We consider the problem of estimating a signal corrupted by independent interference with the assistance of a cost-constrained helper who knows the interference causally or noncausally. When the interference is known causally, we characterize the minimum distortion incurred in estimating the desired signal. In the noncausal case, we present a general achievable scheme for discrete memoryless systems and novel lower bounds on the distortion for the binary and Gaussian settings. Our Gaussian setting coincides with that of assisted interference suppression introduced by Grover and Sahai. Our lower bound for this setting is based on the relation recently established by Verdú between divergence and minimum mean squared error. We illustrate with a few examples that this lower bound can improve on those previously developed. Our bounds also allow us to characterize the optimal distortion in several interesting regimes. Moreover, we show that causal and noncausal estimation are not equivalent for this problem. Finally, we consider the case where the desired signal is also available at the helper. We develop new lower bounds for this setting that improve on those previously developed, and characterize the optimal distortion up to a constant multiplicative factor for some regimes of interest.

preprint2012arXiv

Lossy Compression of Quality Values via Rate Distortion Theory

Motivation: Next Generation Sequencing technologies revolutionized many fields in biology by enabling the fast and cheap sequencing of large amounts of genomic data. The ever increasing sequencing capacities enabled by current sequencing machines hold a lot of promise as for the future applications of these technologies, but also create increasing computational challenges related to the analysis and storage of these data. A typical sequencing data file may occupy tens or even hundreds of gigabytes of disk space, prohibitively large for many users. Raw sequencing data consists of both the DNA sequences (reads) and per-base quality values that indicate the level of confidence in the readout of these sequences. Quality values account for about half of the required disk space in the commonly used FASTQ format and therefore their compression can significantly reduce storage requirements and speed up analysis and transmission of these data. Results: In this paper we present a framework for the lossy compression of the quality value sequences of genomic read files. Numerical experiments with reference based alignment using these quality values suggest that we can achieve significant compression with little compromise in performance for several downstream applications of interest, as is consistent with our theoretical analysis. Our framework also allows compression in a regime - below one bit per quality value - for which there are no existing compressors.

preprint2012arXiv

Multiterminal Source Coding under Logarithmic Loss

We consider the classical two-encoder multiterminal source coding problem where distortion is measured under logarithmic loss. We provide a single-letter characterization of the achievable rate distortion region for arbitrarily correlated sources with finite alphabets. In doing so, we also give the rate distortion region for the $m$-encoder CEO problem (also under logarithmic loss). Several applications and examples are given.

preprint2012arXiv

Pointwise Relations between Information and Estimation in Gaussian Noise

Many of the classical and recent relations between information and estimation in the presence of Gaussian noise can be viewed as identities between expectations of random quantities. These include the I-MMSE relationship of Guo et al.; the relative entropy and mismatched estimation relationship of Verdú; the relationship between causal estimation and mutual information of Duncan, and its extension to the presence of feedback by Kadota et al.; the relationship between causal and non-casual estimation of Guo et al., and its mismatched version of Weissman. We dispense with the expectations and explore the nature of the pointwise relations between the respective random quantities. The pointwise relations that we find are as succinctly stated as - and give considerable insight into - the original expectation identities. As an illustration of our results, consider Duncan's 1970 discovery that the mutual information is equal to the causal MMSE in the AWGN channel, which can equivalently be expressed saying that the difference between the input-output information density and half the causal estimation error is a zero mean random variable (regardless of the distribution of the channel input). We characterize this random variable explicitly, rather than merely its expectation. Classical estimation and information theoretic quantities emerge with new and surprising roles. For example, the variance of this random variable turns out to be given by the causal MMSE (which, in turn, is equal to the mutual information by Duncan's result).

preprint2012arXiv

Reference Based Genome Compression

DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known reference genome. The proposed algorithm first generates a mapping from the reference to the target genome, and then compresses this mapping with an entropy coder. As an illustration of the performance: applying our algorithm to James Watson's genome with hg18 as a reference, we are able to reduce the 2991 megabyte (MB) genome down to 6.99 MB, while Gzip compresses it to 834.8 MB.

preprint2012arXiv

Successive Refinement with Decoder Cooperation and its Channel Coding Duals

We study cooperation in multi terminal source coding models involving successive refinement. Specifically, we study the case of a single encoder and two decoders, where the encoder provides a common description to both the decoders and a private description to only one of the decoders. The decoders cooperate via cribbing, i.e., the decoder with access only to the common description is allowed to observe, in addition, a deterministic function of the reconstruction symbols produced by the other. We characterize the fundamental performance limits in the respective settings of non-causal, strictly-causal and causal cribbing. We use a new coding scheme, referred to as Forward Encoding and Block Markov Decoding, which is a variant of one recently used by Cuff and Zhao for coordination via implicit communication. Finally, we use the insight gained to introduce and solve some dual channel coding scenarios involving Multiple Access Channels with cribbing.

preprint2012arXiv

The Porosity of Additive Noise Sequences

Consider a binary additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process $\mathbf{Z}$, the capacity is $1-\mathbb{H}(\mathbf{Z})$ ($\mathbb{H}(\cdot)$ denoting the entropy rate). It is shown analogously that when the noise is a deterministic sequence $z^\infty$, the capacity under finite-state encoding and decoding is $1-\barρ(z^\infty)$, where $\barρ(\cdot)$ is Lempel and Ziv's finite-state compressibility. This quantity is termed the \emph{porosity} $\underlineσ(\cdot)$ of an individual noise sequence. A sequence of schemes are presented that universally achieve porosity for any noise sequence. These converse and achievability results may be interpreted both as a channel-coding counterpart to Ziv and Lempel's work in universal source coding, as well as an extension to the work by Lomnitz and Feder and Shayevitz and Feder on communication across modulo-additive channels. Additionally, a slightly more practical architecture is suggested that draws a connection with finite-state predictability, as introduced by Feder, Gutman, and Merhav.

preprint2012arXiv

To Feed or Not to Feed Back

We study the communication over Finite State Channels (FSCs), where the encoder and the decoder can control the availability or the quality of the noise-free feedback. Specifically, the instantaneous feedback is a function of an action taken by the encoder, an action taken by the decoder, and the channel output. Encoder and decoder actions take values in finite alphabets, and may be subject to average cost constraints. We prove capacity results for such a setting by constructing a sequence of achievable rates, using a simple scheme based on 'code tree' generation, that generates channel input symbols along with encoder and decoder actions. We prove that the limit of this sequence exists. For a given block length and probability of error, we give an upper bound on the maximum achievable rate. Our upper and lower bounds coincide and hence yield the capacity for the case where the probability of initial state is positive for all states. Further, for stationary indecomposable channels without intersymbol interference (ISI), the capacity is given as the limit of normalized directed information between the input and output sequence, maximized over an appropriate set of causally conditioned distributions. As an important special case, we consider the framework of 'to feed or not to feed back' where either the encoder or the decoder takes binary actions, which determine whether current channel output will be fed back to the encoder, with a constraint on the fraction of channel outputs that are fed back. As another special case of our framework, we characterize the capacity of 'coding on the backward link' in FSCs, i.e. when the decoder sends limited-rate instantaneous coded noise-free feedback on the backward link. Finally, we propose an extension of the Blahut-Arimoto algorithm for evaluating the capacity when actions can be cost constrained, and demonstrate its application on a few examples.

preprint2012arXiv

Worst-Case Source for Distributed Compression with Quadratic Distortion

We consider the k-encoder source coding problem with a quadratic distortion measure. We show that among all source distributions with a given covariance matrix K, the jointly Gaussian source requires the highest rates in order to meet a given set of distortion constraints.

preprint2011arXiv

An MCMC Approach to Universal Lossy Compression of Analog Sources

Motivated by the Markov chain Monte Carlo (MCMC) approach to the compression of discrete sources developed by Jalali and Weissman, we propose a lossy compression algorithm for analog sources that relies on a finite reproduction alphabet, which grows with the input length. The algorithm achieves, in an appropriate asymptotic sense, the optimum Shannon theoretic tradeoff between rate and distortion, universally for stationary ergodic continuous amplitude sources. We further propose an MCMC-based algorithm that resorts to a reduced reproduction alphabet when such reduction does not prevent achieving the Shannon limit. The latter algorithm is advantageous due to its reduced complexity and improved rates of convergence when employed on sources with a finite and small optimum reproduction alphabet.

preprint2011arXiv

Multi-Terminal Source Coding With Action Dependent Side Information

We consider multi-terminal source coding with a single encoder and multiple decoders where either the encoder or the decoders can take cost constrained actions which affect the quality of the side information present at the decoders. For the scenario where decoders take actions, we characterize the rate-cost trade-off region for lossless source coding, and give an achievability scheme for lossy source coding for two decoders which is optimum for a variety of special cases of interest. For the case where the encoder takes actions, we characterize the rate-cost trade-off for a class of lossless source coding scenarios with multiple decoders. Finally, we also consider extensions to other multi-terminal source coding settings with actions, and characterize the rate -distortion-cost tradeoff for a case of successive refinement with actions.

preprint2011arXiv

On Real Time Coding with Limited Lookahead

A real time coding system with lookahead consists of a memoryless source, a memoryless channel, an encoder, which encodes the source symbols sequentially with knowledge of future source symbols upto a fixed finite lookahead, d, with or without feedback of the past channel output symbols and a decoder, which sequentially constructs the source symbols using the channel output. The objective is to minimize the expected per-symbol distortion. For a fixed finite lookahead d>=1 we invoke the theory of controlled markov chains to obtain an average cost optimality equation (ACOE), the solution of which, denoted by D(d), is the minimum expected per-symbol distortion. With increasing d, D(d) bridges the gap between causal encoding, d=0, where symbol by symbol encoding-decoding is optimal and the infinite lookahead case, d=\infty, where Shannon Theoretic arguments show that separation is optimal. We extend the analysis to a system with finite state decoders, with or without noise-free feedback. For a Bernoulli source and binary symmetric channel, under hamming loss, we compute the optimal distortion for various source and channel parameters, and thus obtain computable bounds on D(d). We also identify regions of source and channel parameters where symbol by symbol encoding-decoding is suboptimal. Finally, we demonstrate the wide applicability of our approach by applying it in additional coding scenarios, such as the case where the sequential decoder can take cost constrained actions affecting the quality or availability of side information about the source.

preprint2010arXiv

Cascade and Triangular Source Coding with Side Information at the First Two Nodes

We consider the cascade and triangular rate-distortion problem where side information is known to the source encoder and to the first user but not to the second user. We characterize the rate-distortion region for these problems. For the quadratic Gaussian case, we show that it is sufficient to consider jointly Gaussian distributions, a fact that leads to an explicit solution.

preprint2010arXiv

Cascade, Triangular and Two Way Source Coding with degraded side information at the second user

We consider the Cascade and Triangular rate-distortion problems where the same side information is available at the source node and User 1, and the side information available at User 2 is a degraded version of the side information at the source node and User 1. We characterize the rate-distortion region for these problems. For the Cascade setup, we showed that, at User 1, decoding and re-binning the codeword sent by the source node for User 2 is optimum. We then extend our results to the Two way Cascade and Triangular setting, where the source node is interested in lossy reconstruction of the side information at User 2 via a rate limited link from User 2 to the source node. We characterize the rate distortion regions for these settings. Complete explicit characterizations for all settings are also given in the Quadratic Gaussian case. We conclude with two further extensions: A triangular source coding problem with a helper, and an extension of our Two Way Cascade setting in the Quadratic Gaussian case.

preprint2010arXiv

Discrete denoising of heterogenous two-dimensional data

We consider discrete denoising of two-dimensional data with characteristics that may be varying abruptly between regions. Using a quadtree decomposition technique and space-filling curves, we extend the recently developed S-DUDE (Shifting Discrete Universal DEnoiser), which was tailored to one-dimensional data, to the two-dimensional case. Our scheme competes with a genie that has access, in addition to the noisy data, also to the underlying noiseless data, and can employ $m$ different two-dimensional sliding window denoisers along $m$ distinct regions obtained by a quadtree decomposition with $m$ leaves, in a way that minimizes the overall loss. We show that, regardless of what the underlying noiseless data may be, the two-dimensional S-DUDE performs essentially as well as this genie, provided that the number of distinct regions satisfies $m=o(n)$, where $n$ is the total size of the data. The resulting algorithm complexity is still linear in both $n$ and $m$, as in the one-dimensional case. Our experimental results show that the two-dimensional S-DUDE can be effective when the characteristics of the underlying clean image vary across different regions in the data.

preprint2010arXiv

Lossy compression of discrete sources via Viterbi algorithm

We present a new lossy compressor for discrete-valued sources. For coding a sequence $x^n$, the encoder starts by assigning a certain cost to each possible reconstruction sequence. It then finds the one that minimizes this cost and describes it losslessly to the decoder via a universal lossless compressor. The cost of each sequence is a linear combination of its distance from the sequence $x^n$ and a linear function of its $k^{\rm th}$ order empirical distribution. The structure of the cost function allows the encoder to employ the Viterbi algorithm to recover the minimizer of the cost. We identify a choice of the coefficients comprising the linear function of the empirical distribution used in the cost function which ensures that the algorithm universally achieves the optimum rate-distortion performance of any stationary ergodic source in the limit of large $n$, provided that $k$ diverges as $o(\log n)$. Iterative techniques for approximating the coefficients, which alleviate the computational burden of finding the optimal coefficients, are proposed and studied.

preprint2010arXiv

Mutual Information, Relative Entropy, and Estimation in the Poisson Channel

Let $X$ be a non-negative random variable and let the conditional distribution of a random variable $Y$, given $X$, be ${Poisson}(γ\cdot X)$, for a parameter $γ\geq 0$. We identify a natural loss function such that: 1) The derivative of the mutual information between $X$ and $Y$ with respect to $γ$ is equal to the \emph{minimum} mean loss in estimating $X$ based on $Y$, regardless of the distribution of $X$. 2) When $X \sim P$ is estimated based on $Y$ by a mismatched estimator that would have minimized the expected loss had $X \sim Q$, the integral over all values of $γ$ of the excess mean loss is equal to the relative entropy between $P$ and $Q$. For a continuous time setting where $X^T = \{X_t, 0 \leq t \leq T \}$ is a non-negative stochastic process and the conditional law of $Y^T=\{Y_t, 0\le t\le T\}$, given $X^T$, is that of a non-homogeneous Poisson process with intensity function $γ\cdot X^T$, under the same loss function: 1) The minimum mean loss in \emph{causal} filtering when $γ= γ_0$ is equal to the expected value of the minimum mean loss in \emph{non-causal} filtering (smoothing) achieved with a channel whose parameter $γ$ is uniformly distributed between 0 and $γ_0$. Bridging the two quantities is the mutual information between $X^T$ and $Y^T$. 2) This relationship between the mean losses in causal and non-causal filtering holds also in the case where the filters employed are mismatched, i.e., optimized assuming a law on $X^T$ which is not the true one. Bridging the two quantities in this case is the sum of the mutual information and the relative entropy between the true and the mismatched distribution of $Y^T$. Thus, relative entropy quantifies the excess estimation loss due to mismatch in this setting. These results parallel those recently found for the Gaussian channel.

preprint2010arXiv

Probing Capacity

We consider the problem of optimal probing of states of a channel by transmitter and receiver for maximizing rate of reliable communication. The channel is discrete memoryless (DMC) with i.i.d. states. The encoder takes probing actions dependent on the message. It then uses the state information obtained from probing causally or non-causally to generate channel input symbols. The decoder may also take channel probing actions as a function of the observed channel output and use the channel state information thus acquired, along with the channel output, to estimate the message. We refer to the maximum achievable rate for reliable communication for such systems as the 'Probing Capacity'. We characterize this capacity when the encoder and decoder actions are cost constrained. To motivate the problem, we begin by characterizing the trade-off between the capacity and fraction of channel states the encoder is allowed to observe, while the decoder is aware of channel states. In this setting of 'to observe or not to observe' state at the encoder, we compute certain numerical examples and note a pleasing phenomenon, where encoder can observe a relatively small fraction of states and yet communicate at maximum rate, i.e. rate when observing states at encoder is not cost constrained.

preprint2010arXiv

Rate-Distortion via Markov Chain Monte Carlo

We propose an approach to lossy source coding, utilizing ideas from Gibbs sampling, simulated annealing, and Markov Chain Monte Carlo (MCMC). The idea is to sample a reconstruction sequence from a Boltzmann distribution associated with an energy function that incorporates the distortion between the source and reconstruction, the compressibility of the reconstruction, and the point sought on the rate-distortion curve. To sample from this distribution, we use a `heat bath algorithm': Starting from an initial candidate reconstruction (say the original source sequence), at every iteration, an index i is chosen and the i-th sequence component is replaced by drawing from the conditional probability distribution for that component given all the rest. At the end of this process, the encoder conveys the reconstruction to the decoder using universal lossless compression. The complexity of each iteration is independent of the sequence length and only linearly dependent on a certain context parameter (which grows sub-logarithmically with the sequence length). We show that the proposed algorithms achieve optimum rate-distortion performance in the limits of large number of iterations, and sequence length, when employed on any stationary ergodic source. Experimentation shows promising initial results. Employing our lossy compressors on noisy data, with appropriately chosen distortion measure and level, followed by a simple de-randomization operation, results in a family of denoisers that compares favorably (both theoretically and in practice) with other MCMC-based schemes, and with the Discrete Universal Denoiser (DUDE).

preprint2007arXiv

Discrete Denoising with Shifts

We introduce S-DUDE, a new algorithm for denoising DMC-corrupted data. The algorithm, which generalizes the recently introduced DUDE (Discrete Universal DEnoiser) of Weissman et al., aims to compete with a genie that has access, in addition to the noisy data, also to the underlying clean data, and can choose to switch, up to $m$ times, between sliding window denoisers in a way that minimizes the overall loss. When the underlying data form an individual sequence, we show that the S-DUDE performs essentially as well as this genie, provided that $m$ is sub-linear in the size of the data. When the clean data is emitted by a piecewise stationary process, we show that the S-DUDE achieves the optimum distribution-dependent performance, provided that the same sub-linearity condition is imposed on the number of switches. To further substantiate the universal optimality of the S-DUDE, we show that when the number of switches is allowed to grow linearly with the size of the data, \emph{any} (sequence of) scheme(s) fails to compete in the above senses. Using dynamic programming, we derive an efficient implementation of the S-DUDE, which has complexity (time and memory) growing only linearly with the data size and the number of switches $m$. Preliminary experimental results are presented, suggesting that S-DUDE has the capacity to significantly improve on the performance attained by the original DUDE in applications where the nature of the data abruptly changes in time (or space), as is often the case in practice.

Tsachy Weissman

What is connected

Connect this record

See the researcher in context

Building this map preview

48 published item(s)

An Information-Theoretic Perspective on LLM Tokenizers

An Information-Theoretic Justification for Model Pruning

An Interactive Annotation Tool for Perceptual Video Compression

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

Reducing latency and bandwidth for video streaming using keypoint extraction and digital puppetry

LFZip: Lossy compression of multivariate floating-point time series data via improved prediction

Rateless Lossy Compression via the Extremes

Strong Successive Refinability and Rate-Distortion-Complexity Tradeoff

When is Noisy State Information at the Encoder as Useless as No Information or as Good as Noise-Free State?

Distortion-Rate Function of Sub-Nyquist Sampled Gaussian Sources

Justification of Logarithmic Loss via the Benefit of Side Information

Minimax Estimation of Discrete Distributions under $\ell_1$ Loss

Minimax Estimation of Functionals of Discrete Distributions

Beyond Maximum Likelihood: from Theory to Practice

Comparison of the Achievable Rates in OFDM and Single Carrier Modulation with I.I.D. Inputs

Compression for Quadratic Similarity Queries: Finite Blocklength and Practical Schemes

Information Measures: the Curious Case of the Binary Alphabet

Minimax Filtering via Relations between Information and Estimation

Capacity of a POST Channel with and without Feedback

Compression for Quadratic Similarity Queries

Information, Estimation, and Lookahead in the Gaussian channel

Network Compression: Worst-Case Analysis

Secure Source Coding with a Public Helper

Universal Estimation of Directed Information

Achievable Error Exponents in the Gaussian Channel with Rate-Limited Feedback

Compression with Actions

Directed Information, Causal Estimation, and Communication in Continuous Time

Estimation with a helper who knows the interference

Lossy Compression of Quality Values via Rate Distortion Theory

Multiterminal Source Coding under Logarithmic Loss

Pointwise Relations between Information and Estimation in Gaussian Noise

Reference Based Genome Compression

Successive Refinement with Decoder Cooperation and its Channel Coding Duals

The Porosity of Additive Noise Sequences

To Feed or Not to Feed Back

Worst-Case Source for Distributed Compression with Quadratic Distortion

An MCMC Approach to Universal Lossy Compression of Analog Sources

Multi-Terminal Source Coding With Action Dependent Side Information

On Real Time Coding with Limited Lookahead

Cascade and Triangular Source Coding with Side Information at the First Two Nodes

Cascade, Triangular and Two Way Source Coding with degraded side information at the second user

Discrete denoising of heterogenous two-dimensional data

Lossy compression of discrete sources via Viterbi algorithm

Mutual Information, Relative Entropy, and Estimation in the Poisson Channel

Probing Capacity

Rate-Distortion via Markov Chain Monte Carlo

Discrete Denoising with Shifts