Source author record

Nir Weinberger

Nir Weinberger appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Computer Vision Data Structures and Algorithms math.OC

Catalog footprint

What is connected

14works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Learning-Augmented Scalable Linear Assignment Problem Optimization via Neural Dual Warm-Starts

The Linear Assignment Problem (LAP) is a fundamental combinatorial optimization task with applications ranging from computer vision to logistics. Classical exact solvers such as the Hungarian and Jonker-Volgenant (LAPJV) algorithms guarantee optimality, but their cubic time complexity $\mathcal{O}(N^{3})$ becomes a bottleneck for large-scale instances. Recent learning-based approaches aim to replace these solvers with neural models, often sacrificing exactness or failing to scale due to memory constraints. We propose a learning-augmented framework that accelerates exact assignment solvers while maintaining optimality and worst-case guarantees. Our method predicts dual variables to warm-start a classical solver, with a fallback that prevents asymptotic runtime degradation when the learned advice is unreliable. We introduce RowDualNet, a lightweight row-independent architecture that avoids the $\mathcal{O}(N^{2})$ memory bottleneck of graph-based models, enabling neural warm-starting at large scale ($N=16{,}384$). Feasibility is ensured via a constructive mechanism based on LP duality (namely, the Min-Trick), eliminating costly iterative projection. Empirically, our approach reduces the search effort of LAPJV and achieves over $2{\times}$ speedups on challenging synthetic distributions, in addition to improving over $1.25{\times}$ and $1.5{\times}$ on real-world tracking (MOT) and transportation (LPT) datasets, respectively, while strictly maintaining full optimality, effectively yielding a robust zero-shot generalization to real-world tasks.

preprint2026arXiv

On the Capacity of Noisy Frequency-based Channels

We investigate the capacity of noisy frequency-based channels, motivated by DNA data storage in the short-molecule regime, where information is encoded in the frequency of items types rather than their order. The channel output is a histogram formed by random sampling of items, followed by noisy item identification. While the capacity of the noiseless frequency-based channel has been previously addressed, the effect of identification noise has not been fully characterized. We present a converse bound on the channel capacity that follows from stochastic degradation and the data processing inequality. We then establish an achievable bound, which is based on a Poissonization of the multinomial sampling process, and an analysis of the resulting vector Poisson channel with inter-symbol interference. This analysis refines concentration inequalities for the information density used in Feinstein bound, and explicitly characterizes an additive loss in the mutual information due to identification noise. We apply our results to a DNA storage channel in the short-molecule regime, and quantify the resulting loss in the scaling of the total number of reliably stored bits.

preprint2022arXiv

Error Probability Bounds for Coded-Index DNA Storage

The DNA storage channel is considered, in which a codeword is comprised of $M$ unordered DNA molecules. At reading time, $N$ molecules are sampled with replacement, and then each molecule is sequenced. A coded-index concatenated-coding scheme is considered, in which the $m$th molecule of the codeword is restricted to a subset of all possible molecules (an inner code), which is unique for each $m$. The decoder has low-complexity, and is based on first decoding each molecule separately (the inner code), and then decoding the sequence of molecules (an outer code). Only mild assumptions are made on the sequencing channel, in the form of the existence of an inner code and decoder with vanishing error. The error probability of a random code as well as an expurgated code is analyzed and shown to decay exponentially with $N$. This establishes the importance of increasing the coverage depth $N/M$ in order to obtain low error probability.

preprint2022arXiv

On Information Bottleneck for Gaussian Processes

The information bottleneck problem (IB) of jointly stationary Gaussian sources is considered. A water-filling solution for the IB rate is given in terms of its SNR spectrum and whose rate is attained via frequency domain test-channel realization. A time-domain realization of the IB rate, based on linear prediction, is also proposed, which lends itself to an efficient implementation of the corresponding remote source-coding problem. A compound version of the problem is addressed, in which the joint distribution of the source is not precisely specified but rather in terms of a lower bound on the guaranteed mutual information. It is proved that a white SNR spectrum is optimal for this setting.

preprint2022arXiv

Robust Linear Regression for General Feature Distribution

We investigate robust linear regression where data may be contaminated by an oblivious adversary, i.e., an adversary than may know the data distribution but is otherwise oblivious to the realizations of the data samples. This model has been previously analyzed under strong assumptions. Concretely, $\textbf{(i)}$ all previous works assume that the covariance matrix of the features is positive definite; and $\textbf{(ii)}$ most of them assume that the features are centered (i.e. zero mean). Additionally, all previous works make additional restrictive assumption, e.g., assuming that the features are Gaussian or that the corruptions are symmetrically distributed. In this work we go beyond these assumptions and investigate robust regression under a more general set of assumptions: $\textbf{(i)}$ we allow the covariance matrix to be either positive definite or positive semi definite, $\textbf{(ii)}$ we do not necessarily assume that the features are centered, $\textbf{(iii)}$ we make no further assumption beyond boundedness (sub-Gaussianity) of features and measurement noise. Under these assumption we analyze a natural SGD variant for this problem and show that it enjoys a fast convergence rate when the covariance matrix is positive definite. In the positive semi definite case we show that there are two regimes: if the features are centered we can obtain a standard convergence rate; otherwise the adversary can cause any learner to fail arbitrarily.

preprint2022arXiv

The Compound Information Bottleneck Outlook

We formulate and analyze the compound information bottleneck programming. In this problem, a Markov chain $ \mathsf{X} \rightarrow \mathsf{Y} \rightarrow \mathsf{Z} $ is assumed with fixed marginal distributions $\mathsf{P}_{\mathsf{X}}$ and $\mathsf{P}_{\mathsf{Y}}$, and the mutual information between $ \mathsf{X} $ and $ \mathsf{Z} $ is sought to be maximized over the choice of conditional probability of $\mathsf{Z}$ given $\mathsf{Y}$ from a given class, under the \textit{worst choice} of the joint probability of the pair $(\mathsf{X},\mathsf{Y})$ from a different class. We consider several classes based on extremes of: mutual information; minimal correlation; total variation; and the relative entropy class. We provide values, bounds, and various characterizations for specific instances of this problem: the binary symmetric case, the scalar Gaussian case, the vector Gaussian case and the symmetric modulo-additive case. Finally, for the general case, we propose a Blahut-Arimoto type of alternating iterations algorithm to find a consistent solution to this problem.

preprint2022arXiv

The DNA Storage Channel: Capacity and Error Probability

The DNA storage channel is considered, in which the $M$ Deoxyribonucleic acid (DNA) molecules comprising each codeword are stored without order, sampled $N$ times with replacement, and then sequenced over a discrete memoryless channel. For a constant coverage depth $M/N$ and molecule length scaling $Θ(\log M)$, lower (achievability) and upper (converse) bounds on the capacity of the channel, as well as a lower (achievability) bound on the reliability function of the channel are provided. Both the lower and upper bounds on the capacity generalize a bound which was previously known to hold only for the binary symmetric sequencing channel, and only under certain restrictions on the molecule length scaling and the crossover probability parameters. When specified to binary symmetric sequencing channel, these restrictions are completely removed for the lower bound and are significantly relaxed for the upper bound in the high-noise regime. The lower bound on the reliability function is achieved under a universal decoder, and reveals that the dominant error event is that of outage -- the event in which the capacity of the channel induced by the DNA molecule sampling operation does not support the target rate.

preprint2018arXiv

Guessing with a Bit of Help

What is the value of a single bit to a guesser? We study this problem in a setup where Alice wishes to guess an i.i.d. random vector, and can procure one bit of information from Bob, who observes this vector through a memoryless channel. We are interested in the guessing efficiency, which we define as the best possible multiplicative reduction in Alice's guessing-moments obtainable by observing Bob's bit. For the case of a uniform binary vector observed through a binary symmetric channel, we provide two lower bounds on the guessing efficiency by analyzing the performance of the Dictator and Majority functions, and two upper bounds via maximum entropy and Fourier-analytic / hypercontractivity arguments. We then extend our maximum entropy argument to give a lower bound on the guessing efficiency for a general channel with a binary uniform input, via the strong data-processing inequality constant of the reverse channel. We compute this bound for the binary erasure channel, and conjecture that Greedy Dictator functions achieve the guessing efficiency.

preprint2016arXiv

Lower Bounds on Parameter Modulation-Estimation Under Bandwidth Constraints

We consider the problem of modulating the value of a parameter onto a band-limited signal to be transmitted over a continuous-time, additive white Gaussian noise (AWGN) channel, and estimating this parameter at the receiver. The performance is measured by the mean power-$α$ error (MP$α$E), which is defined as the worst-case $α$-th order moment of the absolute estimation error. The optimal exponential decay rate of the MP$α$E as a function of the transmission time, is investigated. Two upper (converse) bounds on the MP$α$E exponent are derived, on the basis of known bounds for the AWGN channel of inputs with unlimited bandwidth. The bounds are computed for typical values of the error moment and the signal-to-noise ratio (SNR), and the SNR asymptotics of the different bounds are analyzed. The new bounds are compared to known converse and achievability bounds, which were derived from channel coding considerations.

preprint2016arXiv

On the Optimal Boolean Function for Prediction under Quadratic Loss

Suppose $Y^{n}$ is obtained by observing a uniform Bernoulli random vector $X^{n}$ through a binary symmetric channel. Courtade and Kumar asked how large the mutual information between $Y^{n}$ and a Boolean function $\mathsf{b}(X^{n})$ could be, and conjectured that the maximum is attained by a dictator function. An equivalent formulation of this conjecture is that dictator minimizes the prediction cost in a sequential prediction of $Y^{n}$ under logarithmic loss, given $\mathsf{b}(X^{n})$. In this paper, we study the question of minimizing the sequential prediction cost under a different (proper) loss function - the quadratic loss. In the noiseless case, we show that majority asymptotically minimizes this prediction cost among all Boolean functions. We further show that for weak noise, majority is better than dictator, and that for strong noise dictator outperforms majority. We conjecture that for quadratic loss, there is no single sequence of Boolean functions that is simultaneously (asymptotically) optimal at all noise levels.

preprint2015arXiv

A Large Deviations Approach to Secure Lossy Compression

We consider a Shannon cipher system for memoryless sources, in which distortion is allowed at the legitimate decoder. The source is compressed using a rate distortion code secured by a shared key, which satisfies a constraint on the compression rate, as well as a constraint on the exponential rate of the excess-distortion probability at the legitimate decoder. Secrecy is measured by the exponential rate of the exiguous-distortion probability at the eavesdropper, rather than by the traditional measure of equivocation. We define the perfect secrecy exponent as the maximal exiguous-distortion exponent achievable when the key rate is unlimited. Under limited key rate, we prove that the maximal achievable exiguous-distortion exponent is equal to the minimum between the average key rate and the perfect secrecy exponent, for a fairly general class of variable key rate codes.

preprint2015arXiv

Channel Detection in Coded Communication

We consider the problem of block-coded communication, where in each block, the channel law belongs to one of two disjoint sets. The decoder is aimed to decode only messages that have undergone a channel from one of the sets, and thus has to detect the set which contains the prevailing channel. We begin with the simplified case where each of the sets is a singleton. For any given code, we derive the optimum detection/decoding rule in the sense of the best trade-off among the probabilities of decoding error, false alarm, and misdetection, and also introduce sub-optimal detection/decoding rules which are simpler to implement. Then, various achievable bounds on the error exponents are derived, including the exact single-letter characterization of the random coding exponents for the optimal detector/decoder. We then extend the random coding analysis to general sets of channels, and show that there exists a universal detector/decoder which performs asymptotically as well as the optimal detector/decoder, when tuned to detect a channel from a specific pair of channels. The case of a pair of binary symmetric channels is discussed in detail.

preprint2014arXiv

Optimum Trade-offs Between the Error Exponent and the Excess-Rate Exponent of Variable-Rate Slepian-Wolf Coding

We analyze the optimal trade-off between the error exponent and the excess-rate exponent for variable-rate Slepian-Wolf codes. In particular, we first derive upper (converse) bounds on the optimal error and excess-rate exponents, and then lower (achievable) bounds, via a simple class of variable-rate codes which assign the same rate to all source blocks of the same type class. Then, using the exponent bounds, we derive bounds on the optimal rate functions, namely, the minimal rate assigned to each type class, needed in order to achieve a given target error exponent. The resulting excess-rate exponent is then evaluated. Iterative algorithms are provided for the computation of both bounds on the optimal rate functions and their excess-rate exponents. The resulting Slepian-Wolf codes bridge between the two extremes of fixed-rate coding, which has minimal error exponent and maximal excess-rate exponent, and average-rate coding, which has maximal error exponent and minimal excess-rate exponent.

preprint2014arXiv

Simplified Erasure/List Decoding

We consider the problem of erasure/list decoding using certain classes of simplified decoders. Specifically, we assume a class of erasure/list decoders, such that a codeword is in the list if its likelihood is larger than a threshold. This class of decoders both approximates the optimal decoder of Forney, and also includes the following simplified subclasses of decoding rules: The first is a function of the output vector only, but not the codebook (which is most suitable for high rates), and the second is a scaled version of the maximum likelihood decoder (which is most suitable for low rates). We provide single-letter expressions for the exact random coding exponents of any decoder in these classes, operating over a discrete memoryless channel. For each class of decoders, we find the optimal decoder within the class, in the sense that it maximizes the erasure/list exponent, under a given constraint on the error exponent. We establish the optimality of the simplified decoders of the first and second kind for low and high rates, respectively.

Nir Weinberger

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Learning-Augmented Scalable Linear Assignment Problem Optimization via Neural Dual Warm-Starts

On the Capacity of Noisy Frequency-based Channels

Error Probability Bounds for Coded-Index DNA Storage

On Information Bottleneck for Gaussian Processes

Robust Linear Regression for General Feature Distribution

The Compound Information Bottleneck Outlook

The DNA Storage Channel: Capacity and Error Probability

Guessing with a Bit of Help

Lower Bounds on Parameter Modulation-Estimation Under Bandwidth Constraints

On the Optimal Boolean Function for Prediction under Quadratic Loss

A Large Deviations Approach to Secure Lossy Compression

Channel Detection in Coded Communication

Optimum Trade-offs Between the Error Exponent and the Excess-Rate Exponent of Variable-Rate Slepian-Wolf Coding

Simplified Erasure/List Decoding