Researcher profile

Or Ordentlich

Or Ordentlich contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

High-Rate Quantized Matrix Multiplication II

This is the second part of the work investigating quantized matrix multiplication (MatMul). In part I we considered the case of calibration-free quantization, whereas here we discuss the setting where covariance matrix $Σ_X$ of the columns of the second factor is available. This setting arises in the ubiquitous task of weight-only post-training quantization of LLMs. Weight-only quantization is related to the problem of weighted mean squared error (WMSE) source coding, whose classical (reverse) waterfilling solution dictates how one should distribute rate between coordinates of the vector. We show how waterfilling can be used to improve practical LLM quantization algorithms (GPTQ), which at present allocate rate equally. A recent scheme (known as ``WaterSIC'') that only uses scalar INT quantizers is analyzed and its high-rate performance is shown to be (a) basis free (i.e., characterized by the determinant of $Σ_X$ and, thus, unlike existing schemes, is immune to applying random rotations); and (b) within a multiplicative factor of $\frac{2πe}{12}$ (or 0.25 bit/entry) of the information-theoretic distortion limit. GPTQ's performance, in turn, is affected by the choice of basis, but for a random rotation and actual $Σ_X$ from Llama-3-8B we find it to be within 0.1 bit (depending on the layer type) of WaterSIC, suggesting that GPTQ with random rotation is also near optimal, at least in the high-rate regime.

preprint2022arXiv

Deterministic Finite-Memory Bias Estimation

In this paper we consider the problem of estimating a Bernoulli parameter using finite memory. Let $X_1,X_2,\ldots$ be a sequence of independent identically distributed Bernoulli random variables with expectation $θ$, where $θ\in [0,1]$. Consider a finite-memory deterministic machine with $S$ states, that updates its state $M_n \in \{1,2,\ldots,S\}$ at each time according to the rule $M_n = f(M_{n-1},X_n)$, where $f$ is a deterministic time-invariant function. Assume that the machine outputs an estimate at each time point according to some fixed mapping from the state space to the unit interval. The quality of the estimation procedure is measured by the asymptotic risk, which is the long-term average of the instantaneous quadratic risk. The main contribution of this paper is an upper bound on the smallest worst-case asymptotic risk any such machine can attain. This bound coincides with a lower bound derived by Leighton and Rivest, to imply that $Θ(1/S)$ is the minimax asymptotic risk for deterministic $S$-state machines. In particular, our result disproves a longstanding $Θ(\log S/S)$ conjecture for this quantity, also posed by Leighton and Rivest.

preprint2022arXiv

On The Memory Complexity of Uniformity Testing

In this paper we consider the problem of uniformity testing with limited memory. We observe a sequence of independent identically distributed random variables drawn from a distribution $p$ over $[n]$, which is either uniform or is $\varepsilon$-far from uniform under the total variation distance, and our goal is to determine the correct hypothesis. At each time point we are allowed to update the state of a finite-memory machine with $S$ states, where each state of the machine is assigned one of the hypotheses, and we are interested in obtaining an asymptotic probability of error at most $0<δ<1/2$ uniformly under both hypotheses. The main contribution of this paper is deriving upper and lower bounds on the number of states $S$ needed in order to achieve a constant error probability $δ$, as a function of $n$ and $\varepsilon$, where our upper bound is $O(\frac{n\log n}{\varepsilon})$ and our lower bound is $Ω(n+\frac{1}{\varepsilon})$. Prior works in the field have almost exclusively used collision counting for upper bounds, and the Paninski mixture for lower bounds. Somewhat surprisingly, in the limited memory with unlimited samples setup, the optimal solution does not involve counting collisions, and the Paninski prior is not hard. Thus, different proof techniques are needed in order to attain our bounds.

preprint2022arXiv

Spiked Covariance Estimation from Modulo-Reduced Measurements

Consider the rank-1 spiked model: $\bf{X}=\sqrtνξ\bf{u}+ \bf{Z}$, where $ν$ is the spike intensity, $\bf{u}\in\mathbb{S}^{k-1}$ is an unknown direction and $ξ\sim \mathcal{N}(0,1),\bf{Z}\sim \mathcal{N}(\bf{0},\bf{I})$. Motivated by recent advances in analog-to-digital conversion, we study the problem of recovering $\bf{u}\in \mathbb{S}^{k-1}$ from $n$ i.i.d. modulo-reduced measurements $\bf{Y}=[\bf{X}]\mod Δ$, focusing on the high-dimensional regime ($k\gg 1$). We develop and analyze an algorithm that, for most directions $\bf{u}$ and $ν=\mathrm{poly}(k)$, estimates $\bf{u}$ to high accuracy using $n=\mathrm{poly}(k)$ measurements, provided that $Δ\gtrsim \sqrt{\log k}$. Up to constants, our algorithm accurately estimates $\bf{u}$ at the smallest possible $Δ$ that allows (in an information-theoretic sense) to recover $\bf{X}$ from $\bf{Y}$. A key step in our analysis involves estimating the probability that a line segment of length $\approx\sqrtν$ in a random direction $\bf{u}$ passes near a point in the lattice $Δ\mathbb{Z}^k$. Numerical experiments show that the developed algorithm performs well even in a non-asymptotic setting.

preprint2020arXiv

A Note on the Probability of Rectangles for Correlated Binary Strings

Consider two sequences of $n$ independent and identically distributed fair coin tosses, $X=(X_1,\ldots,X_n)$ and $Y=(Y_1,\ldots,Y_n)$, which are $ρ$-correlated for each $j$, i.e. $\mathbb{P}[X_j=Y_j] = {1+ρ\over 2}$. We study the question of how large (small) the probability $\mathbb{P}[X \in A, Y\in B]$ can be among all sets $A,B\subset\{0,1\}^n$ of a given cardinality. For sets $|A|,|B| = Θ(2^n)$ it is well known that the largest (smallest) probability is approximately attained by concentric (anti-concentric) Hamming balls, and this can be proved via the hypercontractive inequality (reverse hypercontractivity). Here we consider the case of $|A|,|B| = 2^{Θ(n)}$. By applying a recent extension of the hypercontractive inequality of Polyanskiy-Samorodnitsky (J. Functional Analysis, 2019), we show that Hamming balls of the same size approximately maximize $\mathbb{P}[X \in A, Y\in B]$ in the regime of $ρ\to 1$. We also prove a similar tight lower bound, i.e. show that for $ρ\to 0$ the pair of opposite Hamming balls approximately minimizes the probability $\mathbb{P}[X \in A, Y\in B]$.

preprint2020arXiv

An Information-Theoretic Proof of the Streaming Switching Lemma for Symmetric Encryption

Motivated by a fundamental paradigm in cryptography, we consider a recent variant of the classic problem of bounding the distinguishing advantage between a random function and a random permutation. Specifically, we consider the problem of deciding whether a sequence of $q$ values was sampled uniformly with or without replacement from $[N]$, where the decision is made by a streaming algorithm restricted to using at most $s$ bits of internal memory. In this work, the distinguishing advantage of such an algorithm is measured by the KL divergence between the distributions of its output as induced under the two cases. We show that for any $s=Ω(\log N)$ the distinguishing advantage is upper bounded by $O(q \cdot s / N)$, and even by $O(q \cdot s / N \log N)$ when $q \leq N^{1 - ε}$ for any constant $ε> 0$ where it is nearly tight with respect to the KL divergence.

preprint2020arXiv

Binary Hypothesis Testing with Deterministic Finite-Memory Decision Rules

In this paper we consider the problem of binary hypothesis testing with finite memory systems. Let $X_1,X_2,\ldots$ be a sequence of independent identically distributed Bernoulli random variables, with expectation $p$ under $\mathcal{H}_0$ and $q$ under $\mathcal{H}_1$. Consider a finite-memory deterministic machine with $S$ states that updates its state $M_n \in \{1,2,\ldots,S\}$ at each time according to the rule $M_n = f(M_{n-1},X_n)$, where $f$ is a deterministic time-invariant function. Assume that we let the process run for a very long time ($n\rightarrow \infty)$, and then make our decision according to some mapping from the state space to the hypothesis space. The main contribution of this paper is a lower bound on the Bayes error probability $P_e$ of any such machine. In particular, our findings show that the ratio between the maximal exponential decay rate of $P_e$ with $S$ for a deterministic machine and for a randomized one, can become unbounded, complementing a result by Hellman.

preprint2020arXiv

New bounds on the density of lattice coverings

We obtain new upper bounds on the minimal density of lattice coverings of Euclidean space by dilates of a convex body K. We also obtain bounds on the probability (with respect to the natural Haar-Siegel measure on the space of lattices) that a randomly chosen lattice L satisfies that L+K is all of space. As a step in the proof, we utilize and strengthen results on the discrete Kakeya problem.