Researcher profile

Ofer Shayevitz

Ofer Shayevitz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Deterministic Finite-Memory Bias Estimation

In this paper we consider the problem of estimating a Bernoulli parameter using finite memory. Let $X_1,X_2,\ldots$ be a sequence of independent identically distributed Bernoulli random variables with expectation $θ$, where $θ\in [0,1]$. Consider a finite-memory deterministic machine with $S$ states, that updates its state $M_n \in \{1,2,\ldots,S\}$ at each time according to the rule $M_n = f(M_{n-1},X_n)$, where $f$ is a deterministic time-invariant function. Assume that the machine outputs an estimate at each time point according to some fixed mapping from the state space to the unit interval. The quality of the estimation procedure is measured by the asymptotic risk, which is the long-term average of the instantaneous quadratic risk. The main contribution of this paper is an upper bound on the smallest worst-case asymptotic risk any such machine can attain. This bound coincides with a lower bound derived by Leighton and Rivest, to imply that $Θ(1/S)$ is the minimax asymptotic risk for deterministic $S$-state machines. In particular, our result disproves a longstanding $Θ(\log S/S)$ conjecture for this quantity, also posed by Leighton and Rivest.

preprint2022arXiv

On The Memory Complexity of Uniformity Testing

In this paper we consider the problem of uniformity testing with limited memory. We observe a sequence of independent identically distributed random variables drawn from a distribution $p$ over $[n]$, which is either uniform or is $\varepsilon$-far from uniform under the total variation distance, and our goal is to determine the correct hypothesis. At each time point we are allowed to update the state of a finite-memory machine with $S$ states, where each state of the machine is assigned one of the hypotheses, and we are interested in obtaining an asymptotic probability of error at most $0<δ<1/2$ uniformly under both hypotheses. The main contribution of this paper is deriving upper and lower bounds on the number of states $S$ needed in order to achieve a constant error probability $δ$, as a function of $n$ and $\varepsilon$, where our upper bound is $O(\frac{n\log n}{\varepsilon})$ and our lower bound is $Ω(n+\frac{1}{\varepsilon})$. Prior works in the field have almost exclusively used collision counting for upper bounds, and the Paninski mixture for lower bounds. Somewhat surprisingly, in the limited memory with unlimited samples setup, the optimal solution does not involve counting collisions, and the Paninski prior is not hard. Thus, different proof techniques are needed in order to attain our bounds.

preprint2021arXiv

Learning User Preferences in Non-Stationary Environments

Recommendation systems often use online collaborative filtering (CF) algorithms to identify items a given user likes over time, based on ratings that this user and a large number of other users have provided in the past. This problem has been studied extensively when users&#39; preferences do not change over time (static case); an assumption that is often violated in practical settings. In this paper, we introduce a novel model for online non-stationary recommendation systems which allows for temporal uncertainties in the users&#39; preferences. For this model, we propose a user-based CF algorithm, and provide a theoretical analysis of its achievable reward. Compared to related non-stationary multi-armed bandit literature, the main fundamental difficulty in our model lies in the fact that variations in the preferences of a certain user may affect the recommendations for other users severely. We also test our algorithm over real-world datasets, showing its effectiveness in real-world applications. One of the main surprising observations in our experiments is the fact our algorithm outperforms other static algorithms even when preferences do not change over time. This hints toward the general conclusion that in practice, dynamic algorithms, such as the one we propose, might be beneficial even in stationary environments.

preprint2020arXiv

A Note on the Probability of Rectangles for Correlated Binary Strings

Consider two sequences of $n$ independent and identically distributed fair coin tosses, $X=(X_1,\ldots,X_n)$ and $Y=(Y_1,\ldots,Y_n)$, which are $ρ$-correlated for each $j$, i.e. $\mathbb{P}[X_j=Y_j] = {1+ρ\over 2}$. We study the question of how large (small) the probability $\mathbb{P}[X \in A, Y\in B]$ can be among all sets $A,B\subset\{0,1\}^n$ of a given cardinality. For sets $|A|,|B| = Θ(2^n)$ it is well known that the largest (smallest) probability is approximately attained by concentric (anti-concentric) Hamming balls, and this can be proved via the hypercontractive inequality (reverse hypercontractivity). Here we consider the case of $|A|,|B| = 2^{Θ(n)}$. By applying a recent extension of the hypercontractive inequality of Polyanskiy-Samorodnitsky (J. Functional Analysis, 2019), we show that Hamming balls of the same size approximately maximize $\mathbb{P}[X \in A, Y\in B]$ in the regime of $ρ\to 1$. We also prove a similar tight lower bound, i.e. show that for $ρ\to 0$ the pair of opposite Hamming balls approximately minimizes the probability $\mathbb{P}[X \in A, Y\in B]$.

preprint2020arXiv

Binary Hypothesis Testing with Deterministic Finite-Memory Decision Rules

In this paper we consider the problem of binary hypothesis testing with finite memory systems. Let $X_1,X_2,\ldots$ be a sequence of independent identically distributed Bernoulli random variables, with expectation $p$ under $\mathcal{H}_0$ and $q$ under $\mathcal{H}_1$. Consider a finite-memory deterministic machine with $S$ states that updates its state $M_n \in \{1,2,\ldots,S\}$ at each time according to the rule $M_n = f(M_{n-1},X_n)$, where $f$ is a deterministic time-invariant function. Assume that we let the process run for a very long time ($n\rightarrow \infty)$, and then make our decision according to some mapping from the state space to the hypothesis space. The main contribution of this paper is a lower bound on the Bayes error probability $P_e$ of any such machine. In particular, our findings show that the ratio between the maximal exponential decay rate of $P_e$ with $S$ for a deterministic machine and for a randomized one, can become unbounded, complementing a result by Hellman.

preprint2020arXiv

Sharp Thresholds of the Information Cascade Fragility Under a Mismatched Model

We analyze a sequential decision making model in which decision makers (or, players) take their decisions based on their own private information as well as the actions of previous decision makers. Such decision making processes often lead to what is known as the \emph{information cascade} or \emph{herding} phenomenon. Specifically, a cascade develops when it seems rational for some players to abandon their own private information and imitate the actions of earlier players. The risk, however, is that if the initial decisions were wrong, then the whole cascade will be wrong. Nonetheless, information cascade are known to be fragile: there exists a sequence of \emph{revealing} probabilities $\{p_{\ell}\}_{\ell\geq1}$, such that if with probability $p_{\ell}$ player $\ell$ ignores the decisions of previous players, and rely on his private information only, then wrong cascades can be avoided. Previous related papers which study the fragility of information cascades always assume that the revealing probabilities are known to all players perfectly, which might be unrealistic in practice. Accordingly, in this paper we study a mismatch model where players believe that the revealing probabilities are $\{q_\ell\}_{\ell\in\mathbb{N}}$ when they truly are $\{p_\ell\}_{\ell\in\mathbb{N}}$, and study the effect of this mismatch on information cascades. We consider both adversarial and probabilistic sequential decision making models, and derive closed-form expressions for the optimal learning rates at which the error probability associated with a certain decision maker goes to zero. We prove several novel phase transitions in the behaviour of the asymptotic learning rate.

preprint2020arXiv

Simple Modulo can Significantly Outperform Deep Learning-based Deepcode

Deepcode (H.Kim et al.2018) is a recently suggested Deep Learning-based scheme for communication over the AWGN channel with noisy feedback, claimed to be superior to all previous schemes in the literature. Deepcode&#39;s use of nonlinear coding (via Deep Learning) has been inspired by known shortcomings (Y.-H. Kim et al 2007) of linear feedback schemes. In 2014, we presented a nonlinear feedback coding scheme based on a combination of the classical SK scheme and modulo-arithmetic, using a small number of elementary operations without any type of neural network. This Modulo-SK scheme has been omitted from the performance comparisons made in the Deepcode paper, due to its use of common randomness (dither), and in a later version since it was incorrectly interpreted as a variable-length coding scheme. However, the dither in Modulo-SK was used only for the standard purpose of tractable performance analysis, and is not required in practice. In this short note, we show that a fully-deterministic Modulo-SK (without dithering) can outperform Deepcode. For example, to attain an error probability of 10^(-4) at rate 1/3 Modulo-SK requires 3dB less feedback SNR than Deepcode. To attain an error probability of 10^(-6) with noiseless feedback, Deepcode requires 150 rounds of communication, whereas Modulo-SK requires only 15 rounds, even if the feedback is noisy (with 27dB SNR). We further address the numerical stability issues of the original SK scheme reported in the Deepcode paper, and explain how they can be avoided. We augment this report with an online-available, fully-functional Matlab simulation for both the classical and Modulo-SK schemes. Finally, note that Modulo-SK is by no means claimed to be the best possible solution; in particular, using deep learning in conjunction with modulo-arithmetic might lead to better designs, and remains a fascinating direction for future research.

preprint2019arXiv

On the Interactive Capacity of Finite-State Protocols

The interactive capacity of a noisy channel is the highest possible rate at which arbitrary interactive protocols can be simulated reliably over the channel. Determining the interactive capacity is notoriously difficult, and the best known lower bounds are far below the associated Shannon capacity, which serves as a trivial (and also generally the best known) upper bound. This paper considers the more restricted setup of simulating finite-state protocols. It is shown that all two-state protocols, as well as rich families of arbitrary finite-state protocols, can be simulated at the Shannon capacity, establishing the interactive capacity for those families of protocols.

preprint2018arXiv

Guessing with a Bit of Help

What is the value of a single bit to a guesser? We study this problem in a setup where Alice wishes to guess an i.i.d. random vector, and can procure one bit of information from Bob, who observes this vector through a memoryless channel. We are interested in the guessing efficiency, which we define as the best possible multiplicative reduction in Alice&#39;s guessing-moments obtainable by observing Bob&#39;s bit. For the case of a uniform binary vector observed through a binary symmetric channel, we provide two lower bounds on the guessing efficiency by analyzing the performance of the Dictator and Majority functions, and two upper bounds via maximum entropy and Fourier-analytic / hypercontractivity arguments. We then extend our maximum entropy argument to give a lower bound on the guessing efficiency for a general channel with a binary uniform input, via the strong data-processing inequality constant of the reverse channel. We compute this bound for the binary erasure channel, and conjecture that Greedy Dictator functions achieve the guessing efficiency.