Source author record

Wasim Huleihel

Wasim Huleihel appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning cond-mat.stat-mech Data Structures and Algorithms Information Retrieval math.PR math.ST Social and Information Networks Statistics Theory Subcellular Processes

Catalog footprint

What is connected

10works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Inferring Hidden Structures in Random Graphs

We study the two inference problems of detecting and recovering an isolated community of \emph{general} structure planted in a random graph. The detection problem is formalized as a hypothesis testing problem, where under the null hypothesis, the graph is a realization of an Erdős-Rényi random graph $\mathcal{G}(n,q)$ with edge density $q\in(0,1)$; under the alternative, there is an unknown structure $Γ_k$ on $k$ nodes, planted in $\mathcal{G}(n,q)$, such that it appears as an \emph{induced subgraph}. In case of a successful detection, we are concerned with the task of recovering the corresponding structure. For these problems, we investigate the fundamental limits from both the statistical and computational perspectives. Specifically, we derive lower bounds for detecting/recovering the structure $Γ_k$ in terms of the parameters $(n,k,q)$, as well as certain properties of $Γ_k$, and exhibit computationally unbounded optimal algorithms that achieve these lower bounds. We also consider the problem of testing in polynomial-time. As is customary in many similar structured high-dimensional problems, our model undergoes an "easy-hard-impossible" phase transition and computational constraints can severely penalize the statistical performance. To provide an evidence for this phenomenon, we show that the class of low-degree polynomials algorithms match the statistical performance of the polynomial-time algorithms we develop.

preprint2022arXiv

Optimal Reference for DNA Synthesis

In the recent years, DNA has emerged as a potentially viable storage technology. DNA synthesis, which refers to the task of writing the data into DNA, is perhaps the most costly part of existing storage systems. Accordingly, this high cost and low throughput limits the practical use in available DNA synthesis technologies. It has been found that the homopolymer run (i.e., the repetition of the same nucleotide) is a major factor affecting the synthesis and sequencing errors. Quite recently, [26] studied the role of batch optimization in reducing the cost of large scale DNA synthesis, for a given pool $\mathcal{S}$ of random quaternary strings of fixed length. Among other things, it was shown that the asymptotic cost savings of batch optimization are significantly greater when the strings in $\mathcal{S}$ contain repeats of the same character (homopolymer run of length one), as compared to the case where strings are unconstrained. Following the lead of [26], in this paper, we take a step forward towards the theoretical understanding of DNA synthesis, and study the homopolymer run of length $k\geq1$. Specifically, we are given a set of DNA strands $\mathcal{S}$, randomly drawn from a natural Markovian distribution modeling a general homopolymer run length constraint, that we wish to synthesize. For this problem, we prove that for any $k\geq 1$, the optimal reference strand, minimizing the cost of DNA synthesis is, perhaps surprisingly, the periodic sequence $\overline{\mathsf{ACGT}}$. It turns out that tackling the homopolymer constraint of length $k\geq2$ is a challenging problem; our main technical contribution is the representation of the DNA synthesis process as a certain constrained system, for which string techniques can be applied.

preprint2021arXiv

Learning User Preferences in Non-Stationary Environments

Recommendation systems often use online collaborative filtering (CF) algorithms to identify items a given user likes over time, based on ratings that this user and a large number of other users have provided in the past. This problem has been studied extensively when users' preferences do not change over time (static case); an assumption that is often violated in practical settings. In this paper, we introduce a novel model for online non-stationary recommendation systems which allows for temporal uncertainties in the users' preferences. For this model, we propose a user-based CF algorithm, and provide a theoretical analysis of its achievable reward. Compared to related non-stationary multi-armed bandit literature, the main fundamental difficulty in our model lies in the fact that variations in the preferences of a certain user may affect the recommendations for other users severely. We also test our algorithm over real-world datasets, showing its effectiveness in real-world applications. One of the main surprising observations in our experiments is the fact our algorithm outperforms other static algorithms even when preferences do not change over time. This hints toward the general conclusion that in practice, dynamic algorithms, such as the one we propose, might be beneficial even in stationary environments.

preprint2021arXiv

Variability in mRNA Translation: A Random Matrix Theory Approach

The rate of mRNA translation depends on the initiation, elongation, and termination rates of ribosomes along the mRNA. These rates depend on many "local" factors like the abundance of free ribosomes and tRNA molecules in the vicinity of the mRNA molecule. All these factors are stochastic and their experimental measurements are also noisy. An important question is how protein production in the cell is affected by this considerable variability. We develop a new theoretical framework for addressing this question by modeling the rates as identically and independently distributed random variables and using tools from random matrix theory to analyze the steady-state production rate. The analysis reveals a principle of universality: the average protein production rate depends only on the of the set of possible values that the random variable may attain. This explains how total protein production can be stabilized despite the overwhelming stochasticticity underlying cellular processes.

preprint2020arXiv

Centralized vs Decentralized Targeted Brute-Force Attacks: Guessing with Side-Information

According to recent empirical studies, a majority of users have the same, or very similar, passwords across multiple password-secured online services. This practice can have disastrous consequences, as one password being compromised puts all the other accounts at much higher risk. Generally, an adversary may use any side-information he/she possesses about the user, be it demographic information, password reuse on a previously compromised account, or any other relevant information to devise a better brute-force strategy (so called targeted attack). In this work, we consider a distributed brute-force attack scenario in which $m$ adversaries, each observing some side information, attempt breaching a password secured system. We compare two strategies: an uncoordinated attack in which the adversaries query the system based on their own side-information until they find the correct password, and a fully coordinated attack in which the adversaries pool their side-information and query the system together. For passwords $\mathbf{X}$ of length $n$, generated independently and identically from a distribution $P_X$, we establish an asymptotic closed-form expression for the uncoordinated and coordinated strategies when the side-information $\mathbf{Y}_{(m)}$ are generated independently from passing $\mathbf{X}$ through a memoryless channel $P_{Y|X}$, as the length of the password $n$ goes to infinity. We illustrate our results for binary symmetric channels and binary erasure channels, two families of side-information channels which model password reuse. We demonstrate that two coordinated agents perform asymptotically better than any finite number of uncoordinated agents for these channels, meaning that sharing side-information is very valuable in distributed attacks.

preprint2020arXiv

Sharp Thresholds of the Information Cascade Fragility Under a Mismatched Model

We analyze a sequential decision making model in which decision makers (or, players) take their decisions based on their own private information as well as the actions of previous decision makers. Such decision making processes often lead to what is known as the \emph{information cascade} or \emph{herding} phenomenon. Specifically, a cascade develops when it seems rational for some players to abandon their own private information and imitate the actions of earlier players. The risk, however, is that if the initial decisions were wrong, then the whole cascade will be wrong. Nonetheless, information cascade are known to be fragile: there exists a sequence of \emph{revealing} probabilities $\{p_{\ell}\}_{\ell\geq1}$, such that if with probability $p_{\ell}$ player $\ell$ ignores the decisions of previous players, and rely on his private information only, then wrong cascades can be avoided. Previous related papers which study the fragility of information cascades always assume that the revealing probabilities are known to all players perfectly, which might be unrealistic in practice. Accordingly, in this paper we study a mismatch model where players believe that the revealing probabilities are $\{q_\ell\}_{\ell\in\mathbb{N}}$ when they truly are $\{p_\ell\}_{\ell\in\mathbb{N}}$, and study the effect of this mismatch on information cascades. We consider both adversarial and probabilistic sequential decision making models, and derive closed-form expressions for the optimal learning rates at which the error probability associated with a certain decision maker goes to zero. We prove several novel phase transitions in the behaviour of the asymptotic learning rate.

preprint2016arXiv

Asymptotic MMSE Analysis Under Sparse Representation Modeling

Compressed sensing is a signal processing technique in which data is acquired directly in a compressed form. There are two modeling approaches that can be considered: the worst-case (Hamming) approach and a statistical mechanism, in which the signals are modeled as random processes rather than as individual sequences. In this paper, the second approach is studied. In particular, we consider a model of the form $\boldsymbol{Y} = \boldsymbol{H}\boldsymbol{X}+\boldsymbol{W}$, where each comportment of $\boldsymbol{X}$ is given by $X_i = S_iU_i$, where $\left\{U_i\right\}$ are i.i.d. Gaussian random variables, and $\left\{S_i\right\}$ are binary random variables independent of $\left\{U_i\right\}$, and not necessarily independent and identically distributed (i.i.d.), $\boldsymbol{H}\in\mathbb{R}^{k\times n}$ is a random matrix with i.i.d. entries, and $\boldsymbol{W}$ is white Gaussian noise. Using a direct relationship between optimum estimation and certain partition functions, and by invoking methods from statistical mechanics and from random matrix theory (RMT), we derive an asymptotic formula for the minimum mean-square error (MMSE) of estimating the input vector $\boldsymbol{X}$ given $\boldsymbol{Y}$ and $\boldsymbol{H}$, as $k,n\to\infty$, keeping the measurement rate, $R = k/n$, fixed. In contrast to previous derivations, which are based on the replica method, the analysis carried out in this paper is rigorous.

preprint2014arXiv

On Compressive Sensing in Coding Problems: A Rigorous Approach

We take an information theoretic perspective on a classical sparse-sampling noisy linear model and present an analytical expression for the mutual information, which plays central role in a variety of communications/processing problems. Such an expression was addressed previously either by bounds, by simulations and by the (non-rigorous) replica method. The expression of the mutual information is based on techniques used in [1], addressing the minimum mean square error (MMSE) analysis. Using these expressions, we study specifically a variety of sparse linear communications models which include coding in different settings, accounting also for multiple access channels and different wiretap problems. For those, we provide single-letter expressions and derive achievable rates, capturing the communications/processing features of these timely models.

preprint2014arXiv

Universal Decoding for Gaussian Intersymbol Interference Channels

A universal decoding procedure is proposed for the intersymbol interference (ISI) Gaussian channels. The universality of the proposed decoder is in the sense of being independent of the various channel parameters, and at the same time, attaining the same random coding error exponent as the optimal maximum-likelihood (ML) decoder, which utilizes full knowledge of these unknown parameters. The proposed decoding rule can be regarded as a frequency domain version of the universal maximum mutual information (MMI) decoder. Contrary to previously suggested universal decoders for ISI channels, our proposed decoding metric can easily be evaluated.

preprint2013arXiv

Analysis of Mismatched Estimation Errors Using Gradients of Partition Functions

We consider the problem of signal estimation (denoising) from a statistical-mechanical perspective, in continuation to a recent work on the analysis of mean-square error (MSE) estimation using a direct relationship between optimum estimation and certain partition functions. The paper consists of essentially two parts. In the first part, using the aforementioned relationship, we derive single-letter expressions of the mismatched MSE of a codeword (from a randomly selected code), corrupted by a Gaussian vector channel. In the second part, we provide several examples to demonstrate phase transitions in the behavior of the MSE. These examples enable us to understand more deeply and to gather intuition regarding the roles of the real and the mismatched probability measures in creating these phase transitions.

Wasim Huleihel

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Inferring Hidden Structures in Random Graphs

Optimal Reference for DNA Synthesis

Learning User Preferences in Non-Stationary Environments

Variability in mRNA Translation: A Random Matrix Theory Approach

Centralized vs Decentralized Targeted Brute-Force Attacks: Guessing with Side-Information

Sharp Thresholds of the Information Cascade Fragility Under a Mismatched Model

Asymptotic MMSE Analysis Under Sparse Representation Modeling

On Compressive Sensing in Coding Problems: A Rigorous Approach

Universal Decoding for Gaussian Intersymbol Interference Channels

Analysis of Mismatched Estimation Errors Using Gradients of Partition Functions