Source author record

Ali Khalesi

Ali Khalesi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Cryptography and Security eess.SP Machine Learning math.RA

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpretations of MoE gating, we treat the gate as a stochastic channel and use $I(X;T)$ to quantify the routing information available to the selected expert. To make the associated information quantities tractable beyond synthetic examples, we develop a finite-bank MNIST construction using pretrained CNN experts and a discrete, data-dependent selection rule. Since the selected model belongs to a finite candidate set, the algorithmic mutual information $I(S;W)$ admits a closed-form discrete-entropy estimator from the empirical posterior $q(W|S)$. Sweeping a data-dependence parameter $α$, we observe that $\widehat I(S;W)$ monotonically tracks the generalization gap, while the Xu-Raginsky bound exhibits the expected looseness. We also compare with a uniform union-bound baseline and introduce an empirical estimator of $I(X;T)$ together with a Blahut-Arimoto procedure for tracing an accuracy-rate curve over the expert bank. The proposed framework provides a practical tool for analyzing resource-aware MoE inference systems and for interpreting $I(X;T)$ and $D(R_g)$ as design proxies for efficient expert routing.

preprint2023arXiv

Multi-User Distributed Computing Via Compressed Sensing

The multi-user linearly-separable distributed computing problem is considered here, in which $N$ servers help to compute the real-valued functions requested by $K$ users, where each function can be written as a linear combination of up to $L$ (generally non-linear) subfunctions. Each server computes a fraction $γ$ of the subfunctions, then communicates a function of its computed outputs to some of the users, and then each user collects its received data to recover its desired function. Our goal is to bound the ratio between the computation workload done by all servers over the number of datasets. To this end, we here reformulate the real-valued distributed computing problem into a matrix factorization problem and then into a basic sparse recovery problem, where sparsity implies computational savings. Building on this, we first give a simple probabilistic scheme for subfunction assignment, which allows us to upper bound the optimal normalized computation cost as $γ\leq \frac{K}{N}$ that a generally intractable $\ell_0$-minimization would give. To bypass the intractability of such optimal scheme, we show that if these optimal schemes enjoy $γ\leq - r\frac{K}{N}W^{-1}_{-1}(- \frac{2K}{e N r} )$ (where $W_{-1}(\cdot)$ is the Lambert function and $r$ calibrates the communication between servers and users), then they can actually be derived using a tractable Basis Pursuit $\ell_1$-minimization. This newly-revealed connection between distributed computation and compressed sensing opens up the possibility of designing practical distributed computing algorithms by employing tools and methods from compressed sensing.

preprint2022arXiv

Multi-User Linearly-Separable Distributed Computing

In this work, we explore the problem of multi-user linearly-separable distributed computation, where $N$ servers help compute the desired functions (jobs) of $K$ users, and where each desired function can be written as a linear combination of up to $L$ (generally non-linear) subtasks (or sub-functions). Each server computes some of the subtasks, communicates a function of its computed outputs to some of the users, and then each user collects its received data to recover its desired function. We explore the computation and communication relationship between how many servers compute each subtask vs. how much data each user receives. For a matrix $\mathbf{F}$ representing the linearly-separable form of the set of requested functions, our problem becomes equivalent to the open problem of sparse matrix factorization $\mathbf{F} = \mathbf{D}\mathbf{E}$ over finite fields, where a sparse decoding matrix $\mathbf{D}$ and encoding matrix $\mathbf{E}$ imply reduced communication and computation costs respectively. This paper establishes a novel relationship between our distributed computing problem, matrix factorization, syndrome decoding and covering codes. To reduce the computation cost, the above $\mathbf{D}$ is drawn from covering codes or from a here-introduced class of so-called `partial covering' codes, whose study here yields computation cost results that we present.

preprint2021arXiv

The Capacity Region of Distributed Multi-User Secret Sharing

In this paper, we study the problem of distributed multi-user secret sharing, including a trusted master node, $N\in \mathbb{N}$ storage nodes, and $K$ users, where each user has access to the contents of a subset of storage nodes. Each user has an independent secret message with certain rate, defined as the size of the message normalized by the size of a storage node. Having access to the secret messages, the trusted master node places encoded shares in the storage nodes, such that (i) each user can recover its own message from the content of the storage nodes that it has access to, (ii) each user cannot gain any information about the message of any other user. We characterize the capacity region of the distributed multi-user secret sharing, defined as the set of all achievable rate tuples, subject to the correctness and privacy constraints. In the achievable scheme, for each user, the master node forms a polynomial with the degree equal to the number of its accessible storage nodes minus one, where the value of this polynomial at certain points are stored as the encoded shares. The message of that user is embedded in some of the coefficients of the polynomial. The remaining coefficients are determined such that the content of each storage node serves as the encoded shares for all users that have access to that storage node.