Researcher profile

Ashish Khisti

Ashish Khisti contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

List-Level Distribution Coupling with Applications to Speculative Decoding and Lossy Compression

We study a relaxation of the problem of coupling probability distributions -- a list of samples is generated from one distribution and an accept is declared if any one of these samples is identical to the sample generated from the other distribution. We propose a novel method for generating samples, which extends the Gumbel-max sampling suggested in Daliri et al. (arXiv:2408.07978) for coupling probability distributions. We also establish a corresponding lower bound on the acceptance probability, which we call the list matching lemma. We next discuss two applications of our setup. First, we develop a new mechanism for multi-draft speculative sampling that is simple to implement and achieves performance competitive with baselines such as SpecTr and SpecInfer across a range of language tasks. Our method also guarantees a certain degree of drafter invariance with respect to the output tokens which is not supported by existing schemes. We also provide a theoretical lower bound on the token level acceptance probability. As our second application, we consider distributed lossy compression with side information in a setting where a source sample is compressed and available to multiple decoders, each with independent side information. We propose a compression technique that is based on our generalization of Gumbel-max sampling and show that it provides significant gains in experiments involving synthetic Gaussian sources and the MNIST image dataset.

preprint2022arXiv

Adaptive relaying for streaming erasure codes in a three node relay network

This paper investigates adaptive streaming codes over a three-node relayed network. In this setting, a source node transmits a sequence of message packets to a destination through a relay. The source-to-relay and relay-to-destination links are unreliable and introduce at most $N_1$ and $N_2$ packet erasures, respectively. The destination node must recover each message packet within a strict delay constraint $T$. The paper presents achievable streaming codes for all feasible parameters $\{N_1, N_2, T\}$ that exploit the fact that the relay naturally observes the erasure pattern occurring in the link from source to relay, thus it can adapt its relaying strategy based on these observations. In a recent work, Fong et al. provide streaming codes featuring channel-state-independent relaying strategies. The codes proposed in this paper achieve rates higher than the ones proposed by Fong et al. whenever $N_2 > N_1$, and achieve the same rate when $N_2 = N_1$. The paper also presents an upper bound on the achievable rate that takes into account erasures in both links in order to bound the rate in the second link. The upper bound is shown to be tighter than a trivial bound that considers only the erasures in the second link.

preprint2022arXiv

Error-correcting codes for low latency streaming over multiple link relay networks

This paper investigates the performance of streaming codes in low-latency applications over a multi-link three-node relayed network. The source wishes to transmit a sequence of messages to the destination through a relay. Each message must be reconstructed after a fixed decoding delay. The special case with one link connecting each node has been studied by Fong et. al [1], and a multi-hop multi-link setting has been studied by Domanovitz et. al [2]. The topology with three nodes and multiple links is studied in this paper. Each link is subject to a different number of erasures due to different channel conditions. An information-theoretic upper bound is derived, and an achievable scheme is presented. The proposed scheme judiciously allocates rates for each link based on the concept of delay spectrum. The achievable scheme is compared to two baseline schemes and the scheme proposed in [2]. Experimental results show that this scheme achieves higher rates than the other schemes, and can achieve the upper bound even in non-trivial scenarios. The scheme is further extended to handle different propagation delays in each link, something not previously considered in the literature. Simulations over statistical channels show that the proposed scheme can outperform the simpler baseline under practical models.

preprint2022arXiv

Optimal Streaming Erasure Codes over the Three-Node Relay Network

This paper investigates low-latency streaming codes for a three-node relay network. The source transmits a sequence of messages (streaming messages) to the destination through the relay between them, where the first-hop channel from the source to the relay and the second-hop channel from the relay to the destination are subject to packet erasures. Every source message must be recovered perfectly at the destination subject to a fixed decoding delay of $T$ time slots. In any sliding window of $T+1$ time slots, we assume no more than $N_1$ and $N_2$ erasures are introduced by the first-hop channel and second-hop channel respectively. Under this channel loss assumption, we fully characterize the maximum achievable rate in terms of $T$, $N_1$ and $N_2$. The achievability is proved by using a symbol-wise decode-forward strategy where the source symbols within the same message are decoded by the relay with different delays. The converse is proved by analyzing the maximum achievable rate for each channel when the erasures in the other channel are consecutive (bursty). In addition, we show that traditional message-wise decode-forward strategies, which require the source symbols within the same message to be decoded by the relay with the same delay, are sub-optimal in general.

preprint2022arXiv

Variational Model Inversion Attacks

Given the ubiquity of deep neural networks, it is important that these models do not reveal information about sensitive data that they have been trained on. In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network. A successful model inversion attack should generate realistic and diverse samples that accurately describe each of the classes in the private dataset. In this work, we provide a probabilistic interpretation of model inversion attacks, and formulate a variational objective that accounts for both diversity and accuracy. In order to optimize this variational objective, we choose a variational family defined in the code space of a deep generative model, trained on a public auxiliary dataset that shares some structural similarity with the target dataset. Empirically, our method substantially improves performance in terms of target attack accuracy, sample realism, and diversity on datasets of faces and chest X-ray images.

preprint2021arXiv

Sequential Classification with Empirically Observed Statistics

Motivated by real-world machine learning applications, we consider a statistical classification task in a sequential setting where test samples arrive sequentially. In addition, the generating distributions are unknown and only a set of empirically sampled sequences are available to a decision maker. The decision maker is tasked to classify a test sequence which is known to be generated according to either one of the distributions. In particular, for the binary case, the decision maker wishes to perform the classification task with minimum number of the test samples, so, at each step, she declares that either hypothesis 1 is true, hypothesis 2 is true, or she requests for an additional test sample. We propose a classifier and analyze the type-I and type-II error probabilities. We demonstrate the significant advantage of our sequential scheme compared to an existing non-sequential classifier proposed by Gutman. Finally, we extend our setup and results to the multi-class classification scenario and again demonstrate that the variable-length nature of the problem affords significant advantages as one can achieve the same set of exponents as Gutman's fixed-length setting but without having the rejection option.

preprint2020arXiv

An Explicit Rate-Optimal Streaming Code for Channels with Burst and Arbitrary Erasures

This paper considers the transmission of an infinite sequence of messages (a streaming source) over a packet erasure channel, where every source message must be recovered perfectly at the destination subject to a fixed decoding delay. While the capacity of a channel that introduces only bursts of erasures is well known, only recently, the capacity of a channel with either one burst of erasures or multiple arbitrary erasures in any fixed-sized sliding window has been established. However, the codes shown to achieve this capacity are either non-explicit constructions (proven to exist) or explicit constructions that require large field size that scales exponentially with the delay. This work describes an explicit rate-optimal construction for admissible channel and delay parameters over a field size that scales only quadratically with the delay.

preprint2020arXiv

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that forecast the mini-batch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky (2017). Our bound can be tied to a measure of flatness of the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of magnitude smaller.

preprint2020arXiv

Low-Latency Network-Adaptive Error Control for Interactive Streaming

We introduce a novel network-adaptive algorithm that is suitable for alleviating network packet losses for low-latency interactive communications between a source and a destination. Our network-adaptive algorithm estimates in real-time the best parameters of a recently proposed streaming code that uses forward error correction (FEC) to correct both arbitrary and burst losses, which cause a crackling noise and undesirable jitters, respectively in audio. In particular, the destination estimates appropriate coding parameters based on its observed packet loss pattern and sends them back to the source for updating the underlying code. Besides, a new explicit construction of practical low-latency streaming codes that achieve the optimal tradeoff between the capability of correcting arbitrary losses and the capability of correcting burst losses is provided. Simulation evaluations based on statistical losses and real-world packet loss traces reveal the following: (i) Our proposed network-adaptive algorithm combined with our optimal streaming codes can achieve significantly higher performance compared to uncoded and non-adaptive FEC schemes over UDP (User Datagram Protocol); (ii) Our explicit streaming codes can significantly outperform traditional MDS (maximum-distance separable) streaming schemes when they are used along with our network-adaptive algorithm.

preprint2020arXiv

Streaming Erasure Codes over Multi-hop Relay Network

This paper studies low-latency streaming codes for the multi-hop network. The source is transmitting a sequence of messages (streaming messages) to a destination through a chain of relays where each hop is subject to packet erasures. Every source message has to be recovered perfectly at the destination within a delay constraint of $T$ time slots. In any sliding window of $T+1$ time slots, we assume no more than $N_j$ erasures introduced by the $j$'th hop channel. The capacity in case of a single relay (a three-node network) was derived by Fong [1], et al. While the converse derived for the three-node case can be extended to any number of nodes using a similar technique (analyzing the case where erasures on other links are consecutive), we demonstrate next that the achievable scheme, which suggested a clever symbol-wise decode and forward strategy, can not be straightforwardly extended without a loss in performance. The coding scheme for the three-node network, which was shown to achieve the upper bound, was ``state-independent'' (i.e., it does not depend on specific erasure pattern). While this is a very desirable property, in this paper, we suggest a ``state-dependent'' (i.e., a scheme which depends on specific erasure pattern) and show that it achieves the upper bound up to the size of an additional header. Since, as we show, the size of the header does not depend on the field size, the gap between the achievable rate and the upper bound decreases as the field size increases.

preprint2020arXiv

Time-Resolved fMRI Shared Response Model using Gaussian Process Factor Analysis

Multi-subject fMRI studies are challenging due to the high variability of both brain anatomy and functional brain topographies across participants. An effective way of aggregating multi-subject fMRI data is to extract a shared representation that filters out unwanted variability among subjects. Some recent work has implemented probabilistic models to extract a shared representation in task fMRI. In the present work, we improve upon these models by incorporating temporal information in the common latent structures. We introduce a new model, Shared Gaussian Process Factor Analysis (S-GPFA), that discovers shared latent trajectories and subject-specific functional topographies, while modelling temporal correlation in fMRI data. We demonstrate the efficacy of our model in revealing ground truth latent structures using simulated data, and replicate experimental performance of time-segment matching and inter-subject similarity on the publicly available Raider and Sherlock datasets. We further test the utility of our model by analyzing its learned model parameters in the large multi-site SPINS dataset, on a social cognition task from participants with and without schizophrenia.