Source author record

Himanshu Tyagi

Himanshu Tyagi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Cryptography and Security Machine Learning Data Structures and Algorithms Discrete Mathematics math.ST Statistics Theory Applications astro-ph.GA astro-ph.SR Computational Complexity Populations and Evolution Social and Information Networks

Catalog footprint

What is connected

19works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Structure and Kinematics of Three Class 0 Protostellar Jets from JWST

We present observations of jets within 2000 au of three deeply embedded protostars using 2.9-27 micron observations with JWST. These observations show the morphologies and kinematics of the collimated jets from three protostars, the low-mass Class 0 protostars B335 and HOPS 153, and the intermediate-mass protostar HOPS 370. These jets are traced by shock-ionized fine-structure line emission observed with the JWST NIRSpec and MIRI IFUs. We find that [Fe II] emission traces the full extent of the inner 1000 to 2000 au of the jets, depending on distance to the protostar, while other ions mostly trace isolated shocked knots. The jets show evidence of wiggling motion in the plane of the sky as well as asymmetries between blue and red-shifted lobes. The widths of the jets increase non-monotonically with distance from the central protostar, with opening angles ranging from 2.1 degrees to < 10.1 degrees for the three protostars in the sample. The jets have total velocities ranging from 147 to 184 km/s after correcting for disk inclination. For B335, an 8-month gap between NIRSpec and MIRI MRS observations enabled measurement of the tangential velocity of a shocked knot; in combination with the radial velocity, this shows that the jet has a different inclination than the outflow cavity. We find multiple knots before and during a recent outburst in B335, although the knots were more frequent during the burst. The asymmetries between blue- and red-shifted lobes strongly suggest complex interactions between the circumstellar disks and magnetic fields.

preprint2022arXiv

The Role of Interactivity in Structured Estimation

We study high-dimensional sparse estimation under three natural constraints: communication constraints, local privacy constraints, and linear measurements (compressive sensing). Without sparsity assumptions, it has been established that interactivity cannot improve the minimax rates of estimation under these information constraints. The question of whether interactivity helps with natural inference tasks has been a topic of active research. We settle this question in the affirmative for the prototypical problems of high-dimensional sparse mean estimation and compressive sensing, by demonstrating a gap between interactive and noninteractive protocols. We further establish that the gap increases when we have more structured sparsity: for block sparsity this gap can be as large as polynomial in the dimensionality. Thus, the more structured the sparsity is, the greater is the advantage of interaction. Proving the lower bounds requires a careful breaking of a sum of correlated random variables into independent components using Baranyai's theorem on decomposition of hypergraphs, which might be of independent interest.

preprint2021arXiv

Inference under Information Constraints III: Local Privacy Constraints

We study goodness-of-fit and independence testing of discrete distributions in a setting where samples are distributed across multiple users. The users wish to preserve the privacy of their data while enabling a central server to perform the tests. Under the notion of local differential privacy, we propose simple, sample-optimal, and communication-efficient protocols for these two questions in the noninteractive setting, where in addition users may or may not share a common random seed. In particular, we show that the availability of shared (public) randomness greatly reduces the sample complexity. Underlying our public-coin protocols are privacy-preserving mappings which, when applied to the samples, minimally contract the distance between their respective probability distributions.

preprint2021arXiv

Multiple Support Recovery Using Very Few Measurements Per Sample

In the problem of multiple support recovery, we are given access to linear measurements of multiple sparse samples in $\mathbb{R}^{d}$. These samples can be partitioned into $\ell$ groups, with samples having the same support belonging to the same group. For a given budget of $m$ measurements per sample, the goal is to recover the $\ell$ underlying supports, in the absence of the knowledge of group labels. We study this problem with a focus on the measurement-constrained regime where $m$ is smaller than the support size $k$ of each sample. We design a two-step procedure that estimates the union of the underlying supports first, and then uses a spectral algorithm to estimate the individual supports. Our proposed estimator can recover the supports with $m<k$ measurements per sample, from $\tilde{O}(k^{4}\ell^{4}/m^{4})$ samples. Our guarantees hold for a general, generative model assumption on the samples and measurement matrices. We also provide results from experiments conducted on synthetic data and on the MNIST dataset.

preprint2020arXiv

Communication Complexity of Distributed High Dimensional Correlation Testing

Two parties observe independent copies of a $d$-dimensional vector and a scalar. They seek to test if their data is correlated or not, namely they seek to test if the norm $\|ρ\|_2$ of the correlation vector $ρ$ between their observations exceeds $τ$ or is it $0$. To that end, they communicate interactively and declare the output of the test. We show that roughly order $d/τ^2$ bits of communication are sufficient and necessary for resolving the distributed correlation testing problem above. Furthermore, we establish a lower bound of roughly $d^2/τ^2$ bits for communication needed for distributed correlation estimation, rendering the estimate-and-test approach suboptimal in communication required for distributed correlation testing. For the one-dimensional case with one-way communication, our bounds are tight even in the constant and provide a precise dependence of communication complexity on the probabilities of error of two types.

preprint2020arXiv

How Reliable are Test Numbers for Revealing the COVID-19 Ground Truth and Applying Interventions?

The number of confirmed cases of COVID-19 is often used as a proxy for the actual number of ground truth COVID-19 infected cases in both public discourse and policy making. However, the number of confirmed cases depends on the testing policy, and it is important to understand how the number of positive cases obtained using different testing policies reveals the unknown ground truth. We develop an agent-based simulation framework in Python that can simulate various testing policies as well as interventions such as lockdown based on them. The interaction between the agents can take into account various communities and mobility patterns. A distinguishing feature of our framework is the presence of another `flu'-like illness with symptoms similar to COVID-19, that allows us to model the noise in selecting the pool of patients to be tested. We instantiate our model for the city of Bengaluru in India, using census data to distribute agents geographically, and traffic flow mobility data to model long-distance interactions and mixing. We use the simulation framework to compare the performance of three testing policies: Random Symptomatic Testing (RST), Contact Tracing (CT), and a new Location Based Testing policy (LBT). We observe that if a sufficient fraction of symptomatic patients come out for testing, then RST can capture the ground truth quite closely even with very few daily tests. However, CT consistently captures more positive cases. Interestingly, our new LBT, which is operationally less intensive than CT, gives performance that is comparable with CT. In another direction, we compare the efficacy of these three testing policies in enabling lockdown, and observe that CT flattens the ground truth curve maximally, followed closely by LBT, and significantly better than RST.

preprint2020arXiv

Limits on Gradient Compression for Stochastic Optimization

We consider stochastic optimization over $\ell_p$ spaces using access to a first-order oracle. We ask: {What is the minimum precision required for oracle outputs to retain the unrestricted convergence rates?} We characterize this precision for every $p\geq 1$ by deriving information theoretic lower bounds and by providing quantizers that (almost) achieve these lower bounds. Our quantizers are new and easy to implement. In particular, our results are exact for $p=2$ and $p=\infty$, showing the minimum precision needed in these settings are $Θ(d)$ and $Θ(\log d)$, respectively. The latter result is surprising since recovering the gradient vector will require $Ω(d)$ bits.

preprint2020arXiv

Sample-Measurement Tradeoff in Support Recovery under a Subgaussian Prior

Data samples from $\mathbb{R}^{d}$ with a common support of size $k$ are accessed through $m$ random linear projections (measurements) per sample. It is well-known that roughly $k$ measurements from a single sample are sufficient to recover the support. In the multiple sample setting, do $k$ overall measurements still suffice when only $m$ measurements per sample are allowed, with $m<k$? We answer this question in the negative by considering a generative model setting with independent samples drawn from a subgaussian prior. We show that $n=Θ((k^2/m^2)\cdot\log k(d-k))$ samples are necessary and sufficient to recover the support exactly. In turn, this shows that when $m<k$, $k$ overall measurements are insufficient for support recovery; instead we need about $m$ measurements each from $k^{2}/m^2$ samples, i.e., $k^{2}/m$ overall measurements are necessary.

preprint2020arXiv

Tracking an Auto-Regressive Process with Limited Communication per Unit Time

Samples from a high-dimensional AR[1] process are observed by a sender which can communicate only finitely many bits per unit time to a receiver. The receiver seeks to form an estimate of the process value at every time instant in real-time. We consider a time-slotted communication model in a slow-sampling regime where multiple communication slots occur between two sampling instants. We propose a successive update scheme which uses communication between sampling instants to refine estimates of the latest sample and study the following question: Is it better to collect communication of multiple slots to send better refined estimates, making the receiver wait more for every refinement, or to be fast but loose and send new information in every communication opportunity? We show that the fast but loose successive update scheme with ideal spherical codes is universally optimal asymptotically for a large dimension. However, most practical quantization codes for fixed dimensions do not meet the ideal performance required for this optimality, and they typically will have a bias in the form of a fixed additive error. Interestingly, our analysis shows that the fast but loose scheme is not an optimal choice in the presence of such errors, and a judiciously chosen frequency of updates outperforms it.

preprint2016arXiv

Estimating Renyi Entropy of Discrete Distributions

It was recently shown that estimating the Shannon entropy $H({\rm p})$ of a discrete $k$-symbol distribution ${\rm p}$ requires $Θ(k/\log k)$ samples, a number that grows near-linearly in the support size. In many applications $H({\rm p})$ can be replaced by the more general Rényi entropy of order $α$, $H_α({\rm p})$. We determine the number of samples needed to estimate $H_α({\rm p})$ for all $α$, showing that $α< 1$ requires a super-linear, roughly $k^{1/α}$ samples, noninteger $α>1$ requires a near-linear $k$ samples, but, perhaps surprisingly, integer $α>1$ requires only $Θ(k^{1-1/α})$ samples. Furthermore, developing on a recently established connection between polynomial approximation and estimation of additive functions of the form $\sum_{x} f({\rm p}_x)$, we reduce the sample complexity for noninteger values of $α$ by a factor of $\log k$ compared to the empirical estimator. The estimators achieving these bounds are simple and run in time linear in the number of samples. Our lower bounds provide explicit constructions of distributions with different Rényi entropies that are hard to distinguish.

preprint2016arXiv

Information Complexity Density and Simulation of Protocols

Two parties observing correlated random variables seek to run an interactive communication protocol. How many bits must they exchange to simulate the protocol, namely to produce a view with a joint distribution within a fixed statistical distance of the joint distribution of the input and the transcript of the original protocol? We present an information spectrum approach for this problem whereby the information complexity of the protocol is replaced by its information complexity density. Our single-shot bounds relate the communication complexity of simulating a protocol to tail bounds for information complexity density. As a consequence, we obtain a strong converse and characterize the second-order asymptotic term in communication complexity for indepedent and identically distributed observation sequences. Furthermore, we obtain a general formula for the rate of communication complexity which applies to any sequence of observations and protocols. Connections with results from theoretical computer science and implications for the function computation problem are discussed.

preprint2016arXiv

Secret Key Agreement: General Capacity and Second-Order Asymptotics

We revisit the problem of secret key agreement using interactive public communication for two parties and propose a new secret key agreement protocol. The protocol attains the secret key capacity for general observations and attains the second-order asymptotic term in the maximum length of a secret key for independent and identically distributed observations. In contrast to the previously suggested secret key agreement protocols, the proposed protocol uses interactive communication. In fact, the standard one-way communication protocol used prior to this work fails to attain the asymptotic results above. Our converse proofs rely on a recently established upper bound for secret key lengths. Both our lower and upper bounds are derived in a single-shot setup and the asymptotic results are obtained as corollaries.

preprint2016arXiv

Universal Hashing for Information Theoretic Security

The information theoretic approach to security entails harnessing the correlated randomness available in nature to establish security. It uses tools from information theory and coding and yields provable security, even against an adversary with unbounded computational power. However, the feasibility of this approach in practice depends on the development of efficiently implementable schemes. In this article, we review a special class of practical schemes for information theoretic security that are based on 2-universal hash families. Specific cases of secret key agreement and wiretap coding are considered, and general themes are identified. The scheme presented for wiretap coding is modular and can be implemented easily by including an extra pre-processing layer over the existing transmission codes.

preprint2015arXiv

Converses for Secret Key Agreement and Secure Computing

We consider information theoretic secret key agreement and secure function computation by multiple parties observing correlated data, with access to an interactive public communication channel. Our main result is an upper bound on the secret key length, which is derived using a reduction of binary hypothesis testing to multiparty secret key agreement. Building on this basic result, we derive new converses for multiparty secret key agreement. Furthermore, we derive converse results for the oblivious transfer problem and the bit commitment problem by relating them to secret key agreement. Finally, we derive a necessary condition for the feasibility of secure computation by trusted parties that seek to compute a function of their collective data, using an interactive public communication that by itself does not give away the value of the function. In many cases, we strengthen and improve upon previously known converse bounds. Our results are single-shot and use only the given joint distribution of the correlated observations. For the case when the correlated observations consist of independent and identically distributed (in time) sequences, we derive strong versions of previously known converses.

preprint2014arXiv

Strong Converse for a Degraded Wiretap Channel via Active Hypothesis Testing

We establish an upper bound on the rate of codes for a wiretap channel with public feedback for a fixed probability of error and secrecy parameter. As a corollary, we obtain a strong converse for the capacity of a degraded wiretap channel with public feedback. Our converse proof is based on a reduction of active hypothesis testing for discriminating between two channels to coding for wiretap channel with feedback.

preprint2013arXiv

Common Information and Secret Key Capacity

We study the generation of a secret key of maximum rate by a pair of terminals observing correlated sources and with the means to communicate over a noiseless public com- munication channel. Our main result establishes a structural equivalence between the generation of a maximum rate secret key and the generation of a common randomness that renders the observations of the two terminals conditionally independent. The minimum rate of such common randomness, termed interactive common information, is related to Wyner's notion of common information, and serves to characterize the minimum rate of interactive public communication required to generate an optimum rate secret key. This characterization yields a single-letter expression for the aforementioned communication rate when the number of rounds of interaction are bounded. An application of our results shows that interaction does not reduce this rate for binary symmetric sources. Further, we provide an example for which interaction does reduce the minimum rate of communication. Also, certain invariance properties of common information quantities are established that may be of independent interest.

preprint2013arXiv

Distributed Function Computation with Confidentiality

A set of terminals observe correlated data and seek to compute functions of the data using interactive public communication. At the same time, it is required that the value of a private function of the data remains concealed from an eavesdropper observing this communication. In general, the private function and the functions computed by the nodes can be all different. We show that a class of functions are securely computable if and only if the conditional entropy of data given the value of private function is greater than the least rate of interactive communication required for a related multiterminal source-coding task. A single-letter formula is provided for this rate in special cases.

preprint2013arXiv

How Many Queries Will Resolve Common Randomness?

A set of m terminals, observing correlated signals, communicate interactively to generate common randomness for a given subset of them. Knowing only the communication, how many direct queries of the value of the common randomness will resolve it? A general upper bound, valid for arbitrary signal alphabets, is developed for the number of such queries by using a query strategy that applies to all common randomness and associated communication. When the underlying signals are independent and identically distributed repetitions of m correlated random variables, the number of queries can be exponential in signal length. For this case, the mentioned upper bound is tight and leads to a single-letter formula for the largest query exponent, which coincides with the secret key capacity of a corresponding multiterminal source model. In fact, the upper bound constitutes a strong converse for the optimum query exponent, and implies also a new strong converse for secret key capacity. A key tool, estimating the size of a large probability set in terms of Renyi entropy, is interpreted separately, too, as a lossless block coding result for general sources. As a particularization, it yields the classic result for a discrete memoryless source.

preprint2010arXiv

When is a Function Securely Computable?

A subset of a set of terminals that observe correlated signals seek to compute a given function of the signals using public communication. It is required that the value of the function be kept secret from an eavesdropper with access to the communication. We show that the function is securely computable if and only if its entropy is less than the "aided secret key" capacity of an associated secrecy generation model, for which a single-letter characterization is provided.

Himanshu Tyagi

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

The Structure and Kinematics of Three Class 0 Protostellar Jets from JWST

The Role of Interactivity in Structured Estimation

Inference under Information Constraints III: Local Privacy Constraints

Multiple Support Recovery Using Very Few Measurements Per Sample

Communication Complexity of Distributed High Dimensional Correlation Testing

How Reliable are Test Numbers for Revealing the COVID-19 Ground Truth and Applying Interventions?

Limits on Gradient Compression for Stochastic Optimization

Sample-Measurement Tradeoff in Support Recovery under a Subgaussian Prior

Tracking an Auto-Regressive Process with Limited Communication per Unit Time

Estimating Renyi Entropy of Discrete Distributions

Information Complexity Density and Simulation of Protocols

Secret Key Agreement: General Capacity and Second-Order Asymptotics

Universal Hashing for Information Theoretic Security

Converses for Secret Key Agreement and Secure Computing

Strong Converse for a Degraded Wiretap Channel via Active Hypothesis Testing

Common Information and Secret Key Capacity

Distributed Function Computation with Confidentiality

How Many Queries Will Resolve Common Randomness?

When is a Function Securely Computable?