Researcher profile

Bhaswar B. Bhattacharya

Bhaswar B. Bhattacharya contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

Fluctuations of Subgraph Counts in Graphon Based Random Graphs

Given a graphon $W$ and a finite simple graph $H$, with vertex set $V(H)$, denote by $X_n(H, W)$ the number of copies of $H$ in a $W$-random graph on $n$ vertices. The asymptotic distribution of $X_n(H, W)$ was recently obtained by Hladký, Pelekis, and Šileikis (2021) in the case where $H$ is a clique. In this paper, we extend this result to any fixed graph $H$. Towards this we introduce a notion of $H$-regularity of graphons and show that if the graphon $W$ is not $H$-regular, then $X_n(H, W)$ has Gaussian fluctuations with scaling $n^{|V(H)|-\frac{1}{2}}$. On the other hand, if $W$ is $H$-regular, then the fluctuations are of order $n^{|V(H)|-1}$ and the limiting distribution of $X_n(H, W)$ can have both Gaussian and non-Gaussian components, where the non-Gaussian component is a (possibly) infinite weighted sum of centered chi-squared random variables with the weights determined by the spectral properties of a graphon derived from $W$. Our proofs use the asymptotic theory of generalized $U$-statistics developed by Janson and Nowicki (1991). We also investigate the structure of $H$-regular graphons for which either the Gaussian or the non-Gaussian component of the limiting distribution (but not both) is degenerate. Interestingly, there are also $H$-regular graphons $W$ for which both the Gaussian or the non-Gaussian components are degenerate, that is, $X_n(H, W)$ has a degenerate limit even under the scaling $n^{|V(H)|-1}$. We give an example of this degeneracy with $H=K_{1, 3}$ (the 3-star) and also establish non-degeneracy in a few examples. This naturally leads to interesting open questions on higher-order degeneracies.

preprint2022arXiv

Phase Transitions of the Maximum Likelihood Estimates in the $p$-Spin Curie-Weiss Model

In this paper we consider the problem of parameter estimation in the $p$-spin Curie-Weiss model, for $p \geq 3$. We provide a complete description of the limiting properties of the maximum likelihood (ML) estimates of the inverse temperature and the magnetic field given a single realization from the $p$-spin Curie-Weiss model, complementing the well-known results in the 2-spin case (Comets and Gidas (1991)). Our results unearth various new phase transitions and surprising limit theorems, such as the existence of a 'critical' curve in the parameter space, where the limiting distribution of the ML estimates is a mixture with both continuous and discrete components. The number of mixture components is either two or three, depending on, among other things, the sign of one of the parameters and the parity of $p$. Another interesting revelation is the existence of certain 'special' points in the parameter space where the ML estimates exhibit a superefficiency phenomenon, converging to a non-Gaussian limiting distribution at rate $N^{\frac{3}{4}}$. Using these results we can obtain asymptotically valid confidence intervals for the inverse temperature and the magnetic field at all points in the parameter space where consistent estimation is possible.

preprint2022arXiv

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimensional discrete distributions (multinomials) under sparse alternatives. More precisely, we derive sharp detection thresholds for testing, based on $n$ samples, whether a discrete distribution supported on $d$ elements differs from the uniform distribution only in $s$ (out of the $d$) coordinates and is $\varepsilon$-far (in total variation distance) from uniformity. Our results reveal various interesting phase transitions which depend on the interplay of the sample size $n$ and the signal strength $\varepsilon$ with the dimension $d$ and the sparsity level $s$. For instance, if the sample size is less than a threshold (which depends on $d$ and $s$), then all tests are asymptotically powerless, irrespective of the magnitude of the signal strength. On the other hand, if the sample size is above the threshold, then the detection boundary undergoes a further phase transition depending on the signal strength. Here, a $χ^2$-type test attains the detection boundary in the dense regime, whereas in the sparse regime a Bonferroni correction of two maximum-type tests and a version of the Higher Criticism test is optimal up to sharp constants. These results combined provide a complete description of the phase diagram for the sparse uniformity testing problem across all regimes of the parameters $n$, $d$, and $s$. One of the challenges in dealing with multinomials is that the parameters are always constrained to lie in the simplex. This results in the aforementioned two-layered phase transition, a new phenomenon which does not arise in classical high-dimensional sparse testing problems.

preprint2021arXiv

Asymptotic Distribution and Detection Thresholds for Two-Sample Tests Based on Geometric Graphs

In this paper, we consider the problem of testing the equality of two multivariate distributions based on geometric graphs constructed using the interpoint distances between the observations. These include the tests based on the minimum spanning tree and the $K$-nearest neighbor (NN) graphs, among others. These tests are asymptotically distribution-free, universally consistent and computationally efficient, making them particularly useful in modern applications. However, very little is known about the power properties of these tests. In this paper, using the theory of stabilizing geometric graphs, we derive the asymptotic distribution of these tests under general alternatives, in the Poissonized setting. Using this, the detection threshold and the limiting local power of the test based on the $K$-NN graph are obtained, where interesting exponents depending on dimension emerge. This provides a way to compare and justify the performance of these tests in different examples.

preprint2021arXiv

Monochromatic Subgraphs in Randomly Colored Graphons

Let $T(H, G_n)$ be the number of monochromatic copies of a fixed connected graph $H$ in a uniformly random coloring of the vertices of the graph $G_n$. In this paper we give a complete characterization of the limiting distribution of $T(H, G_n)$, when $\{G_n\}_{n \geq 1}$ is a converging sequence of dense graphs. When the number of colors grows to infinity, depending on whether the expected value remains bounded, $T(H, G_n)$ either converges to a finite linear combination of independent Poisson variables or a normal distribution. On the other hand, when the number of colors is fixed, $T(H, G_n)$ converges to a (possibly infinite) linear combination of independent centered chi-squared random variables. This generalizes the classical birthday problem, which involves understanding the asymptotics of $T(K_s, K_n)$, the number of monochromatic $s$-cliques in a complete graph $K_n$ ($s$-matching birthdays among a group of $n$ friends), to general monochromatic subgraphs in a network.

preprint2020arXiv

A Nearest-Neighbor Based Nonparametric Test for Viral Remodeling in Heterogeneous Single-Cell Proteomic Data

An important problem in contemporary immunology studies based on single-cell protein expression data is to determine whether cellular expressions are remodeled post infection by a pathogen. One natural approach for detecting such changes is to use non-parametric two-sample statistical tests. However, in single-cell studies, direct application of these tests is often inadequate because single-cell level expression data from uninfected populations often contains attributes of several latent sub-populations with highly heterogeneous characteristics. As a result, viruses often infect these different sub-populations at different rates in which case the traditional nonparametric two-sample tests for checking similarity in distributions are no longer conservative. We propose a new nonparametric method for Testing Remodeling Under Heterogeneity (TRUH) that can accurately detect changes in the infected samples compared to possibly heterogeneous uninfected samples. Our testing framework is based on composite nulls and is designed to allow the null model to encompass the possibility that the infected samples, though unaltered by the virus, might be dominantly arising from under-represented sub-populations in the baseline data. The TRUH statistic, which uses nearest neighbor projections of the infected samples into the baseline uninfected population, is calibrated using a novel bootstrap algorithm. We demonstrate the non-asymptotic performance of the test via simulation experiments and derive the large sample limit of the test statistic, which provides theoretical support towards consistent asymptotic calibration of the test. We use the TRUH statistic for studying remodeling in tonsillar T cells under different types of HIV infection and find that unlike traditional tests, TRUH based statistical inference conforms to the biologically validated immunological theories on HIV infection.

preprint2020arXiv

A Nonparametric Likelihood Approach for Inference in Instrumental Variable Models

Instrumental variable methods allow for inference about the treatment effect by controlling for unmeasured confounding in randomized experiments with noncompliance. However, many studies do not consider the observed compliance behavior in the testing procedure, which can lead to a loss of power. In this paper, we propose a novel nonparametric likelihood approach, referred to as the binomial likelihood (BL) method, that incorporates information on compliance behavior while overcoming several limitations of previous techniques and utilizing the advantages of likelihood methods. Our proposed method produces proper estimates of the counterfactual distribution functions by maximizing the binomial likelihood over the space of distribution functions. Using this we propose two versions of a binomial likelihood ratio test for the null hypothesis of no treatment effect. We show that both versions are more powerful to detect any distributional change than existing methods in finite sample cases, and are asymptotically equivalent to the two-sample Anderson-Darling test. We also develop an efficient algorithm for computing our estimates, and apply the binomial likelihood method to a study of the effect of Medicaid coverage on mental health using the Oregon Health Insurance Experiment.

preprint2020arXiv

Estimation in Tensor Ising Models

The $p$-tensor Ising model is a one-parameter discrete exponential family for modeling dependent binary data, where the sufficient statistic is a multi-linear form of degree $p \geq 2$. This is a natural generalization of the matrix Ising model, that provides a convenient mathematical framework for capturing higher-order dependencies in complex relational data. In this paper, we consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes. Our estimate is based on the maximum pseudo-likelihood (MPL) method, which provides a computationally efficient algorithm for estimating the parameter that avoids computing the intractable partition function. We derive general conditions under which the MPL estimate is $\sqrt N$-consistent, that is, it converges to the true parameter at rate $1/\sqrt N$. In particular, we show the $\sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model, spin systems on general $p$-uniform hypergraphs, and Ising models on the hypergraph stochastic block model (HSBM). In fact, for the HSBM we pin down the exact location of the phase transition threshold, which is determined by the positivity of a certain mean-field variational problem, such that above this threshold the MPL estimate is $\sqrt N$-consistent, while below the threshold no estimator is consistent. Finally, we derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model. An interesting consequence of our results is that the MPL estimate in the Curie-Weiss model saturates the Cramer-Rao lower bound at all points above the estimation threshold, that is, the MPL estimate incurs no loss in asymptotic efficiency, even though it is obtained by minimizing only an approximation of the true likelihood function for computational tractability.

preprint2020arXiv

Exact and Asymptotic Results on Coarse Ricci Curvature of Graphs

Ricci curvature was proposed by Ollivier in a general framework of metric measure spaces, and it has been studied extensively in the context of graphs in recent years. In this paper we prove upper bounds for Ollivier's Ricci curvature for bipartite graphs and for the graphs with girth at least 5. We also prove a general lower bound on the Ricci curvature in terms of the size of the maximum matching in an appropriate subgraph. As a consequence, we characterize the Ricci-flat graphs of girth 5. Moreover, using our general lower bound and the Birkhoff-von Neumann theorem, we give a necessary and sufficient condition for the structure of Ricci-flat regular graphs of girth 4. Finally, we obtain the asymptotic Ricci curvature of random bipartite graphs $G(n,n, p)$ and random graphs $G(n, p)$, in various regimes of $p$.

preprint2020arXiv

Geometric Systems of Unbiased Representatives

Let $P$ be a set of points in $\mathbb{R}^d$, $B$ a bicoloring of $P$ and $\Oo$ a family of geometric objects (that is, intervals, boxes, balls, etc). An object from $\Oo$ is called balanced with respect to $B$ if it contains the same number of points from each color of $B$. For a collection $\B$ of bicolorings of $P$, a geometric system of unbiased representatives (G-SUR) is a subset $\Oo'\subseteq\Oo$ such that for any bicoloring $B$ of $\B$ there is an object in $\Oo'$ that is balanced with respect to $B$. We study the problem of finding G-SURs. We obtain general bounds on the size of G-SURs consisting of intervals, size-restricted intervals, axis-parallel boxes and Euclidean balls. We show that the G-SUR problem is NP-hard even in the simple case of points on a line and interval ranges. Furthermore, we study a related problem on determining the size of the largest and smallest balanced intervals for points on the real line with a random distribution and coloring. Our results are a natural extension to a geometric context of the work initiated by Balachandran et al. on arbitrary systems of unbiased representatives.

preprint2020arXiv

Normal Approximation and Fourth Moment Theorems for Monochromatic Triangles

Given a graph sequence $\{G_n\}_{n \geq 1}$ denote by $T_3(G_n)$ the number of monochromatic triangles in a uniformly random coloring of the vertices of $G_n$ with $c \geq 2$ colors. This arises as a generalization of the birthday paradox, where $G_n$ corresponds to a friendship network and $T_3(G_n)$ counts the number of triples of friends with matching birthdays. In this paper we prove a central limit theorem (CLT) for $T_3(G_n)$ with explicit error rates. The proof involves constructing a martingale difference sequence by carefully ordering the vertices of $G_n$, based on a certain combinatorial score function, and using a quantitive version of the martingale CLT. We then relate this error term to the well-known fourth moment phenomenon, which, interestingly, holds only when the number of colors $c \geq 5$. We also show that the convergence of the fourth moment is necessary to obtain a Gaussian limit for any $c \geq 2$, which, together with the above result, implies that the fourth-moment condition characterizes the limiting normal distribution of $T_3(G_n)$, whenever $c \geq 5$. Finally, to illustrate the promise of our approach, we include an alternative proof of the CLT for the number of monochromatic edges, which provides quantitative rates for the results obtained in Bhattacharya et al. (2017).

preprint2020arXiv

Spectral Edge in Sparse Random Graphs: Upper and Lower Tail Large Deviations

In this paper we consider the problem of estimating the joint upper and lower tail large deviations of the edge eigenvalues of an Erdős-Rényi random graph $\mathcal{G}_{n,p}$, in the regime of $p$ where the edge of the spectrum is no longer governed by global observables, such as the number of edges, but rather by localized statistics, such as high degree vertices. Going beyond the recent developments in mean-field approximations of related problems, this paper provides a comprehensive treatment of the large deviations of the spectral edge in this entire regime, which notably includes the well studied case of constant average degree. In particular, for $r \geq 1$ fixed, we pin down the asymptotic probability that the top $r$ eigenvalues are jointly greater/less than their typical values by multiplicative factors bigger/smaller than $1$, in the regime mentioned above. The proof for the upper tail relies on a novel structure theorem, obtained by building on estimates of Krivelevich and Sudakov (2003), followed by an iterative cycle removal process, which shows, conditional on the upper tail large deviation event, with high probability the graph admits a decomposition in to a disjoint union of stars and a spectrally negligible part. On the other hand, the key ingredient in the proof of the lower tail is a Ramsey-type result which shows that if the $K$-th largest degree of a graph is not atypically small (for some large $K$ depending on $r$), then either the top eigenvalue or the $r$-th largest eigenvalue is larger than that allowed by the lower tail event on the top $r$ eigenvalues, thus forcing a contradiction. The above arguments reduce the problems to developing a large deviation theory for the extremal degrees which could be of independent interest.

preprint2020arXiv

The Second Moment Phenomenon for Monochromatic Subgraphs

What is the chance that among a group of $n$ friends, there are $s$ friends all of whom have the same birthday? This is the celebrated birthday problem which can be formulated as the existence of a monochromatic $s$-clique $K_s$ ($s$-matching birthdays) in the complete graph $K_n$, where every vertex of $K_n$ is uniformly colored with $365$ colors (corresponding to birthdays). More generally, for a general connected graph $H$, let $T(H, G_n)$ be the number of monochromatic copies of $H$ in a uniformly random coloring of the vertices of the graph $G_n$ with $c_n$ colors. In this paper we show that $T(H, G_n)$ converges to $\mathrm{Pois}(λ)$ whenever $\mathbb E T(H, G_n) \rightarrow λ$ and $\mathrm{Var} T(H, G_n) \rightarrow λ$, that is, the asymptotic Poisson distribution of $T(H, G_n)$ is determined just by the convergence of its mean and variance. Moreover, this condition is necessary if and only if $H$ is a star-graph. In fact, the second-moment phenomenon is a consequence of a more general theorem about the convergence of $T(H,G_n)$ to a finite linear combination of independent Poisson random variables. As an application, we derive the limiting distribution of $T(H, G_n)$, when $G_n\sim G(n, p)$ is the Erd\H os-Rényi random graph. Multiple phase-transitions emerge as $p$ varies from 0 to 1, depending on whether the graph $H$ is balanced or unbalanced.