Source author record

Subhroshekhar Ghosh

Subhroshekhar Ghosh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.PR math.ST Statistics Theory Computation cond-mat.dis-nn econ.EM eess.AS eess.SP Information Theory math.IT Methodology quant-ph Sound

Catalog footprint

What is connected

10works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

Determinantal point processes (DPPs) have emerged as a kernelized alternative to vanilla independent sampling for generating efficient minibatches, coresets and other parsimonious representations of large-scale datasets. While theoretical foundations and promising empirical performance have been demonstrated, there are two challenges for current proposals for DPP-based coresets or minibatches. The first is the need for families of DPPs with certain key variance reduction properties, usually constructed in a continuous setting, of which there are few known examples. The second is the need for an ad-hoc construction of a discrete DPP defined on a given dataset, that inherits such variance reduction. In this work, we contribute to the programme of establishing DPPs as a subsampling toolbox for ML by advancing on these two fronts. First, we propose new DPPs on the Euclidean space based on wavelets, with provably better accuracy guarantees than the best known rates. Second, we introduce a general method to convert such continuous DPPs, which are more amenable to proving analytical statements, into discrete kernels, which are pertinent for subsampling tasks such as minibatch and coreset constructions. This conversion mechanism simultaneously preserves the desired variance decay and reveals a low-rank decomposition of the discrete kernel, which makes sampling the corresponding DPP computationally inexpensive. En route, we enlarge the class of ML tasks amenable to improvements via DPP-based minibatches and coresets to include objective functions with arbitrarily low regularity, and rate guarantees that explicitly adapt to this regularity.

preprint2022arXiv

Fluctuation and Entropy in Spectrally Constrained random fields

We investigate the statistical properties of translation invariant random fields (including point processes) on Euclidean spaces (or lattices) under constraints on their spectrum or structure function. An important class of models that motivate our study are hyperuniform and stealthy hyperuniform systems, which are characterised by the vanishing of the structure function at the origin (resp., vanishing in a neighbourhood of the origin). We show that many key features of two classical statistical mechanical measures of randomness - namely, fluctuations and entropy, are governed only by some particular local aspects of their structure function. We obtain exponents for the fluctuations of the local mass in domains of growing size, and show that spatial geometric considerations play an important role - both the shape of the domain and the mode of spectral decay. In doing so, we unveil intriguing oscillatory behaviour of spatial correlations of local masses in adjacent box domains. We describe very general conditions under which we show that the field of local masses exhibit Gaussian asymptotics, with an explicitly described limit. We further demonstrate that stealthy hyperuniform systems with joint densities exhibit degeneracy in their asymptotic entropy per site. In fact, our analysis shows that entropic degeneracy sets in under much milder conditions than stealthiness, as soon as the structure function fails to be logarithmically integrable.

preprint2022arXiv

Fractal Gaussian Networks: A sparse random graph model based on Gaussian Multiplicative Chaos

We propose a novel stochastic network model, called Fractal Gaussian Network (FGN), that embodies well-defined and analytically tractable fractal structures. Such fractal structures have been empirically observed in diverse applications. FGNs interpolate continuously between the popular purely random geometric graphs (a.k.a. the Poisson Boolean network), and random graphs with increasingly fractal behavior. In fact, they form a parametric family of sparse random geometric graphs that are parametrized by a fractality parameter which governs the strength of the fractal structure. FGNs are driven by the latent spatial geometry of Gaussian Multiplicative Chaos (GMC), a canonical model of fractality in its own right. We asymptotically characterize the expected number of edges, triangles, cliques and hub-and-spoke motifs in FGNs, unveiling a distinct pattern in their scaling with the size parameter of the network. We then examine the natural question of detecting the presence of fractality and the problem of parameter estimation based on observed network data, in addition to fundamental properties of the FGN as a random graph model. We also explore fractality in community structures by unveiling a natural stochastic block model in the setting of FGNs. Finally, we substantiate our results with phenomenological analysis of the FGN in the context of available scientific literature for fractality in networks, including applications to real-world massive network data.

preprint2022arXiv

Generative Principal Component Analysis

In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.

preprint2022arXiv

Learning with latent group sparsity via heat flow dynamics on networks

Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlying network with a related community structure, and proceeds by directly incorporating this into a penalty that is effectively computed via a heat flow-based local network dynamics. In fact, we demonstrate a procedure to construct such a network based on the available data. Notably, we dispense with computationally intensive pre-processing involving clustering of variables, spectral or otherwise. Our technique is underpinned by rigorous theorems that guarantee its effective performance and provide bounds on its sample complexity. In particular, in a wide range of settings, it provably suffices to run the heat flow dynamics for time that is only logarithmic in the problem dimensions. We explore in detail the interfaces of our approach with key statistical physics models in network science, such as the Gaussian Free Field and the Stochastic Block Model. We validate our approach by successful applications to real-world data from a wide array of application domains, including computer science, genetics, climatology and economics. Our work raises the possibility of applying similar diffusion-based techniques to classical learning tasks, exploiting the interplay between geometric, dynamical and stochastic structures underlying the data.

preprint2022arXiv

Signal Analysis via the Stochastic Geometry of Spectrogram Level Sets

Spectrograms are fundamental tools in time-frequency analysis, being the squared magnitude of the so-called short time Fourier transform (STFT). Signal analysis via spectrograms has traditionally explored their peaks, i.e. their maxima. This is complemented by a recent interest in their zeros or minima, following seminal work by Flandrin and others, which exploits connections with Gaussian analytic functions (GAFs). However, the zero sets (or extrema) of GAFs have a complicated stochastic structure, complicating any direct theoretical analysis. Standard techniques largely rely on statistical observables from the analysis of spatial data, whose distributional properties for spectrograms are mostly understood only at an empirical level. In this work, we investigate spectrogram analysis via an examination of the stochastic geometric properties of their level sets. We obtain rigorous theorems demonstrating the efficacy of a spectrogram level sets based approach to the detection and estimation of signals, framed in a concrete inferential set-up. Exploiting these ideas as theoretical underpinnings, we propose a level sets based algorithm for signal analysis that is intrinsic to given spectrogram data, and substantiate its effectiveness via extensive empirical studies. Our results also have theoretical implications for spectrogram zero based approaches to signal analysis. To our knowledge, these results are arguably among the first to provide a rigorous statistical understanding of signal detection and reconstruction in this set up, complemented with provable guarantees on detection thresholds and rates of convergence.

preprint2020arXiv

Learning from DPPs via Sampling: Beyond HKPV and symmetry

Determinantal point processes (DPPs) have become a significant tool for recommendation systems, feature selection, or summary extraction, harnessing the intrinsic ability of these probabilistic models to facilitate sample diversity. The ability to sample from DPPs is paramount to the empirical investigation of these models. Most exact samplers are variants of a spectral meta-algorithm due to Hough, Krishnapur, Peres and Virág (henceforth HKPV), which is in general time and resource intensive. For DPPs with symmetric kernels, scalable HKPV samplers have been proposed that either first downsample the ground set of items, or force the kernel to be low-rank, using e.g. Nyström-type decompositions. In the present work, we contribute a radically different approach than HKPV. Exploiting the fact that many statistical and learning objectives can be effectively accomplished by only sampling certain key observables of a DPP (so-called linear statistics), we invoke an expression for the Laplace transform of such an observable as a single determinant, which holds in complete generality. Combining traditional low-rank approximation techniques with Laplace inversion algorithms from numerical analysis, we show how to directly approximate the distribution function of a linear statistic of a DPP. This distribution function can then be used in hypothesis testing or to actually sample the linear statistic, as per requirement. Our approach is scalable and applies to very general DPPs, beyond traditional symmetric kernels.

preprint2020arXiv

Transmission and navigation on disordered lattice networks, directed spanning forests and Brownian web

Stochastic networks based on random point sets as nodes have attracted considerable interest in many applications, particularly in communication networks, including wireless sensor networks, peer-to-peer networks and so on. The study of such networks generally requires the nodes to be independently and uniformly distributed as a Poisson point process. In this work, we venture beyond this standard paradigm and investigate the stochastic geometry of networks obtained from \textit{directed spanning forests} (DSF) based on randomly perturbed lattices, which have desirable statistical properties as a models of spatially dependent point fields. In the regime of low disorder, we show in 2D and 3D that the DSF almost surely consists of a single tree. In 2D, we further establish that the DSF, as a collection of paths, converges under diffusive scaling to the Brownian web.

preprint2016arXiv

Multivariate CLT follows from strong Rayleigh property

Let $(X_1 , \ldots , X_d)$ be random variables taking nonnegative integer values and let $f(z_1, \ldots , z_d)$ be the probability generating function. Suppose that $f$ is real stable; equivalently, suppose that the polarization of this probability distribution is strong Rayleigh. In specific examples, such as occupation counts of disjoint sets by a determinantal point process, it is known~\cite{soshnikov02} that the joint distribution must approach a multivariate Gaussian distribution. We show that this conclusion follows already from stability of $f$.

preprint2005arXiv

Quantum algorithm to distinguish Boolean functions of different weights

We exploit Grover operator of database search algorithm for weight decision algorithm. In this research, weight decision problem is to find an exact weight w from given two weights as w1 and w2 where w1+w2=1 and 0<w1<w2<1. Firstly, if a Boolean function is given and when weights are {1/4,3/4}, we can find w with only one application of Grover operator. Secondly, if we apply k many times of Grover operator, we can decide w from the set of weights {sin^2(\frac{k}{2k+1}\fracπ{2}) cos^2(\frac{k}{2k+1}\fracπ{2})}. Finally, by changing the last two Grover operators with two phase conditions, we can decide w from given any set of two weights. To decide w with a sure success, if the quantum algorithm requires O(k) Grover steps, then the best known classical algorithm requires Ω(k^s) steps where s>2. Hence the quantum algorithm achieves at least quadratic speedup.

Subhroshekhar Ghosh

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

Fluctuation and Entropy in Spectrally Constrained random fields

Fractal Gaussian Networks: A sparse random graph model based on Gaussian Multiplicative Chaos

Generative Principal Component Analysis

Learning with latent group sparsity via heat flow dynamics on networks

Signal Analysis via the Stochastic Geometry of Spectrogram Level Sets

Learning from DPPs via Sampling: Beyond HKPV and symmetry

Transmission and navigation on disordered lattice networks, directed spanning forests and Brownian web

Multivariate CLT follows from strong Rayleigh property

Quantum algorithm to distinguish Boolean functions of different weights