Researcher profile

Subhadeep Paul

Subhadeep Paul contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Recidivism and Peer Influence with LLM Text Embeddings in Low Security Correctional Facilities

Studying peer effects in language is critical because they often reflect behavioral and personality traits that are important determinants of economic outcomes. However, language is unstructured, non-numeric, and high-dimensional. We combine Large Language Model (LLM) embeddings with structural econometric identification to provide a unified framework for identifying peer effects in language. This unified framework is applied to 80,000-120,000 written exchanges among residents of low security correctional facilities. The LLM language profiles predict three-year recidivism 30\% more accurately than pre-entry covariates alone, showing that text representations capture meaningful signals. We analyze peer effects on multidimensional language embeddings while addressing network endogeneity. We develop novel instrumental variable estimators for peer effects that accommodate multivariate outcomes, sparse networks, and multidimensional latent variables. Our methods achieve root-N consistency and asymptotic normality under realistic sparsity conditions, relaxing the dense-network assumption. Results reveal significant peer effects in residents' language profiles.

preprint2026arXiv

Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning

Federated Learning is a leading framework for training ML and AI models collaboratively across numerous user devices or databases. We study the trade-offs among estimation accuracy, privacy constraints, and communication cost for differentially private (DP) federated M estimation. The two standard methods in the literature are FedAvg, which may suffer from high federation bias, and FedSGD, which can incur high communication cost. Aimed at improving accuracy at a reduced communication cost, we propose FedHybrid, which uses FedSGD starting with an improved initialization by the FedAvg estimator. We propose FedNewton, which averages local Newton iterations to reduce bias in FedAvg, achieving an estimation accuracy comparable to FedSGD with much fewer communication rounds when the number of clients grows sufficiently slowly. We establish finite sample upper bounds on the mean-squared error rates of the DP versions of these estimators as functions of the number of clients, local sample sizes, privacy budget, and number of iterations. We further derive a minimax lower bound on the MSE of any iterative private federated procedure that provides a benchmark to assess the optimality gap of these methods. We numerically evaluate our methods for training a logistic regression and a neural network on the computer vision datasets MNIST and CIFAR-10.

preprint2022arXiv

A Mutually Exciting Latent Space Hawkes Process Model for Continuous-time Networks

Networks and temporal point processes serve as fundamental building blocks for modeling complex dynamic relational data in various domains. We propose the latent space Hawkes (LSH) model, a novel generative model for continuous-time networks of relational events, using a latent space representation for nodes. We model relational events between nodes using mutually exciting Hawkes processes with baseline intensities dependent upon the distances between the nodes in the latent space and sender and receiver specific effects. We demonstrate that our proposed LSH model can replicate many features observed in real temporal networks including reciprocity and transitivity, while also achieving superior prediction accuracy and providing more interpretable fits than existing models.

preprint2022arXiv

The Multivariate Community Hawkes Model for Dependent Relational Events in Continuous-time Networks

The stochastic block model (SBM) is one of the most widely used generative models for network data. Many continuous-time dynamic network models are built upon the same assumption as the SBM: edges or events between all pairs of nodes are conditionally independent given the block or community memberships, which prevents them from reproducing higher-order motifs such as triangles that are commonly observed in real networks. We propose the multivariate community Hawkes (MULCH) model, an extremely flexible community-based model for continuous-time networks that introduces dependence between node pairs using structured multivariate Hawkes processes. We fit the model using a spectral clustering and likelihood-based local refinement procedure. We find that our proposed MULCH model is far more accurate than existing models both for predictive and generative tasks.

preprint2021arXiv

Joint Latent Space Model for Social Networks with Multivariate Attributes

In many application problems in social, behavioral, and economic sciences, researchers often have data on a social network among a group of individuals along with high dimensional multivariate measurements for each individual. To analyze such networked data structures, we propose a joint Attribute and Person Latent Space Model (APLSM) that summarizes information from the social network and the multiple attribute measurements in a person-attribute joint latent space. We develop a Variational Bayesian Expectation-Maximization estimation algorithm to estimate the posterior distribution of the attribute and person locations in the joint latent space. This methodology allows for effective integration, informative visualization, and prediction of social networks and high dimensional attribute measurements. Using APLSM, we explore the inner workings of the French financial elites based on their social networks and their career, political views, and social status. We observe a division in the social circles of the French elites in accordance with the differences in their individual characteristics.

preprint2020arXiv

A random effects stochastic block model for joint community detection in multiple networks with applications to neuroimaging

Motivated by multi-subject experiments in neuroimaging studies, we develop a modeling framework for joint community detection in a group of related networks, which can be considered as a sample from a population of networks. The proposed random effects stochastic block model facilitates the study of group differences and subject-specific variations in the community structure. The model proposes a putative mean community structure which is representative of the group or the population under consideration but is not the community structure of any individual component network. Instead, the community memberships of nodes vary in each component network with a transition matrix, thus modeling the variation in community structure across a group of subjects. To estimate the quantities of interest we propose two methods, a variational EM algorithm, and a model-free "two-step" method based on either spectral or non-negative matrix factorization (NMF). Our NMF based method Co-OSNTF is of independent interest and we study its convergence properties to a stationary point. We also develop a resampling-based hypothesis test for differences in community structure in two populations both at the whole network level and node level. The methodology is applied to a publicly available fMRI dataset from multi-subject experiments involving schizophrenia patients. Our methods reveal an overall putative community structure representative of the group as well as subject-specific variations within each group. Using our network level hypothesis tests we are able to ascertain statistically significant difference in community structure between the two groups, while our node level tests help determine the nodes that are driving the difference.