Researcher profile

Sudipto Banerjee

Sudipto Banerjee contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

On Statistical Inference for Rates of Change in Spatial Processes over Riemannian Manifolds

Statistical inference for spatial processes from partially realized or scattered data has seen voluminous developments in diverse areas ranging from environmental sciences to business and economics. Inference on the associated rates of change has seen some recent developments. The literature has been restricted to Euclidean domains, where inference is sought on directional derivatives, rates along a chosen direction of interest, at arbitrary locations. Inference for higher order rates, particularly directional curvature has also proved useful in these settings. Modern spatial data often arise from non-Euclidean domains. This manuscript particularly considers spatial processes defined over compact Riemannian manifolds. We develop a comprehensive inferential framework for spatial rates of change for such processes over vector fields. In doing so, we formalize smoothness of process realizations and construct differential processes -- the derivative and curvature processes. We derive conditions for kernels that ensure the existence of these processes and establish validity of the joint multivariate process consisting of the ``parent'' Gaussian process (GP) over the manifold and the associated differential processes. Predictive inference on these rates is devised conditioned on the realized process over the manifold. Manifolds arise as polyhedral meshes in practice. The success of our simulation experiments for assessing derivatives for processes observed over such meshes validate our theoretical findings. By enhancing our understanding of GPs on manifolds, this manuscript unlocks a variety of potential applications in machine learning and statistics where GPs have seen wide usage. We propose a fully model-based approach to inference on the differential processes arising from a spatial process from partially observed or realized data across scattered location on a manifold.

preprint2023arXiv

Fixed-Domain Asymptotics Under Vecchia's Approximation of Spatial Process Likelihoods

Statistical modeling for massive spatial data sets has generated a substantial literature on scalable spatial processes based upon Vecchia's approximation. Vecchia's approximation for Gaussian process models enables fast evaluation of the likelihood by restricting dependencies at a location to its neighbors. We establish inferential properties of microergodic spatial covariance parameters within the paradigm of fixed-domain asymptotics when they are estimated using Vecchia's approximation. The conditions required to formally establish these properties are explored, theoretically and empirically, and the effectiveness of Vecchia's approximation is further corroborated from the standpoint of fixed-domain asymptotics.

preprint2022arXiv

bayesassurance: An R package for calculating sample size and Bayesian assurance

We present a bayesassurance R package that computes the Bayesian assurance under various settings characterized by different assumptions and objectives. The package offers a constructive set of simulation-based functions suitable for addressing a wide range of clinical trial study design problems. We provide a detailed description of the underlying framework embedded within each of the power and assurance functions and demonstrate their usage through a series of worked-out examples. Through these examples, we hope to corroborate the advantages that come with using a two-stage generalized structure. We also illustrate scenarios where the Bayesian assurance and frequentist power overlap, allowing the user to address both Bayesian and classical inference problems provided that the parameters are properly defined. All assurance-related functions included in this R package rely on a two-stage Bayesian method that assigns two distinct priors to evaluate the unconditional probability of observing a positive outcome, which in turn addresses subtle limitations that take place when using the standard single-prior approach.

preprint2021arXiv

A Compartment Model of Human Mobility and Early Covid-19 Dynamics in NYC

In this paper, we build a mechanistic system to understand the relation between a reduction in human mobility and Covid-19 spread dynamics within New York City. To this end, we propose a multivariate compartmental model that jointly models smartphone mobility data and case counts during the first 90 days of the epidemic. Parameter calibration is achieved through the formulation of a general Bayesian hierarchical model to provide uncertainty quantification of resulting estimates. The open-source probabilistic programming language Stan is used for the requisite computation. Through sensitivity analysis and out-of-sample forecasting, we find our simple and interpretable model provides evidence that reductions in human mobility altered case dynamics.

preprint2021arXiv

Hierarchical Multivariate Directed Acyclic Graph Auto-Regressive (MDAGAR) models for spatial diseases mapping

Disease mapping is an important statistical tool used by epidemiologists to assess geographic variation in disease rates and identify lurking environmental risk factors from spatial patterns. Such maps rely upon spatial models for regionally aggregated data, where neighboring regions tend to exhibit similar outcomes than those farther apart. We contribute to the literature on multivariate disease mapping, which deals with measurements on multiple (two or more) diseases in each region. We aim to disentangle associations among the multiple diseases from spatial autocorrelation in each disease. We develop Multivariate Directed Acyclic Graphical Autoregression (MDAGAR) models to accommodate spatial and inter-disease dependence. The hierarchical construction imparts flexibility and richness, interpretability of spatial autocorrelation and inter-disease relationships, and computational ease, but depends upon the order in which the cancers are modeled. To obviate this, we demonstrate how Bayesian model selection and averaging across orders are easily achieved using bridge sampling. We compare our method with a competitor using simulation studies and present an application to multiple cancer mapping using data from the Surveillance, Epidemiology, and End Results (SEER) Program.

preprint2019arXiv

Bayesian spatially varying coefficient models in the spBayes R package

This paper describes and illustrates new functionality for fitting spatially varying coefficients models in the spBayes (version 0.4-2) R package. The new spSVC function uses a computationally efficient Markov chain Monte Carlo algorithm and extends current spBayes functions, that fit only space-varying intercept regression models, to fit independent or multivariate Gaussian process random effects for any set of columns in the regression design matrix. Newly added OpenMP parallelization options for spSVC are discussed and illustrated, as well as helper functions for joint and point-wise prediction and model fit diagnostics. The utility of the proposed models is illustrated using a PM10 analysis over central Europe.

preprint2013arXiv

Modeling temporal gradients in regionally aggregated California asthma hospitalization data

Advances in Geographical Information Systems (GIS) have led to the enormous recent burgeoning of spatial-temporal databases and associated statistical modeling. Here we depart from the rather rich literature in space-time modeling by considering the setting where space is discrete (e.g., aggregated data over regions), but time is continuous. Our major objective in this application is to carry out inference on gradients of a temporal process in our data set of monthly county level asthma hospitalization rates in the state of California, while at the same time accounting for spatial similarities of the temporal process across neighboring counties. Use of continuous time models here allows inference at a finer resolution than at which the data are sampled. Rather than use parametric forms to model time, we opt for a more flexible stochastic process embedded within a dynamic Markov random field framework. Through the matrix-valued covariance function we can ensure that the temporal process realizations are mean square differentiable, and may thus carry out inference on temporal gradients in a posterior predictive fashion. We use this approach to evaluate temporal gradients where we are concerned with temporal changes in the residual and fitted rate curves after accounting for seasonality, spatiotemporal ozone levels and several spatially-resolved important sociodemographic covariates.

preprint2013arXiv

spBayes for large univariate and multivariate point-referenced spatio-temporal data models

In this paper we detail the reformulation and rewrite of core functions in the spBayes R package. These efforts have focused on improving computational efficiency, flexibility, and usability for point-referenced data models. Attention is given to algorithm and computing developments that result in improved sampler convergence rate and efficiency by reducing parameter space; decreased sampler run-time by avoiding expensive matrix computations, and; increased scalability to large datasets by implementing a class of predictive process models that attempt to overcome computational hurdles by representing spatial processes in terms of lower-dimensional realizations. Beyond these general computational improvements for existing model functions, we detail new functions for modeling data indexed in both space and time. These new functions implement a class of dynamic spatio-temporal models for settings where space is viewed as continuous and time is taken as discrete.

preprint2010arXiv

Smoothed ANOVA with spatial effects as a competitor to MCAR in multivariate spatial smoothing

Rapid developments in geographical information systems (GIS) continue to generate interest in analyzing complex spatial datasets. One area of activity is in creating smoothed disease maps to describe the geographic variation of disease and generate hypotheses for apparent differences in risk. With multiple diseases, a multivariate conditionally autoregressive (MCAR) model is often used to smooth across space while accounting for associations between the diseases. The MCAR, however, imposes complex covariance structures that are difficult to interpret and estimate. This article develops a much simpler alternative approach building upon the techniques of smoothed ANOVA (SANOVA). Instead of simply shrinking effects without any structure, here we use SANOVA to smooth spatial random effects by taking advantage of the spatial structure. We extend SANOVA to cases in which one factor is a spatial lattice, which is smoothed using a CAR model, and a second factor is, for example, type of cancer. Datasets routinely lack enough information to identify the additional structure of MCAR. SANOVA offers a simpler and more intelligible structure than the MCAR while performing as well. We demonstrate our approach with simulation studies designed to compare SANOVA with different design matrices versus MCAR with different priors. Subsequently a cancer-surveillance dataset, describing incidence of 3-cancers in Minnesota's 87 counties, is analyzed using both approaches, showing the competitiveness of the SANOVA approach.