Researcher profile

Abhik Ghosh

Abhik Ghosh contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Robust Inference for Non-Linear Regression Models with Applications in Enzyme Kinetics

Despite linear regression being the most popular statistical modelling technique, in real-life we often need to deal with situations where the true relationship between the response and the covariates is nonlinear in parameters. In such cases one needs to adopt appropriate non-linear regression (NLR) analysis, having wider applications in biochemical and medical studies among many others. In this paper we propose a new improved robust estimation and testing methodologies for general NLR models based on the minimum density power divergence approach and apply our proposal to analyze the widely popular Michaelis-Menten (MM) model in enzyme kinetics. We establish the asymptotic properties of our proposed estimator and tests, along with their theoretical robustness characteristics through influence function analysis. For the particular MM model, we have further empirically justified the robustness and the efficiency of our proposed estimator and the testing procedure through extensive simulation studies and several interesting real data examples of enzyme-catalyzed (biochemical) reactions.

preprint2024arXiv

Robust and Efficient Estimation in Ordinal Response Models using the Density Power Divergence

In real life, we frequently come across data sets that involve some independent explanatory variable(s) generating a set of ordinal responses. These ordinal responses may correspond to an underlying continuous latent variable, which is linearly related to the covariate(s), and takes a particular (ordinal) label depending on whether this latent variable takes value in some suitable interval specified by a pair of (unknown) cut-offs. The most efficient way of estimating the unknown parameters (i.e., the regression coefficients and the cut-offs) is the method of maximum likelihood (ML). However, contamination in the data set either in the form of misspecification of ordinal responses, or the unboundedness of the covariate(s), might destabilize the likelihood function to a great extent where the ML based methodology might lead to completely unreliable inferences. In this paper, we explore a minimum distance estimation procedure based on the popular density power divergence (DPD) to yield robust parameter estimates for the ordinal response model. This paper highlights how the resulting estimator, namely the minimum DPD estimator (MDPDE), can be used as a practical robust alternative to the classical procedures based on the ML. We rigorously develop several theoretical properties of this estimator, and provide extensive simulations to substantiate the theory developed.

preprint2022arXiv

Robust and Efficient Parameter Estimation for Discretely Observed Stochastic Processes

In various practical situations, we encounter data from stochastic processes which can be efficiently modelled by an appropriate parametric model for subsequent statistical analyses. Unfortunately, the most common estimation and inference methods based on the maximum likelihood (ML) principle are susceptible to minor deviations from assumed model or data contamination due to their well known lack of robustness. Since the alternative non-parametric procedures often lose significant efficiency, in this paper, we develop a robust parameter estimation procedure for discretely observed data from a parametric stochastic process model which exploits the nice properties of the popular density power divergence measure in the framework of minimum distance inference. In particular, here we define the minimum density power divergence estimators (MDPDE) for the independent increment and the Markov processes. We establish the asymptotic consistency and distributional results for the proposed MDPDEs in these dependent stochastic process set-ups and illustrate their benefits over the usual ML estimator for common examples like Poisson process, drifted Brownian motion and auto-regressive models.

preprint2021arXiv

RaJIVE: Robust Angle Based JIVE for Integrating Noisy Multi-Source Data

With increasing availability of high dimensional, multi-source data, the identification of joint and data specific patterns of variability has become a subject of interest in many research areas. Several matrix decomposition methods have been formulated for this purpose, for example JIVE (Joint and Individual Variation Explained), and its angle based variation, aJIVE. Although the effect of data contamination on the estimated joint and individual components has not been considered in the literature, gross errors and outliers in the data can cause instability in such methods, and lead to incorrect estimation of joint and individual variance components. We focus on the aJIVE factorization method and provide a thorough analysis of the effect outliers on the resulting variation decomposition. After showing that such effect is not negligible when all data-sources are contaminated, we propose a robust extension of aJIVE (RaJIVE) that integrates a robust formulation of the singular value decomposition into the aJIVE approach. The proposed RaJIVE is shown to provide correct decompositions even in the presence of outliers and improves the performance of aJIVE. We use extensive simulation studies with different levels of data contamination to compare the two methods. Finally, we describe an application of RaJIVE to a multi-omics breast cancer dataset from The Cancer Genome Atlas. We provide the R package RaJIVE with a ready-to-use implementation of the methods and documentation of code and examples.

preprint2021arXiv

Strata-based Quantification of Distributional Uncertainty in Socio-Economic Indicators: A Comparative Study of Indian States

This paper reports a comprehensive study of distributional uncertainty in a few socio-economic indicators across the various states of India over the years 2001-2011. We show that the DGB distribution, a typical rank order distribution, provide excellent fits to the district-wise empirical data for the population size, literacy rate (LR) and work participation rate (WPR) within every states in India, through its two distributional parameters. Moreover, taking resort to the entropy formulation of the DGB distribution, a proposed uncertainty percentage (UP) unveils the dynamics of the uncertainty of LR and WPR in all states of India. We have also commented on the changes in the estimated parameters and the UP values from the years 2001 to 2011. Additionally, a gender based analysis of the distribution of these important socio-economic variables within different states of India has also been discussed. Interestingly, it has been observed that, although the distributions of the numbers of literate and working people has a direct (linear) correspondence with that of the population size, the literacy and work-participation rates are distributed independently of the population distributions.

preprint2020arXiv

A robust variable screening procedure for ultra-high dimensional data

Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the sure independence screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. We demonstrate the claimed robustness through use of influence functions, and we discuss appropriate choice of the tuning parameter $α$. Finally, we illustrate its use on a small dataset from a study on regulation of lipid metabolism.

preprint2020arXiv

All sky angular power spectrum: I. Estimating brightness temperature fluctuations using TGSS 150 MHz survey

Measurements of the Galactic synchrotron emission is relevant for the 21-cm studies from the Epoch of Reionization. The study of the synchrotron emission is also useful to quantify the fluctuations in the magnetic field and the cosmic ray electron density of the turbulent interstellar medium (ISM) of our Galaxy. Here, we present the all-sky angular power spectrum $(C_{\ell})$ measurements of the diffuse synchrotron emission using the TIFR GMRT Sky Survey (TGSS) at 150 {\rm MHz}. We estimate $C_{\ell}$ using visibility data both before and after subtracting the modelled point sources. The amplitude of the measured $C_{\ell}$ falls significantly after subtracting the point sources, and it is also slightly higher in the Galactic plane for the residual data. The residual $C_{\ell}$ is most likely to be dominated by the Galactic synchrotron emission. The amplitude of the residual $C_{\ell}$ falls significantly away from the Galactic plane. We find the measurements are quite symmetric in the Northern and Southern hemispheres except in the latitude range $15-30^{\circ}$ which is the transition region from the disk dominated to diffuse halo dominated region. The comparison between this interferometric measurement with the scaled version of the Haslam rms map at 150 {\rm MHz} shows that the correlation coefficient $(r)$ is more than 0.5 for most of the latitude ranges considered here. This signifies the TGSS survey is quite sensitive to the diffuse Galactic synchrotron radiation.

preprint2020arXiv

Consistent Fixed-Effects Selection in Ultra-high dimensional Linear Mixed Models with Error-Covariate Endogeneity

Recently, applied sciences, including longitudinal and clustered studies in biomedicine require the analysis of ultra-high dimensional linear mixed effects models where we need to select important fixed effect variables from a vast pool of available candidates. However, all existing literature assume that all the available covariates and random effect components are independent of the model error which is often violated (endogeneity) in practice. In this paper, we first investigate this important issue in ultra-high dimensional linear mixed effects models with particular focus on the fixed effects selection. We study the effects of different types of endogeneity on existing regularization methods and prove their inconsistencies. Then, we propose a new profiled focused generalized method of moments (PFGMM) approach to consistently select fixed effects under 'error-covariate' endogeneity, i.e., in the presence of correlation between the model error and covariates. Our proposal is proved to be oracle consistent with probability tending to one and works well under most other type of endogeneity too. Additionally, we also propose and illustrate a few consistent parameter estimators, including those of the variance components, along with variable selection through PFGMM. Empirical simulations and an interesting real data example further support the claimed utility of our proposal.

preprint2020arXiv

Demonstrating the Tapered Gridded Estimator (TGE) for the Cosmological HI 21-cm Power Spectrum using $150 \, {\rm MHz}$ GMRT observations

We apply the Tapered Gridded Estimator (TGE) for estimating the cosmological 21-cm power spectrum from $150 \, {\rm MHz}$ GMRT observations which corresponds to the neutral hydrogen (HI) at redshift $z = 8.28$. Here TGE is used to measure the Multi-frequency Angular Power Spectrum (MAPS) $C_{\ell}(Δν)$ first, from which we estimate the 21-cm power spectrum $P(k_{\perp},k_{\parallel})$. The data here are much too small for a detection, and the aim is to demonstrate the capabilities of the estimator. We find that the estimated power spectrum is consistent with the expected foreground and noise behaviour. This demonstrates that this estimator correctly estimates the noise bias and subtracts this out to yield an unbiased estimate of the power spectrum. More than $47\%$ of the frequency channels had to be discarded from the data owing to radio-frequency interference, however the estimated power spectrum does not show any artifacts due to missing channels. Finally, we show that it is possible to suppress the foreground contribution by tapering the sky response at large angular separations from the phase center. We combine the k modes within a rectangular region in the `EoR window' to obtain the spherically binned averaged dimensionless power spectra $Δ^{2}(k)$ along with the statistical error $σ$ associated with the measured $Δ^{2}(k)$. The lowest $k$-bin yields $Δ^{2}(k)=(61.47)^{2}\,{\rm K}^{2}$ at $k=1.59\,\textrm{Mpc}^{-1}$, with $σ=(27.40)^{2} \, {\rm K}^{2}$. We obtain a $2 \, σ$ upper limit of $(72.66)^{2}\,\textrm{K}^{2}$ on the mean squared HI 21-cm brightness temperature fluctuations at $k=1.59\,\textrm{Mpc}^{-1}$.

preprint2020arXiv

Foreground modelling via Gaussian process regression: an application to HERA data

The key challenge in the observation of the redshifted 21-cm signal from cosmic reionization is its separation from the much brighter foreground emission. Such separation relies on the different spectral properties of the two components, although, in real life, the foreground intrinsic spectrum is often corrupted by the instrumental response, inducing systematic effects that can further jeopardize the measurement of the 21-cm signal. In this paper, we use Gaussian Process Regression to model both foreground emission and instrumental systematics in $\sim 2$ hours of data from the Hydrogen Epoch of Reionization Array. We find that a simple co-variance model with three components matches the data well, giving a residual power spectrum with white noise properties. These consist of an "intrinsic" and instrumentally corrupted component with a coherence-scale of 20 MHz and 2.4 MHz respectively (dominating the line of sight power spectrum over scales $k_{\parallel} \le 0.2$ h cMpc$^{-1}$) and a baseline dependent periodic signal with a period of $\sim 1$ MHz (dominating over $k_{\parallel} \sim 0.4 - 0.8$h cMpc$^{-1}$) which should be distinguishable from the 21-cm EoR signal whose typical coherence-scales is $\sim 0.8$ MHz.

preprint2020arXiv

General Robust Bayes Pseudo-Posterior: Exponential Convergence results with Applications

Although Bayesian inference is an immensely popular paradigm among a large segment of scientists including statisticians, most applications consider objective priors and need critical investigations (Efron, 2013, Science). While it has several optimal properties, a major drawback of Bayesian inference is the lack of robustness against data contamination and model misspecification, which becomes pernicious in the use of objective priors. This paper presents the general formulation of a Bayes pseudo-posterior distribution yielding robust inference. Exponential convergence results related to the new pseudo-posterior and the corresponding Bayes estimators are established under the general parametric set-up and illustrations are provided for the independent stationary as well as non-homogeneous models. Several additional details and properties of the procedure are described, including the estimation under fixed-design regression models.

preprint2020arXiv

On regularization methods based on Rényi's pseudodistances for sparse high-dimensional linear regression models

Several regularization methods have been considered over the last decade for sparse high-dimensional linear regression models, but the most common ones use the least square (quadratic) or likelihood loss and hence are not robust against data contamination. Some authors have overcome the problem of non-robustness by considering suitable loss function based on divergence measures (e.g., density power divergence, gamma-divergence, etc.) instead of the quadratic loss. In this paper we shall consider a loss function based on the Rényi's pseudodistance jointly with non-concave penalties in order to simultaneously perform variable selection and get robust estimators of the parameters in a high-dimensional linear regression model of non-polynomial dimensionality. The desired oracle properties of our proposed method are derived theoretically and its usefulness is illustustrated numerically through simulations and real data examples.

preprint2020arXiv

Robust Generalised Quadratic Discriminant Analysis

Quadratic discriminant analysis (QDA) is a widely used statistical tool to classify observations from different multivariate Normal populations. The generalized quadratic discriminant analysis (GQDA) classification rule/classifier, which generalizes the QDA and the minimum Mahalanobis distance (MMD) classifiers to discriminate between populations with underlying elliptically symmetric distributions competes quite favorably with the QDA classifier when it is optimal and performs much better when QDA fails under non-Normal underlying distributions, e.g. Cauchy distribution. However, the classification rule in GQDA is based on the sample mean vector and the sample dispersion matrix of a training sample, which are extremely non-robust under data contamination. In real world, since it is quite common to face data highly vulnerable to outliers, the lack of robustness of the classical estimators of the mean vector and the dispersion matrix reduces the efficiency of the GQDA classifier significantly, increasing the misclassification errors. The present paper investigates the performance of the GQDA classifier when the classical estimators of the mean vector and the dispersion matrix used therein are replaced by various robust counterparts. Applications to various real data sets as well as simulation studies reveal far better performance of the proposed robust versions of the GQDA classifier. A Comparative study has been made to advocate the appropriate choice of the robust estimators to be used in a specific situation of the degree of contamination of the data sets.

preprint2020arXiv

Robust Wald-type test in GLM with random design based on minimum density power divergence estimators

We consider the problem of robust inference under the generalized linear model (GLM) with stochastic covariates. We derive the properties of the minimum density power divergence estimator of the parameters in GLM with random design and use this estimator to propose robust Wald-type tests for testing any general composite null hypothesis about the GLM. The asymptotic and robustness properties of the proposed tests are also examined for the GLM with random design. Application of the proposed robust inference procedures to the popular Poisson regression model for analyzing count data is discussed in detail both theoretically and numerically through simulation studies and real data examples.