Source author record

Abhik Ghosh

Abhik Ghosh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory astro-ph.CO Applications astro-ph.GA Computation Computer Vision cond-mat.stat-mech Machine Learning physics.soc-ph q-fin.GN q-fin.ST Quantitative Methods

Catalog footprint

What is connected

36works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Robust Inference for Non-Linear Regression Models with Applications in Enzyme Kinetics

Despite linear regression being the most popular statistical modelling technique, in real-life we often need to deal with situations where the true relationship between the response and the covariates is nonlinear in parameters. In such cases one needs to adopt appropriate non-linear regression (NLR) analysis, having wider applications in biochemical and medical studies among many others. In this paper we propose a new improved robust estimation and testing methodologies for general NLR models based on the minimum density power divergence approach and apply our proposal to analyze the widely popular Michaelis-Menten (MM) model in enzyme kinetics. We establish the asymptotic properties of our proposed estimator and tests, along with their theoretical robustness characteristics through influence function analysis. For the particular MM model, we have further empirically justified the robustness and the efficiency of our proposed estimator and the testing procedure through extensive simulation studies and several interesting real data examples of enzyme-catalyzed (biochemical) reactions.

preprint2024arXiv

Robust and Efficient Estimation in Ordinal Response Models using the Density Power Divergence

In real life, we frequently come across data sets that involve some independent explanatory variable(s) generating a set of ordinal responses. These ordinal responses may correspond to an underlying continuous latent variable, which is linearly related to the covariate(s), and takes a particular (ordinal) label depending on whether this latent variable takes value in some suitable interval specified by a pair of (unknown) cut-offs. The most efficient way of estimating the unknown parameters (i.e., the regression coefficients and the cut-offs) is the method of maximum likelihood (ML). However, contamination in the data set either in the form of misspecification of ordinal responses, or the unboundedness of the covariate(s), might destabilize the likelihood function to a great extent where the ML based methodology might lead to completely unreliable inferences. In this paper, we explore a minimum distance estimation procedure based on the popular density power divergence (DPD) to yield robust parameter estimates for the ordinal response model. This paper highlights how the resulting estimator, namely the minimum DPD estimator (MDPDE), can be used as a practical robust alternative to the classical procedures based on the ML. We rigorously develop several theoretical properties of this estimator, and provide extensive simulations to substantiate the theory developed.

preprint2022arXiv

Robust and Efficient Parameter Estimation for Discretely Observed Stochastic Processes

In various practical situations, we encounter data from stochastic processes which can be efficiently modelled by an appropriate parametric model for subsequent statistical analyses. Unfortunately, the most common estimation and inference methods based on the maximum likelihood (ML) principle are susceptible to minor deviations from assumed model or data contamination due to their well known lack of robustness. Since the alternative non-parametric procedures often lose significant efficiency, in this paper, we develop a robust parameter estimation procedure for discretely observed data from a parametric stochastic process model which exploits the nice properties of the popular density power divergence measure in the framework of minimum distance inference. In particular, here we define the minimum density power divergence estimators (MDPDE) for the independent increment and the Markov processes. We establish the asymptotic consistency and distributional results for the proposed MDPDEs in these dependent stochastic process set-ups and illustrate their benefits over the usual ML estimator for common examples like Poisson process, drifted Brownian motion and auto-regressive models.

preprint2021arXiv

RaJIVE: Robust Angle Based JIVE for Integrating Noisy Multi-Source Data

With increasing availability of high dimensional, multi-source data, the identification of joint and data specific patterns of variability has become a subject of interest in many research areas. Several matrix decomposition methods have been formulated for this purpose, for example JIVE (Joint and Individual Variation Explained), and its angle based variation, aJIVE. Although the effect of data contamination on the estimated joint and individual components has not been considered in the literature, gross errors and outliers in the data can cause instability in such methods, and lead to incorrect estimation of joint and individual variance components. We focus on the aJIVE factorization method and provide a thorough analysis of the effect outliers on the resulting variation decomposition. After showing that such effect is not negligible when all data-sources are contaminated, we propose a robust extension of aJIVE (RaJIVE) that integrates a robust formulation of the singular value decomposition into the aJIVE approach. The proposed RaJIVE is shown to provide correct decompositions even in the presence of outliers and improves the performance of aJIVE. We use extensive simulation studies with different levels of data contamination to compare the two methods. Finally, we describe an application of RaJIVE to a multi-omics breast cancer dataset from The Cancer Genome Atlas. We provide the R package RaJIVE with a ready-to-use implementation of the methods and documentation of code and examples.

preprint2021arXiv

Strata-based Quantification of Distributional Uncertainty in Socio-Economic Indicators: A Comparative Study of Indian States

This paper reports a comprehensive study of distributional uncertainty in a few socio-economic indicators across the various states of India over the years 2001-2011. We show that the DGB distribution, a typical rank order distribution, provide excellent fits to the district-wise empirical data for the population size, literacy rate (LR) and work participation rate (WPR) within every states in India, through its two distributional parameters. Moreover, taking resort to the entropy formulation of the DGB distribution, a proposed uncertainty percentage (UP) unveils the dynamics of the uncertainty of LR and WPR in all states of India. We have also commented on the changes in the estimated parameters and the UP values from the years 2001 to 2011. Additionally, a gender based analysis of the distribution of these important socio-economic variables within different states of India has also been discussed. Interestingly, it has been observed that, although the distributions of the numbers of literate and working people has a direct (linear) correspondence with that of the population size, the literacy and work-participation rates are distributed independently of the population distributions.

preprint2020arXiv

A robust variable screening procedure for ultra-high dimensional data

Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the sure independence screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. We demonstrate the claimed robustness through use of influence functions, and we discuss appropriate choice of the tuning parameter $α$. Finally, we illustrate its use on a small dataset from a study on regulation of lipid metabolism.

preprint2020arXiv

All sky angular power spectrum: I. Estimating brightness temperature fluctuations using TGSS 150 MHz survey

Measurements of the Galactic synchrotron emission is relevant for the 21-cm studies from the Epoch of Reionization. The study of the synchrotron emission is also useful to quantify the fluctuations in the magnetic field and the cosmic ray electron density of the turbulent interstellar medium (ISM) of our Galaxy. Here, we present the all-sky angular power spectrum $(C_{\ell})$ measurements of the diffuse synchrotron emission using the TIFR GMRT Sky Survey (TGSS) at 150 {\rm MHz}. We estimate $C_{\ell}$ using visibility data both before and after subtracting the modelled point sources. The amplitude of the measured $C_{\ell}$ falls significantly after subtracting the point sources, and it is also slightly higher in the Galactic plane for the residual data. The residual $C_{\ell}$ is most likely to be dominated by the Galactic synchrotron emission. The amplitude of the residual $C_{\ell}$ falls significantly away from the Galactic plane. We find the measurements are quite symmetric in the Northern and Southern hemispheres except in the latitude range $15-30^{\circ}$ which is the transition region from the disk dominated to diffuse halo dominated region. The comparison between this interferometric measurement with the scaled version of the Haslam rms map at 150 {\rm MHz} shows that the correlation coefficient $(r)$ is more than 0.5 for most of the latitude ranges considered here. This signifies the TGSS survey is quite sensitive to the diffuse Galactic synchrotron radiation.

preprint2020arXiv

Consistent Fixed-Effects Selection in Ultra-high dimensional Linear Mixed Models with Error-Covariate Endogeneity

Recently, applied sciences, including longitudinal and clustered studies in biomedicine require the analysis of ultra-high dimensional linear mixed effects models where we need to select important fixed effect variables from a vast pool of available candidates. However, all existing literature assume that all the available covariates and random effect components are independent of the model error which is often violated (endogeneity) in practice. In this paper, we first investigate this important issue in ultra-high dimensional linear mixed effects models with particular focus on the fixed effects selection. We study the effects of different types of endogeneity on existing regularization methods and prove their inconsistencies. Then, we propose a new profiled focused generalized method of moments (PFGMM) approach to consistently select fixed effects under 'error-covariate' endogeneity, i.e., in the presence of correlation between the model error and covariates. Our proposal is proved to be oracle consistent with probability tending to one and works well under most other type of endogeneity too. Additionally, we also propose and illustrate a few consistent parameter estimators, including those of the variance components, along with variable selection through PFGMM. Empirical simulations and an interesting real data example further support the claimed utility of our proposal.

preprint2020arXiv

Demonstrating the Tapered Gridded Estimator (TGE) for the Cosmological HI 21-cm Power Spectrum using $150 \, {\rm MHz}$ GMRT observations

We apply the Tapered Gridded Estimator (TGE) for estimating the cosmological 21-cm power spectrum from $150 \, {\rm MHz}$ GMRT observations which corresponds to the neutral hydrogen (HI) at redshift $z = 8.28$. Here TGE is used to measure the Multi-frequency Angular Power Spectrum (MAPS) $C_{\ell}(Δν)$ first, from which we estimate the 21-cm power spectrum $P(k_{\perp},k_{\parallel})$. The data here are much too small for a detection, and the aim is to demonstrate the capabilities of the estimator. We find that the estimated power spectrum is consistent with the expected foreground and noise behaviour. This demonstrates that this estimator correctly estimates the noise bias and subtracts this out to yield an unbiased estimate of the power spectrum. More than $47\%$ of the frequency channels had to be discarded from the data owing to radio-frequency interference, however the estimated power spectrum does not show any artifacts due to missing channels. Finally, we show that it is possible to suppress the foreground contribution by tapering the sky response at large angular separations from the phase center. We combine the k modes within a rectangular region in the `EoR window' to obtain the spherically binned averaged dimensionless power spectra $Δ^{2}(k)$ along with the statistical error $σ$ associated with the measured $Δ^{2}(k)$. The lowest $k$-bin yields $Δ^{2}(k)=(61.47)^{2}\,{\rm K}^{2}$ at $k=1.59\,\textrm{Mpc}^{-1}$, with $σ=(27.40)^{2} \, {\rm K}^{2}$. We obtain a $2 \, σ$ upper limit of $(72.66)^{2}\,\textrm{K}^{2}$ on the mean squared HI 21-cm brightness temperature fluctuations at $k=1.59\,\textrm{Mpc}^{-1}$.

preprint2020arXiv

Foreground modelling via Gaussian process regression: an application to HERA data

The key challenge in the observation of the redshifted 21-cm signal from cosmic reionization is its separation from the much brighter foreground emission. Such separation relies on the different spectral properties of the two components, although, in real life, the foreground intrinsic spectrum is often corrupted by the instrumental response, inducing systematic effects that can further jeopardize the measurement of the 21-cm signal. In this paper, we use Gaussian Process Regression to model both foreground emission and instrumental systematics in $\sim 2$ hours of data from the Hydrogen Epoch of Reionization Array. We find that a simple co-variance model with three components matches the data well, giving a residual power spectrum with white noise properties. These consist of an "intrinsic" and instrumentally corrupted component with a coherence-scale of 20 MHz and 2.4 MHz respectively (dominating the line of sight power spectrum over scales $k_{\parallel} \le 0.2$ h cMpc$^{-1}$) and a baseline dependent periodic signal with a period of $\sim 1$ MHz (dominating over $k_{\parallel} \sim 0.4 - 0.8$h cMpc$^{-1}$) which should be distinguishable from the 21-cm EoR signal whose typical coherence-scales is $\sim 0.8$ MHz.

preprint2020arXiv

General Robust Bayes Pseudo-Posterior: Exponential Convergence results with Applications

Although Bayesian inference is an immensely popular paradigm among a large segment of scientists including statisticians, most applications consider objective priors and need critical investigations (Efron, 2013, Science). While it has several optimal properties, a major drawback of Bayesian inference is the lack of robustness against data contamination and model misspecification, which becomes pernicious in the use of objective priors. This paper presents the general formulation of a Bayes pseudo-posterior distribution yielding robust inference. Exponential convergence results related to the new pseudo-posterior and the corresponding Bayes estimators are established under the general parametric set-up and illustrations are provided for the independent stationary as well as non-homogeneous models. Several additional details and properties of the procedure are described, including the estimation under fixed-design regression models.

preprint2020arXiv

On regularization methods based on Rényi's pseudodistances for sparse high-dimensional linear regression models

Several regularization methods have been considered over the last decade for sparse high-dimensional linear regression models, but the most common ones use the least square (quadratic) or likelihood loss and hence are not robust against data contamination. Some authors have overcome the problem of non-robustness by considering suitable loss function based on divergence measures (e.g., density power divergence, gamma-divergence, etc.) instead of the quadratic loss. In this paper we shall consider a loss function based on the Rényi's pseudodistance jointly with non-concave penalties in order to simultaneously perform variable selection and get robust estimators of the parameters in a high-dimensional linear regression model of non-polynomial dimensionality. The desired oracle properties of our proposed method are derived theoretically and its usefulness is illustustrated numerically through simulations and real data examples.

preprint2020arXiv

Robust Generalised Quadratic Discriminant Analysis

Quadratic discriminant analysis (QDA) is a widely used statistical tool to classify observations from different multivariate Normal populations. The generalized quadratic discriminant analysis (GQDA) classification rule/classifier, which generalizes the QDA and the minimum Mahalanobis distance (MMD) classifiers to discriminate between populations with underlying elliptically symmetric distributions competes quite favorably with the QDA classifier when it is optimal and performs much better when QDA fails under non-Normal underlying distributions, e.g. Cauchy distribution. However, the classification rule in GQDA is based on the sample mean vector and the sample dispersion matrix of a training sample, which are extremely non-robust under data contamination. In real world, since it is quite common to face data highly vulnerable to outliers, the lack of robustness of the classical estimators of the mean vector and the dispersion matrix reduces the efficiency of the GQDA classifier significantly, increasing the misclassification errors. The present paper investigates the performance of the GQDA classifier when the classical estimators of the mean vector and the dispersion matrix used therein are replaced by various robust counterparts. Applications to various real data sets as well as simulation studies reveal far better performance of the proposed robust versions of the GQDA classifier. A Comparative study has been made to advocate the appropriate choice of the robust estimators to be used in a specific situation of the degree of contamination of the data sets.

preprint2020arXiv

Robust Wald-type test in GLM with random design based on minimum density power divergence estimators

We consider the problem of robust inference under the generalized linear model (GLM) with stochastic covariates. We derive the properties of the minimum density power divergence estimator of the parameters in GLM with random design and use this estimator to propose robust Wald-type tests for testing any general composite null hypothesis about the GLM. The asymptotic and robustness properties of the proposed tests are also examined for the GLM with random design. Application of the proposed robust inference procedures to the popular Poisson regression model for analyzing count data is discussed in detail both theoretically and numerically through simulation studies and real data examples.

preprint2016arXiv

Tapering the sky response for angular power spectrum estimation from low-frequency radio-interferometric data

It is important to correctly subtract point sources from radio-interferometric data in order to measure the power spectrum of diffuse radiation like the Galactic synchrotron or the Epoch of Reionization 21-cm signal. It is computationally very expensive and challenging to image a very large area and accurately subtract all the point sources from the image. The problem is particularly severe at the sidelobes and the outer parts of the main lobe where the antenna response is highly frequency dependent and the calibration also differs from that of the phase center. Here we show that it is possible to overcome this problem by tapering the sky response. Using simulated 150 MHz observations, we demonstrate that it is possible to suppress the contribution due to point sources from the outer parts by using the Tapered Gridded Estimator to measure the angular power spectrum C_l of the sky signal. We also show from the simulation that this method can self-consistently compute the noise bias and accurately subtract it to provide an unbiased estimation of C_l.

preprint2015arXiv

A Bayesian analysis of redshifted 21-cm HI signal and foregrounds: Simulations for LOFAR

Observations of the EoR with the 21-cm hyperfine emission of neutral hydrogen (HI) promise to open an entirely new window onto the formation of the first stars, galaxies and accreting black holes. In order to characterize the weak 21-cm signal, we need to develop imaging techniques which can reconstruct the extended emission very precisely. Here, we present an inversion technique for LOFAR baselines at NCP, based on a Bayesian formalism with optimal spatial regularization, which is used to reconstruct the diffuse foreground map directly from the simulated visibility data. We notice the spatial regularization de-noises the images to a large extent, allowing one to recover the 21-cm power-spectrum over a considerable $k_{\perp}-k_{\para}$ space in the range of $0.03\,{\rm Mpc^{-1}}<k_{\perp}<0.19\,{\rm Mpc^{-1}}$ and $0.14\,{\rm Mpc^{-1}}<k_{\para}<0.35\,{\rm Mpc^{-1}}$ without subtracting the noise power-spectrum. We find that, in combination with using the GMCA, a non-parametric foreground removal technique, we can mostly recover the spherically average power-spectrum within $2σ$ statistical fluctuations for an input Gaussian random rms noise level of $60 \, {\rm mK}$ in the maps after 600 hrs of integration over a $10 \, {\rm MHz}$ bandwidth.

preprint2015arXiv

Influence Analysis of Robust Wald-type Tests

We consider a robust version of the classical Wald test statistics for testing simple and composite null hypotheses for general parametric models. These test statistics are based on the minimum density power divergence estimators instead of the maximum likelihood estimators. An extensive study of their robustness properties is given though the influence functions as well as the chi-square inflation factors. It is theoretically established that the level and power of these robust tests are stable against outliers, whereas the classical Wald test breaks down. Some numerical examples confirm the validity of the theoretical results.

preprint2015arXiv

Predictions for the 21cm-galaxy cross-power spectrum observable with LOFAR and Subaru

The 21cm-galaxy cross-power spectrum is expected to be one of the promising probes of the Epoch of Reionization (EoR), as it could offer information about the progress of reionization and the typical scale of ionized regions at different redshifts. With upcoming observations of 21cm emission from the EoR with the Low Frequency Array (LOFAR), and of high redshift Lyalpha emitters (LAEs) with Subaru's Hyper Suprime Cam (HSC), we investigate the observability of such cross-power spectrum with these two instruments, which are both planning to observe the ELAIS-N1 field at z=6.6. In this paper we use N-body + radiative transfer (both for continuum and Lyalpha photons) simulations at redshift 6.68, 7.06 and 7.3 to compute the 3D theoretical 21cm-galaxy cross-power spectrum, as well as to predict the 2D 21cm-galaxy cross-power spectrum expected to be observed by LOFAR and HSC. Once noise and projection effects are accounted for, our predictions of the 21cm-galaxy cross-power spectrum show clear anti-correlation on scales larger than ~ 60 h$^{-1}$ Mpc (corresponding to k ~ 0.1 h Mpc$^{-1}$), with levels of significance p=0.04 at z=6.6 and p=0.048 at z=7.3. On smaller scales, instead, the signal is completely contaminated.

preprint2015arXiv

Testing Composite Null Hypothesis Based on $S$-Divergences

We present a robust test for composite null hypothesis based on the general $S$-divergence family. This requires a non-trivial extension of the results of Ghosh et al.~(2015). We derive the asymptotic and theoretical robustness properties of the resulting test along with the properties of the minimum $S$-divergence estimators under parameter restrictions imposed by the null hypothesis. An illustration in the context of the normal model is also presented.

preprint2014arXiv

Asymptotic Properties of Minimum S-Divergence Estimator for Discrete Models

Robust inference based on the minimization of statistical divergences has proved to be a useful alternative to the classical techniques based on maximum likelihood and related methods. Recently Ghosh et al. (2013) proposed a general class of divergence measures, namely the S-Divergence Family and discussed its usefulness in robust parametric estimation through some numerical illustrations. In this present paper, we develop the asymptotic properties of the proposed minimum S-Divergence estimators under discrete models.

preprint2014arXiv

Constraining the epoch of reionization with the variance statistic: simulations of the LOFAR case

Several experiments are underway to detect the cosmic redshifted 21-cm signal from neutral hydrogen from the Epoch of Reionization (EoR). Due to their very low signal-to-noise ratio, these observations aim for a statistical detection of the signal by measuring its power spectrum. We investigate the extraction of the variance of the signal as a first step towards detecting and constraining the global history of the EoR. Signal variance is the integral of the signal's power spectrum, and it is expected to be measured with a high significance. We demonstrate this through results from a simulation and parameter estimation pipeline developed for the Low Frequency Array (LOFAR)-EoR experiment. We show that LOFAR should be able to detect the EoR in 600 hours of integration using the variance statistic. Additionally, the redshift ($z_r$) and duration ($Δz$) of reionization can be constrained assuming a parametrization. We use an EoR simulation of $z_r = 7.68$ and $Δz = 0.43$ to test the pipeline. We are able to detect the simulated signal with a significance of 4 standard deviations and extract the EoR parameters as $z_r = 7.72^{+0.37}_{-0.18}$ and $Δz = 0.53^{+0.12}_{-0.23}$ in 600 hours, assuming that systematic errors can be adequately controlled. We further show that the significance of detection and constraints on EoR parameters can be improved by measuring the cross-variance of the signal by cross-correlating consecutive redshift bins.

preprint2014arXiv

Estimation of Multivariate Location and Covariance using the S -Hellinger Distance

This paper describes a generalization of the Hellinger distance which we call the S -Hellinger distance; this general family connects the Hellinger distance smoothly with the $L_2$-divergence by a tuning parameter $α$ and is indeed a subfamily of the S -Divergence family of Ghosh et al. (2013 a, b). We use this general divergence in the context of estimating the location and covariances under (continuous) multivariate models and show that the proposed minimum S -Hellinger distance estimator is affine equivariant, asymptotically consistent and have high breakdown point under suitable conditions. We also illustrate its performance through an extensive simulation study which show that the proposed estimators give more robust estimator than the minimum Hellinger distance estimator for the location and correlation parameters under different types of contamination with the contamination proportion being as high as 20%.

preprint2014arXiv

Influence Function Analysis of the Restricted Minimum Divergence Estimators : A General Form

The minimum divergence estimators have proved to be useful tools in the area of robust inference. The robustness of such estimators are measured using the classical Influence functions. However, in many complex situations like testing a composite hypothesis using divergence require the estimators to be restricted into some subspace of the parameter space. The robustness of these restricted minimum divergence estimators are very important in order to have overall robust inference. In this paper we provide a comprehensive description of the robustness of such restricted estimators in terms of their Influence Function for a general class of density based divergences along with their unrestricted versions. In particular, the robustness of some popular minimum divergence estimators are also demonstrated under certain usual restrictions. Thus this paper provides a general framework for the influence function analysis of a large class of minimum divergence estimators with or without restrictions on the parameters.

preprint2014arXiv

On the Robustness of a Divergence based Test of Simple Statistical Hypotheses

The most popular hypothesis testing procedure, the likelihood ratio test, is known to be highly non-robust in many real situations. Basu et al. (2013a) provided an alternative robust procedure of hypothesis testing based on the density power divergence; however, although the robustness properties of the latter test were intuitively argued for by the authors together with extensive empirical substantiation of the same, no theoretical robustness properties were presented in this work. In the present paper we will consider a more general class of tests which form a superfamily of the procedures described by Basu et al. (2013a). This superfamily derives from the class of $S$-divergences recently proposed by Basu et al. (2013a). In this context we theoretically prove several robustness results of the new class of tests and illustrate them in the normal model. All the theoretical robustness properties of the Basu et al. (2013a) proposal follows as special cases of our results.

preprint2014arXiv

Robust Estimation in Generalised Linear Models : The Density Power Divergence Approach

The generalised linear model (GLM) is a very important tool for analysing real data in biology, sociology, agriculture, engineering and many other application domain where the relationship between the response and explanatory variables may not be linear or the distributions may not be normal in all the cases. However, quite often such real data contain a significant number of outliers in relation to the standard parametric model used in the analysis; in such cases the classical maximum likelihood estimator may fail to produce reasonable estimators and related inference could be unreliable. In this paper, we develop a robust estimation procedure for the generalised linear models that can generate robust estimators with little loss in efficiency. We will also explore two particular cases of the generalised linear model in details -- Poisson regression for count data and logistic regression for binary data -- which are widely applied in real life experiments. We will also illustrate the performance of the proposed estimators through several interesting data examples

preprint2014arXiv

Robust Estimation of Bivariate Tail Dependence Coefficient

The problem of estimating the coefficient of bivariate tail dependence is considered here from the robustness point of view; it combines two apparently contradictory theories of robust statistics and extreme value statistics. The usual maximum likelihood based or the moment type estimators of tail dependence coefficient are highly sensitive to the presence of outlying observations in data. This paper proposes some alternative robust estimators obtained by minimizing the density power divergence with suitable model assumptions; their robustness properties are examined through the classical influence function analysis. The performance of the proposed estimators is illustrated through an extensive empirical study considering several important bivariate extreme value distributions.

preprint2014arXiv

The Logarithmic Super Divergence and its use in Statistical Inference

This paper introduces a new superfamily of divergences that is similar in spirit to the S-divergence family introduced by Ghosh et al. (2013). This new family serves as an umbrella that contains the logarithmic power divergence family (Renyi, 1961; Maji, Chakraborty and Basu 2014) and the logarithmic density power divergence family (Jones et al., 2001) as special cases. Various properties of this new family and the corresponding minimum distance procedures are discussed with particular emphasis on the robustness issue; these properties are demonstrated through simulation studies. In particular the method demonstrates the limitation of the first order influence function in assessing the robustness of the corresponding minimum distance procedures.

preprint2014arXiv

The Logarithmic Super Divergence and Statistical Inference : Asymptotic Properties

Statistical inference based on divergence measures have a long history. Recently, Maji, Ghosh and Basu (2014) have introduced a general family of divergences called the logarithmic super divergence (LSD) family. This family acts as a superfamily for both of the logarithmic power divergence (LPD) family (eg. Renyi, 1961) and the logarithmic density power divergence (LDPD)family introduced by Jones et al. (2001). In this paper we describe the asymptotic properties of the inference procedures resulting from this divergence in discrete models. The properties are well supported by real data examples.

preprint2014arXiv

The Minimum S-Divergence Estimator under Continuous Models: The Basu-Lindsay Approach

Robust inference based on the minimization of statistical divergences has proved to be a useful alternative to the classical maximum likelihood based techniques. Recently Ghosh et al. (2013) proposed a general class of divergence measures for robust statistical inference, named the S-Divergence Family. Ghosh (2014) discussed its asymptotic properties for the discrete model of densities. In the present paper, we develop the asymptotic properties of the proposed minimum S-Divergence estimators under continuous models. Here we use the Basu-Lindsay approach (1994) of smoothing the model densities that, unlike previous approaches, avoids much of the complications of the kernel bandwidth selection. Illustrations are presented to support the performance of the resulting estimators both in terms of efficiency and robustness through extensive simulation studies and real data examples.

preprint2014arXiv

Visibility based angular power spectrum estimation in low frequency radio interferometric observations

We present two estimators to quantify the angular power spectrum of the sky signal directly from the visibilities measured in radio interferometric observations. This is relevant for both the foregrounds and the cosmological 21-cm signal buried therein. The discussion here is restricted to the Galactic synchrotron radiation, the most dominant foreground component after point source removal. Our theoretical analysis is validated using simulations at 150 MHz, mainly for GMRT and also briefly for LOFAR. The Bare Estimator uses pairwise correlations of the measured visibilities, while the Tapered Gridded Estimator uses the visibilities after gridding in the uv plane. The former is very precise, but computationally expensive for large data. The latter has a lower precision, but takes less computation time which is proportional to the data volume. The latter also allows tapering of the sky response leading to sidelobe suppression, an useful ingredient for foreground removal. Both estimators avoid the positive bias that arises due to the system noise. We consider amplitude and phase errors of the gain, and the w-term as possible sources of errors . We find that the estimated angular power spectrum is exponentially sensitive to the variance of the phase errors but insensitive to amplitude errors. The statistical uncertainties of the estimators are affected by both amplitude and phase errors. The w-term does not have a significant effect at the angular scales of our interest. We propose the Tapered Gridded Estimator as an effective tool to observationally quantify both foregrounds and the cosmological 21-cm signal.

preprint2013arXiv

A Model Explaining Correlation Between Observed Values in Contingency Tables

In this article, a model is proposed using Bayesian techniques to account for the high correlation between many observed set of contingency tables. In many real life data this high correlation is encountered. Simulation studies are also given to check the effectiveness of this model.

preprint2013arXiv

Estimating Copula and Test of Independence based on a generalized framework of all rank-based Statistics in Bivariate Sample

Copulas are mathematical objects that fully capture the dependence structure among random variables and hence, offer a great flexibility in building multivariate stochastic models. In statistics, a copula is used as a general way of formulating a multivariate distribution in such a way that various general types of dependence can be represented. In case of bivariate sample, the notion of estimating copula is closely related to that of testing independence in a bivariate sample, as when the components of the bivariate sample are independent the copula becomes simply product of two uniform distributions. So apart from non-parametric estimation of copulas we also considered it relevant to introduce some non-parametric tests to better understand the very essence of copula in the explanation of association between the components. In fact we will develop a general multivariate statistics that gives rise to a much larger class of non-parametric rank based statistics. This class of statistics can be used in estimation and testing for the association present in the bivariate sample. We choose some representative statistics from that class and compared their power in testing independence using simulation as an attempt to choose the best candidate in that class.

preprint2012arXiv

Characterizing Foreground for redshifted 21-cm radiation: 150 MHz GMRT observations

Foreground removal is a major challenge for detecting the redshifted 21-cm neutral hydrogen (HI) signal from the Epoch of Reionization (EoR). We have used 150 MHz GMRT observations to characterize the statistical properties of the foregrounds in four different fields of view. The measured multi-frequency angular power spectrum C_l(Delta nu) is found to have values in the range 10^4 mK^2 to 2 x 10^4 mK^2 across 700 <= l <= 2 x 10^4 and Delta nu <= 2.5 MHz, which is consistent with model predictions where point sources are the most dominant foreground component. The measured C_l(Delta nu) does not show a smooth Delta nu dependence, which poses a severe difficulty for foreground removal using polynomial fitting. The observational data was used to assess point source subtraction. Considering the brightest source (~ 1 Jy) in each field, we find that the residual artifacts are less than 1.5% in the most sensitive field (FIELD I). We have used FIELD I, which has a rms noise of 1.3 mJy/Beam, to study the properties of the radio source population to a limiting flux of 9 mJy. The differential source count is well fitted with a single power law of slope -1.6. We find there is no evidence for flattening of the source counts towards lower flux densities which suggests that source population is dominated by the classical radio-loud Active Galactic Nucleus (AGN). The diffuse Galactic emission is revealed after the point sources are subtracted out from FIELD I . We find C_l \propto l^{-2.34} for 253 <= l <= 800 which is characteristic of the Galactic synchrotron radiation measured at higher frequencies and larger angular scales. We estimate the fluctuations in the Galactic synchrotron emission to be sqrt{l(l+1)C_l/2 pi} ~ 10 K at l=800 (theta > 10'). The measured C_l is dominated by the residual point sources and artifacts at smaller angular scales where C_l ~ 10^3 mK^2 for l > 800.

preprint2011arXiv

Improved foreground removal in GMRT 610 MHz observations towards redshifted 21-cm tomography

Foreground removal is a challenge for 21-cm tomography of the high redshift Universe. We use archival GMRT data (obtained for completely different astronomical goals) to estimate the foregrounds at a redshift ~ 1. The statistic we use is the cross power spectrum between two frequencies separated by Δν at the angular multipole l, or equivalently the multi-frequency angular power spectrum C_l(Δν). An earlier measurement of C_l(Δν) using this data had revealed the presence of oscillatory patterns along Δν, which turned out to be a severe impediment for foreground removal (Ghosh et al. 2011). Using the same data, in this paper we show that it is possible to considerably reduce these oscillations by suppressing the sidelobe response of the primary antenna elements. The suppression works best at the angular multipoles l for which there is a dense sampling of the u-v plane. For three angular multipoles l = 1405, 1602 and 1876, this sidelobe suppression along with a low order polynomial fitting completely results in residuals of (\leq 0.02 mK^2), consistent with the noise at the 3σ level. Since the polynomial fitting is done after estimation of the power spectrum it can be ensured that the estimation of the HI signal is not biased. The corresponding 99% upper limit on the HI signal is xHI b \leq 2.9, where xHI is the mean neutral fraction and b is the bias.

preprint2010arXiv

Consumer Expenditure Distribution in India, 1983-2007: Evidence of a Long Pareto Tail

This work presents an empirical study of the evolution of the consumer expenditure distribution in India during 1982-2007. We have used the National Sample Survey Organization data and analysed the expenditure distribution for the urban and rural sectors. It is found that this distribution is a mixture of two distributions, more particularly, it follows a lognormal in the lower tail and a Pareto distribution in the higher end. The Pareto tail consists of a remarkable 30-40% of the population in the upper end and the lower end is suitably modeled by the lognormal one. The goodness-of-fit tests endorse the proposed distribution. Moreover, the Pareto tail is widening over time for the rural sector. The Gini coefficient, a prominent measure for inequality, for the expenditure distribution is found to be stable for the entire time span.

preprint2010arXiv

GMRT observation towards detecting the Post-reionization 21-cm signal

We have analyzed 610 MHz GMRT observations towards detecting the redshifted 21-cm signal from z=1.32. The multi-frequency angular power spectrum C_l(Delta nu) is used to characterize the statistical properties of the background radiation across angular scales ~20" to 10', and a frequency bandwidth of 7.5 MHz with resolution 125 kHz. The measured C_l(Delta nu) which ranges from 7 mK^2 to 18 mK^2 is dominated by foregrounds, the expected HI signal C_l^HI(Delta nu) ~10^{-6}- 10^{-7} mK^2 is several orders of magnitude smaller. The foregrounds, believed to originate from continuum sources, is expected to vary smoothly with Delta nu whereas the HI signal decorrelates within ~0.5 MHz and this holds the promise of separating the two. For each l, we use the interval 0.5 < Delta nu < 7.5 MHz to fit a fourth order polynomial which is subtracted from the measured C_l(Delta nu) to remove any smoothly varying component across the entire bandwidth Delta nu < 7.5 MHz. The residual C_l(Delta nu), we find, has an oscillatory pattern with amplitude and period respectively ~0.1 mK^2 and Delta nu = 3 MHz at the smallest l value of 1476, and the amplitude and period decreasing with increasing l. Applying a suitably chosen high pass filter, we are able to remove the residual oscillatory pattern for l=1476 where the residual C_l(Delta nu) is now consistent with zero at the 3-sigma noise level. We conclude that we have successfully removed the foregrounds at l=1476 and the residuals are consistent with noise. We use this to place an upper limit on the HI signal whose amplitude is determined by x_HI b where x_HI and b are the HI neutral fraction and the HI bias respectively. A value of x_HI b greater than 7.95 would have been detected in our observation, and is therefore ruled out at the 3-sigma level. (abridged)

Abhik Ghosh

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

Robust Inference for Non-Linear Regression Models with Applications in Enzyme Kinetics

Robust and Efficient Estimation in Ordinal Response Models using the Density Power Divergence

Robust and Efficient Parameter Estimation for Discretely Observed Stochastic Processes

RaJIVE: Robust Angle Based JIVE for Integrating Noisy Multi-Source Data

Strata-based Quantification of Distributional Uncertainty in Socio-Economic Indicators: A Comparative Study of Indian States

A robust variable screening procedure for ultra-high dimensional data

All sky angular power spectrum: I. Estimating brightness temperature fluctuations using TGSS 150 MHz survey

Consistent Fixed-Effects Selection in Ultra-high dimensional Linear Mixed Models with Error-Covariate Endogeneity

Demonstrating the Tapered Gridded Estimator (TGE) for the Cosmological HI 21-cm Power Spectrum using $150 \, {\rm MHz}$ GMRT observations

Foreground modelling via Gaussian process regression: an application to HERA data

General Robust Bayes Pseudo-Posterior: Exponential Convergence results with Applications

On regularization methods based on Rényi's pseudodistances for sparse high-dimensional linear regression models

Robust Generalised Quadratic Discriminant Analysis

Robust Wald-type test in GLM with random design based on minimum density power divergence estimators

Tapering the sky response for angular power spectrum estimation from low-frequency radio-interferometric data

A Bayesian analysis of redshifted 21-cm HI signal and foregrounds: Simulations for LOFAR

Influence Analysis of Robust Wald-type Tests

Predictions for the 21cm-galaxy cross-power spectrum observable with LOFAR and Subaru

Testing Composite Null Hypothesis Based on $S$-Divergences

Asymptotic Properties of Minimum S-Divergence Estimator for Discrete Models

Constraining the epoch of reionization with the variance statistic: simulations of the LOFAR case

Estimation of Multivariate Location and Covariance using the S -Hellinger Distance

Influence Function Analysis of the Restricted Minimum Divergence Estimators : A General Form

On the Robustness of a Divergence based Test of Simple Statistical Hypotheses

Robust Estimation in Generalised Linear Models : The Density Power Divergence Approach

Robust Estimation of Bivariate Tail Dependence Coefficient

The Logarithmic Super Divergence and its use in Statistical Inference

The Logarithmic Super Divergence and Statistical Inference : Asymptotic Properties

The Minimum S-Divergence Estimator under Continuous Models: The Basu-Lindsay Approach

Visibility based angular power spectrum estimation in low frequency radio interferometric observations

A Model Explaining Correlation Between Observed Values in Contingency Tables

Estimating Copula and Test of Independence based on a generalized framework of all rank-based Statistics in Bivariate Sample

Characterizing Foreground for redshifted 21-cm radiation: 150 MHz GMRT observations

Improved foreground removal in GMRT 610 MHz observations towards redshifted 21-cm tomography

Consumer Expenditure Distribution in India, 1983-2007: Evidence of a Long Pareto Tail

GMRT observation towards detecting the Post-reionization 21-cm signal