Source author record

Dipak K. Dey

Dipak K. Dey appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Methodology Computation math.ST Statistics Theory

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Fast Bayesian inference of Block Nearest Neighbor Gaussian process for large data

This paper presents the development of a spatial block-Nearest Neighbor Gaussian process (block-NNGP) for location-referenced large spatial data. The key idea behind this approach is to divide the spatial domain into several blocks which are dependent under some constraints. The cross-blocks capture the large-scale spatial dependence, while each block captures the small-scale spatial dependence. The resulting block-NNGP enjoys Markov properties reflected on its sparse precision matrix. It is embedded as a prior within the class of latent Gaussian models, thus Bayesian inference is obtained using the integrated nested Laplace approximation (INLA). The performance of the block-NNGP is illustrated on simulated examples and massive real data for locations in the order of $10^4$.

preprint2020arXiv

Flexible Modeling of Hurdle Conway-Maxwell-Poisson Distributions with Application to Mining Injuries

While the hurdle Poisson regression is a popular class of models for count data with excessive zeros, the link function in the binary component may be unsuitable for highly imbalanced cases. Ordinary Poisson regression is unable to handle the presence of dispersion. In this paper, we introduce Conway-Maxwell-Poisson (CMP) distribution and integrate use of flexible skewed Weibull link functions as better alternative. We take a fully Bayesian approach to draw inference from the underlying models to better explain skewness and quantify dispersion, with Deviance Information Criteria (DIC) used for model selection. For empirical investigation, we analyze mining injury data for period 2013-2016 from the U.S. Mine Safety and Health Administration (MSHA). The risk factors describing proportions of employee hours spent in each type of mining work are compositional data; the probabilistic principal components analysis (PPCA) is deployed to deal with such covariates. The hurdle CMP regression is additionally adjusted for exposure, measured by the total employee working hours, to make inference on rate of mining injuries; we tested its competitiveness against other models. This can be used as predictive model in the mining workplace to identify features that increase the risk of injuries so that prevention can be implemented.

preprint2020arXiv

Heckman selection-t model: parameter estimation via the EM-algorithm

Heckman selection model is perhaps the most popular econometric model in the analysis of data with sample selection. The analyses of this model are based on the normality assumption for the error terms, however, in some applications, the distribution of the error term departs significantly from normality, for instance, in the presence of heavy tails and/or atypical observation. In this paper, we explore the Heckman selection-t model where the random errors follow a bivariate Student's-t distribution. We develop an analytically tractable and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated Student's-t distributions. Simulations studies show the vulnerability of the Heckman selection-normal model, as well as the robustness aspects of the Heckman selection-t model. Two real examples are analyzed, illustrating the usefulness of the proposed methods. The proposed algorithms and methods are implemented in the new R package HeckmanEM.

preprint2020arXiv

Power laws distributions in objective priors

The use of objective prior in Bayesian applications has become a common practice to analyze data without subjective information. Formal rules usually obtain these priors distributions, and the data provide the dominant information in the posterior distribution. However, these priors are typically improper and may lead to improper posterior. Here, we show, for a general family of distributions, that the obtained objective priors for the parameters either follow a power-law distribution or has an asymptotic power-law behavior. As a result, we observed that the exponents of the model are between 0.5 and 1. Understand these behaviors allow us to easily verify if such priors lead to proper or improper posteriors directly from the exponent of the power-law. The general family considered in our study includes essential models such as Exponential, Gamma, Weibull, Nakagami-m, Haf-Normal, Rayleigh, Erlang, and Maxwell Boltzmann distributions, to list a few. In summary, we show that comprehending the mechanisms describing the shapes of the priors provides essential information that can be used in situations where additional complexity is presented.

preprint2020arXiv

Skewed link regression models for imbalanced binary response with applications to life insurance

For a portfolio of life insurance policies observed for a stated period of time, e.g., one year, mortality is typically a rare event. When we examine the outcome of dying or not from such portfolios, we have an imbalanced binary response. The popular logistic and probit regression models can be inappropriate for imbalanced binary response as model estimates may be biased, and if not addressed properly, it can lead to serious adverse predictions. In this paper, we propose the use of skewed link regression models (Generalized Extreme Value, Weibull, and Frechet link models) as more superior models to handle imbalanced binary response. We adopt a fully Bayesian approach for the generalized linear models (GLMs) under the proposed link functions to help better explain the high skewness. To calibrate our proposed Bayesian models, we use a real dataset of death claims experience drawn from a life insurance company's portfolio. Bayesian estimates of parameters were obtained using the Metropolis-Hastings algorithm and for Bayesian model selection and comparison, the Deviance Information Criterion (DIC) statistic has been used. For our mortality dataset, we find that these skewed link models are more superior than the widely used binary models with standard link functions. We evaluate the predictive power of the different underlying models by measuring and comparing aggregated death counts and death benefits.

preprint2014arXiv

A new class of flexible link functions with application to species co-occurrence in cape floristic region

Understanding the mechanisms that allow biological species to co-occur is of great interest to ecologists. Here we investigate the factors that influence co-occurrence of members of the genus Protea in the Cape Floristic Region of southwestern Africa, a global hot spot of biodiversity. Due to the binomial nature of our response, a critical issue is to choose appropriate link functions for the regression model. In this paper we propose a new family of flexible link functions for modeling binomial response data. By introducing a power parameter into the cumulative distribution function (c.d.f.) corresponding to a symmetric link function and its mirror reflection, greater flexibility in skewness can be achieved in both positive and negative directions. Through simulated data sets and analysis of the Protea co-occurrence data, we show that the proposed link function is quite flexible and performs better against link misspecification than standard link functions.

preprint2011arXiv

Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption

In the information system research, a question of particular interest is to interpret and to predict the probability of a firm to adopt a new technology such that market promotions are targeted to only those firms that were more likely to adopt the technology. Typically, there exists significant difference between the observed number of ``adopters'' and ``nonadopters,'' which is usually coded as binary response. A critical issue involved in modeling such binary response data is the appropriate choice of link functions in a regression model. In this paper we introduce a new flexible skewed link function for modeling binary response data based on the generalized extreme value (GEV) distribution. We show how the proposed GEV links provide more flexible and improved skewed link regression models than the existing skewed links, especially when dealing with imbalance between the observed number of 0's and 1's in a data. The flexibility of the proposed model is illustrated through simulated data sets and a billing data set of the electronic payments system adoption from a Fortune 100 company in 2005.

Dipak K. Dey

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Fast Bayesian inference of Block Nearest Neighbor Gaussian process for large data

Flexible Modeling of Hurdle Conway-Maxwell-Poisson Distributions with Application to Mining Injuries

Heckman selection-t model: parameter estimation via the EM-algorithm

Power laws distributions in objective priors

Skewed link regression models for imbalanced binary response with applications to life insurance

A new class of flexible link functions with application to species co-occurrence in cape floristic region

Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption