Source author record

Subrata Chakraborty

Subrata Chakraborty appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Computer Vision eess.IV Methodology Machine Learning Populations and Evolution

Catalog footprint

What is connected

24works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Bivariate binomial conditionals distributions with positive and negative correlations: A statistical study

In this article, we discuss a bivariate distribution whose conditionals are univariate binomial distributions and the marginals are not binomial that exhibits negative correlation. Some useful structural properties of this distribution namely marginals, moments, generating functions, stochastic ordering are investigated. Simple proofs of negative correlation, marginal over-dispersion, distribution of sum and conditional given the sum are also derived. The distribution is shown to be a member of the multi-parameter exponential family and some natural but useful consequences are also outlined. The proposed distribution tends to a recently investigated conditional Poisson distribution studied by Ghosh et al. (2020). Finally, the distribution is fitted to two bivariate count data sets with an inherent negative correlation to illustrate its suitability.

preprint2022arXiv

2-speed network ensemble for efficient classification of incremental land-use/land-cover satellite image chips

The ever-growing volume of satellite imagery data presents a challenge for industry and governments making data-driven decisions based on the timely analysis of very large data sets. Commonly used deep learning algorithms for automatic classification of satellite images are time and resource-intensive to train. The cost of retraining in the context of Big Data presents a practical challenge when new image data and/or classes are added to a training corpus. Recognizing the need for an adaptable, accurate, and scalable satellite image chip classification scheme, in this research we present an ensemble of: i) a slow to train but high accuracy vision transformer; and ii) a fast to train, low-parameter convolutional neural network. The vision transformer model provides a scalable and accurate foundation model. The high-speed CNN provides an efficient means of incorporating newly labelled data into analysis, at the expense of lower accuracy. To simulate incremental data, the very large (~400,000 images) So2Sat LCZ42 satellite image chip dataset is divided into four intervals, with the high-speed CNN retrained every interval and the vision transformer trained every half interval. This experimental setup mimics an increase in data volume and diversity over time. For the task of automated land-cover/land-use classification, the ensemble models for each data increment outperform each of the component models, with best accuracy of 65% against a holdout test partition of the So2Sat dataset. The proposed ensemble and staggered training schedule provide a scalable and cost-effective satellite image classification scheme that is optimized to process very large volumes of satellite data.

preprint2022arXiv

Debiasing pipeline improves deep learning model generalization for X-ray based lung nodule detection

Lung cancer is the leading cause of cancer death worldwide and a good prognosis depends on early diagnosis. Unfortunately, screening programs for the early diagnosis of lung cancer are uncommon. This is in-part due to the at-risk groups being located in rural areas far from medical facilities. Reaching these populations would require a scaled approach that combines mobility, low cost, speed, accuracy, and privacy. We can resolve these issues by combining the chest X-ray imaging mode with a federated deep-learning approach, provided that the federated model is trained on homogenous data to ensure that no single data source can adversely bias the model at any point in time. In this study we show that an image pre-processing pipeline that homogenizes and debiases chest X-ray images can improve both internal classification and external generalization, paving the way for a low-cost and accessible deep learning-based clinical system for lung cancer screening. An evolutionary pruning mechanism is used to train a nodule detection deep learning model on the most informative images from a publicly available lung nodule X-ray dataset. Histogram equalization is used to remove systematic differences in image brightness and contrast. Model training is performed using all combinations of lung field segmentation, close cropping, and rib suppression operators. We show that this pre-processing pipeline results in deep learning models that successfully generalize an independent lung nodule dataset using ablation studies to assess the contribution of each operator in this pipeline. In stripping chest X-ray images of known confounding variables by lung field segmentation, along with suppression of signal noise from the bone structure we can train a highly accurate deep learning lung nodule detection algorithm with outstanding generalization accuracy of 89% to nodule samples in unseen data.

preprint2021arXiv

A new method for constructing continuous distributions on the unit interval

A novel approach towards construction of absolutely continuous distributions over the unit interval is proposed. Considering two absolutely continuous random variables with positive support, this method conditions on their convolution to generate a new random variable in the unit interval. This approach is demonstrated using some popular choices of the positive random variables such as the exponential, Lindley, gamma. Some existing distributions like the uniform and the beta are formulated with this method. Several new structures of density functions having potential for future application in real life problems are also provided. One of the new distributions having one parameter is considered for parameter estimation and real life modelling application and shown to provide better fit than the popular one parameter Topp-Leone model.

preprint2021arXiv

MAVIDH Score: A COVID-19 Severity Scoring using Chest X-Ray Pathology Features

The application of computer vision for COVID-19 diagnosis is complex and challenging, given the risks associated with patient misclassifications. Arguably, the primary value of medical imaging for COVID-19 lies rather on patient prognosis. Radiological images can guide physicians assessing the severity of the disease, and a series of images from the same patient at different stages can help to gauge disease progression. Hence, a simple method based on lung-pathology interpretable features for scoring disease severity from Chest X-rays is proposed here. As the primary contribution, this method correlates well to patient severity in different stages of disease progression with competitive results compared to other existing, more complex methods. An original data selection approach is also proposed, allowing the simple model to learn the severity-related features. It is hypothesized that the resulting competitive performance presented here is related to the method being feature-based rather than reliant on lung involvement or opacity as others in the literature. A second contribution comes from the validation of the results, conceptualized as the scoring of patients groups from different stages of the disease. Besides performing such validation on an independent data set, the results were also compared with other proposed scoring methods in the literature. The results show that there is a significant correlation between the scoring system (MAVIDH) and patient outcome, which could potentially help physicians rating and following disease progression in COVID-19 patients.

preprint2021arXiv

Potential Features of ICU Admission in X-ray Images of COVID-19 Patients

X-ray images may present non-trivial features with predictive information of patients that develop severe symptoms of COVID-19. If true, this hypothesis may have practical value in allocating resources to particular patients while using a relatively inexpensive imaging technique. The difficulty of testing such a hypothesis comes from the need for large sets of labelled data, which need to be well-annotated and should contemplate the post-imaging severity outcome. This paper presents an original methodology for extracting semantic features that correlate to severity from a data set with patient ICU admission labels through interpretable models. The methodology employs a neural network trained to recognise lung pathologies to extract the semantic features, which are then analysed with low-complexity models to limit overfitting while increasing interpretability. This analysis points out that only a few features explain most of the variance between patients that developed severe symptoms. When applied to an unrelated larger data set with pathology-related clinical notes, the method has shown to be capable of selecting images for the learned features, which could translate some information about their common locations in the lung. Besides attesting separability on patients that eventually develop severe symptoms, the proposed methods represent a statistical approach highlighting the importance of features related to ICU admission that may have been only qualitatively reported. While handling limited data sets, notable methodological aspects are adopted, such as presenting a state-of-the-art lung segmentation network and the use of low-complexity models to avoid overfitting. The code for methodology and experiments is also available.

preprint2020arXiv

A Practical Blockchain Framework using Image Hashing for Image Authentication

Blockchain is a relatively new technology that can be seen as a decentralised database. Blockchain systems heavily rely on cryptographic hash functions to store their data, which makes it difficult to tamper with any data stored in the system. A topic that was researched along with blockchain is image authentication. Image authentication focuses on investigating and maintaining the integrity of images. As a blockchain system can be useful for maintaining data integrity, image authentication has the potential to be enhanced by blockchain. There are many techniques that can be used to authenticate images; the technique investigated by this work is image hashing. Image hashing is a technique used to calculate how similar two different images are. This is done by converting the images into hashes and then comparing them using a distance formula. To investigate the topic, an experiment involving a simulated blockchain was created. The blockchain acted as a database for images. This blockchain was made up of devices which contained their own unique image hashing algorithms. The blockchain was tested by creating modified copies of the images contained in the database, and then submitting them to the blockchain to see if it will return the original image. Through this experiment it was discovered that it is plausible to create an image authentication system using blockchain and image hashing. However, the design proposed by this work requires refinement, as it appears to struggle in some situations. This work shows that blockchain can be a suitable approach for authenticating images, particularly via image hashing. Other observations include that using multiple image hash algorithms at the same time can increase performance in some cases, as well as that each type of test done to the blockchain has its own unique pattern to its data.

preprint2020arXiv

Beta Poisson-G Family of Distributions: Its Properties and Application with Failure Time Data

A new generalization of the family of Poisson-G is called beta Poisson-G family of distribution. Useful expansions of the probability density function and the cumulative distribution function of the proposed family are derived and seen as infinite mixtures of the Poisson-G distribution. Moment generating function, power moments, entropy, quantile function, skewness and kurtosis are investigated. Numerical computation of moments, skewness, kurtosis and entropy are tabulated for select parameter values. Furthermore, estimation by methods of maximum likelihood is discussed. A simulation study is carried at under varying sample size to assess the performance of this model. Finally suitability check of the proposed model in comparison to its recently introduced models is carried out by considering two real life data sets modeling.

preprint2020arXiv

Changing Clusters of Indian States with respect to number of Cases of COVID-19 using incrementalKMN Method

The novel Coronavirus (COVID-19) incidence in India is currently experiencing exponential rise but with apparent spatial variation in growth rate and doubling time rate. We classify the states into five clusters with low to the high-risk category and study how the different states moved from one cluster to the other since the onset of the first case on $30^{th}$ January 2020 till the end of unlock 1 that is $30^{th}$ June 2020. We have implemented a new clustering technique called the incrementalKMN (Prasad, R. K., Sarmah, R., Chakraborty, S.(2019))

preprint2020arXiv

On a family that unifies Generalized Marshall-Olkin and Poisson-G family of distribution

Unifying the generalized Marshall-Olkin (GMO) and Poisson-G (P-G) a new family of distribution is proposed. Density and the survival function are expressed as infinite mixtures of P-G family. The quantile function, asymptotes, shapes, stochastic ordering, moment generating function, order statistics, probability weighted moments and Rényi entropy are derived. Maximum likelihood estimation with large sample properties is presented. A Monte Carlo simulation is used to examine the pattern of the bias and the mean square error of the maximum likelihood estimators. An illustration of comparison with some of the important sub models of the family in modeling a real data reveals the utility of the proposed family.

preprint2020arXiv

The Poisson Transmuted-G Family of Distributions: Its Properties and Applications

In this paper introduces a new family of continuous distributions namely the Poison transmuted-G family of distribution is proposed by inducing two addition parameter on the base line G distribution. Some of its mathematical properties including explicit expressions for the moments generating function, order statistics, Probability weighted moments, stress-strength reliability, residual life, reversed residual life, Rényi entropy and mean deviation are derived. Some special models of the new family are listed. Estimation of the model parameters by the maximum likelihood method is discussed. The advantage of the proposed family in data fitting is illustrated by means of two applications to failure time data set.

preprint2016arXiv

A modified Conway-Maxwell-Poisson type binomial distribution and its applications

This paper proposes a generalized binomial distribution with four parameters, which is derived from the finite capacity queueing system with state-dependent service and arrival rates. This distribution is also generated from the conditional Conway-Maxwell-Poisson distribution given a sum of two Conway-Maxwell-Poisson variables. In this paper, we consider the properties about the probability mass function, index of dispersion, skewness and kurtosis and give applications of the proposed distribution from its geneses. The estimation method and simulation study are also considered.

preprint2016arXiv

An Alternative Discrete Skew Logistic Distribution

In this paper, an alternative Discrete skew Logistic distribution is proposed, which is derived by using the general approach of discretizing a continuous distribution while retaining its survival function. The properties of the distribution are explored and it is compared to a discrete distribution defined on integers recently proposed in the literature. The estimation of its parameters are discussed, with particular focus on the maximum likelihood method and the method of proportion, which is particularly suitable for such a discrete model. A Monte Carlo simulation study is carried out to assess the statistical properties of these inferential techniques. Application of the proposed model to a real life data is given as well.

preprint2016arXiv

Analysis of Count Data by Transmuted Geometric Distribution

Transmuted geometric distribution (TGD) was recently introduced and investigated by Chakraborty and Bhati (2016). This is a flexible extension of geometric distribution having an additional parameter that determines its zero inflation as well as the tail length. In the present article we further study this distribution for some of its reliability, stochastic ordering and parameter estimation properties. In parameter estimation among others we discuss an EM algorithm and the performance of estimators is evaluated through extensive simulation. For assessing the statistical significance of additional parameter, Likelihood ratio test, the Rao's score tests and the Wald's test are developed and its empirical power via simulation were compared. We have demonstrate two applications of (TGD) in modeling real life count data.

preprint2016arXiv

Beta generated Kumaraswamy-G and other new families of distributions

A new generalization of the family of Kumaraswamy-G (Cordeiro and de Castro, 2011) distribution that includes three recently proposed families namely the Garhy generated family (Elgarhy et al., 2016), Beta-Dagum and Beta-Singh-Maddala distribution (Domma and Condino, 2016) is proposed by constructing beta generated Kumaraswamy-G distribution. Useful expansions of the pdf and the cdf of the proposed family is derived and seen as infinite mixtures of the Kumaraswamy-G distribution. Order statistics, Probability weighted moments, moment generating function, Rényi entropies, quantile power series, random sample generation, asymptotes and shapes are also investigated. Two methods of parameter estimation are presented. Suitability of the proposed model in comparisons to its sub models is carried out considering two real life data sets. Finally, some new classes of beta generated families are proposed for future investigations.

preprint2016arXiv

The Beta Generalized Marshall-Olkin-G Family of Distributions

In this paper we propose a new family of distribution considering Generalized Marshal-Olkin distribution as the base line distribution in the Beta-G family of Construction. The new family includes Beta-G (Eugene et al. 2002 and Jones, 2004) and (Jayakumar and Mathew, 2008) families as particular cases. Probability density function (pdf) and the cumulative distribution function (cdf) are expressed as mixture of the Marshal-Olkin (Marshal and Olkin, 1997) distribution. Series expansions of pdf of the order statistics are also obtained. Moments, moment generating function, Rényi entropies, quantile power series, random sample generation and asymptotes are also investigated. Parameter estimation by method of maximum likelihood and method of moment are also presented. Finally proposed model is compared to the Generalized Marshall-Olkin Kumaraswamy extended family (Handique and Chakraborty, 2015) by considering three data fitting examples with real life data sets.

preprint2016arXiv

The Generalized Marshall-Olkin-Kumaraswamy-G family of distributions

A new family of distribution is proposed by using Kumaraswamy-G (Cordeiro and de Castro, 2011) distribution as the base line distribution in the Generalized Marshal-Olkin (Jayakumar and Mathew, 2008) Construction. A number of special cases are presented. By expanding the probability density function and the survival function as infinite series the proposed family is seen as infinite mixtures of the Kumaraswamy-G (Cordeiro and de Castro, 2011) distribution. Density function and its series expansions for order statistics are also obtained. Order statistics, moments, moment generating function, Rényi entropy, quantile function, random sample generation, asymptotes, shapes and stochastic orderings are also investigated. The methods of parameter estimation by method of maximum likelihood and method of moment are presented. Large sample standard error and confidence intervals for the mles are also discussed. One real life application of comparative data fitting with some of the important sub models of the family and some other models is considered.

preprint2016arXiv

The Kumaraswamy Generalized Marshall-Olkin-G family of distributions

Another new family of continuous probability distribution is proposed by using Generalized Marshal-Olkin distribution as the base line distribution in the Kumaraswamy-G distribution. This family includes (Cordeiro and de Castro, 2011) and (Jayakumar and Mathew, 2008) families special case besides a under of other distributions. The probability density function (pdf) and the survival function (sf) are expressed as series to observe as a mixture of the Generalized Marshal-Olkin distribution. Series expansions pdf of order statistics are also obtained. Moments, moment generating function, Rényi entropies, quantile function, random sample generation and asymptotes are also investigated. Parameter estimation by method of maximum likelihood and method of moment are also presented. Finally the proposed model is compared to the Generalized Marshall-Olkin Kumaraswamy extended family (Handique and Chakraborty, 2015) by considering four examples of real life data modeling.

preprint2016arXiv

The Marshall-Olkin-Kumarswamy-G family of distributions

A new family of continuous distribution is proposed by using Kumaraswamy-G (Cordeiro and de Castro, 2011) distribution as the base line distribution in the Marshal-Olkin (Marshall and Olkin, 1997) construction. A number of known distributions are derived as particular cases. Various properties of the proposed family like formulation of the pdf as different mixture of exponentiated baseline distributions, order statistics, moments, moment generating function, Renyi entropy, quantile function and random sample generation have been investigated. Asymptotes, shapes and stochastic ordering are also investigated. The parameter estimation by methods of maximum likelihood, their large sample standard errors and confidence intervals and method of moment are also presented. Two members of the proposed family are compared with corresponding members of Kumaraswamy-Marshal-Olkin-G family (Alizadeh et al., 2015) by fitting of two real life data sets.

preprint2015arXiv

A Discrete Power Distribution

A new discrete distribution has been proposed as a discrete analogue of the two sided power distribution [Van Drop, J. R. and Kotz, S. (2002a). A novel extension of the triangular distribution and its parameter estimation, Journal of the Royal Statistical Society, Series D (The Statistician), 51, 1: 63-79]. This probability mass function and hazard rate function of this distribution can assume a variety of shapes including bath tub, rectangular, trapezoidal, triangular, J, inverse J, U inverse U, strictly decreasing and strictly increasing shapes. Its moment and reliability properties along with parameter estimation have been investigated.

preprint2015arXiv

Extended Conway-Maxwell-Poisson distribution and its properties and applications

A new three parameter natural extension of the Conway-Maxwell-Poisson (COM-Poisson) distribution is proposed. This distribution includes the recently proposed COM-Poisson type negative binomial (COM-NB) distribution [Chakraborty, S. and Ong, S. H. (2014): A COM-type Generalization of the Negative Binomial Distribution, Accepted in Communications in Statistics-Theory and Methods] and the generalized COM-Poisson (GCOMP) distribution [Imoto, T. :(2014) A generalized Conway-Maxwell-Poisson distribution which includes the negative binomial distribution, Applied Mathematics and Computation, 247, 824-834]. The proposed distribution is derived from a queuing system with state dependent arrival and service rates and also from an exponential combination of negative binomial and COM-Poisson distribution. Some distributional, reliability and stochastic ordering properties are investigated. Computational asymptotic approximations, different characterizations, parameter estimation and data fitting example also discussed.

preprint2015arXiv

Transmuted Geometric Distribution and its Prpoerties

Transmuted geometric distribution with two parameters and is proposed as a new generalization of the geometric distribution by employing the quadratic transmutation techniques of Shaw and Buckley (2007). Its important distributional and reliability properties are investigated. Parameter estimation methods is discussed.

preprint2014arXiv

A Discrete Gumbel Distribution

A discrete version of the Gumbel (Type I) extreme value distribution has been derived by using the general approach of discretization of a continuous distribution. Important distributional and reliability properties have been explored. It has been shown that depending on the choice of parameters the proposed distribution can be positively or negatively skewed; possess long tail(s), and exhibits equal, over or under dispersion. Log concavity of the distribution and consequential results has been established. Estimation of parameters by method of maximum likelihood, method of moments, and method of proportions has been presented. A method of checking model adequacy and regression type estimation based on empirical survival function has also been examined. Simulation study has been carried out to check the efficacy of the maximum likelihood estimators. Finally, the proposed distribution has been applied to model three real life count data regarding maximum flood discharges and annual maximum wind speeds from literature.

preprint2014arXiv

Mittag - Leffler function distribution - A new generalization of hyper-Poisson distribution

In this paper a new generalization of the hyper-Poisson distribution is proposed using the Mittag-Leffler function. The hyper-Poisson, displaced Poisson, Poisson and geometric distributions among others are seen as particular cases. This Mittag-Leffler function distribution (MLFD) belongs to the generalized hypergeometric and generalized power series families and also arises as weighted Poison distributions. MLFD is a flexible distribution with varying shapes like non-increasing with unique mode at zero, unimodal with one / two non-zero modes. It can be under, equi or over dispersed. Various distributional properties like recurrence relation for pmf, cumulative distribution function, generating functions, formulae for different type of moments, their recurrence relations, index of dispersion, its classification, log-concavity, reliability properties like survival, increasing failure rate, unimodality, and stochastic ordering with respect to hyper-Poisson distribution have been discussed. The distribution has been found to fare well when compared with the hyper-Poisson distributions in its suitability in empirical modeling of differently dispersed count data. It is therefore expected that proposed MLFD with its interesting features and flexibility, will be a useful addition as a model for count data.

Subrata Chakraborty

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

Bivariate binomial conditionals distributions with positive and negative correlations: A statistical study

2-speed network ensemble for efficient classification of incremental land-use/land-cover satellite image chips

Debiasing pipeline improves deep learning model generalization for X-ray based lung nodule detection

A new method for constructing continuous distributions on the unit interval

MAVIDH Score: A COVID-19 Severity Scoring using Chest X-Ray Pathology Features

Potential Features of ICU Admission in X-ray Images of COVID-19 Patients

A Practical Blockchain Framework using Image Hashing for Image Authentication

Beta Poisson-G Family of Distributions: Its Properties and Application with Failure Time Data

Changing Clusters of Indian States with respect to number of Cases of COVID-19 using incrementalKMN Method

On a family that unifies Generalized Marshall-Olkin and Poisson-G family of distribution

The Poisson Transmuted-G Family of Distributions: Its Properties and Applications

A modified Conway-Maxwell-Poisson type binomial distribution and its applications

An Alternative Discrete Skew Logistic Distribution

Analysis of Count Data by Transmuted Geometric Distribution

Beta generated Kumaraswamy-G and other new families of distributions

The Beta Generalized Marshall-Olkin-G Family of Distributions

The Generalized Marshall-Olkin-Kumaraswamy-G family of distributions

The Kumaraswamy Generalized Marshall-Olkin-G family of distributions

The Marshall-Olkin-Kumarswamy-G family of distributions

A Discrete Power Distribution

Extended Conway-Maxwell-Poisson distribution and its properties and applications

Transmuted Geometric Distribution and its Prpoerties

A Discrete Gumbel Distribution

Mittag - Leffler function distribution - A new generalization of hyper-Poisson distribution