Researcher profile

Subrata Chakraborty

Subrata Chakraborty contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2023arXiv

Bivariate binomial conditionals distributions with positive and negative correlations: A statistical study

In this article, we discuss a bivariate distribution whose conditionals are univariate binomial distributions and the marginals are not binomial that exhibits negative correlation. Some useful structural properties of this distribution namely marginals, moments, generating functions, stochastic ordering are investigated. Simple proofs of negative correlation, marginal over-dispersion, distribution of sum and conditional given the sum are also derived. The distribution is shown to be a member of the multi-parameter exponential family and some natural but useful consequences are also outlined. The proposed distribution tends to a recently investigated conditional Poisson distribution studied by Ghosh et al. (2020). Finally, the distribution is fitted to two bivariate count data sets with an inherent negative correlation to illustrate its suitability.

preprint2022arXiv

2-speed network ensemble for efficient classification of incremental land-use/land-cover satellite image chips

The ever-growing volume of satellite imagery data presents a challenge for industry and governments making data-driven decisions based on the timely analysis of very large data sets. Commonly used deep learning algorithms for automatic classification of satellite images are time and resource-intensive to train. The cost of retraining in the context of Big Data presents a practical challenge when new image data and/or classes are added to a training corpus. Recognizing the need for an adaptable, accurate, and scalable satellite image chip classification scheme, in this research we present an ensemble of: i) a slow to train but high accuracy vision transformer; and ii) a fast to train, low-parameter convolutional neural network. The vision transformer model provides a scalable and accurate foundation model. The high-speed CNN provides an efficient means of incorporating newly labelled data into analysis, at the expense of lower accuracy. To simulate incremental data, the very large (~400,000 images) So2Sat LCZ42 satellite image chip dataset is divided into four intervals, with the high-speed CNN retrained every interval and the vision transformer trained every half interval. This experimental setup mimics an increase in data volume and diversity over time. For the task of automated land-cover/land-use classification, the ensemble models for each data increment outperform each of the component models, with best accuracy of 65% against a holdout test partition of the So2Sat dataset. The proposed ensemble and staggered training schedule provide a scalable and cost-effective satellite image classification scheme that is optimized to process very large volumes of satellite data.

preprint2022arXiv

Debiasing pipeline improves deep learning model generalization for X-ray based lung nodule detection

Lung cancer is the leading cause of cancer death worldwide and a good prognosis depends on early diagnosis. Unfortunately, screening programs for the early diagnosis of lung cancer are uncommon. This is in-part due to the at-risk groups being located in rural areas far from medical facilities. Reaching these populations would require a scaled approach that combines mobility, low cost, speed, accuracy, and privacy. We can resolve these issues by combining the chest X-ray imaging mode with a federated deep-learning approach, provided that the federated model is trained on homogenous data to ensure that no single data source can adversely bias the model at any point in time. In this study we show that an image pre-processing pipeline that homogenizes and debiases chest X-ray images can improve both internal classification and external generalization, paving the way for a low-cost and accessible deep learning-based clinical system for lung cancer screening. An evolutionary pruning mechanism is used to train a nodule detection deep learning model on the most informative images from a publicly available lung nodule X-ray dataset. Histogram equalization is used to remove systematic differences in image brightness and contrast. Model training is performed using all combinations of lung field segmentation, close cropping, and rib suppression operators. We show that this pre-processing pipeline results in deep learning models that successfully generalize an independent lung nodule dataset using ablation studies to assess the contribution of each operator in this pipeline. In stripping chest X-ray images of known confounding variables by lung field segmentation, along with suppression of signal noise from the bone structure we can train a highly accurate deep learning lung nodule detection algorithm with outstanding generalization accuracy of 89% to nodule samples in unseen data.

preprint2021arXiv

A new method for constructing continuous distributions on the unit interval

A novel approach towards construction of absolutely continuous distributions over the unit interval is proposed. Considering two absolutely continuous random variables with positive support, this method conditions on their convolution to generate a new random variable in the unit interval. This approach is demonstrated using some popular choices of the positive random variables such as the exponential, Lindley, gamma. Some existing distributions like the uniform and the beta are formulated with this method. Several new structures of density functions having potential for future application in real life problems are also provided. One of the new distributions having one parameter is considered for parameter estimation and real life modelling application and shown to provide better fit than the popular one parameter Topp-Leone model.

preprint2021arXiv

MAVIDH Score: A COVID-19 Severity Scoring using Chest X-Ray Pathology Features

The application of computer vision for COVID-19 diagnosis is complex and challenging, given the risks associated with patient misclassifications. Arguably, the primary value of medical imaging for COVID-19 lies rather on patient prognosis. Radiological images can guide physicians assessing the severity of the disease, and a series of images from the same patient at different stages can help to gauge disease progression. Hence, a simple method based on lung-pathology interpretable features for scoring disease severity from Chest X-rays is proposed here. As the primary contribution, this method correlates well to patient severity in different stages of disease progression with competitive results compared to other existing, more complex methods. An original data selection approach is also proposed, allowing the simple model to learn the severity-related features. It is hypothesized that the resulting competitive performance presented here is related to the method being feature-based rather than reliant on lung involvement or opacity as others in the literature. A second contribution comes from the validation of the results, conceptualized as the scoring of patients groups from different stages of the disease. Besides performing such validation on an independent data set, the results were also compared with other proposed scoring methods in the literature. The results show that there is a significant correlation between the scoring system (MAVIDH) and patient outcome, which could potentially help physicians rating and following disease progression in COVID-19 patients.

preprint2021arXiv

Potential Features of ICU Admission in X-ray Images of COVID-19 Patients

X-ray images may present non-trivial features with predictive information of patients that develop severe symptoms of COVID-19. If true, this hypothesis may have practical value in allocating resources to particular patients while using a relatively inexpensive imaging technique. The difficulty of testing such a hypothesis comes from the need for large sets of labelled data, which need to be well-annotated and should contemplate the post-imaging severity outcome. This paper presents an original methodology for extracting semantic features that correlate to severity from a data set with patient ICU admission labels through interpretable models. The methodology employs a neural network trained to recognise lung pathologies to extract the semantic features, which are then analysed with low-complexity models to limit overfitting while increasing interpretability. This analysis points out that only a few features explain most of the variance between patients that developed severe symptoms. When applied to an unrelated larger data set with pathology-related clinical notes, the method has shown to be capable of selecting images for the learned features, which could translate some information about their common locations in the lung. Besides attesting separability on patients that eventually develop severe symptoms, the proposed methods represent a statistical approach highlighting the importance of features related to ICU admission that may have been only qualitatively reported. While handling limited data sets, notable methodological aspects are adopted, such as presenting a state-of-the-art lung segmentation network and the use of low-complexity models to avoid overfitting. The code for methodology and experiments is also available.

preprint2020arXiv

A Practical Blockchain Framework using Image Hashing for Image Authentication

Blockchain is a relatively new technology that can be seen as a decentralised database. Blockchain systems heavily rely on cryptographic hash functions to store their data, which makes it difficult to tamper with any data stored in the system. A topic that was researched along with blockchain is image authentication. Image authentication focuses on investigating and maintaining the integrity of images. As a blockchain system can be useful for maintaining data integrity, image authentication has the potential to be enhanced by blockchain. There are many techniques that can be used to authenticate images; the technique investigated by this work is image hashing. Image hashing is a technique used to calculate how similar two different images are. This is done by converting the images into hashes and then comparing them using a distance formula. To investigate the topic, an experiment involving a simulated blockchain was created. The blockchain acted as a database for images. This blockchain was made up of devices which contained their own unique image hashing algorithms. The blockchain was tested by creating modified copies of the images contained in the database, and then submitting them to the blockchain to see if it will return the original image. Through this experiment it was discovered that it is plausible to create an image authentication system using blockchain and image hashing. However, the design proposed by this work requires refinement, as it appears to struggle in some situations. This work shows that blockchain can be a suitable approach for authenticating images, particularly via image hashing. Other observations include that using multiple image hash algorithms at the same time can increase performance in some cases, as well as that each type of test done to the blockchain has its own unique pattern to its data.

preprint2020arXiv

Beta Poisson-G Family of Distributions: Its Properties and Application with Failure Time Data

A new generalization of the family of Poisson-G is called beta Poisson-G family of distribution. Useful expansions of the probability density function and the cumulative distribution function of the proposed family are derived and seen as infinite mixtures of the Poisson-G distribution. Moment generating function, power moments, entropy, quantile function, skewness and kurtosis are investigated. Numerical computation of moments, skewness, kurtosis and entropy are tabulated for select parameter values. Furthermore, estimation by methods of maximum likelihood is discussed. A simulation study is carried at under varying sample size to assess the performance of this model. Finally suitability check of the proposed model in comparison to its recently introduced models is carried out by considering two real life data sets modeling.

preprint2020arXiv

Changing Clusters of Indian States with respect to number of Cases of COVID-19 using incrementalKMN Method

The novel Coronavirus (COVID-19) incidence in India is currently experiencing exponential rise but with apparent spatial variation in growth rate and doubling time rate. We classify the states into five clusters with low to the high-risk category and study how the different states moved from one cluster to the other since the onset of the first case on $30^{th}$ January 2020 till the end of unlock 1 that is $30^{th}$ June 2020. We have implemented a new clustering technique called the incrementalKMN (Prasad, R. K., Sarmah, R., Chakraborty, S.(2019))

preprint2020arXiv

On a family that unifies Generalized Marshall-Olkin and Poisson-G family of distribution

Unifying the generalized Marshall-Olkin (GMO) and Poisson-G (P-G) a new family of distribution is proposed. Density and the survival function are expressed as infinite mixtures of P-G family. The quantile function, asymptotes, shapes, stochastic ordering, moment generating function, order statistics, probability weighted moments and Rényi entropy are derived. Maximum likelihood estimation with large sample properties is presented. A Monte Carlo simulation is used to examine the pattern of the bias and the mean square error of the maximum likelihood estimators. An illustration of comparison with some of the important sub models of the family in modeling a real data reveals the utility of the proposed family.

preprint2020arXiv

The Poisson Transmuted-G Family of Distributions: Its Properties and Applications

In this paper introduces a new family of continuous distributions namely the Poison transmuted-G family of distribution is proposed by inducing two addition parameter on the base line G distribution. Some of its mathematical properties including explicit expressions for the moments generating function, order statistics, Probability weighted moments, stress-strength reliability, residual life, reversed residual life, Rényi entropy and mean deviation are derived. Some special models of the new family are listed. Estimation of the model parameters by the maximum likelihood method is discussed. The advantage of the proposed family in data fitting is illustrated by means of two applications to failure time data set.