Researcher profile

Soudeep Deb

Soudeep Deb contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
3topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

A review and recommendations on variable selection methods in regression models for binary data

The selection of essential variables in logistic regression is vital because of its extensive use in medical studies, finance, economics and related fields. In this paper, we explore four main typologies (test-based, penalty-based, screening-based, and tree-based) of frequentist variable selection methods in logistic regression setup. Primary objective of this work is to give a comprehensive overview of the existing literature for practitioners. Underlying assumptions and theory, along with the specifics of their implementations, are detailed as well. Next, we conduct a thorough simulation study to explore the performances of fifteen different methods in terms of variable selection, estimation of coefficients, prediction accuracy as well as time complexity under various settings. We take low, moderate and high dimensional setups and consider different correlation structures for the covariates. A real-life application, using a high-dimensional gene expression data, is also included in this study to further understand the efficacy and consistency of the methods. Finally, based on our findings in the simulated data and in the real data, we provide recommendations for practitioners on the choice of variable selection methods under various contexts.

preprint2021arXiv

A mathematical take on the competitive balance of a football league

Competitive balance in a football league is extremely important from the perspective of economic growth of the industry. Many researchers have earlier proposed different measures of competitive balance, which are primarily adapted from the standard economic theory. However, these measures fail to capture the finer nuances of the game. In this work, we discuss a new framework which is more suitable for a football league. First, we present a mathematical proof of an ideal situation where a football league becomes perfectly balanced. Next, a goal based index for competitive balance is developed. We present relevant theoretical results and show how the proposed index can be used to formally test for the presence of imbalance. The methods are implemented on the data from top five European leagues, and it shows that the new approach can better explain the changes in the seasonal competitive balance of the leagues. Further, using appropriate panel data models, we show that the proposed index is more suitable to analyze the variability in total revenues of the football leagues.

preprint2021arXiv

Analyzing count data using a time series model with an exponentially decaying covariance structure

Count data appears in various disciplines. In this work, a new method to analyze time series count data has been proposed. The method assumes exponentially decaying covariance structure, a special class of the Matérn covariance function, for the latent variable in a Poisson regression model. It is implemented in a Bayesian framework, with the help of Gibbs sampling and ARMS sampling techniques. The proposed approach provides reliable estimates for the covariate effects and estimates the extent of variability explained by the temporally dependent process and the white noise process. The method is flexible, allows irregular spaced data, and can be extended naturally to bigger datasets. The Bayesian implementation helps us to compute the posterior predictive distribution and hence is more appropriate and attractive for count data forecasting problems. Two real life applications of different flavors are included in the paper. These two examples and a short simulation study establish that the proposed approach has good inferential and predictive abilities and performs better than the other competing models.

preprint2020arXiv

A time series method to analyze incidence pattern and estimate reproduction number of COVID-19

The ongoing pandemic of Coronavirus disease (COVID-19) emerged in Wuhan, China in the end of 2019. It has already affected more than 300,000 people, with the number of deaths nearing 13000 across the world. As it has been posing a huge threat to global public health, it is of utmost importance to identify the rate at which the disease is spreading. In this study, we propose a time series model to analyze the trend pattern of the incidence of COVID-19 outbreak. We also incorporate information on total or partial lockdown, wherever available, into the model. The model is concise in structure, and using appropriate diagnostic measures, we showed that a time-dependent quadratic trend successfully captures the incidence pattern of the disease. We also estimate the basic reproduction number across different countries, and find that it is consistent except for the United States of America. The above statistical analysis is able to shed light on understanding the trends of the outbreak, and gives insight on what epidemiological stage a region is in. This has the potential to help in prompting policies to address COVID-19 pandemic in different countries.