Source author record

Palash Ghosh

Palash Ghosh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MDAS: A Diagnostic Approach to Assess the Quality of Data Splitting in Machine Learning

In the field of machine learning, model performance is usually assessed by randomly splitting data into training and test sets. Different random splits, however, can yield markedly different performance estimates, so a genuinely good model may be discarded or a poor one selected purely due to an unlucky partition. This motivates a principled way to diagnose the quality of a given data split. We propose a diagnostic framework based on a new discrepancy measure, the Mahalanobis Distribution Alignment Score (MDAS). MDAS is a symmetric dissimilarity measure between two multivariate samples, rather than a strict metric. MDAS captures both mean and covariance differences and is affine invariant. Building on this, we construct a Monte Carlo test that evaluates whether an observed split is statistically compatible with typical random splits, yielding an interpretable p-value for split quality. Using several real data sets, we study the relationship between MDAS and model robustness, including its association with the normalized Akaike information criterion. Finally, we apply MDAS to compare existing state-of-the-art deterministic data-splitting strategies with standard random splitting. The experimental results show that MDAS provides a simple, model-agnostic tool for auditing data splits and improving the reliability of empirical model evaluation.

preprint2023arXiv

Optimal Adaptive SMART Designs with Binary Outcomes

In a sequential multiple-assignment randomized trial (SMART), a sequence of treatments is given to a patient over multiple stages. In each stage, randomization may be done to allocate patients to different treatment groups. Even though SMART designs are getting popular among clinical researchers, the methodologies for adaptive randomization at different stages of a SMART are few and not sophisticated enough to handle the complexity of optimal allocation of treatments at every stage of a trial. Lack of optimal allocation methodologies can raise serious concerns about SMART designs from an ethical point of view. In this work, we develop an optimal adaptive allocation procedure to minimize the expected number of treatment failures for a SMART with a binary primary outcome. Issues related to optimal adaptive allocations are explored theoretically with supporting simulations. The applicability of the proposed methodology is demonstrated using a recently conducted SMART study named M-Bridge for developing universal and resource-efficient dynamic treatment regimes (DTRs) for incoming first-year college students as a bridge to desirable treatments to address alcohol-related risks.

preprint2022arXiv

A Novel Approach To Assess Dynamic Treatment Regimes Embedded In A Smart With An Ordinal Outcome

Sequential multiple assignment randomized trials (SMARTs) are used to construct data-driven optimal intervention strategies for subjects based on their intervention and covariate histories in different branches of health and behavioral sciences where a sequence of interventions is given to a participant. Sequential intervention strategies are often called dynamic treatment regimes (DTR). In the existing literature, the majority of the analysis methodologies for SMART data assume a continuous primary outcome. However, ordinal outcomes are also quite common in clinical practice. In this work, first, we introduce the notion of generalized odds ratio (GOR) to compare two DTRs embedded in a SMART with an ordinal outcome and discuss some combinatorial properties of this measure. Next, we propose a likelihood-based approach to estimate GOR from SMART data, and derive the asymptotic properties of its estimate. We discuss alternative ways to estimate GOR using concordant-discordant pairs and two-sample U-statistic. We derive the required sample size formula for designing SMARTs with ordinal outcomes based on GOR. A simulation study shows the performance of the estimated GOR in terms of the estimated power corresponding to the derived sample size. The methodology is applied to analyze data from the SMART+ study, conducted in the UK, to improve carbohydrate periodization behavior in athletes using a menu planner mobile application, Hexis Performance. A freely available Shiny web app using R is provided to make the proposed methodology accessible to other researchers and practitioners.