Source author record

Hassan Doosti

Hassan Doosti appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Methodology

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A New Flexible Train-Test Split Algorithm, an approach for choosing among the Hold-out, K-fold cross-validation, and Hold-out iteration

Choosing an appropriate strategy for partitioning data into training and evaluation sets is a critical step in machine learning, yet validation methods are often selected using default or conventional settings without considering their impact on generalizability and real-world performance. Common approaches such as hold-out validation or k-fold cross-validation with fixed k values are frequently applied based solely on empirical practice. To address this issue, we propose a flexible Python-based framework that systematically examines how different validation strategies affect predictive performance across seven widely used machine learning algorithms, including Decision Trees, K-Nearest Neighbors, Naive Bayes variants, Logistic Regression, calibrated linear Support Vector Machines, and histogram-based gradient boosting. The framework evaluates these methods under a wide range of validation schemes, including hold-out splits from 10% to 90%, k-fold cross-validation with k between 3 and 15, repeated hold-out, and nested cross-validation. The framework is applied to three biomedical datasets of varying size, and performance is assessed using ROC-AUC, accuracy, and the Matthews correlation coefficient. The results show that no single validation strategy consistently outperforms others across all algorithms and datasets, indicating that optimal validation depends on the interaction between the algorithm, dataset characteristics, and evaluation metric.

preprint2021arXiv

Tilted Nonparametric Regression Function Estimation

This paper provides the theory about the convergence rate of the tilted version of linear smoother. We study tilted linear smoother, a nonparametric regression function estimator, which is obtained by minimizing the distance to an infinite order flat-top trapezoidal kernel estimator. We prove that the proposed estimator achieves a high level of accuracy. Moreover, it preserves the attractive properties of the infinite order flat-top kernel estimator. We also present an extensive numerical study for analysing the performance of two members of the tilted linear smoother class named tilted Nadaraya-Watson and tilted local linear in the finite sample. The simulation study shows that tilted Nadaraya-Watson and tilted local linear perform better than their classical analogs in some conditions in terms of Mean Integrated Squared Error (MISE). Finally, the performance of these estimators as well as the conventional estimators were illustrated by curve fitting to COVID-19 data for 12 countries and a dose-response data set.