Source author record

Haris I. Sair

Haris I. Sair appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Machine Learning Methodology Neurons and Cognition

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Adaptive Label Error Detection: A Bayesian Approach to Mislabeled Data Detection

Machine learning classification systems are susceptible to poor performance when trained with incorrect ground truth labels, even when data is well-curated by expert annotators. As machine learning becomes more widespread, it is increasingly imperative to identify and correct mislabeling to develop more powerful models. In this work, we motivate and describe Adaptive Label Error Detection (ALED), a novel method of detecting mislabeling. ALED extracts an intermediate feature space from a deep convolutional neural network, denoises the features, models the reduced manifold of each class with a multidimensional Gaussian distribution, and performs a simple likelihood ratio test to identify mislabeled samples. We show that ALED has markedly increased sensitivity, without compromising precision, compared to established label error detection methods, on multiple medical imaging datasets. We demonstrate an example where fine-tuning a neural network on corrected data results in a 33.8% decrease in test set errors, providing strong benefits to end users. The ALED detector is deployed in the Python package statlab.

preprint2016arXiv

Stability and Localization of inter-individual differences in functional connectivity

Much recent attention has been paid to quantifying anatomic and functional neuroimaging on the individual subject level. For optimal individual subject characterization, specific acquisition and analysis features need to be identified that maximize inter-individual variability while concomitantly minimizing intra-subject variability. Here we develop a non-parametric statistical metric that quantifies the degree to which a parameter set allows this individual subject differentiation. We apply this metric to analyzing publicly available test-retest resting-state fMRI (rs-fMRI) data sets. We find that for the question of maximizing individual differentiation, there is a relative tradeoff between increasing sampling through increased sampling frequency or increased acquisition time; that for the sizes of the interrogated data sets, only 4-5 min of acquisition time is necessary to perfectly differentiate each subject; and that brain regions that most contribute to individuals unique characterization lie in association cortices thought to contribute to higher cognitive function. These findings may guide optimal rs-fMRI experiment design and may aid elucidation of the neural bases for subject-to-subject differences.

preprint2013arXiv

Estimating a graphical intra-class correlation coefficient (GICC) using multivariate probit-linear mixed models

Data reproducibility is a critical issue in all scientific experiments. In this manuscript, we consider the problem of quantifying the reproducibility of graphical measurements. We generalize the concept of image intra-class correlation coefficient (I2C2) and propose the concept of the graphical intra-class correlation coefficient (GICC) for such purpose. The concept of GICC is based on multivariate probit-linear mixed effect models. We will present a Markov Chain EM (MCEM) algorithm for estimating the GICC. Simulations results with varied settings are demonstrated and our method is applied to the KIRBY21 test-retest dataset.