Source author record

Jeffrey D. Hart

Jeffrey D. Hart appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

3works
1topics
3close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2023arXiv

Screening Methods for Classification Based on Non-parametric Bayesian Tests

Feature or variable selection is a problem inherent to large data sets. While many methods have been proposed to deal with this problem, some can scale poorly with the number of predictors in a data set. Screening methods scale linearly with the number of predictors by checking each predictor one at a time, and are a tool used to decrease the number of variables to consider before further analysis or variable selection. For classification, there is a variety of techniques. There are parametric based screening tests, such as t-test or SIS based screening, and non-parametric based screening tests, such as Kolmogorov distance based screening, and MV-SIS. We propose a method for variable screening that uses Bayesian-motivated tests, compare it to SIS based screening, and provide example applications of the method on simulated and real data. It is shown that our screening method can lead to improvements in classification rate. This is so even when our method is used in conjunction with a classifier, such as DART, which is designed to select a sparse subset of variables. Finally, we propose a classifier based on kernel density estimates that in some cases can produce dramatic improvements in classification rates relative to DART.

preprint2016arXiv

Partitioned Cross-Validation for Divide-and-Conquer Density Estimation

We present an efficient method to estimate cross-validation bandwidth parameters for kernel density estimation in very large datasets where ordinary cross-validation is rendered highly inefficient, both statistically and computationally. Our approach relies on calculating multiple cross-validation bandwidths on partitions of the data, followed by suitable scaling and averaging to return a partitioned cross-validation bandwidth for the entire dataset. The partitioned cross-validation approach produces substantial computational gains over ordinary cross-validation. We additionally show that partitioned cross-validation can be statistically efficient compared to ordinary cross-validation. We derive analytic expressions for the asymptotically optimal number of partitions and study its finite sample accuracy through a detailed simulation study. We additionally propose a permuted version of partitioned cross-validation which attains even higher efficiency. Theoretical properties of the estimators are studied and the methodology is applied to the Higgs Boson dataset with 11 million observations

preprint2016arXiv

Theoretical Properties and Practical Performance of Fully Robust One-Sided Cross-Validation

Fully robust OSCV is a modification of the OSCV method that produces consistent bandwidth in the cases of smooth and nonsmooth regression functions. The current implementation of the method uses the kernel $H_I$ that is almost indistinguishable from the Gaussian kernel on the interval $[-4,4]$, but has negative tails. The theoretical properties and practical performances of the $H_I$- and $ϕ$-based OSCV versions are compared. The kernel $H_I$ tends to produce too low bandwidths in the smooth case. The $H_I$-based OSCV curves are shown to have wiggles appearing in the neighborhood of zero. The kernel $H_I$ uncovers sensitivity of the OSCV method to a tiny modification of the kernel used for the cross-validation purposes. The recently found robust bimodal kernels tend to produce OSCV curves with multiple local minima. The problem of finding a robust unimodal nonnegative kernel remains open.