Source author record

Justin Khim

Justin Khim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory Social and Information Networks

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Class-Weighted Classification: Trade-offs and Robust Approaches

We address imbalanced classification, the problem in which a label may have low marginal probability relative to other labels, by weighting losses according to the correct class. First, we examine the convergence rates of the expected excess weighted risk of plug-in classifiers where the weighting for the plug-in classifier and the risk may be different. This leads to irreducible errors that do not converge to the weighted Bayes risk, which motivates our consideration of robust risks. We define a robust risk that minimizes risk over a set of weightings and show excess risk bounds for this problem. Finally, we show that particular choices of the weighting set leads to a special instance of conditional value at risk (CVaR) from stochastic programming, which we call label conditional value at risk (LCVaR). Additionally, we generalize this weighting to derive a new robust risk problem that we call label heterogeneous conditional value at risk (LHCVaR). Finally, we empirically demonstrate the efficacy of LCVaR and LHCVaR on improving class conditional risks.

preprint2020arXiv

Multiclass Classification via Class-Weighted Nearest Neighbors

We study statistical properties of the k-nearest neighbors algorithm for multiclass classification, with a focus on settings where the number of classes may be large and/or classes may be highly imbalanced. In particular, we consider a variant of the k-nearest neighbor classifier with non-uniform class-weightings, for which we derive upper and minimax lower bounds on accuracy, class-weighted risk, and uniform error. Additionally, we show that uniform error bounds lead to bounds on the difference between empirical confusion matrix quantities and their population counterparts across a set of weights. As a result, we may adjust the class weights to optimize classification metrics such as F1 score or Matthew's Correlation Coefficient that are commonly used in practice, particularly in settings with imbalanced classes. We additionally provide a simple example to instantiate our bounds and numerical experiments.

preprint2019arXiv

Permutation Tests for Infection Graphs

We formulate and analyze a novel hypothesis testing problem for inferring the edge structure of an infection graph. In our model, a disease spreads over a network via contagion or random infection, where the random variables governing the rates of contracting the disease from neighbors or random infection are independent exponential random variables with unknown rate parameters. A subset of nodes is also censored uniformly at random. Given the statuses of nodes in the network, the goal is to determine the underlying graph. We present a procedure based on permutation testing, and we derive sufficient conditions for the validity of our test in terms of automorphism groups of the graphs corresponding to the null and alternative hypotheses. Further, the test is valid more generally for infection processes satisfying a basic symmetry condition. Our test is easy to compute and does not involve estimating unknown parameters governing the process. We also derive risk bounds for our permutation test in a variety of settings, and motivate our test statistic in terms of approximate equivalence to likelihood ratio testing and maximin tests. We conclude with an application to real data from an HIV infection network.