Researcher profile

Justin Khim

Justin Khim contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2020arXiv

Class-Weighted Classification: Trade-offs and Robust Approaches

We address imbalanced classification, the problem in which a label may have low marginal probability relative to other labels, by weighting losses according to the correct class. First, we examine the convergence rates of the expected excess weighted risk of plug-in classifiers where the weighting for the plug-in classifier and the risk may be different. This leads to irreducible errors that do not converge to the weighted Bayes risk, which motivates our consideration of robust risks. We define a robust risk that minimizes risk over a set of weightings and show excess risk bounds for this problem. Finally, we show that particular choices of the weighting set leads to a special instance of conditional value at risk (CVaR) from stochastic programming, which we call label conditional value at risk (LCVaR). Additionally, we generalize this weighting to derive a new robust risk problem that we call label heterogeneous conditional value at risk (LHCVaR). Finally, we empirically demonstrate the efficacy of LCVaR and LHCVaR on improving class conditional risks.

preprint2020arXiv

Multiclass Classification via Class-Weighted Nearest Neighbors

We study statistical properties of the k-nearest neighbors algorithm for multiclass classification, with a focus on settings where the number of classes may be large and/or classes may be highly imbalanced. In particular, we consider a variant of the k-nearest neighbor classifier with non-uniform class-weightings, for which we derive upper and minimax lower bounds on accuracy, class-weighted risk, and uniform error. Additionally, we show that uniform error bounds lead to bounds on the difference between empirical confusion matrix quantities and their population counterparts across a set of weights. As a result, we may adjust the class weights to optimize classification metrics such as F1 score or Matthew's Correlation Coefficient that are commonly used in practice, particularly in settings with imbalanced classes. We additionally provide a simple example to instantiate our bounds and numerical experiments.

preprint2019arXiv

Permutation Tests for Infection Graphs

We formulate and analyze a novel hypothesis testing problem for inferring the edge structure of an infection graph. In our model, a disease spreads over a network via contagion or random infection, where the random variables governing the rates of contracting the disease from neighbors or random infection are independent exponential random variables with unknown rate parameters. A subset of nodes is also censored uniformly at random. Given the statuses of nodes in the network, the goal is to determine the underlying graph. We present a procedure based on permutation testing, and we derive sufficient conditions for the validity of our test in terms of automorphism groups of the graphs corresponding to the null and alternative hypotheses. Further, the test is valid more generally for infection processes satisfying a basic symmetry condition. Our test is easy to compute and does not involve estimating unknown parameters governing the process. We also derive risk bounds for our permutation test in a variety of settings, and motivate our test statistic in terms of approximate equivalence to likelihood ratio testing and maximin tests. We conclude with an application to real data from an HIV infection network.