Source author record

Caleb Levy

Caleb Levy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence astro-ph.CO cs.CY Data Structures and Algorithms hep-ph Machine Learning Social and Information Networks

Catalog footprint

What is connected

3works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Classic Graph Structural Features Outperform Factorization-Based Graph Embedding Methods on Community Labeling

Graph representation learning (also called graph embeddings) is a popular technique for incorporating network structure into machine learning models. Unsupervised graph embedding methods aim to capture graph structure by learning a low-dimensional vector representation (the embedding) for each node. Despite the widespread use of these embeddings for a variety of downstream transductive machine learning tasks, there is little principled analysis of the effectiveness of this approach for common tasks. In this work, we provide an empirical and theoretical analysis for the performance of a class of embeddings on the common task of pairwise community labeling. This is a binary variant of the classic community detection problem, which seeks to build a classifier to determine whether a pair of vertices participate in a community. In line with our goal of foundational understanding, we focus on a popular class of unsupervised embedding techniques that learn low rank factorizations of a vertex proximity matrix (this class includes methods like GraRep, DeepWalk, node2vec, NetMF). We perform detailed empirical analysis for community labeling over a variety of real and synthetic graphs with ground truth. In all cases we studied, the models trained from embedding features perform poorly on community labeling. In constrast, a simple logistic model with classic graph structural features handily outperforms the embedding models. For a more principled understanding, we provide a theoretical analysis for the (in)effectiveness of these embeddings in capturing the community structure. We formally prove that popular low-dimensional factorization methods either cannot produce community structure, or can only produce ``unstable" communities. These communities are inherently unstable under small perturbations.

preprint2022arXiv

Constraining Dark Matter properties with the first generation of stars

Dark Matter (DM) can be trapped by the gravitational field of any star, since collisions with nuclei in dense environments can slow down the DM particle below the escape velocity ($v_{esc}$) at the surface of the star. If captured, the DM particles can self-annihilate, and, therefore, provide a new source of energy for the star. We investigate this phenomenon for capture of DM particles by the first generation of stars [Population III (Pop III) stars], by using the multiscatter capture formalism. Pop III stars are particularly good DM captors, since they form in DM-rich environments, at the center of$~\sim 10^6 M_\odot$ DM minihalos, at redshifts $z\sim 15$. Assuming a DM-proton scattering cross section ($σ)$ at the current deepest exclusion limits provided by the XENON1T experiment, we find that captured DM annihilations at the core of Pop III stars can lead, via the Eddington limit, to upper bounds in stellar masses that can be as low as a few $M_\odot$ if the ambient DM density ($ρ_X$) at the location of the Pop III star is sufficiently high. Conversely, when Pop III stars are identified, one can use their observed mass ($M_\star$) to place bounds on $ρ_Xσ$. Using adiabatic contraction to estimate the ambient DM density in the environment surrounding Pop III stars, we place projected upper limits on $σ$, for $M_\star$ in the $100-1000~M_\odot$ range, and find bounds that are competitive with, or deeper than, those provided by the most sensitive current direct detection experiments for both spin independent and spin dependent interactions, for a wide range of DM masses. Most intriguingly, we find that Pop III stars with mass $M_\star \gtrsim 300 M_\odot$ could be used to probe the SD proton-DM cross section below the "neutrino floor," i.e. the region of parameter space where DM direct detection experiments will soon become overwhelmed by neutrino backgrounds.

preprint2021arXiv

Fair Classification with Group-Dependent Label Noise

This work examines how to train fair classifiers in settings where training labels are corrupted with random noise, and where the error rates of corruption depend both on the label class and on the membership function for a protected subgroup. Heterogeneous label noise models systematic biases towards particular groups when generating annotations. We begin by presenting analytical results which show that naively imposing parity constraints on demographic disparity measures, without accounting for heterogeneous and group-dependent error rates, can decrease both the accuracy and the fairness of the resulting classifier. Our experiments demonstrate these issues arise in practice as well. We address these problems by performing empirical risk minimization with carefully defined surrogate loss functions and surrogate constraints that help avoid the pitfalls introduced by heterogeneous label noise. We provide both theoretical and empirical justifications for the efficacy of our methods. We view our results as an important example of how imposing fairness on biased data sets without proper care can do at least as much harm as it does good.