Researcher profile

Christopher Musco

Christopher Musco contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values

With origins in game theory, probabilistic values like Shapley values, Banzhaf values, and semi-values have emerged as a central tool in explainable AI. They are used for feature attribution, data attribution, data valuation, and more. Since all of these values require exponential time to compute exactly, research has focused on efficient approximation methods using two techniques: Monte Carlo sampling and linear regression formulations. In this work, we present a new way of combining both of these techniques. Our approach is more flexible than prior algorithms, allowing for linear regression to be replaced with any function family whose probabilistic values can be computed efficiently. This allows us to harness the accuracy of tree-based models like XGBoost, while still producing unbiased estimates. From experiments across eight datasets, we find that our methods give state-of-the-art performance for estimating probabilistic values. For Shapley values, the error of our methods can be $6.5\times$ lower than Permutation SHAP (the most popular Monte Carlo method), $3.8\times$ lower than Kernel SHAP (the most popular linear regression method), and $2.6\times$ lower than Leverage SHAP (the prior state-of-the-art Shapley value estimator). For more general probabilistic values, we can obtain error $215\times$ lower than the best estimator from prior work.

preprint2022arXiv

Error bounds for Lanczos-based matrix function approximation

We analyze the Lanczos method for matrix function approximation (Lanczos-FA), an iterative algorithm for computing $f(\mathbf{A}) \mathbf{b}$ when $\mathbf{A}$ is a Hermitian matrix and $\mathbf{b}$ is a given vector. Assuming that $f : \mathbb{C} \rightarrow \mathbb{C}$ is piecewise analytic, we give a framework, based on the Cauchy integral formula, which can be used to derive a priori and a posteriori error bounds for Lanczos-FA in terms of the error of Lanczos used to solve linear systems. Unlike many error bounds for Lanczos-FA, these bounds account for fine-grained properties of the spectrum of $\mathbf{A}$, such as clustered or isolated eigenvalues. Our results are derived assuming exact arithmetic, but we show that they are easily extended to finite precision computations using existing theory about the Lanczos algorithm in finite precision. We also provide generalized bounds for the Lanczos method used to approximate quadratic forms $\mathbf{b}^\textsf{H} f(\mathbf{A}) \mathbf{b}$, and demonstrate the effectiveness of our bounds with numerical experiments.

preprint2022arXiv

Fast Regression for Structured Inputs

We study the $\ell_p$ regression problem, which requires finding $\mathbf{x}\in\mathbb R^{d}$ that minimizes $\|\mathbf{A}\mathbf{x}-\mathbf{b}\|_p$ for a matrix $\mathbf{A}\in\mathbb R^{n \times d}$ and response vector $\mathbf{b}\in\mathbb R^{n}$. There has been recent interest in developing subsampling methods for this problem that can outperform standard techniques when $n$ is very large. However, all known subsampling approaches have run time that depends exponentially on $p$, typically, $d^{\mathcal{O}(p)}$, which can be prohibitively expensive. We improve on this work by showing that for a large class of common \emph{structured matrices}, such as combinations of low-rank matrices, sparse matrices, and Vandermonde matrices, there are subsampling based methods for $\ell_p$ regression that depend polynomially on $p$. For example, we give an algorithm for $\ell_p$ regression on Vandermonde matrices that runs in time $\mathcal{O}(n\log^3 n+(dp^2)^{0.5+ω}\cdot\text{polylog}\,n)$, where $ω$ is the exponent of matrix multiplication. The polynomial dependence on $p$ crucially allows our algorithms to extend naturally to efficient algorithms for $\ell_\infty$ regression, via approximation of $\ell_\infty$ by $\ell_{\mathcal{O}(\log n)}$. Of practical interest, we also develop a new subsampling algorithm for $\ell_p$ regression for arbitrary matrices, which is simpler than previous approaches for $p \ge 4$.

preprint2022arXiv

Sublinear Time Spectral Density Estimation

We present a new sublinear time algorithm for approximating the spectral density (eigenvalue distribution) of an $n\times n$ normalized graph adjacency or Laplacian matrix. The algorithm recovers the spectrum up to $ε$ accuracy in the Wasserstein-1 distance in $O(n\cdot \text{poly}(1/ε))$ time given sample access to the graph. This result compliments recent work by David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant (2018), which obtains a solution with runtime independent of $n$, but exponential in $1/ε$. We conjecture that the trade-off between dimension dependence and accuracy is inherent. Our method is simple and works well experimentally. It is based on a Chebyshev polynomial moment matching method that employees randomized estimators for the matrix trace. We prove that, for any Hermitian $A$, this moment matching method returns an $ε$ approximation to the spectral density using just $O({1}/ε)$ matrix-vector products with $A$. By leveraging stability properties of the Chebyshev polynomial three-term recurrence, we then prove that the method is amenable to the use of coarse approximate matrix-vector products. Our sublinear time algorithm follows from combining this result with a novel sampling algorithm for approximating matrix-vector products with a normalized graph adjacency matrix. Of independent interest, we show a similar result for the widely used \emph{kernel polynomial method} (KPM), proving that this practical algorithm nearly matches the theoretical guarantees of our moment matching method. Our analysis uses tools from Jackson's seminal work on approximation with positive polynomial kernels.

preprint2020arXiv

Projection-Cost-Preserving Sketches: Proof Strategies and Constructions

In this note we illustrate how common matrix approximation methods, such as random projection and random sampling, yield projection-cost-preserving sketches, as introduced in [FSS13, CEM+15]. A projection-cost-preserving sketch is a matrix approximation which, for a given parameter $k$, approximately preserves the distance of the target matrix to all $k$-dimensional subspaces. Such sketches have applications to scalable algorithms for linear algebra, data science, and machine learning. Our goal is to simplify the presentation of proof techniques introduced in [CEM+15] and [CMM17] so that they can serve as a guide for future work. We also refer the reader to [CYD19], which gives a similar simplified exposition of the proof covered in Section 2.

preprint2020arXiv

The Statistical Cost of Robust Kernel Hyperparameter Tuning

This paper studies the statistical complexity of kernel hyperparameter tuning in the setting of active regression under adversarial noise. We consider the problem of finding the best interpolant from a class of kernels with unknown hyperparameters, assuming only that the noise is square-integrable. We provide finite-sample guarantees for the problem, characterizing how increasing the complexity of the kernel class increases the complexity of learning kernel hyperparameters. For common kernel classes (e.g. squared-exponential kernels with unknown lengthscale), our results show that hyperparameter optimization increases sample complexity by just a logarithmic factor, in comparison to the setting where optimal parameters are known in advance. Our result is based on a subsampling guarantee for linear regression under multiple design matrices, combined with an ε-net argument for discretizing kernel parameterizations.