Researcher profile

Carey Priebe

Carey Priebe contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Black-box model classification under the discriminative factorization

Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based on the relationship between their embedded responses to a set of queries are useful for inferring model-level properties, the quality of these representations is highly sensitive to the query set. We introduce the \emph{discriminative factorization} to distinguish between high- and low-quality query sets in the context of black-box model-level classification. Under this framework, the probability of chance-level classification decays exponentially in the query budget. On three auditing tasks, estimated factorization parameters predict the empirical performance decay rate. We conclude by showing that query sets selected using the estimated discriminative field reproduce the empirical ordering of oracle query sets.

preprint2026arXiv

Control Charts for Multi-agent Systems

Generative agents have proven to be powerful assistants in a wide variety of contexts. Given this success, users are now deploying agents with minimal restrictions in open ended, multi-agent environments. Current methods for monitoring the dynamics of open-ended multi-agent systems are limited to qualitative inspection. In this paper, we extend the process-theoretic notion of adaptive control charts to multi-agent systems to enable automated monitoring. Using simulation, we demonstrate that adaptive control charts are necessary for monitoring multi-agent systems that can learn from their environment. We further demonstrate, both empirically and theoretically, that adaptive control charts are susceptible to adversarial agents that defect sufficiently slowly. These results illustrate a fundamental tradeoff in multi-agent system control: either agents in a system cannot learn or the system is susceptible to adversaries.

preprint2026arXiv

Query-efficient model evaluation using cached responses

Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice, responses from previously-evaluated models are often cached -- creating a potential opportunity to use this additional information to decrease the number of queries required to accurately evaluate a new model. In this paper, we introduce an approach for predicting benchmark performance that leverages cached model responses based on the Data Kernel Perspective Space (DKPS), a method for quantifying the relationship between models in the black-box setting. Theoretically, we show that DKPS-based methods are query-efficient under certain conditions. Empirically, we demonstrate that DKPS-based methods achieve the same mean absolute error as baselines with a substantially decreased query budget. We conclude by proposing an offline method for selecting a set of queries that maximizes the goodness-of-fit on reference models, improving prediction accuracy over random query selection.

preprint2023arXiv

Spectral embedding of weighted graphs

When analyzing weighted networks using spectral embedding, a judicious transformation of the edge weights may produce better results. To formalize this idea, we consider the asymptotic behavior of spectral embedding for different edge-weight representations, under a generic low rank model. We measure the quality of different embeddings -- which can be on entirely different scales -- by how easy it is to distinguish communities, in an information-theoretic sense. For common types of weighted graphs, such as count networks or p-value networks, we find that transformations such as tempering or thresholding can be highly beneficial, both in theory and in practice.

preprint2022arXiv

Entrywise Estimation of Singular Vectors of Low-Rank Matrices with Heteroskedasticity and Dependence

We propose an estimator for the singular vectors of high-dimensional low-rank matrices corrupted by additive subgaussian noise, where the noise matrix is allowed to have dependence within rows and heteroskedasticity between them. We prove finite-sample $\ell_{2,\infty}$ bounds and a Berry-Esseen theorem for the individual entries of the estimator, and we apply these results to high-dimensional mixture models. Our Berry-Esseen theorem clearly shows the geometric relationship between the signal matrix, the covariance structure of the noise, and the distribution of the errors in the singular vector estimation task. These results are illustrated in numerical simulations. Unlike previous results of this type, which rely on assumptions of gaussianity or independence between the entries of the additive noise, handling the dependence between entries in the proofs of these results requires careful leave-one-out analysis and conditioning arguments. Our results depend only on the signal-to-noise ratio, the sample size, and the spectral properties of the signal matrix.

preprint2020arXiv

The exact form of the 'Ockham factor' in model selection

We explore the arguments for maximizing the `evidence' as an algorithm for model selection. We show, using a new definition of model complexity which we term `flexibility', that maximizing the evidence should appeal to both Bayesian and Frequentist statisticians. This is due to flexibility's unique position in the exact decomposition of log-evidence into log-fit minus flexibility. In the Gaussian linear model, flexibility is asymptotically equal to the Bayesian Information Criterion (BIC) penalty, but we caution against using BIC in place of flexibility for model selection.

preprint2020arXiv

Vertex nomination: The canonical sampling and the extended spectral nomination schemes

Suppose that one particular block in a stochastic block model is of interest, but block labels are only observed for a few of the vertices in the network. Utilizing a graph realized from the model and the observed block labels, the vertex nomination task is to order the vertices with unobserved block labels into a ranked nomination list with the goal of having an abundance of interesting vertices near the top of the list. There are vertex nomination schemes in the literature, including the optimally precise canonical nomination scheme~$\mathcal{L}^C$ and the consistent spectral partitioning nomination scheme~$\mathcal{L}^P$. While the canonical nomination scheme $\mathcal{L}^C$ is provably optimally precise, it is computationally intractable, being impractical to implement even on modestly sized graphs. With this in mind, an approximation of the canonical scheme---denoted the {\it canonical sampling nomination scheme} $\mathcal{L}^{CS}$---is introduced; $\mathcal{L}^{CS}$ relies on a scalable, Markov chain Monte Carlo-based approximation of $\mathcal{L}^{C}$, and converges to $\mathcal{L}^{C}$ as the amount of sampling goes to infinity. The spectral partitioning nomination scheme is also extended to the {\it extended spectral partitioning nomination scheme}, $\mathcal{L}^{EP}$, which introduces a novel semisupervised clustering framework to improve upon the precision of $\mathcal{L}^P$. Real-data and simulation experiments are employed to illustrate the precision of these vertex nomination schemes, as well as their empirical computational complexity. Keywords: vertex nomination, Markov chain Monte Carlo, spectral partitioning, Mclust MSC[2010]: 60J22, 65C40, 62H30, 62H25