Researcher profile

James A. Evans

James A. Evans contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Missing vs. Unused Knowledge Hypothesis for Language Model Bottlenecks in Patent Understanding

While large language models (LLMs) excel at factual recall, the real challenge lies in knowledge application. A gap persists between their ability to answer complex questions and their effectiveness in performing tasks that require that knowledge. We investigate this gap using a patent classification problem that requires deep conceptual understanding to distinguish semantically similar but objectively different patents written in dense, strategic technical language. We find that LLMs often struggle with this distinction. To diagnose the source of these failures, we introduce a framework that decomposes model errors into two categories: missing knowledge and unused knowledge. Our method prompts models to generate clarifying questions and compares three settings -- raw performance, self-answered questions that activate internal knowledge, and externally provided answers that supply missing knowledge (if any). We show that most errors stem from failures to deploy existing knowledge rather than from true knowledge gaps. We also examine how models differ in constructing task-specific question-answer databases. Smaller models tend to generate simpler questions that they, and other models, can retrieve and use effectively, whereas larger models produce more complex questions that are less effective, suggesting complementary strengths across model scales. Together, our findings highlight that shifting evaluation from static fact recall to dynamic knowledge application offers a more informative view of model capabilities.

preprint2023arXiv

Disrupted Routines Anticipate Musical Exploration

Prior research suggests that taste preferences relate to personality traits, values, shifts in mood, and immigration destination, but understanding everyday patterns of listening and the function music plays in life have remained elusive, despite speculations that musical nostalgia may compensate for local disruption. Using more than a hundred million streams of 4 million songs by tens of thousands of international listeners from a global music service catering to local tastes, here we show that breaches in personal routine are systematically associated with personal musical exploration. As people visited new cities and countries, their preferences diversified, converging towards their destinations. As people experienced COVID-19 lock-downs, and then again when they experienced reopenings, their preferences diversified further.

preprint2021arXiv

Policy-Aware Mobility Model Explains the Growth of COVID-19 in Cities

With the continued spread of coronavirus, the task of forecasting distinctive COVID-19 growth curves in different cities, which remain inadequately explained by standard epidemiological models, is critical for medical supply and treatment. Predictions must take into account non-pharmaceutical interventions to slow the spread of coronavirus, including stay-at-home orders, social distancing, quarantine and compulsory mask-wearing, leading to reductions in intra-city mobility and viral transmission. Moreover, recent work associating coronavirus with human mobility and detailed movement data suggest the need to consider urban mobility in disease forecasts. Here we show that by incorporating intra-city mobility and policy adoption into a novel metapopulation SEIR model, we can accurately predict complex COVID-19 growth patterns in U.S. cities ($R^2$ = 0.990). Estimated mobility change due to policy interventions is consistent with empirical observation from Apple Mobility Trends Reports (Pearson's R = 0.872), suggesting the utility of model-based predictions where data are limited. Our model also reproduces urban "superspreading", where a few neighborhoods account for most secondary infections across urban space, arising from uneven neighborhood populations and heightened intra-city churn in popular neighborhoods. Therefore, our model can facilitate location-aware mobility reduction policy that more effectively mitigates disease transmission at similar social cost. Finally, we demonstrate our model can serve as a fine-grained analytic and simulation framework that informs the design of rational non-pharmaceutical interventions policies.

preprint2020arXiv

Human Evaluation of Interpretability: The Case of AI-Generated Music Knowledge

Interpretability of machine learning models has gained more and more attention among researchers in the artificial intelligence (AI) and human-computer interaction (HCI) communities. Most existing work focuses on decision making, whereas we consider knowledge discovery. In particular, we focus on evaluating AI-discovered knowledge/rules in the arts and humanities. From a specific scenario, we present an experimental procedure to collect and assess human-generated verbal interpretations of AI-generated music theory/rules rendered as sophisticated symbolic/numeric objects. Our goal is to reveal both the possibilities and the challenges in such a process of decoding expressive messages from AI sources. We treat this as a first step towards 1) better design of AI representations that are human interpretable and 2) a general methodology to evaluate interpretability of AI-discovered knowledge representations.

preprint2020arXiv

Too many cooks: Bayesian inference for coordinating multi-agent collaboration

Collaboration requires agents to coordinate their behavior on the fly, sometimes cooperating to solve a single task together and other times dividing it up into sub-tasks to work on in parallel. Underlying the human ability to collaborate is theory-of-mind, the ability to infer the hidden mental states that drive others to act. Here, we develop Bayesian Delegation, a decentralized multi-agent learning mechanism with these abilities. Bayesian Delegation enables agents to rapidly infer the hidden intentions of others by inverse planning. We test Bayesian Delegation in a suite of multi-agent Markov decision processes inspired by cooking problems. On these tasks, agents with Bayesian Delegation coordinate both their high-level plans (e.g. what sub-task they should work on) and their low-level actions (e.g. avoiding getting in each other's way). In a self-play evaluation, Bayesian Delegation outperforms alternative algorithms. Bayesian Delegation is also a capable ad-hoc collaborator and successfully coordinates with other agent types even in the absence of prior experience. Finally, in a behavioral experiment, we show that Bayesian Delegation makes inferences similar to human observers about the intent of others. Together, these results demonstrate the power of Bayesian Delegation for decentralized multi-agent collaboration.

preprint2019arXiv

Quantifying dynamics of failure across science, startups, and security

Human achievements are often preceded by repeated attempts that initially fail, yet little is known about the mechanisms governing the dynamics of failure. Here, building on the rich literature on innovation, human dynamics and learning, we develop a simple one-parameter model that mimics how successful future attempts build on those past. Analytically solving this model reveals a phase transition that separates dynamics of failure into regions of stagnation or progression, predicting that near the critical threshold, agents who share similar characteristics and learning strategies may experience fundamentally different outcomes following failures. Below the critical point, we see those who explore disjoint opportunities without a pattern of improvement, and above it, those who exploit incremental refinements to systematically advance toward success. The model makes several empirically testable predictions, demonstrating that those who eventually succeed and those who do not may be initially similar, yet are characterized by fundamentally distinct failure dynamics in terms of the efficiency and quality of each subsequent attempt. We collected large-scale data from three disparate domains, tracing repeated attempts by (i) NIH investigators to fund their research, (ii) innovators to successfully exit their startup ventures, and (iii) terrorist organizations to post casualties in violent attacks, finding broadly consistent empirical support across all three domains. Together, our findings unveil identifiable yet previously unknown early signals that allow us to identify failure dynamics that will lead to ultimate victory or defeat. Given the ubiquitous nature of failures and the paucity of quantitative approaches to understand them, these results represent a crucial step toward deeper understanding of the complex dynamics beneath failures, the essential prerequisites for success.