Researcher profile

Robert C. Wilson

Robert C. Wilson contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Post-training makes large language models less human-like

Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base models continue to improve. Finally, we find that persona-induction -- a popular technique for eliciting human-like behavior by conditioning models on participant-specific information -- does not improve predictions at the level of individuals. Taken together, our results suggest that the very processes that are currently employed to turn LLMs into useful assistants also make them less accurate models of human behavior.

preprint2026arXiv

The Position Curse: LLMs Struggle to Locate the Last Few Items in a List

Modern large language models (LLMs) can find a needle in a haystack (locating a single relevant fact buried among hundreds of thousands of irrelevant tokens) with near-saturated accuracy, yet fail to retrieve the last few items in a short list. We call this failure the Position Curse. For instance, even in a two-line code snippet, Claude Opus 4.6 misidentifies the second-to-last line most of the time. To characterize this failure, we evaluated two complementary queries: given a position in a sequence (of letters or words), retrieve the corresponding item; and given an item, return its position. Each position is specified as a forward or backward offset from an anchor, either an endpoint of the list (its start or end) or another item in the list. Across both open-source and frontier closed-source models, backward retrieval substantially lags forward retrieval. To test whether this capability can be rescued by post-training, we constructed PosBench, a position-focused training dataset. LoRA fine-tuning improves both forward and backward retrieval and generalizes to a held-out code-understanding benchmark (PyIndex), yet absolute performance remains far from saturated. As LLM coding agents increasingly operate over large codebases where precise indexing becomes essential for code understanding and editing, position-based retrieval emerges as a key capability for future pretraining objectives and model design.

preprint2026arXiv

Think-Aloud Reshapes Automated Cognitive Model Discovery Beyond Behavior

Computational cognitive models discovered using large language models have so far relied solely on behavioral data. However, it is well-known that models produced from the behavioral trajectory alone are typically under-determined. In this work, we explore the use of Think Aloud traces as an additional form of data constraint during automated model discovery. When applied to the domain of risky decision-making, we find that the models discovered with think-aloud achieve significantly improved predictive performance on held-out data. Additionally, we find that the discovered models belong to different structural classes than those discovered from behavior alone for the majority of participants (69.4\%), specifically, it shifts from Explicit comparator towards Integrated utility. These results suggest that process-level language data not only improve model fit, but also systematically reshape the structure of the discovered cognitive models, enabling the identification of mechanisms that are not recoverable from behavior alone.

preprint2021arXiv

Human Inference in Changing Environments With Temporal Structure

To make informed decisions in natural environments that change over time, humans must update their beliefs as new observations are gathered. Studies exploring human inference as a dynamical process that unfolds in time have focused on situations in which the statistics of observations are history-independent. Yet temporal structure is everywhere in nature, and yields history-dependent observations. Do humans modify their inference processes depending on the latent temporal statistics of their observations? We investigate this question experimentally and theoretically using a change-point inference task. We show that humans adapt their inference process to fine aspects of the temporal structure in the statistics of stimuli. As such, humans behave qualitatively in a Bayesian fashion, but, quantitatively, deviate away from optimality. Perhaps more importantly, humans behave suboptimally in that their responses are not deterministic, but variable. We show that this variability itself is modulated by the temporal statistics of stimuli. To elucidate the cognitive algorithm that yields this behavior, we investigate a broad array of existing and new models that characterize different sources of suboptimal deviations away from Bayesian inference. While models with 'output noise' that corrupts the response-selection process are natural candidates, human behavior is best described by sampling-based inference models, in which the main ingredient is a compressed approximation of the posterior, represented through a modest set of random samples and updated over time. This result comes to complement a growing literature on sample-based representation and learning in humans.