Paper detail

Post-training makes large language models less human-like

Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base models continue to improve. Finally, we find that persona-induction -- a popular technique for eliciting human-like behavior by conditioning models on participant-specific information -- does not improve predictions at the level of individuals. Taken together, our results suggest that the very processes that are currently employed to turn LLMs into useful assistants also make them less accurate models of human behavior.

preprint2026arXivOpen access

Marcel Binz Elif Akata Abdullah Almaatouq Mohammed Alsobay Oleksii Ariasov Franziska Brändle David Broska Jason W. Burton Nuno Busch Frederick Callaway Vanessa Cheung Brian Christian Julian Coda-Forno Can Demircan Vittoria Dentella Maria K. Eckstein Noémi Éltető Michael Franke Thomas L. Griffiths Fritz Günther Susanne Haridi Sebastian Hellmann Stefan Herytash Linus Hof Eleanor Holton Isabelle Hoxha Zak Hussain Akshay Jagadish Elif Kara Valentin Kriegmair Evelina Leivada Li Ji-An Tobias Ludwig Maximilian Maier Marcelo G. Mattar Marvin Mathony Alireza Modirshanechi Robin Na Mariia Nadverniuk Antonios Nasioulas Surabhi S. Nath Helen Niemeyer Kate Nussenbaum Sebastian Olschewski Thorsten Pachur Stefano Palminteri Aliona Petrenco Camille V. Phaneuf-Hadd Angelo Pirrone Manuel Rausch Laura Raveling Shashank Reddy Milena Rmus Evan M. Russek Tankred Saanum Kai Sandbrink Louis Schiekiera Johannes A. Schubert Luca M. Schulze Buschoff Nishad Singhi Leah H. Somerville Mikhail S. Spektor Xin Sui Christopher Summerfield Mirko Thalmann Anna I. Thoma Taisiia Tikhomirova Vuong Truong Polina Tsvilodub Konstantinos Voudouris Robert C. Wilson Kristin Witte Shuchen Wu Dirk U. Wulff Hua-Dong Xiong Songlin Xu Lance Ying Xinyu Zhang Jian-Qiao Zhu Eric Schulz

Computation and Language Machine Learning Artificial Intelligence

Open graph Reviews Discussion

Signal facts

What is known right now

Open access80 authors3 topics

Imported metadata coverageMissing code, dataset, citation and institution fields are tracked without dominating the paper.Details

Citations: 0Reviews: 0Saves: 0Code: not linkedDataset: not linkedInstitutions: 0

Next steps

Decide what to do with this paper

Like0 Dislike0Score 0

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Save to reading list0

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Authors

Institutions

No institution affiliation has been imported for this paper yet.

Add specific reaction

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.

Post-training makes large language models less human-like

What is known right now

Decide what to do with this paper

Keep the important context close to the paper

Authors

Institutions

Research map

Building this map preview

0 review(s)

0 comment(s)