Source author record

Yian Yin

Yian Yin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Digital Libraries Social and Information Networks Artificial Intelligence cs.CY

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LLM hallucinations in the wild: Large-scale evidence from non-existent citations

Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estimate of 146,932 hallucinated citations in 2025 alone. These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams. At the same time, hallucinated references disproportionately assign credit to already prominent and male scholars, suggesting that LLM-generated errors may reinforce existing inequities in scientific recognition. Preprint moderation and journal publication processes capture only a fraction of these errors, suggesting that the spread of hallucinated content has outpaced existing safeguards. Together, these findings demonstrate that LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.

preprint2020arXiv

Quantifying Policy Responses to a Global Emergency: Insights from the COVID-19 Pandemic

Public policy must confront emergencies that evolve in real time and in uncertain directions, yet little is known about the nature of policy response. Here we take the coronavirus pandemic as a global and extraordinarily consequential case, and study the global policy response by analyzing a novel dataset recording policy documents published by government agencies, think tanks, and intergovernmental organizations (IGOs) across 114 countries (37,725 policy documents from Jan 2nd through May 26th 2020). Our analyses reveal four primary findings. (1) Global policy attention to COVID-19 follows a remarkably similar trajectory as the total confirmed cases of COVID-19, yet with evolving policy focus from public health to broader social issues. (2) The COVID-19 policy frontier disproportionately draws on the latest, peer-reviewed, and high-impact scientific insights. Moreover, policy documents that cite science appear especially impactful within the policy domain. (3) The global policy frontier is primarily interconnected through IGOs, such as the WHO, which produce policy documents that are central to the COVID19 policy network and draw especially strongly on scientific literature. Removing IGOs' contributions fundamentally alters the global policy landscape, with the policy citation network among government agencies increasingly fragmented into many isolated clusters. (4) Countries exhibit highly heterogeneous policy attention to COVID-19. Most strikingly, a country's early policy attention to COVID-19 shows a surprising degree of predictability for the country's subsequent deaths. Overall, these results uncover fundamental patterns of policy interactions and, given the consequential nature of emergent threats and the paucity of quantitative approaches to understand them, open up novel dimensions for assessing and effectively coordinating global and local responses to COVID-19 and beyond.

preprint2020arXiv

Quantifying the Immediate Effects of the COVID-19 Pandemic on Scientists

The COVID-19 pandemic has undoubtedly disrupted the scientific enterprise, but we lack empirical evidence on the nature and magnitude of these disruptions. Here we report the results of a survey of approximately 4,500 Principal Investigators (PIs) at U.S.- and Europe-based research institutions. Distributed in mid-April 2020, the survey solicited information about how scientists' work changed from the onset of the pandemic, how their research output might be affected in the near future, and a wide range of individuals' characteristics. Scientists report a sharp decline in time spent on research on average, but there is substantial heterogeneity with a significant share reporting no change or even increases. Some of this heterogeneity is due to field-specific differences, with laboratory-based fields being the most negatively affected, and some is due to gender, with female scientists reporting larger declines. However, among the individuals' characteristics examined, the largest disruptions are connected to a usually unobserved dimension: childcare. Reporting a young dependent is associated with declines similar in magnitude to those reported by the laboratory-based fields and can account for a significant fraction of gender differences. Amidst scarce evidence about the role of parenting in scientists' work, these results highlight the fundamental and heterogeneous ways this pandemic is affecting the scientific workforce, and may have broad relevance for shaping responses to the pandemic's effect on science and beyond.

preprint2020arXiv

Scientific elite revisited: Patterns of productivity, collaboration, authorship and impact

Throughout history, a relatively small number of individuals have made a profound and lasting impact on science and society. Despite long-standing, multi-disciplinary interests in understanding careers of elite scientists, there have been limited attempts for a quantitative, career-level analysis. Here, we leverage a comprehensive dataset we assembled, allowing us to trace the entire career histories of nearly all Nobel laureates in physics, chemistry, and physiology or medicine over the past century. We find that, although Nobel laureates were energetic producers from the outset, producing works that garner unusually high impact, their careers before winning the prize follow relatively similar patterns as ordinary scientists, being characterized by hot streaks and increasing reliance on collaborations. We also uncovered notable variations along their careers, often associated with the Nobel prize, including shifting coauthorship structure in the prize-winning work, and a significant but temporary dip in the impact of work they produce after winning the Nobel. Together, these results document quantitative patterns governing the careers of scientific elites, offering an empirical basis for a deeper understanding of the hallmarks of exceptional careers in science.

preprint2019arXiv

Quantifying dynamics of failure across science, startups, and security

Human achievements are often preceded by repeated attempts that initially fail, yet little is known about the mechanisms governing the dynamics of failure. Here, building on the rich literature on innovation, human dynamics and learning, we develop a simple one-parameter model that mimics how successful future attempts build on those past. Analytically solving this model reveals a phase transition that separates dynamics of failure into regions of stagnation or progression, predicting that near the critical threshold, agents who share similar characteristics and learning strategies may experience fundamentally different outcomes following failures. Below the critical point, we see those who explore disjoint opportunities without a pattern of improvement, and above it, those who exploit incremental refinements to systematically advance toward success. The model makes several empirically testable predictions, demonstrating that those who eventually succeed and those who do not may be initially similar, yet are characterized by fundamentally distinct failure dynamics in terms of the efficiency and quality of each subsequent attempt. We collected large-scale data from three disparate domains, tracing repeated attempts by (i) NIH investigators to fund their research, (ii) innovators to successfully exit their startup ventures, and (iii) terrorist organizations to post casualties in violent attacks, finding broadly consistent empirical support across all three domains. Together, our findings unveil identifiable yet previously unknown early signals that allow us to identify failure dynamics that will lead to ultimate victory or defeat. Given the ubiquitous nature of failures and the paucity of quantitative approaches to understand them, these results represent a crucial step toward deeper understanding of the complex dynamics beneath failures, the essential prerequisites for success.

Yian Yin

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

LLM hallucinations in the wild: Large-scale evidence from non-existent citations

Quantifying Policy Responses to a Global Emergency: Insights from the COVID-19 Pandemic

Quantifying the Immediate Effects of the COVID-19 Pandemic on Scientists

Scientific elite revisited: Patterns of productivity, collaboration, authorship and impact

Quantifying dynamics of failure across science, startups, and security