Researcher profile

Paul Ginsparg

Paul Ginsparg contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

LLM hallucinations in the wild: Large-scale evidence from non-existent citations

Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estimate of 146,932 hallucinated citations in 2025 alone. These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams. At the same time, hallucinated references disproportionately assign credit to already prominent and male scholars, suggesting that LLM-generated errors may reinforce existing inequities in scientific recognition. Preprint moderation and journal publication processes capture only a fraction of these errors, suggesting that the spread of hallucinated content has outpaced existing safeguards. Together, these findings demonstrate that LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.

preprint2021arXiv

Attention-based Quantum Tomography

With rapid progress across platforms for quantum systems, the problem of many-body quantum state reconstruction for noisy quantum states becomes an important challenge. Recent works found promise in recasting the problem of quantum state reconstruction to learning the probability distribution of quantum state measurement vectors using generative neural network models. Here we propose the "Attention-based Quantum Tomography" (AQT), a quantum state reconstruction using an attention mechanism-based generative network that learns the mixed state density matrix of a noisy quantum state. The AQT is based on the model proposed in "Attention is all you need" by Vishwani et al (2017) that is designed to learn long-range correlations in natural language sentences and thereby outperform previous natural language processing models. We demonstrate not only that AQT outperforms earlier neural-network-based quantum state reconstruction on identical tasks but that AQT can accurately reconstruct the density matrix associated with a noisy quantum state experimentally realized in an IBMQ quantum computer. We speculate the success of the AQT stems from its ability to model quantum entanglement across the entire quantum system much as the attention model for natural language processing captures the correlations among words in a sentence.

preprint2021arXiv

Experimental error mitigation using linear rescaling for variational quantum eigensolving with up to 20 qubits

Quantum computers have the potential to help solve a range of physics and chemistry problems, but noise in quantum hardware currently limits our ability to obtain accurate results from the execution of quantum-simulation algorithms. Various methods have been proposed to mitigate the impact of noise on variational algorithms, including several that model the noise as damping expectation values of observables. In this work, we benchmark various methods, including a new method proposed here. We compare their performance in estimating the ground-state energies of several instances of the 1D mixed-field Ising model using the variational-quantum-eigensolver algorithm with up to 20 qubits on two of IBM's quantum computers. We find that several error-mitigation techniques allow us to recover energies to within 10% of the true values for circuits containing up to about 25 ansatz layers, where each layer consists of CNOT gates between all neighboring qubits and Y-rotations on all qubits.

preprint2020arXiv

Sensitivity of collective outcomes identifies pivotal components

A social system is susceptible to perturbation when its collective properties depend sensitively on a few pivotal components. Using the information geometry of minimal models from statistical physics, we develop an approach to identify pivotal components to which coarse-grained, or aggregate, properties are sensitive. As an example, we introduce our approach on a reduced toy model with a median voter who always votes in the majority. The sensitivity of majority-minority divisions to changing voter behaviour pinpoints the unique role of the median. More generally, the sensitivity identifies pivotal components that precisely determine collective outcomes generated by a complex network of interactions. Using perturbations to target pivotal components in the models, we analyse datasets from political voting, finance and Twitter. Across these systems, we find remarkable variety, from systems dominated by a median-like component to those whose components behave more equally. In the context of political institutions such as courts or legislatures, our methodology can help describe how changes in voters map to new collective voting outcomes. For economic indices, differing system response reflects varying fiscal conditions across time. Thus, our information-geometric approach provides a principled, quantitative framework that may help assess the robustness of collective outcomes to targeted perturbation and compare social institutions, or even biological networks, with one another and across time.