Researcher profile

Siddharth Mishra-Sharma

Siddharth Mishra-Sharma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction

Autonomous language-model agents are increasingly evaluated on long-horizon tool-use tasks, but existing benchmarks rarely capture the complexity and nuance of real scientific work. To address this gap, we introduce Collider-Bench, a benchmark for evaluating whether LLM agents can reproduce experimental analyses from the Large Hadron Collider (LHC) using only public papers and open scientific software. Such analyses are often difficult to reproduce because the public toolchain only approximates the software used internally by the experimental collaborations, while the published papers inevitably omit implementation details needed for a faithful reconstruction. Agents must therefore rely on physical reasoning, domain knowledge, and trial-and-error to fill these gaps. Each task requires the agent to turn a published analysis into an executable simulation-and-selection pipeline and submit predicted collision event yields in specified signal regions. These predictions are evaluated with standard histogram metrics that provide continuous fidelity scores without a hand-written rubric. We also report the computational cost incurred by each agent per task. Finally, we evaluate the codebase and full session trace using an LLM judge to catch qualitative failure modes such as fabrications, hallucinations and duplications. We release an initial set of tasks drawn from LHC searches, together with a containerized sandbox and event simulation tools. We evaluate across a capability ladder of general purpose coding agents. Our results show that on average no agent reliably beats the physicist-in-the-loop solution.

preprint2022arXiv

A neural simulation-based inference approach for characterizing the Galactic Center $γ$-ray excess

The nature of the Fermi gamma-ray Galactic Center Excess (GCE) has remained a persistent mystery for over a decade. Although the excess is broadly compatible with emission expected due to dark matter annihilation, an explanation in terms of a population of unresolved astrophysical point sources e.g., millisecond pulsars, remains viable. The effort to uncover the origin of the GCE is hampered in particular by an incomplete understanding of diffuse emission of Galactic origin. This can lead to spurious features that make it difficult to robustly differentiate smooth emission, as expected for a dark matter origin, from more "clumpy" emission expected for a population of relatively bright, unresolved point sources. We use recent advancements in the field of simulation-based inference, in particular density estimation techniques using normalizing flows, in order to characterize the contribution of modeled components, including unresolved point source populations, to the GCE. Compared to traditional techniques based on the statistical distribution of photon counts, our machine learning-based method is able to utilize more of the information contained in a given model of the Galactic Center emission, and in particular can perform posterior parameter estimation while accounting for pixel-to-pixel spatial correlations in the gamma-ray map. This makes the method demonstrably more resilient to certain forms of model misspecification. On application to Fermi data, the method generically attributes a smaller fraction of the GCE flux to unresolved point sources when compared to traditional approaches. We nevertheless infer such a contribution to make up a non-negligible fraction of the GCE across all analysis variations considered, with at least $38^{+9}_{-19}\%$ of the excess attributed to unresolved point sources in our baseline analysis.

preprint2022arXiv

Astrophysical and Cosmological Probes of Dark Matter

While astrophysical and cosmological probes provide a remarkably precise and consistent picture of the quantity and general properties of dark matter, its fundamental nature remains one of the most significant open questions in physics. Obtaining a more comprehensive understanding of dark matter within the next decade will require overcoming a number of theoretical challenges: the groundwork for these strides is being laid now, yet much remains to be done. Chief among the upcoming challenges is establishing the theoretical foundation needed to harness the full potential of new observables in the astrophysical and cosmological domains, spanning the early Universe to the inner portions of galaxies and the stars therein. Identifying the nature of dark matter will also entail repurposing and implementing a wide range of theoretical techniques from outside the typical toolkit of astrophysics, ranging from effective field theory to the dramatically evolving world of machine learning and artificial-intelligence-based statistical inference. Through this work, the theory frontier will be at the heart of dark matter discoveries in the upcoming decade.

preprint2022arXiv

Inferring dark matter substructure with astrometric lensing beyond the power spectrum

Astrometry -- the precise measurement of positions and motions of celestial objects -- has emerged as a promising avenue for characterizing the dark matter population in our Galaxy. By leveraging recent advances in simulation-based inference and neural network architectures, we introduce a novel method to search for global dark matter-induced gravitational lensing signatures in astrometric datasets. Our method based on neural likelihood-ratio estimation shows significantly enhanced sensitivity to a cold dark matter population and more favorable scaling with measurement noise compared to existing approaches based on two-point correlation statistics. We demonstrate the real-world viability of our method by showing it to be robust to non-trivial modeled as well as unmodeled noise features expected in astrometric measurements. This establishes machine learning as a powerful tool for characterizing dark matter using astrometric data.

preprint2022arXiv

Machine Learning and Cosmology

Methods based on machine learning have recently made substantial inroads in many corners of cosmology. Through this process, new computational tools, new perspectives on data collection, model development, analysis, and discovery, as well as new communities and educational pathways have emerged. Despite rapid progress, substantial potential at the intersection of cosmology and machine learning remains untapped. In this white paper, we summarize current and ongoing developments relating to the application of machine learning within cosmology and provide a set of recommendations aimed at maximizing the scientific impact of these burgeoning tools over the coming decade through both technical development as well as the fostering of emerging communities.

preprint2022arXiv

Snowmass2021 Cosmic Frontier White Paper: Puzzling Excesses in Dark Matter Searches and How to Resolve Them

Intriguing signals with excesses over expected backgrounds have been observed in many astrophysical and terrestrial settings, which could potentially have a dark matter origin. Astrophysical excesses include the Galactic Center GeV gamma-ray excess detected by the Fermi Gamma-Ray Space Telescope, the AMS antiproton and positron excesses, and the 511 and 3.5 keV X-ray lines. Direct detection excesses include the DAMA/LIBRA annual modulation signal, the XENON1T excess, and low-threshold excesses in solid state detectors. We discuss avenues to resolve these excesses, with actions the field can take over the next several years.

preprint2022arXiv

Snowmass2021: Vera C. Rubin Observatory as a Flagship Dark Matter Experiment

Establishing that Vera C. Rubin Observatory is a flagship dark matter experiment is an essential pathway toward understanding the physical nature of dark matter. In the past two decades, wide-field astronomical surveys and terrestrial laboratories have jointly created a phase transition in the ecosystem of dark matter models and probes. Going forward, any robust understanding of dark matter requires astronomical observations, which still provide the only empirical evidence for dark matter to date. We have a unique opportunity right now to create a dark matter experiment with Rubin Observatory Legacy Survey of Space and Time (LSST). This experiment will be a coordinated effort to perform dark matter research, and provide a large collaborative team of scientists with the necessary organizational and funding supports. This approach leverages existing investments in Rubin. Studies of dark matter with Rubin LSST will also guide the design of, and confirm the results from, other dark matter experiments. Supporting a collaborative team to carry out a dark matter experiment with Rubin LSST is the key to achieving the dark matter science goals that have already been identified as high priority by the high-energy physics and astronomy communities.

preprint2020arXiv

Characterizing the Nature of the Unresolved Point Sources in the Galactic Center

The Galactic Center Excess (GCE) of GeV gamma rays can be explained as a signal of annihilating dark matter or of emission from unresolved astrophysical sources, such as millisecond pulsars. Evidence for the latter is provided by a statistical procedure---referred to as Non-Poissonian Template Fitting (NPTF)---that distinguishes the smooth distribution of photons expected for dark matter annihilation from a "clumpy" photon distribution expected for point sources. In this paper, we perform an extensive study of the NPTF on simulated data, exploring its ability to recover the flux and luminosity function of unresolved sources at the Galactic Center. When astrophysical background emission is perfectly modeled, we find that the NPTF successfully distinguishes between the dark matter and point source hypotheses when either component makes up the entirety of the GCE. When the GCE is a mixture of dark matter and point sources, the NPTF may fail to reconstruct the correct contribution of each component. We further study the impact of mismodeling the Galactic diffuse backgrounds, finding that while a dark matter signal could be attributed to point sources in some outlying cases for the scenarios we consider, the significance of a true point source signal remains robust. Our work enables us to comment on a recent study by Leane and Slatyer (2019) that questions prior NPTF conclusions because the method does not recover an artificial dark matter signal injected on actual Fermi data. We demonstrate that the failure of the NPTF to extract an artificial dark matter signal can be natural when point sources are present in the data---with the effect further exacerbated by the presence of diffuse mismodeling---and does not on its own invalidate the conclusions of the NPTF analysis in the Inner Galaxy.

preprint2020arXiv

Foreground Mismodeling and the Point Source Explanation of the Fermi Galactic Center Excess

The Fermi Large Area Telescope has observed an excess of ~GeV energy gamma rays from the center of the Milky Way, which may arise from near-thermal dark matter annihilation. Firmly establishing the dark matter origin for this excess is however complicated by challenges in modeling diffuse cosmic-ray foregrounds as well as unresolved astrophysical sources, such as millisecond pulsars. Non-Poissonian Template Fitting (NPTF) is one statistical technique that has previously been used to show that at least some fraction of the GeV excess is likely due to a population of dim point sources. These results were recently called into question by Leane and Slatyer (2019), who showed that a synthetic dark matter annihilation signal injected on top of the real Fermi data is not recovered by the NPTF procedure. In this work, we perform a dedicated study of the Fermi data and explicitly show that the central result of Leane and Slatyer (2019) is likely driven by the fact that their choice of model for the Galactic foreground emission does not provide a sufficiently good description of the data. We repeat the NPTF analyses using a state-of-the-art model for diffuse gamma-ray emission in the Milky Way and introduce a novel statistical procedure, based on spherical-harmonic marginalization, to provide an improved description of the Galactic diffuse emission in a data-driven fashion. With these improvements, we find that the NPTF results continue to robustly favor the interpretation that the Galactic Center excess is due, in part, to unresolved astrophysical point sources across the analysis variations that we have explored.

preprint2020arXiv

Harnessing the Population Statistics of Subhalos to Search for Annihilating Dark Matter

The Milky Way's dark matter halo is expected to host numerous low-mass subhalos with no detectable associated stellar component. Such subhalos are invisible unless their dark matter annihilates to visible states such as photons. One of the established methods for identifying candidate subhalos is to search for individual unassociated gamma-ray sources with properties consistent with the dark matter expectation. However, robustly ruling out an astrophysical origin for any such candidate is challenging. In this work, we present a complementary approach that harnesses information about the entire population of subhalos---such as their spatial and mass distribution in the Galaxy---to search for a signal of annihilating dark matter. Using simulated data, we show that the collective emission from subhalos can imprint itself in a unique way on the statistics of observed photons, even when individual subhalos may be too dim to be resolved on their own. Additionally, we demonstrate that, for the models we consider, the signal can be identified even in the face of unresolved astrophysical point-source emission of extragalactic and Galactic origin. This establishes a new search technique for subhalos that is complementary to established methods, and that could have important ramifications for gamma-ray dark matter searches using observatories such as the Fermi Large Area Telescope and the Cherenkov Telescope Array.

preprint2020arXiv

The Power of Halometry

Astrometric weak gravitational lensing is a powerful probe of the distribution of matter on sub-Galactic scales, which harbor important information about the fundamental nature of dark matter. We propose a novel method that utilizes angular power spectra to search for the correlated pattern of apparent motions of celestial objects induced from time-dependent lensing by a population of Galactic subhalos. Application of this method to upcoming astrometric datasets will allow for the direct measurement of the properties of Galactic substructure, with implications for the underlying particle physics. We show that, with near-future astrometric observations, it may be possible to statistically detect populations of cold dark matter subhalos, compact objects, as well as density fluctuations sourced by scalar field dark matter. Currently-unconstrained parameter space will already be accessible using upcoming data from the ongoing Gaia mission.

preprint2019arXiv

Mining for Dark Matter Substructure: Inferring subhalo population properties from strong lenses with machine learning

The subtle and unique imprint of dark matter substructure on extended arcs in strong lensing systems contains a wealth of information about the properties and distribution of dark matter on small scales and, consequently, about the underlying particle physics. However, teasing out this effect poses a significant challenge since the likelihood function for realistic simulations of population-level parameters is intractable. We apply recently-developed simulation-based inference techniques to the problem of substructure inference in galaxy-galaxy strong lenses. By leveraging additional information extracted from the simulator, neural networks are efficiently trained to estimate likelihood ratios associated with population-level parameters characterizing substructure. Through proof-of-principle application to simulated data, we show that these methods can provide an efficient and principled way to simultaneously analyze an ensemble of strong lenses, and can be used to mine the large sample of lensing images deliverable by near-future surveys for signatures of dark matter substructure.