Researcher profile

Jing Luo

Jing Luo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Can Coding Agents Reproduce Findings in Computational Materials Science?

Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational scientific workflows, where tasks require not only strong coding ability, but also the ability to navigate complex, domain-specific procedures and to interpret results in the context of scientific claims. To address this question, we present AutoMat, a benchmark for evaluating LLM-based agents' ability to reproduce claims from computational materials science. AutoMat poses three interrelated challenges: recovering underspecified computational procedures, navigating specialized toolchains, and determining whether the resulting evidence supports a claim. By working closely with subject matter experts, we curate a set of claims from real materials science papers to test whether coding agents can recover and execute the end-to-end workflow needed to support (or undermine) such claims. We then evaluate multiple representative coding agent settings across several foundation models. Our results show that current LLM-based agents obtain low overall success rates on AutoMat, with the best-performing setting achieving a success rate of only 54.1%. Error analysis further reveals that agents perform worst when workflows must be reconstructed from paper text alone and that they fail primarily due to incomplete procedures, methodological deviations, and execution fragility. Taken together, these findings position AutoMat as both a benchmark for computational scientific reproducibility and a tool for diagnosing the current limitations of agentic systems in AI-for-science settings.

preprint2026arXiv

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

Traditional workflow-based agents exhibit limited intelligence when addressing real-world problems requiring tool invocation. Tool-integrated reasoning (TIR) agents capable of autonomous reasoning and tool invocation are rapidly emerging as a powerful approach for complex decision-making tasks involving multi-step interactions with external environments. In this work, we introduce MindWatcher, a TIR agent integrating interleaved thinking and multimodal chain-of-thought (CoT) reasoning. MindWatcher can autonomously decide whether and how to invoke diverse tools and coordinate their use, without relying on human prompts or workflows. The interleaved thinking paradigm enables the model to switch between thinking and tool calling at any intermediate stage, while its multimodal CoT capability allows manipulation of images during reasoning to yield more precise search results. We implement automated data auditing and evaluation pipelines, complemented by manually curated high-quality datasets for training, and we construct a benchmark, called MindWatcher-Evaluate Bench (MWE-Bench), to evaluate its performance. MindWatcher is equipped with a comprehensive suite of auxiliary reasoning tools, enabling it to address broad-domain multimodal problems. A large-scale, high-quality local image retrieval database, covering eight categories including cars, animals, and plants, endows model with robust object recognition despite its small size. Finally, we design a more efficient training infrastructure for MindWatcher, enhancing training speed and hardware utilization. Experiments not only demonstrate that MindWatcher matches or exceeds the performance of larger or more recent models through superior tool invocation, but also uncover critical insights for agent training, such as the genetic inheritance phenomenon in agentic RL.

preprint2026arXiv

Song Aesthetics Evaluation with Multi-Stem Attention and Hierarchical Uncertainty Modeling

Music generative artificial intelligence (AI) is rapidly expanding music content, necessitating automated song aesthetics evaluation. However, existing studies largely focus on speech, audio or singing quality, leaving song aesthetics underexplored. Moreover, conventional approaches often predict a precise Mean Opinion Score (MOS) value directly, which struggles to capture the nuances of human perception in song aesthetics evaluation. This paper proposes a song-oriented aesthetics evaluation framework, featuring two novel modules: 1) Multi-Stem Attention Fusion (MSAF) builds bidirectional cross-attention between mixture-vocal and mixture-accompaniment pairs, fusing them to capture complex musical features; 2) Hierarchical Granularity-Aware Interval Aggregation (HiGIA) learns multi-granularity score probability distributions, aggregates them into a score interval, and applies a regression within the interval to produce the final score. We evaluated on two datasets of full-length songs: SongEval dataset (AI-generated) and an internal aesthetics dataset (human-created), and compared with two state-of-the-art (SOTA) models. Results show that the proposed method achieves stronger performance for multi-dimensional song aesthetics evaluation.

preprint2026arXiv

The NANOGrav 15 yr Data Set: Piecewise Power-Law Reconstruction of the Gravitational-Wave Background

The NANOGrav 15-year (NG15) data set provides evidence for a gravitational-wave background (GWB) signal at nanohertz frequencies, which is expected to originate either from a cosmic population of inspiraling supermassive black-hole binaries or new particle physics in the early Universe. A firm identification of the source of the NG15 signal requires an accurate reconstruction of its frequency spectrum. In this paper, we provide such a spectral characterization of the NG15 signal based on a piecewise power-law (PPL) ansatz that strikes a balance between existing alternatives in the literature. Our PPL reconstruction is more flexible than the standard constant-power-law model, which describes the GWB spectrum in terms of only two parameters: an amplitude A and a spectral index gamma. Concurrently, it better approximates physically realistic GWB spectra -- especially those of cosmological origin -- than the free spectral model, since the latter allows for arbitrary variations in the GWB amplitude from one frequency bin to the next. Our PPL reconstruction of the NG15 signal relies on individual PPL models with a fixed number of internal nodes (i.e., constant power law, broken power law, doubly broken power law, etc.) that are ultimately combined in a Bayesian model average. The data products resulting from our analysis provide the basis for fast refits of spectral GWB models.

preprint2022arXiv

ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition

Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module (SFLM) to obtain spatial features across different levels. Then, these features are fed into squeeze-and-excitation (SE) attention-based temporal feature learning module (TFLM) to get multi-level emotion-related spatial-temporal features (ESTFs), which can discriminate emotions well in the final emotion space. In addition, a novel data processing is devised to cut the single-channel input into multi-channel to improve calculative efficiency while ensuring the quality of MER. Experiments show that our proposed method achieves 10.43% and 4.82% relative improvement of valence and arousal respectively on the R2 score compared to the state-of-the-art model, meanwhile, performs better on datasets with distinct scales and in multi-task learning.

preprint2022arXiv

Carrier mobilities of Janus transition metal dichalcogenides monolayers studied by Born effective charge and first-principles calculation

Two-dimensional (2D) Janus transition metal dichalcogenides (TMDs) are a new class of materials with unique physical properties. However, the carrier mobility of most Janus TMDs calculated by deformation potential theory (DPT) is not reliable due to the unconsidered part of lattice scattering. In this work, we propose a new method of Born effective charge (BEC) to calculate the carrier mobility of Janus TMDs by including the important factors that neglected in the DPT. The BEC could be used in the calculation of both pure and defective Janus TMDs by employing density functional perturbation theory. We have figured out the relationship between the carrier mobility and the value of BEC, which is the lower the absolute BEC, the higher the electron or hole mobility. Using the new method, we have calculated the carrier mobility of commonly studied Janus TMDs with and without defect. The method may shed light on the high-throughout calculation of selecting high carrier mobility 2D materials.

preprint2022arXiv

Localizing FRBs through VLBI with the Algonquin Radio Observatory 10-m Telescope

The CHIME/FRB experiment has detected thousands of Fast Radio Bursts (FRBs) due to its sensitivity and wide field of view; however, its low angular resolution prevents it from localizing events to their host galaxies. Very Long Baseline Interferometry (VLBI), triggered by FRB detections from CHIME/FRB will solve the challenge of localization for non-repeating events. Using a refurbished 10-m radio dish at the Algonquin Radio Observatory located in Ontario Canada, we developed a testbed for a VLBI experiment with a theoretical ~<30 masec precision. We provide an overview of the 10-m system and describe its refurbishment, the data acquisition, and a procedure for fringe fitting that simultaneously estimates the geometric delay used for localization and the dispersive delay from the ionosphere. Using single pulses from the Crab pulsar, we validate the system and localization procedure, and analyze the clock stability between sites, which is critical for phase-referencing an FRB event. We find a localization of 50 masec is possible with the performance of the current system. Furthermore, for sources with insufficient signal or restricted wideband to simultaneously measure both geometric and ionospheric delays, we show that the differential ionospheric contribution between the two sites must be measured to a precision of 1e-8 pc/cc to provide a reasonable localization from a detection in the 400--800 MHz band. Finally we show detection of an FRB observed simultaneously in the CHIME and the Algonquin 10-m telescope, the first FRB cross-correlated in this very long baseline. This project serves as a testbed for the forthcoming CHIME/FRB Outriggers project.

preprint2022arXiv

Searching For Gravitational Waves From Cosmological Phase Transitions With The NANOGrav 12.5-year dataset

We search for a first-order phase transition gravitational wave signal in 45 pulsars from the NANOGrav 12.5 year dataset. We find that the data can be modeled in terms of a strong first order phase transition taking place at temperatures below the electroweak scale. However, we do not observe any strong preference for a phase-transition interpretation of the signal over the standard astrophysical interpretation in terms of supermassive black holes mergers; but we expect to gain additional discriminating power with future datasets, improving the signal to noise ratio and extending the sensitivity window to lower frequencies. An interesting open question is how well gravitational wave observatories could separate such signals.

preprint2021arXiv

The NANOGrav 12.5-year Data Set: Search For An Isotropic Stochastic Gravitational-Wave Background

We search for an isotropic stochastic gravitational-wave background (GWB) in the $12.5$-year pulsar timing data set collected by the North American Nanohertz Observatory for Gravitational Waves. Our analysis finds strong evidence of a stochastic process, modeled as a power-law, with common amplitude and spectral slope across pulsars. The Bayesian posterior of the amplitude for an $f^{-2/3}$ power-law spectrum, expressed as the characteristic GW strain, has median $1.92 \times 10^{-15}$ and $5\%$--$95\%$ quantiles of $1.37$--$2.67 \times 10^{-15}$ at a reference frequency of $f_\mathrm{yr} = 1 ~\mathrm{yr}^{-1}$. The Bayes factor in favor of the common-spectrum process versus independent red-noise processes in each pulsar exceeds $10,000$. However, we find no statistically significant evidence that this process has quadrupolar spatial correlations, which we would consider necessary to claim a GWB detection consistent with general relativity. We find that the process has neither monopolar nor dipolar correlations, which may arise from, for example, reference clock or solar system ephemeris systematics, respectively. The amplitude posterior has significant support above previously reported upper limits; we explain this in terms of the Bayesian priors assumed for intrinsic pulsar red noise. We examine potential implications for the supermassive black hole binary population under the hypothesis that the signal is indeed astrophysical in nature.

preprint2020arXiv

Multi-Messenger Gravitational Wave Searches with Pulsar Timing Arrays: Application to 3C66B Using the NANOGrav 11-year Data Set

When galaxies merge, the supermassive black holes in their centers may form binaries and, during the process of merger, emit low-frequency gravitational radiation in the process. In this paper we consider the galaxy 3C66B, which was used as the target of the first multi-messenger search for gravitational waves. Due to the observed periodicities present in the photometric and astrometric data of the source of the source, it has been theorized to contain a supermassive black hole binary. Its apparent 1.05-year orbital period would place the gravitational wave emission directly in the pulsar timing band. Since the first pulsar timing array study of 3C66B, revised models of the source have been published, and timing array sensitivities and techniques have improved dramatically. With these advances, we further constrain the chirp mass of the potential supermassive black hole binary in 3C66B to less than $(1.65\pm0.02) \times 10^9~{M_\odot}$ using data from the NANOGrav 11-year data set. This upper limit provides a factor of 1.6 improvement over previous limits, and a factor of 4.3 over the first search done. Nevertheless, the most recent orbital model for the source is still consistent with our limit from pulsar timing array data. In addition, we are able to quantify the improvement made by the inclusion of source properties gleaned from electromagnetic data to `blind&#39; pulsar timing array searches. With these methods, it is apparent that it is not necessary to obtain exact a priori knowledge of the period of a binary to gain meaningful astrophysical inferences.

preprint2020arXiv

Surrogate representation of sink strengths and the long-term role of crystalline interfaces in the development of irradiation-induced bubbles

The present article addresses an early-stage attempt on replacing the analyticity-based sink strength terms in rate equations by surrogate models of machine learning representation. Here we emphasise, in the context of multiscale modelling, a combinative use of machine learning with scale analysis, through which a set of fine-resolution problems of partial differential equations describing the (quasi-steady) short-range individual sink behaviour can be asymptotically sorted out from the mean-field kinetics. Hence the training of machine learning is restrictively oriented, that is, to express the local and already identified, but analytically unavailable nonlinear functional relationships between the sink strengths and other local continuum field quantities. With the trained models, one is enabled to quantitatively investigate the biased effect shown by a void/bubble being a point defect sink, and the results are compared with existing ones over well-studied scenarios. Moreover, the faster diffusive mechanisms on crystalline interfaces are distinguishingly modelled by locally planar rate equations, and their linkages with rate equations for bulk diffusion are formulated through derivative jumps of point defect concentrations across the interfaces. Thus the distinctive role of crystalline interfaces as partial sinks and quick diffusive channels can be investigated. Methodologicalwise, the present treatment is also applicable for studying more complicated situation of long-term sink behaviour observed in irradiated materials.

preprint2020arXiv

Weighted directed networks with a differentially private bi-degree sequence

The $p_0$ model is an exponential random graph model for directed networks with the bi-degree sequence as the exclusively sufficient statistic. It captures the network feature of degree heterogeneity. The consistency and asymptotic normality of a differentially private estimator of the parameter in the private $p_0$ model has been established. However, the $p_0$ model only focuses on binary edges. In many realistic networks, edges could be weighted, taking a set of finite discrete values. In this paper, we further show that the moment estimators of the parameters based on the differentially private bi-degree sequence in the weighted $p_0$ model are consistent and asymptotically normal. Numerical studies demonstrate our theoretical findings.