Source author record

Ding Chen

Ding Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.HE astro-ph.IM Computation and Language hep-ex physics.ins-det Artificial Intelligence astro-ph.CO astro-ph.SR Computer Vision cs.CY Multiagent Systems Neural and Evolutionary Computing physics.soc-ph q-fin.GN q-fin.TR

Catalog footprint

What is connected

14works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HaluMem: Evaluating Hallucinations in Memory Systems of Agents

Memory systems are key components that enable AI systems such as LLMs and AI agents to achieve long-term learning and sustained interaction. However, during memory storage and retrieval, these systems frequently exhibit memory hallucinations, including fabrication, errors, conflicts, and omissions. Existing evaluations of memory hallucinations are primarily end-to-end question answering, which makes it difficult to localize the operational stage within the memory system where hallucinations arise. To address this, we introduce the Hallucination in Memory Benchmark (HaluMem), the first operation level hallucination evaluation benchmark tailored to memory systems. HaluMem defines three evaluation tasks (memory extraction, memory updating, and memory question answering) to comprehensively reveal hallucination behaviors across different operational stages of interaction. To support evaluation, we construct user-centric, multi-turn human-AI interaction datasets, HaluMem-Medium and HaluMem-Long. Both include about 15k memory points and 3.5k multi-type questions. The average dialogue length per user reaches 1.5k and 2.6k turns, with context lengths exceeding 1M tokens, enabling evaluation of hallucinations across different context scales and task complexities. Empirical studies based on HaluMem show that existing memory systems tend to generate and accumulate hallucinations during the extraction and updating stages, which subsequently propagate errors to the question answering stage. Future research should focus on developing interpretable and constrained memory operation mechanisms that systematically suppress hallucinations and improve memory reliability.

preprint2026arXiv

MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

In agent memory systems, the reranking model serves as the critical bridge connecting user queries with long-term memory. Most systems adopt the "retrieve-then-rerank" two-stage paradigm, but generic reranking models rely on semantic similarity matching and lack genuine reasoning capabilities, leading to a problem where recalled results are semantically highly relevant yet do not contain the key information needed to answer the question. This deficiency manifests in memory scenarios as three specific problems. First, relevance scores are miscalibrated, making threshold-based filtering difficult. Second, ranking degrades when facing temporal constraints, causal reasoning, and other complex queries. Third, the model cannot leverage dialogue context for semantic disambiguation. This report introduces MemReranker, a reranking model family (0.6B/4B) built on Qwen3-Reranker through multi-stage LLM knowledge distillation. Multi-teacher pairwise comparisons generate calibrated soft labels, BCE pointwise distillation establishes well-distributed scores, and InfoNCE contrastive learning enhances hard-sample discrimination. Training data combines general corpora with memory-specific multi-turn dialogue data covering temporal constraints, causal reasoning, and coreference resolution. On the memory retrieval benchmark, MemReranker-0.6B substantially outperforms BGE-Reranker and matches open-source 4B/8B models as well as GPT-4o-mini on key metrics. MemReranker-4B further achieves 0.737 MAP, with several metrics on par with Gemini-3-Flash, while maintaining inference latency at only 10--20% of large models. On finance and healthcare vertical-domain benchmarks, the models preserve generalization capabilities on par with mainstream large-parameter rerankers.

preprint2026arXiv

Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits

The scientific claims drawn from LLM social simulations should be no stronger than the robustness audits that support them. Generative agents bring new expressive power to agent-based modeling, enabling simulations of collective social processes like cooperation, polarization, and norm formation. Yet they also introduce complexity through additional architectural choices, such as agent specification, memory representation, interaction protocols, and environment design. Small perturbations that appear minor to researchers can cascade into macro-level outcomes through repeated interaction, creating a "butterfly effect." Consequently, scientific claims drawn from LLM social simulations may reflect implementation artifacts rather than the social mechanisms being modeled. We support this position with two case studies: a repeated Prisoner's Dilemma and a social media echo chamber simulation. Across multiple models, minor perturbations in persona format and game-instruction framing shift cooperation rates by up to 76 percentage points, while network homophily and hub assignment produce significant and consistent shifts in polarization metrics. We also find that sensitivity is unevenly distributed across both architectural choices and model families: the same perturbation that produces the 76 pp shift in one frontier model only shifts another by 1 pp. Robustness is therefore a property that should be measured per claim and per model, not assumed. To address this validation gap, we introduce TRAILS (Taxonomy for Robustness Audits In LLM Simulations), a robustness-audit taxonomy spanning three levels of simulation design: agent (micro-level), interaction (meso-level), and system (macro-level). We call for robustness to become a first-order validation requirement before LLM social simulations are used to explain mechanisms, evaluate interventions, or inform decisions.

preprint2025arXiv

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

With the release of OpenAI's o1 model, reasoning models that adopt slow-thinking strategies have become increasingly common. Their outputs often contain complex reasoning, intermediate steps, and self-reflection, making existing evaluation methods and reward models inadequate. In particular, they struggle to judge answer equivalence and to reliably extract final answers from long, complex responses. To address this challenge, we propose xVerify, an efficient answer verifier for evaluating reasoning models. xVerify shows strong equivalence judgment capabilities, enabling accurate comparison between model outputs and reference answers across diverse question types. To train and evaluate xVerify, we construct the VAR dataset, which consists of question-answer pairs generated by multiple LLMs across various datasets. The dataset incorporates multiple reasoning models and challenging evaluation sets specifically designed for reasoning assessment, with a multi-round annotation process to ensure label quality. Based on VAR, we train xVerify models at different scales. Experimental results on both test and generalization sets show that all xVerify variants achieve over 95% F1 score and accuracy. Notably, the smallest model, xVerify-0.5B-I, outperforms all evaluation methods except GPT-4o, while xVerify-3B-Ib surpasses GPT-4o in overall performance. In addition, reinforcement learning experiments using xVerify as the reward model yield an 18.4% improvement for Qwen2.5-7B compared with direct generation, exceeding the gains achieved with Math Verify as the reward. These results demonstrate the effectiveness and generalizability of xVerify. All xVerify resources are available on \href{https://github.com/IAAR-Shanghai/xVerify}{GitHub}.

preprint2024arXiv

Parallel Spiking Neurons with High Efficiency and Ability to Learn Long-term Dependencies

Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated serially and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics can be reformulated in a non-iterative form and parallelized. By rewriting neuronal dynamics without reset to a general formulation, we propose the Parallel Spiking Neuron (PSN), which generates hidden states that are independent of their predecessors, resulting in parallelizable neuronal dynamics and extremely high simulation speed. The weights of inputs in the PSN are fully connected, which maximizes the utilization of temporal information. To avoid the use of future inputs for step-by-step inference, the weights of the PSN can be masked, resulting in the masked PSN. By sharing weights across time-steps based on the masked PSN, the sliding PSN is proposed to handle sequences of varying lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To the best of our knowledge, this is the first study about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning research. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}.

preprint2021arXiv

The design of the Ali CMB Polarization Telescope receiver

Ali CMB Polarization Telescope (AliCPT-1) is the first CMB degree-scale polarimeter to be deployed on the Tibetan plateau at 5,250m above sea level. AliCPT-1 is a 90/150 GHz 72 cm aperture, two-lens refracting telescope cooled down to 4 K. Alumina lenses, 800mm in diameter, image the CMB in a 33.4° field of view on a 636mm wide focal plane. The modularized focal plane consists of dichroic polarization-sensitive Transition-Edge Sensors (TESes). Each module includes 1,704 optically active TESes fabricated on a 150mm diameter silicon wafer. Each TES array is read out with a microwave multiplexing readout system capable of a multiplexing factor up to 2,048. Such a large multiplexing factor has allowed the practical deployment of tens of thousands of detectors, enabling the design of a receiver that can operate up to 19 TES arrays for a total of 32,376 TESes. AliCPT-1 leverages the technological advancements in the detector design from multiple generations of previously successful feedhorn-coupled polarimeters, and in the instrument design from BICEP-3, but applied on a larger scale. The cryostat receiver is currently under integration and testing. During the first deployment year, the focal plane will be populated with up to 4 TES arrays. Further TES arrays will be deployed in the following years, fully populating the focal plane with 19 arrays on the fourth deployment year. Here we present the AliCPT-1 receiver design, and how the design has been optimized to meet the experimental requirements.

preprint2020arXiv

Modeling the broadest spectral band of the Crab nebula and constraining the ions acceleration efficiency

Although it is widely accepted that the electromagnetic spectrum from radio to very-high-energy $γ$-rays of pulsar wind nebulae (PWNe) originates from leptons, there is still an open question that protons (or more generally, ions) may exist in pulsar wind and are further accelerated in PWN. The broadband spectrum of the prototype PWN Crab, extended recently by the detection of the Tibet AS$γ$ and HAWC experiments above 100 TeV, may be helpful in constraining the acceleration efficiency of ions. Here, we model the broadest energy spectrum of Crab and find that the broadband spectrum can be explained by the one-zone leptonic model in which the electrons/positrons produce the emission from radio to soft $γ$-rays via the synchrotron process, and simultaneously generate the GeV-TeV $γ$-rays through inverse Compton scattering including the synchrotron self-Compton process. In the framework of this leptonic model, the fraction of energy converted into the energetic protons is constrained to be below $0.5\ (n_{\rm t}/10\ {\rm cm}^{-3})^{-1}$ per cent, where $n_{\rm t}$ is the target gas density in the Crab. However, this fraction can be up to $7\ (n_{\rm t}/10\ {\rm cm}^{-3})^{-1}$ per cent if only the $γ$-rays are used.

preprint2016arXiv

Performance of new 8-inch photomultiplier tube used for the Tibet muon-detector array

A new hybrid experiment has been constructed to measure the chemical composition of cosmic rays around the "knee" in the wide energy range by the Tibet AS$γ$ collaboration at Tibet, China, since 2014. They consist of a high-energy air-shower-core array (YAC-II), a high-density air-shower array (Tibet-III) and a large underground water-Cherenkov muon-detector array (MD). In order to obtain the primary proton, helium and iron spectra and their "knee" positions in the energy range lower than $10^{16}$ eV, each of PMTs equipped to the MD cell is required to measure the number of photons capable of covering a wide dynamic range of 100 - $10^{6}$ photoelectrons (PEs) according to Monte Carlo simulations. In this paper, we firstly compare the characteristic features between R5912-PMT made by Japan Hamamatsu and CR365-PMT made by Beijing Hamamatsu. This is the first comparison between R5912-PMT and CR365-PMT. If there exists no serious difference, we will then add two 8-inch-in-diameter PMTs to meet our requirements in each MD cell, which are responsible for the range of 100 - 10000 PEs and 2000 - 1000000 PEs, respectively. That is, MD cell is expected to be able to measure the number of muons over 6 orders of magnitude.

preprint2015arXiv

Development of Yangbajing Air shower Core detector array for a new EAS hybrid Experiment

Aiming at the observation of cosmic-ray chemical composition at the "knee" energy region, we have been developinga new type air-shower core detector (YAC, Yangbajing Air shower Core detector array) to be set up at Yangbajing (90.522$^\circ$ E, 30.102$^\circ$ N, 4300 m above sea level, atmospheric depth: 606 g/m$^2$) in Tibet, China. YAC works together with the Tibet air-shower array (Tibet-III) and an underground water cherenkov muon detector array (MD) as a hybrid experiment. Each YAC detector unit consists of lead plates of 3.5 cm thick and a scintillation counter which detects the burst size induced by high energy particles in the air-shower cores. The burst size can be measured from 1 MIP (Minimum Ionization Particle) to $10^{6}$ MIPs. The first phase of this experiment, named "YAC-I", consists of 16 YAC detectors each having the size 40 cm $\times$ 50 cm and distributing in a grid with an effective area of 10 m$^{2}$. YAC-I is used to check hadronic interaction models. The second phase of the experiment, called "YAC-II", consists of 124 YAC detectors with coverage about 500 m$^2$. The inner 100 detectors of 80 cm $\times $ 50 cm each are deployed in a 10 $\times$ 10 matrix from with a 1.9 m separation and the outer 24 detectors of 100 cm $\times$ 50 cm each are distributed around them to reject non-core events whose shower cores are far from the YAC-II array. YAC-II is used to study the primary cosmic-ray composition, in particular, to obtain the energy spectra of proton, helium and iron nuclei between 5$\times$$10^{13}$ eV and $10^{16}$ eV covering the "knee" and also being connected with direct observations at energies around 100 TeV. We present the design and performance of YAC-II in this paper.

preprint2015arXiv

Spectra of cosmic ray electrons and diffuse gamma rays with the constraints of AMS-02 and HESS data

Recently, AMS-02 reported their observed results of cosmic rays(CRs). In addition to the AMS-02 data, we add HESS data to estimate the spectra of CR electrons and the diffuse gamma rays above TeV. In the conventional diffusion model, a global analysis is performed on the spectral features of CR electrons and the diffuse gamma rays by GALRPOP package. The results show that the spectrum structure of the primary component of CR electrons can not be fully reproduced by a simple power law and the relevant break is around hundred GeV. At 99\% C.L., the injection indices above the break decrease from 2.54 to 2.35, but the ones below the break are only in the range 2.746 - 2.751. The spectrum of CR electrons does not need to add TeV cutoff to match the features of HESS data too. Based on the difference between the fluxes of CR electrons and the primary component of them, the predicted excess of CR positrons is consistent with the interpretations as pulsar or dark matter. In the analysis of the Galactic diffuse gamma rays with the indirect constraint of AMS-02 and HESS data, it is found that the fluxes of Galactic diffuse gamma rays are consistent with GeV data of Fermi-LAT in the high latitude regions. The results indicate that the inverse Compton scattering(IC) is the dominant component in the range of the hundred GeV to tens of TeV respectively from the high latitude regions to the low ones, and in the all regions of Galaxy the flux of diffuse gamma rays is less than CR electrons at the energy scale of 20 TeV.

preprint2014arXiv

Very Long Baseline Interferometry Measured Proper Motion and Parallax of the $γ$-ray Millisecond Pulsar PSR J0218+4232

PSR J0218$+$4232 is a millisecond pulsar (MSP) with a flux density $\sim$ 0.9 mJy at 1.4 GHz. It is very bright in the high-energy X-ray and $γ$-ray domains. We conducted an astrometric program using the European VLBI Network (EVN) at 1.6 GHz to measure its proper motion and parallax. A model-independent distance would also help constrain its $γ$-ray luminosity. We achieved a detection of signal-to-noise ratio S/N > 37 for the weak pulsar in all five epochs. Using an extragalactic radio source lying 20 arcmin away from the pulsar, we estimate the pulsar's proper motion to be $μ_α\cosδ=5.35\pm0.05$ mas yr$^{-1}$ and $μ_δ=-3.74\pm 0.12$ mas yr$^{-1}$, and a parallax of $π=0.16\pm0.09$ mas. The very long baseline interferometry (VLBI) proper motion has significantly improved upon the estimates from long-term pulsar timing observations. The VLBI parallax provides the first model-independent distance constraints: $d=6.3^{+8.0}_{-2.3}$ kpc, with a corresponding $3σ$ lower-limit of $d=2.3$ kpc. This is the first pulsar trigonometric parallax measurement based solely on EVN observations. Using the derived distance, we believe that PSR J0218$+$4232 is the most energetic $γ$-ray MSP known to date. The luminosity based on even our 3$σ$ lower-limit distance is high enough to pose challenges to the conventional outer gap and slot gap models.

preprint2012arXiv

Radio and Gamma-ray Pulsed Emission from Millisecond Pulsars

Pulsed gamma-ray emission from millisecond pulsars (MSPs) has been detected by the sensitive Fermi, which sheds light on studies of the emission region and mechanism. In particular, the specific patterns of radio and gamma-ray emission from PSR J0101-6422 challenge the popular pulsar models, e.g. outer gap and two-pole caustic (TPC) models. Using the three dimension (3D) annular gap model, we have jointly simulated radio and gamma-ray light curves for three representative MSPs (PSR J0034-0534, PSR J0101-6422 and PSR J0437-4715) with distinct radio phase lags and present the best simulated results for these MSPs, particularly for PSR J0101-6422 with complex radio and gamma-ray pulse profiles and for PSR J0437-4715 with a radio interpulse. It is found that both the gamma-ray and radio emission originate from the annular gap region located in only one magnetic pole, and the radio emission region is not primarily lower than the gamma-ray region in most cases. In addition, the annular gap model with a small magnetic inclination angle instead of "orthogonal rotator" can account for MSPs' radio interpulse with a large phase separation from the main pulse. The annular gap model is a self-consistent model not only for young pulsars but also MSPs, and multi-wavelength light curves can be fundamentally explained by this model.

preprint2011arXiv

Optimal Filtration and a Pulsar Time Scale

An algorithm is proposed for constructing a group (ensemble) pulsar time based on the application of optimal Wiener filters. This algorithm makes it possible to separate the contributions of variations of the atomic time scale and of the pulsar rotation to barycentric residual deviations of the pulse arrival times. The method is applied to observations of the pulsars PSR B1855+09 and PSR B1937+21, and is used to obtain corrections to UTC relative to the group pulsar time PT$_{\rm ens}$. Direct comparison of the terrestrial time TT(BIPM06) and the group pulsar time PT$_{\rm ens}$ shows that they disagree by no more than $0.4\pm 0.17\; μ$s. Based on the fractional instability of the time difference TT(BIPM06) -- PT$_{\rm ens}$, a new limit for the energy density of the gravitational-wave background is established at the level $Ω_g {h}^2\sim 10^{-9}$.

preprint2010arXiv

A Security Price Volatile Trading Conditioning Model

We develop a theoretical trading conditioning model subject to price volatility and return information in terms of market psychological behavior, based on analytical transaction volume-price probability wave distributions in which we use transaction volume probability to describe price volatility uncertainty and intensity. Applying the model to high frequent data test in China stock market, we have main findings as follows: 1) there is, in general, significant positive correlation between the rate of mean return and that of change in trading conditioning intensity; 2) it lacks significance in spite of positive correlation in two time intervals right before and just after bubble crashes; and 3) it shows, particularly, significant negative correlation in a time interval when SSE Composite Index is rising during bull market. Our model and findings can test both disposition effect and herd behavior simultaneously, and explain excessive trading (volume) and other anomalies in stock market.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint