Researcher profile

Xintong Li

Xintong Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

Large Language Models (LLMs) are increasingly deployed as long-term interactive agents, yet their limited context windows make it difficult to sustain coherent behavior over extended interactions. Existing memory systems often store isolated records and retrieve fragments, limiting their ability to consolidate evolving user states and resolve conflicts. We introduce EverMemOS, a self-organizing memory operating system that implements an engram-inspired lifecycle for computational memory. Episodic Trace Formation converts dialogue streams into MemCells that capture episodic traces, atomic facts, and time-bounded Foresight signals. Semantic Consolidation organizes MemCells into thematic MemScenes, distilling stable semantic structures and updating user profiles. Reconstructive Recollection performs MemScene-guided agentic retrieval to compose the necessary and sufficient context for downstream reasoning. Experiments on LoCoMo and LongMemEval show that EverMemOS achieves state-of-the-art performance on memory-augmented reasoning tasks. We further report a profile study on PersonaMem v2 and qualitative case studies illustrating chat-oriented capabilities such as user profiling and Foresight. Code is available at https://github.com/EverMind-AI/EverMemOS.

preprint2026arXiv

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Traditional retrieval pipelines optimize utility through stages of candidate retrieval and reranking, where ranking operates over a predefined candidate set. Large Language Models (LLMs) broaden this into a generative process: given a candidate pool, an LLM can generate a subset and order it within a single autoregressive pass. However, this flexibility introduces a new optimization challenge: the model must search a combinatorial output space while receiving utility feedback only after the full ranked list is generated. Because this feedback is defined over the completed sequence, it cannot distinguish whether a poor result arises from failing to generate a relevant subset or from failing to rank that subset correctly. This credit assignment gap makes end-to-end optimization unstable and sample-inefficient. Existing systems often address this by separating candidate generation from ranking. However, such decoupling remains misaligned with downstream utility because ranking is limited by the candidate set it receives. To bridge this gap, we propose a unified framework that performs both within a single autoregressive rollout and optimizes them end-to-end via factorized group-relative policy optimization (F-GRPO). Our framework factorizes the policy into candidate generation and ranking while sharing a single LLM backbone, and jointly trains them with an order-invariant coverage reward and a position-aware utility reward. To address the resulting phase-specific credit assignment problem, we use separate group-relative advantages for generation and ranking within a two-phase sequence-level objective. Across sequential recommendation and multi-hop question answering benchmarks, F-GRPO improves top-ranked performance over GRPO and decoupled baselines, outperforms supervised alternatives, and remains competitive with strong zero-shot rerankers, with no architectural changes at inference time.

preprint2026arXiv

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Enabling Large Language Models (LLMs) to continuously improve from environmental interactions is a central challenge in post-training. While on-policy self-distillation offers a promising paradigm, existing methods predominantly treat environmental feedback as a passive conditioning signal. Consequently, they heavily rely on successful demonstrations and struggle to learn in rare-success regimes. To bridge this gap, we introduce Reflection-Enhanced Self-Distillation (RESD), a framework that transforms raw failure feedback into an active source of corrective supervision. Instead of passively appending feedback, RESD interprets failed trajectories by generating retrospective reflections to diagnose local errors, and curates a persistent global playbook to preserve reusable lessons across training steps. The enriched context enables the self-teacher to provide actionable token-level supervision even in the absence of successful rollouts. Empirical evaluations on multiple continual learning tasks demonstrate that RESD substantially outperforms standard self-distillation baselines. Furthermore, RESD achieves significantly faster early-stage improvement than GRPO with $8\times$ samples using only a single rollout per prompt, highlighting its superior interaction efficiency.

preprint2026arXiv

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

Multi-negative preference optimization under the Plackett--Luce (PL) model extends Direct Preference Optimization (DPO) by leveraging comparative signals across one preferred and multiple rejected responses. However, optimizing over large negative pools is costly, and many candidates contribute redundant gradients due to their similar effects on policy updates. We introduce MASS-DPO, a multi-negative active sample selection method that derives a PL-specific Fisher-information objective for selecting compact, informative negative subsets within each prompt. The resulting log-determinant objective selects negatives that contribute complementary information for policy updates, yielding compact subsets that retain the full pool's information while reducing redundancy. In practice, this favors negatives whose gradients cover different update directions, reducing redundant signal from near-duplicate candidates while preserving the most useful training information. Across four benchmarks spanning recommendation and multiple-choice QA and three model families, MASS-DPO consistently exceeds or matches existing methods in accuracy, improves Recall/NDCG and margin-based optimization dynamics, and delivers stronger alignment with substantially fewer negatives.

preprint2026arXiv

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

Large language model agents interleave reasoning, action selection, and observation to solve sequential decision-making tasks. In deployed settings where agents repeatedly handle related multi-step tasks, small action-selection errors can accumulate into wasted tool calls, latency, and reduced reliability. Despite this need for deployment-time improvement, existing inference-time adaptation methods for LLM agents mainly rely on prompting or retrieval, which influence behavior indirectly through context manipulation. For ReAct-style agents, such approaches do not expose an explicit decision layer that can score candidate actions, represent uncertainty, or be updated online from action-level feedback. As a result, they provide limited support for trackable, fine-grained, and uncertainty-aware adaptation during deployment. We propose OLIVIA, an inference-time action adaptation framework for ReAct-style agents. OLIVIA models the LLM's final action-selection layer as a contextual linear bandit over candidate actions, with frozen hidden states as decision contexts. This choice is particularly suitable for deployment because it adapts behavior directly at the action-selection interface, preserves the underlying reasoning process, and provides explicit uncertainty estimates and lightweight online updates from action-level feedback. With upper-confidence-bound exploration, OLIVIA improves the policy sample-efficiently with minimal computational overhead. We instantiate OLIVIA on four benchmarks and show that it consistently improves task performance over static ReAct and prompt-based inference-time baselines. Our results suggest that explicit online decision layers provide an effective alternative to purely prompt- or retrieval-based adaptation for LLM agents during deployment.

preprint2026arXiv

SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes

Multimodal large language models often struggle with faithful reasoning in complex visual scenes, where intricate entities and relations require precise visual grounding at each step. This reasoning unfaithfulness frequently manifests as hallucinated entities, mis-grounded relations, skipped steps, and over-specified reasoning. Existing preference-based approaches, typically relying on textual perturbations or answer-conditioned rationales, fail to address this challenge as they allow models to exploit language priors to bypass visual grounding. To address this, we propose SceneAlign, a framework that leverages scene graphs as structured visual information to perform controllable structural interventions. By identifying reasoning-critical nodes and perturbing them through four targeted strategies that mimic typical grounding failures, SceneAlign constructs hard negative rationales that remain linguistically plausible but are grounded in inaccurate visual facts. These contrastive pairs are used in Direct Preference Optimization to steer models toward fine-grained, structure-faithful reasoning. Across seven visual reasoning benchmarks, SceneAlign consistently improves answer accuracy and reasoning faithfulness, highlighting the effectiveness of grounding-aware alignment for multimodal reasoning.

preprint2026arXiv

Skill-R1: Agent Skill Evolution via Reinforcement Learning

Agentic large language models often rely on skills, reusable natural language procedures that guide planning, action, and tool use. In practice, skills are typically improved through prompt engineering or by aligning the task LLM itself, which is costly, model-specific, and often infeasible for closed-source models. Skill optimization is not a one-step problem but a recurrent process with two coupled levels of credit assignment: a useful skill must improve rollout quality under current conditioning, while a useful revision must turn observed outcomes into a better skill for the next round. We propose Skill-R1, a reinforcement learning framework for instance-level recurrent skill optimization from verifiable rewards. Rather than updating the task LLM, Skill-R1 trains a lightweight skill generator that conditions on the task context, prior rollouts, and their verified outcomes to produce skills that steer a frozen task LLM. This preserves black-box compatibility with both open- and closed-source models while making adaptation substantially cheaper than model-level updates. Skill-R1 proceeds over multiple generations: at each step, the current skill induces rollouts whose verified outcomes are fed back to produce the next revision. To optimize this recurrent process, we introduce a bi-level group-relative policy optimization objective combining intra-generation and inter-generation advantages. The intra-generation term compares rollouts under shared skill conditioning, while the inter-generation term rewards revisions that improve behavior across successive generations. Together, these provide a principled objective for directional skill evolution rather than one-shot self-refinement. Empirically, Skill-R1 achieves consistent gains over no-skill baselines and standard GRPO across benchmarks with verifiable rewards, with particularly strong improvements on complex, multi-step tasks.

preprint2022arXiv

A State-of-the-art Survey of Artificial Neural Networks for Whole-slide Image Analysis:from Popular Convolutional Neural Networks to Potential Visual Transformers

To increase the objectivity and accuracy of pathologists' work, artificial neural network(ANN) methods have been generally needed in the segmentation, classification, and detection of histopathological WSI. In this paper, WSI analysis methods based on ANN are reviewed. Firstly, the development status of WSI and ANN methods is introduced. Secondly, we summarize the common ANN methods. Next, we discuss publicly available WSI datasets and evaluation metrics. These ANN architectures for WSI processing are divided into classical neural networks and deep neural networks(DNNs) and then analyzed. Finally, the application prospect of the analytical method in this field is discussed. The important potential method is Visual Transformers.

preprint2022arXiv

A State-of-the-art Survey of U-Net in Microscopic Image Analysis: from Simple Usage to Structure Mortification

Image analysis technology is used to solve the inadvertences of artificial traditional methods in disease, wastewater treatment, environmental change monitoring analysis and convolutional neural networks (CNN) play an important role in microscopic image analysis. An important step in detection, tracking, monitoring, feature extraction, modeling and analysis is image segmentation, in which U-Net has increasingly applied in microscopic image segmentation. This paper comprehensively reviews the development history of U-Net, and analyzes various research results of various segmentation methods since the emergence of U-Net and conducts a comprehensive review of related papers. First, this paper has summarized the improved methods of U-Net and then listed the existing significance of image segmentation techniques and their improvements that has introduced over the years. Finally, focusing on the different improvement strategies of U-Net in different papers, the related work of each application target is reviewed according to detailed technical categories to facilitate future research. Researchers can clearly see the dynamics of transmission of technological development and keep up with future trends in this interdisciplinary field.

preprint2022arXiv

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation. However, all the above tasks are in the direction of speech understanding, but for the inverse direction, speech synthesis, the potential of representation learning is yet to be realized, due to the challenging nature of generating high-quality speech. To address this problem, we propose our framework, Alignment-Aware Acoustic-Text Pretraining (A$^3$T), which reconstructs masked acoustic signals with text input and acoustic-text alignment during training. In this way, the pretrained model can generate high quality reconstructed spectrogram, which can be applied to the speech editing and unseen speaker TTS directly. Experiments show A$^3$T outperforms SOTA models on speech editing, and improves multi-speaker speech synthesis without the external speaker verification model.

preprint2022arXiv

All Electrical Control and Temperature Dependence of the Spin and Valley Hall Effect in Monolayer WSe2 Transistors

Heavy metal-based two-dimensional van der Waals materials have a large, coupled spin and valley Hall effect (SVHE) that has potential use in spintronics and valleytronics. Optical measurements of the SVHE have largely been performed below 30 K and understanding of the SVHE-induced spin/valley polarizations that can be electrically generated is limited. Here, we study the SVHE in monolayer p-type tungsten diselenide (WSe2). Kerr rotation (KR) measurements show the spatial distribution of the SVHE at different temperatures, its persistence up to 160 K, and that it can be electrically modulated via gate and drain bias. A spin/valley drift and diffusion model together with reflection spectra data is used to interpret the KR data and predict a lower-bound spin/valley lifetime of 4.1 ns below 90 K and 0.26 ns at 160 K. The excess spin and valley per unit length along the edge is calculated to be 109 per micron at 45 K, which corresponds to a spin/valley polarization on the edge of 6%. These results are important steps towards practical use of the SVHE.

preprint2022arXiv

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the development and research of speech processing technologies by providing an easy-to-use command-line interface and a simple code structure. This paper describes the design philosophy and core architecture of PaddleSpeech to support several essential speech-to-text and text-to-speech tasks. PaddleSpeech achieves competitive or state-of-the-art performance on various speech datasets and implements the most popular methods. It also provides recipes and pretrained models to quickly reproduce the experimental results in this paper. PaddleSpeech is publicly avaiable at https://github.com/PaddlePaddle/PaddleSpeech.

preprint2022arXiv

What Can Machine Vision Do for Lymphatic Histopathology Image Analysis: A Comprehensive Review

In the past ten years, the computing power of machine vision (MV) has been continuously improved, and image analysis algorithms have developed rapidly. At the same time, histopathological slices can be stored as digital images. Therefore, MV algorithms can provide doctors with diagnostic references. In particular, the continuous improvement of deep learning algorithms has further improved the accuracy of MV in disease detection and diagnosis. This paper reviews the applications of image processing technology based on MV in lymphoma histopathological images in recent years, including segmentation, classification and detection. Finally, the current methods are analyzed, some more potential methods are proposed, and further prospects are made.

preprint2021arXiv

A Comprehensive Review of Computer-aided Whole-slide Image Analysis: from Datasets to Feature Extraction, Segmentation, Classification, and Detection Approaches

With the development of computer-aided diagnosis (CAD) and image scanning technology, Whole-slide Image (WSI) scanners are widely used in the field of pathological diagnosis. Therefore, WSI analysis has become the key to modern digital pathology. Since 2004, WSI has been used more and more in CAD. Since machine vision methods are usually based on semi-automatic or fully automatic computers, they are highly efficient and labor-saving. The combination of WSI and CAD technologies for segmentation, classification, and detection helps histopathologists obtain more stable and quantitative analysis results, save labor costs and improve diagnosis objectivity. This paper reviews the methods of WSI analysis based on machine learning. Firstly, the development status of WSI and CAD methods are introduced. Secondly, we discuss publicly available WSI datasets and evaluation metrics for segmentation, classification, and detection tasks. Then, the latest development of machine learning in WSI segmentation, classification, and detection are reviewed continuously. Finally, the existing methods are studied, the applicabilities of the analysis methods are analyzed, and the application prospects of the analysis methods in this field are forecasted.

preprint2021arXiv

Evolution of Charge and Pair Density Modulations in Overdoped Bi2Sr2CuO6+delta

One of the central issues concerning the mechanism of high temperature superconductivity in cuprates is the nature of the ubiquitous charge order and its implications to superconductivity. Here we use scanning tunneling microscopy to investigate the evolution of charge order from the optimally doped to strongly overdoped Bi2Sr2CuO6+δ cuprates. We find that with increasing hole concentration, the long-range checkerboard order gradually evolves into short-range glassy patterns consisting of diluted charge puddles. Each charge puddle has a unidirectional nematic internal structure, and exhibits clear pair density modulations as revealed by the spatial variations of superconducting coherence peak and gap depth. Both the charge puddles and the nematicity vanish completely in the strongly overdoped non-superconducting regime, when another type of short-range order with root2 * root2 periodicity emerges. These results shed important new lights on the intricate interplay between the intertwined orders and the superconducting phase of cuprates.

preprint2021arXiv

Particle-hole asymmetric superconducting coherence peaks in overdoped cuprates

To elucidate the superconductor to metal transition at the end of superconducting dome, the overdoped regime has stepped onto the center stage of cuprate research recently. Here, we use scanning tunneling microscopy to investigate the atomic-scale electronic structure of overdoped trilayer Bi-2223 and bilayer Bi-2212 cuprates. At low energies the spectroscopic maps are well described by dispersive quasiparticle interference patterns. However, as the bias increases to the superconducting coherence peak energy, a virtually non-dispersive pattern with sqrt(2)*sqrt(2) periodicity emerges. Remarkably, the position of the coherence peaks exhibits evident particle-hole asymmetry which also modulates with the same period. We propose that this is an extreme quasiparticle interference phenomenon, caused by pairing-breaking scattering between flat anti-nodal Bogoliubov bands, which is ultimately responsible for the superconductor to metal transition.

preprint2020arXiv

Machine-learning classifiers for logographic name matching in public health applications: approaches for incorporating phonetic, visual, and keystroke similarity in large-scale probabilistic record linkage

Approximate string-matching methods to account for complex variation in highly discriminatory text fields, such as personal names, can enhance probabilistic record linkage. However, discriminating between matching and non-matching strings is challenging for logographic scripts, where similarities in pronunciation, appearance, or keystroke sequence are not directly encoded in the string data. We leverage a large Chinese administrative dataset with known match status to develop logistic regression and Xgboost classifiers integrating measures of visual, phonetic, and keystroke similarity to enhance identification of potentially-matching name pairs. We evaluate three methods of leveraging name similarity scores in large-scale probabilistic record linkage, which can adapt to varying match prevalence and information in supporting fields: (1) setting a threshold score based on predicted quality of name-matching across all record pairs; (2) setting a threshold score based on predicted discriminatory power of the linkage model; and (3) using empirical score distributions among matches and nonmatches to perform Bayesian adjustment of matching probabilities estimated from exact-agreement linkage. In experiments on holdout data, as well as data simulated with varying name error rates and supporting fields, a logistic regression classifier incorporated via the Bayesian method demonstrated marked improvements over exact-agreement linkage with respect to discriminatory power, match probability estimation, and accuracy, reducing the total number of misclassified record pairs by 21% in test data and up to an average of 93% in simulated datasets. Our results demonstrate the value of incorporating visual, phonetic, and keystroke similarity for logographic name matching, as well as the promise of our Bayesian approach to leverage name-matching within large-scale record linkage.

preprint2020arXiv

Regularized Context Gates on Transformer for Machine Translation

Context gates are effective to control the contributions from the source and target contexts in the recurrent neural network (RNN) based neural machine translation (NMT). However, it is challenging to extend them into the advanced Transformer architecture, which is more complicated than RNN. This paper first provides a method to identify source and target contexts and then introduce a gate mechanism to control the source and target contributions in Transformer. In addition, to further reduce the bias problem in the gate mechanism, this paper proposes a regularization method to guide the learning of the gates with supervision automatically generated using pointwise mutual information. Extensive experiments on 4 translation datasets demonstrate that the proposed model obtains an averaged gain of 1.0 BLEU score over a strong Transformer baseline.

preprint2019arXiv

Effect of structural supermodulation on superconductivity in tri-layer cuprate Bi2Sr2Ca2Cu3O10+x

We investigate the spatial and doping evolutions of the superconducting properties of tri-layer cuprate Bi2Sr2Ca2Cu3O10+x by using scanning tunneling microscopy and spectroscopy. Both the superconducting coherence peak and gap size exhibit periodic variations with the structural supermodulation, but the effect is much more pronounced in the underdoped regime than at optimal doping. Moreover, a new type of tunneling spectrum characterized by two superconducting gaps emerges with increasing doping, and the two-gap features also correlate with the supermodulation. We propose that the interaction between the inequivalent outer and inner CuO2 planes is responsible for these novel features that are unique to tri-layer cuprates.