Source author record

Xintong Li

Xintong Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language Computer Vision cond-mat.supr-con Artificial Intelligence cond-mat.str-el eess.AS eess.IV Sound Applications cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.stat-mech Information Retrieval

Catalog footprint

What is connected

22works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

Large Language Models (LLMs) are increasingly deployed as long-term interactive agents, yet their limited context windows make it difficult to sustain coherent behavior over extended interactions. Existing memory systems often store isolated records and retrieve fragments, limiting their ability to consolidate evolving user states and resolve conflicts. We introduce EverMemOS, a self-organizing memory operating system that implements an engram-inspired lifecycle for computational memory. Episodic Trace Formation converts dialogue streams into MemCells that capture episodic traces, atomic facts, and time-bounded Foresight signals. Semantic Consolidation organizes MemCells into thematic MemScenes, distilling stable semantic structures and updating user profiles. Reconstructive Recollection performs MemScene-guided agentic retrieval to compose the necessary and sufficient context for downstream reasoning. Experiments on LoCoMo and LongMemEval show that EverMemOS achieves state-of-the-art performance on memory-augmented reasoning tasks. We further report a profile study on PersonaMem v2 and qualitative case studies illustrating chat-oriented capabilities such as user profiling and Foresight. Code is available at https://github.com/EverMind-AI/EverMemOS.

preprint2026arXiv

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Traditional retrieval pipelines optimize utility through stages of candidate retrieval and reranking, where ranking operates over a predefined candidate set. Large Language Models (LLMs) broaden this into a generative process: given a candidate pool, an LLM can generate a subset and order it within a single autoregressive pass. However, this flexibility introduces a new optimization challenge: the model must search a combinatorial output space while receiving utility feedback only after the full ranked list is generated. Because this feedback is defined over the completed sequence, it cannot distinguish whether a poor result arises from failing to generate a relevant subset or from failing to rank that subset correctly. This credit assignment gap makes end-to-end optimization unstable and sample-inefficient. Existing systems often address this by separating candidate generation from ranking. However, such decoupling remains misaligned with downstream utility because ranking is limited by the candidate set it receives. To bridge this gap, we propose a unified framework that performs both within a single autoregressive rollout and optimizes them end-to-end via factorized group-relative policy optimization (F-GRPO). Our framework factorizes the policy into candidate generation and ranking while sharing a single LLM backbone, and jointly trains them with an order-invariant coverage reward and a position-aware utility reward. To address the resulting phase-specific credit assignment problem, we use separate group-relative advantages for generation and ranking within a two-phase sequence-level objective. Across sequential recommendation and multi-hop question answering benchmarks, F-GRPO improves top-ranked performance over GRPO and decoupled baselines, outperforms supervised alternatives, and remains competitive with strong zero-shot rerankers, with no architectural changes at inference time.

preprint2026arXiv

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Enabling Large Language Models (LLMs) to continuously improve from environmental interactions is a central challenge in post-training. While on-policy self-distillation offers a promising paradigm, existing methods predominantly treat environmental feedback as a passive conditioning signal. Consequently, they heavily rely on successful demonstrations and struggle to learn in rare-success regimes. To bridge this gap, we introduce Reflection-Enhanced Self-Distillation (RESD), a framework that transforms raw failure feedback into an active source of corrective supervision. Instead of passively appending feedback, RESD interprets failed trajectories by generating retrospective reflections to diagnose local errors, and curates a persistent global playbook to preserve reusable lessons across training steps. The enriched context enables the self-teacher to provide actionable token-level supervision even in the absence of successful rollouts. Empirical evaluations on multiple continual learning tasks demonstrate that RESD substantially outperforms standard self-distillation baselines. Furthermore, RESD achieves significantly faster early-stage improvement than GRPO with $8\times$ samples using only a single rollout per prompt, highlighting its superior interaction efficiency.

preprint2026arXiv

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

Multi-negative preference optimization under the Plackett--Luce (PL) model extends Direct Preference Optimization (DPO) by leveraging comparative signals across one preferred and multiple rejected responses. However, optimizing over large negative pools is costly, and many candidates contribute redundant gradients due to their similar effects on policy updates. We introduce MASS-DPO, a multi-negative active sample selection method that derives a PL-specific Fisher-information objective for selecting compact, informative negative subsets within each prompt. The resulting log-determinant objective selects negatives that contribute complementary information for policy updates, yielding compact subsets that retain the full pool's information while reducing redundancy. In practice, this favors negatives whose gradients cover different update directions, reducing redundant signal from near-duplicate candidates while preserving the most useful training information. Across four benchmarks spanning recommendation and multiple-choice QA and three model families, MASS-DPO consistently exceeds or matches existing methods in accuracy, improves Recall/NDCG and margin-based optimization dynamics, and delivers stronger alignment with substantially fewer negatives.

preprint2026arXiv

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

Large language model agents interleave reasoning, action selection, and observation to solve sequential decision-making tasks. In deployed settings where agents repeatedly handle related multi-step tasks, small action-selection errors can accumulate into wasted tool calls, latency, and reduced reliability. Despite this need for deployment-time improvement, existing inference-time adaptation methods for LLM agents mainly rely on prompting or retrieval, which influence behavior indirectly through context manipulation. For ReAct-style agents, such approaches do not expose an explicit decision layer that can score candidate actions, represent uncertainty, or be updated online from action-level feedback. As a result, they provide limited support for trackable, fine-grained, and uncertainty-aware adaptation during deployment. We propose OLIVIA, an inference-time action adaptation framework for ReAct-style agents. OLIVIA models the LLM's final action-selection layer as a contextual linear bandit over candidate actions, with frozen hidden states as decision contexts. This choice is particularly suitable for deployment because it adapts behavior directly at the action-selection interface, preserves the underlying reasoning process, and provides explicit uncertainty estimates and lightweight online updates from action-level feedback. With upper-confidence-bound exploration, OLIVIA improves the policy sample-efficiently with minimal computational overhead. We instantiate OLIVIA on four benchmarks and show that it consistently improves task performance over static ReAct and prompt-based inference-time baselines. Our results suggest that explicit online decision layers provide an effective alternative to purely prompt- or retrieval-based adaptation for LLM agents during deployment.

preprint2026arXiv

SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes

Multimodal large language models often struggle with faithful reasoning in complex visual scenes, where intricate entities and relations require precise visual grounding at each step. This reasoning unfaithfulness frequently manifests as hallucinated entities, mis-grounded relations, skipped steps, and over-specified reasoning. Existing preference-based approaches, typically relying on textual perturbations or answer-conditioned rationales, fail to address this challenge as they allow models to exploit language priors to bypass visual grounding. To address this, we propose SceneAlign, a framework that leverages scene graphs as structured visual information to perform controllable structural interventions. By identifying reasoning-critical nodes and perturbing them through four targeted strategies that mimic typical grounding failures, SceneAlign constructs hard negative rationales that remain linguistically plausible but are grounded in inaccurate visual facts. These contrastive pairs are used in Direct Preference Optimization to steer models toward fine-grained, structure-faithful reasoning. Across seven visual reasoning benchmarks, SceneAlign consistently improves answer accuracy and reasoning faithfulness, highlighting the effectiveness of grounding-aware alignment for multimodal reasoning.

preprint2026arXiv

Skill-R1: Agent Skill Evolution via Reinforcement Learning

Agentic large language models often rely on skills, reusable natural language procedures that guide planning, action, and tool use. In practice, skills are typically improved through prompt engineering or by aligning the task LLM itself, which is costly, model-specific, and often infeasible for closed-source models. Skill optimization is not a one-step problem but a recurrent process with two coupled levels of credit assignment: a useful skill must improve rollout quality under current conditioning, while a useful revision must turn observed outcomes into a better skill for the next round. We propose Skill-R1, a reinforcement learning framework for instance-level recurrent skill optimization from verifiable rewards. Rather than updating the task LLM, Skill-R1 trains a lightweight skill generator that conditions on the task context, prior rollouts, and their verified outcomes to produce skills that steer a frozen task LLM. This preserves black-box compatibility with both open- and closed-source models while making adaptation substantially cheaper than model-level updates. Skill-R1 proceeds over multiple generations: at each step, the current skill induces rollouts whose verified outcomes are fed back to produce the next revision. To optimize this recurrent process, we introduce a bi-level group-relative policy optimization objective combining intra-generation and inter-generation advantages. The intra-generation term compares rollouts under shared skill conditioning, while the inter-generation term rewards revisions that improve behavior across successive generations. Together, these provide a principled objective for directional skill evolution rather than one-shot self-refinement. Empirically, Skill-R1 achieves consistent gains over no-skill baselines and standard GRPO across benchmarks with verifiable rewards, with particularly strong improvements on complex, multi-step tasks.

preprint2022arXiv

A State-of-the-art Survey of Artificial Neural Networks for Whole-slide Image Analysis:from Popular Convolutional Neural Networks to Potential Visual Transformers

To increase the objectivity and accuracy of pathologists' work, artificial neural network(ANN) methods have been generally needed in the segmentation, classification, and detection of histopathological WSI. In this paper, WSI analysis methods based on ANN are reviewed. Firstly, the development status of WSI and ANN methods is introduced. Secondly, we summarize the common ANN methods. Next, we discuss publicly available WSI datasets and evaluation metrics. These ANN architectures for WSI processing are divided into classical neural networks and deep neural networks(DNNs) and then analyzed. Finally, the application prospect of the analytical method in this field is discussed. The important potential method is Visual Transformers.

preprint2022arXiv

A State-of-the-art Survey of U-Net in Microscopic Image Analysis: from Simple Usage to Structure Mortification

Image analysis technology is used to solve the inadvertences of artificial traditional methods in disease, wastewater treatment, environmental change monitoring analysis and convolutional neural networks (CNN) play an important role in microscopic image analysis. An important step in detection, tracking, monitoring, feature extraction, modeling and analysis is image segmentation, in which U-Net has increasingly applied in microscopic image segmentation. This paper comprehensively reviews the development history of U-Net, and analyzes various research results of various segmentation methods since the emergence of U-Net and conducts a comprehensive review of related papers. First, this paper has summarized the improved methods of U-Net and then listed the existing significance of image segmentation techniques and their improvements that has introduced over the years. Finally, focusing on the different improvement strategies of U-Net in different papers, the related work of each application target is reviewed according to detailed technical categories to facilitate future research. Researchers can clearly see the dynamics of transmission of technological development and keep up with future trends in this interdisciplinary field.

preprint2022arXiv

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation. However, all the above tasks are in the direction of speech understanding, but for the inverse direction, speech synthesis, the potential of representation learning is yet to be realized, due to the challenging nature of generating high-quality speech. To address this problem, we propose our framework, Alignment-Aware Acoustic-Text Pretraining (A$^3$T), which reconstructs masked acoustic signals with text input and acoustic-text alignment during training. In this way, the pretrained model can generate high quality reconstructed spectrogram, which can be applied to the speech editing and unseen speaker TTS directly. Experiments show A$^3$T outperforms SOTA models on speech editing, and improves multi-speaker speech synthesis without the external speaker verification model.

preprint2022arXiv

All Electrical Control and Temperature Dependence of the Spin and Valley Hall Effect in Monolayer WSe2 Transistors

Heavy metal-based two-dimensional van der Waals materials have a large, coupled spin and valley Hall effect (SVHE) that has potential use in spintronics and valleytronics. Optical measurements of the SVHE have largely been performed below 30 K and understanding of the SVHE-induced spin/valley polarizations that can be electrically generated is limited. Here, we study the SVHE in monolayer p-type tungsten diselenide (WSe2). Kerr rotation (KR) measurements show the spatial distribution of the SVHE at different temperatures, its persistence up to 160 K, and that it can be electrically modulated via gate and drain bias. A spin/valley drift and diffusion model together with reflection spectra data is used to interpret the KR data and predict a lower-bound spin/valley lifetime of 4.1 ns below 90 K and 0.26 ns at 160 K. The excess spin and valley per unit length along the edge is calculated to be 109 per micron at 45 K, which corresponds to a spin/valley polarization on the edge of 6%. These results are important steps towards practical use of the SVHE.

preprint2022arXiv

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the development and research of speech processing technologies by providing an easy-to-use command-line interface and a simple code structure. This paper describes the design philosophy and core architecture of PaddleSpeech to support several essential speech-to-text and text-to-speech tasks. PaddleSpeech achieves competitive or state-of-the-art performance on various speech datasets and implements the most popular methods. It also provides recipes and pretrained models to quickly reproduce the experimental results in this paper. PaddleSpeech is publicly avaiable at https://github.com/PaddlePaddle/PaddleSpeech.

preprint2022arXiv

What Can Machine Vision Do for Lymphatic Histopathology Image Analysis: A Comprehensive Review

In the past ten years, the computing power of machine vision (MV) has been continuously improved, and image analysis algorithms have developed rapidly. At the same time, histopathological slices can be stored as digital images. Therefore, MV algorithms can provide doctors with diagnostic references. In particular, the continuous improvement of deep learning algorithms has further improved the accuracy of MV in disease detection and diagnosis. This paper reviews the applications of image processing technology based on MV in lymphoma histopathological images in recent years, including segmentation, classification and detection. Finally, the current methods are analyzed, some more potential methods are proposed, and further prospects are made.

preprint2021arXiv

A Comprehensive Review of Computer-aided Whole-slide Image Analysis: from Datasets to Feature Extraction, Segmentation, Classification, and Detection Approaches

With the development of computer-aided diagnosis (CAD) and image scanning technology, Whole-slide Image (WSI) scanners are widely used in the field of pathological diagnosis. Therefore, WSI analysis has become the key to modern digital pathology. Since 2004, WSI has been used more and more in CAD. Since machine vision methods are usually based on semi-automatic or fully automatic computers, they are highly efficient and labor-saving. The combination of WSI and CAD technologies for segmentation, classification, and detection helps histopathologists obtain more stable and quantitative analysis results, save labor costs and improve diagnosis objectivity. This paper reviews the methods of WSI analysis based on machine learning. Firstly, the development status of WSI and CAD methods are introduced. Secondly, we discuss publicly available WSI datasets and evaluation metrics for segmentation, classification, and detection tasks. Then, the latest development of machine learning in WSI segmentation, classification, and detection are reviewed continuously. Finally, the existing methods are studied, the applicabilities of the analysis methods are analyzed, and the application prospects of the analysis methods in this field are forecasted.

preprint2021arXiv

Evolution of Charge and Pair Density Modulations in Overdoped Bi2Sr2CuO6+delta

One of the central issues concerning the mechanism of high temperature superconductivity in cuprates is the nature of the ubiquitous charge order and its implications to superconductivity. Here we use scanning tunneling microscopy to investigate the evolution of charge order from the optimally doped to strongly overdoped Bi2Sr2CuO6+δ cuprates. We find that with increasing hole concentration, the long-range checkerboard order gradually evolves into short-range glassy patterns consisting of diluted charge puddles. Each charge puddle has a unidirectional nematic internal structure, and exhibits clear pair density modulations as revealed by the spatial variations of superconducting coherence peak and gap depth. Both the charge puddles and the nematicity vanish completely in the strongly overdoped non-superconducting regime, when another type of short-range order with root2 * root2 periodicity emerges. These results shed important new lights on the intricate interplay between the intertwined orders and the superconducting phase of cuprates.

preprint2021arXiv

Particle-hole asymmetric superconducting coherence peaks in overdoped cuprates

To elucidate the superconductor to metal transition at the end of superconducting dome, the overdoped regime has stepped onto the center stage of cuprate research recently. Here, we use scanning tunneling microscopy to investigate the atomic-scale electronic structure of overdoped trilayer Bi-2223 and bilayer Bi-2212 cuprates. At low energies the spectroscopic maps are well described by dispersive quasiparticle interference patterns. However, as the bias increases to the superconducting coherence peak energy, a virtually non-dispersive pattern with sqrt(2)*sqrt(2) periodicity emerges. Remarkably, the position of the coherence peaks exhibits evident particle-hole asymmetry which also modulates with the same period. We propose that this is an extreme quasiparticle interference phenomenon, caused by pairing-breaking scattering between flat anti-nodal Bogoliubov bands, which is ultimately responsible for the superconductor to metal transition.

preprint2020arXiv

Machine-learning classifiers for logographic name matching in public health applications: approaches for incorporating phonetic, visual, and keystroke similarity in large-scale probabilistic record linkage

Approximate string-matching methods to account for complex variation in highly discriminatory text fields, such as personal names, can enhance probabilistic record linkage. However, discriminating between matching and non-matching strings is challenging for logographic scripts, where similarities in pronunciation, appearance, or keystroke sequence are not directly encoded in the string data. We leverage a large Chinese administrative dataset with known match status to develop logistic regression and Xgboost classifiers integrating measures of visual, phonetic, and keystroke similarity to enhance identification of potentially-matching name pairs. We evaluate three methods of leveraging name similarity scores in large-scale probabilistic record linkage, which can adapt to varying match prevalence and information in supporting fields: (1) setting a threshold score based on predicted quality of name-matching across all record pairs; (2) setting a threshold score based on predicted discriminatory power of the linkage model; and (3) using empirical score distributions among matches and nonmatches to perform Bayesian adjustment of matching probabilities estimated from exact-agreement linkage. In experiments on holdout data, as well as data simulated with varying name error rates and supporting fields, a logistic regression classifier incorporated via the Bayesian method demonstrated marked improvements over exact-agreement linkage with respect to discriminatory power, match probability estimation, and accuracy, reducing the total number of misclassified record pairs by 21% in test data and up to an average of 93% in simulated datasets. Our results demonstrate the value of incorporating visual, phonetic, and keystroke similarity for logographic name matching, as well as the promise of our Bayesian approach to leverage name-matching within large-scale record linkage.

preprint2020arXiv

Regularized Context Gates on Transformer for Machine Translation

Context gates are effective to control the contributions from the source and target contexts in the recurrent neural network (RNN) based neural machine translation (NMT). However, it is challenging to extend them into the advanced Transformer architecture, which is more complicated than RNN. This paper first provides a method to identify source and target contexts and then introduce a gate mechanism to control the source and target contributions in Transformer. In addition, to further reduce the bias problem in the gate mechanism, this paper proposes a regularization method to guide the learning of the gates with supervision automatically generated using pointwise mutual information. Extensive experiments on 4 translation datasets demonstrate that the proposed model obtains an averaged gain of 1.0 BLEU score over a strong Transformer baseline.

preprint2019arXiv

Effect of structural supermodulation on superconductivity in tri-layer cuprate Bi2Sr2Ca2Cu3O10+x

We investigate the spatial and doping evolutions of the superconducting properties of tri-layer cuprate Bi2Sr2Ca2Cu3O10+x by using scanning tunneling microscopy and spectroscopy. Both the superconducting coherence peak and gap size exhibit periodic variations with the structural supermodulation, but the effect is much more pronounced in the underdoped regime than at optimal doping. Moreover, a new type of tunneling spectrum characterized by two superconducting gaps emerges with increasing doping, and the two-gap features also correlate with the supermodulation. We propose that the interaction between the inequivalent outer and inner CuO2 planes is responsible for these novel features that are unique to tri-layer cuprates.

preprint2015arXiv

Strong similarities between the local electronic structure of insulating iron pnictide and lightly doped cuprate

One of the major puzzles regarding unconventional superconductivity is how some of the most interesting superconductors are related to an insulating phase that lies in close proximity. Here we report scanning tunneling microscopy studies of the local electronic structure of Cu doped NaFeAs across the superconductor to insulator transition. We find that in the highly insulating regime the electronic spectrum develops an energy gap with diminishing density of state at the Fermi level. The overall lineshape and strong spatial variations of the spectra are strikingly similar to that of lightly doped cuprates close to the parent Mott insulator. We propose that the suppression of itinerant electron state and strong impurity potential induced by Cu dopants lead to this insulating iron pnictide.

preprint2015arXiv

Structural phase transition and electronic structure evolution in Ir1-xPtxTe2 studied by scanning tunneling microscopy

The IrTe2 transition metal dichalcogenide undergoes a series of structural and electronic phase transitions when doped with Pt. The nature of each phase and the mechanism of the phase transitions have attracted much attention. In this paper, we report scanning tunneling microscopy and spectroscopy studies of Pt doped IrTe2 with varied Pt contents. In pure IrTe2, we find that the ground state has a 1/6 superstructure, and the electronic structure is inconsistent with Fermi surface nesting induced charge density wave order. Upon Pt doping, the crystal structure changes to a 1/5 superstructure and then to a quasi-periodic hexagonal phase. First principles calculations show that the superstructures and electronic structures are determined by the global chemical strain and local impurity states that can be tuned systematically by Pt doping.

preprint2011arXiv

Random walks in small-world exponential treelike networks

In this paper, we investigate random walks in a family of small-world trees having an exponential degree distribution. First, we address a trapping problem, that is, a particular case of random walks with an immobile trap located at the initial node. We obtain the exact mean trapping time defined as the average of first-passage time (FPT) from all nodes to the trap, which scales linearly with the network order $N$ in large networks. Then, we determine analytically the mean sending time, which is the mean of the FPTs from the initial node to all other nodes, and show that it grows with $N$ in the order of $N \ln N$. After that, we compute the precise global mean first-passage time among all pairs of nodes and find that it also varies in the order of $N \ln N$ in the large limit of $N$. After obtaining the relevant quantities, we compare them with each other and related our results to the efficiency for information transmission by regarding the walker as an information messenger. Finally, we compare our results with those previously reported for other trees with different structural properties (e.g., degree distribution), such as the standard fractal trees and the scale-free small-world trees, and show that the shortest path between a pair of nodes in a tree is responsible for the scaling of FPT between the two nodes.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.09359:author:5:xintong-li

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.12741:author:9:xintong-li

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.12995:author:4:xintong-li

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.10784:author:2:xintong-li

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.11169:author:3:xintong-li

Imported May 20, 2026Synced May 20, 2026

6 works

Jingbo Shang

Researcher

Jingbo Shang contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Julian McAuley

Researcher

Julian McAuley contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Yayu Wang

Researcher

Yayu Wang contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Chen Li

Researcher

Chen Li contributes to research discovery and scholarly infrastructure.

Open to collaborate

Xintong Li

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes

Skill-R1: Agent Skill Evolution via Reinforcement Learning

A State-of-the-art Survey of Artificial Neural Networks for Whole-slide Image Analysis:from Popular Convolutional Neural Networks to Potential Visual Transformers

A State-of-the-art Survey of U-Net in Microscopic Image Analysis: from Simple Usage to Structure Mortification

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

All Electrical Control and Temperature Dependence of the Spin and Valley Hall Effect in Monolayer WSe2 Transistors

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

What Can Machine Vision Do for Lymphatic Histopathology Image Analysis: A Comprehensive Review

A Comprehensive Review of Computer-aided Whole-slide Image Analysis: from Datasets to Feature Extraction, Segmentation, Classification, and Detection Approaches

Evolution of Charge and Pair Density Modulations in Overdoped Bi2Sr2CuO6+delta

Particle-hole asymmetric superconducting coherence peaks in overdoped cuprates

Machine-learning classifiers for logographic name matching in public health applications: approaches for incorporating phonetic, visual, and keystroke similarity in large-scale probabilistic record linkage

Regularized Context Gates on Transformer for Machine Translation

Effect of structural supermodulation on superconductivity in tri-layer cuprate Bi2Sr2Ca2Cu3O10+x

Strong similarities between the local electronic structure of insulating iron pnictide and lightly doped cuprate

Structural phase transition and electronic structure evolution in Ir1-xPtxTe2 studied by scanning tunneling microscopy

Random walks in small-world exponential treelike networks