Source author record

Xiang Li

Xiang Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

16works

21topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding

Non-verbal Vocalizations (NVs), such as laughter and sighs, are vital for conveying emotion and intention in human speech, yet most existing speech systems neglect them, which severely compromises communicative richness and emotional intelligence. Existing methods for NVs acquisition are either costly and unscalable (relying on manual annotation/recording) or unnatural (relying on rule-based synthesis). To address these limitations, we propose a highly scalable automatic annotation framework to label non-verbal phenomena from natural speech, which is low-cost, easily extendable, and inherently diverse and natural. This framework leverages a unified detection model to accurately identify NVs in natural speech and integrates them with transcripts via temporal-semantic alignment method. Using this framework, we created and released \textbf{NonVerbalSpeech-38K}, a diverse, real-world dataset featuring 38,718 samples across 10 NV categories collected from in-the-wild media. Experimental results demonstrate that our dataset provides superior controllability for NVs generation and achieves comparable performance for NVs understanding.

preprint2026arXiv

Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models

Recent advances in synergizing large reasoning models (LRMs) with retrieval-augmented generation (RAG) have shown promising results, yet two critical challenges remain: (1) reasoning models typically operate from a single, unchallenged perspective, limiting their ability to conduct deep, self-correcting reasoning over external documents, and (2) existing training paradigms rely excessively on outcome-oriented rewards, which provide insufficient signal for shaping the complex, multi-step reasoning process. To address these issues, we propose an Reasoner-Verifier framework named Adversarial Reasoning RAG (ARR). The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage that requires no external scoring model. This reward combines explicit observational signals with internal model uncertainty to jointly optimize reasoning fidelity and verification rigor. Experiments on multiple benchmarks demonstrate the effectiveness of our method.

preprint2026arXiv

APEX: Academic Poster Editing Agentic Expert

Designing academic posters is a labor-intensive process requiring the precise balance of high-density content and sophisticated layout. While existing paper-to-poster generation methods automate initial drafting, they are typically single-pass and non-interactive, often fail to align with complex, subjective user intent. To bridge this gap, we propose APEX (Academic Poster Editing agentic eXpert), the first agentic framework for interactive academic poster editing, supporting fine-grained control with robust multi-level API-based editing and a review-and-adjustment Mechanism. In addition, we introduce APEX-Bench, the first systematic benchmark comprising 514 academic poster editing instructions, categorized by a multi-dimensional taxonomy including operation type, difficulty, and abstraction level, constructed via reference-guided and reference-free strategies to ensure realism and diversity. We further establish a multi-dimensional VLM-as-a-judge evaluation protocol to assess instruction fulfillment, modification scope, and visual consistency & harmony. Experimental results demonstrate that APEX significantly outperforms baseline methods. Our implementation is available at https://github.com/Breesiu/APEX.

preprint2026arXiv

GRACE: Reinforcement Learning for Grounded Response and Abstention under Contextual Evidence

Retrieval-Augmented Generation (RAG) integrates external knowledge to enhance Large Language Models (LLMs), yet systems remain susceptible to two critical flaws: providing correct answers without explicit grounded evidence and producing fabricated responses when the retrieved context is insufficient. While prior research has addressed these issues independently, a unified framework that integrates evidence-based grounding and reliable abstention is currently lacking. In this paper, we propose GRACE, a reinforcement-learning framework that simultaneously mitigates both types of flaws. GRACE employs a data construction method that utilizes heterogeneous retrievers to generate diverse training samples without manual annotation. A multi-stage gated reward function is then employed to train the model to assess evidence sufficiency, extract key supporting evidence, and provide answers or explicitly abstain. Experimental results on two benchmarks demonstrate that GRACE achieves state-of-the-art overall accuracy and strikes a favorable balance between accurate response and rejection, while requiring only 10% of the annotation costs of prior methods. Our code is available at https://github.com/YiboZhao624/Grace..

preprint2026arXiv

Gyral-Sulcal-Net: An Integrated Network Representation of Brain Folding Patterns

Our brain functions as a complex communication network, and studying it from a network perspective offers valuable insights into its organizational principles and links to cognitive functions and brain disorders. However, most current network studies typically use brain regions as nodes, often overlooking the intricate folding patterns of finer-scale anatomical landmarks within these regions. In this study, we introduce a novel approach to integrate the brain's two primary folding patterns - gyri and sulci - into a unified network termed the Gyral-Sulcal-Net (GS-Net), in which three different types of finer-scale landmarks have been successfully identified. We evaluated the proposed GS-Net across multiple datasets, comprising over 1,600 brains, spanning different age groups (from 34 gestational weeks to elderly adults) and cohorts (healthy brains and those with pathological conditions). The experimental results demonstrate that the GS-Net can effectively integrate and represent diverse cortical folding patterns from a network perspective. More importantly, this approach offers a promising way for integrating different folding patterns into a unified anatomical brain network, alongside structural and functional networks, providing a comprehensive framework for studying brain networks.

preprint2026arXiv

Iteration Sums of The Euler Totient Function Regarding Powers of Fermat Primes

Euler totient function $ϕ(n)$ plays a central role in number theory and is applied in areas such as cryptography. In this paper, we study iterations of the totient function. We first prove that for any integer $n>2$, iteratively applying $ϕ$ eventually yields the value $2$. Motivated by this terminal behavior, we examine sums of iterated totient values of the form $ϕ(n)+ϕ(ϕ(n))+ϕ(ϕ(ϕ(n)))+\cdots+ϕ(2)$, where the summation terminates at $ϕ(2)$. We show that for all integers of the form $n = 3^k$, this sum is equal to $n$. We then extend this result to all powers of Fermat primes, deriving a closed-form expression for the corresponding summations.

preprint2026arXiv

Machine Learning Model Trading with Verification under Information Asymmetry

Machine learning (ML) model trading, known for its role in protecting data privacy, faces a major challenge: information asymmetry. This issue can lead to model deception, a problem that current literature has not fully solved, where the seller misrepresents model performance to earn more. We propose a game-theoretic approach, adding a verification step in the ML model market that lets buyers check model quality before buying. However, this method can be expensive and offers imperfect information, making it harder for buyers to decide. Our analysis reveals that a seller might probabilistically conduct model deception considering the chance of model verification. This deception probability decreases with the verification accuracy and increases with the verification cost. To maximize seller payoff, we further design optimal pricing schemes accounting for heterogeneous buyers' strategic behaviors. Interestingly, we find that reducing information asymmetry benefits both the seller and buyer. Meanwhile, protecting buyer order information doesn't improve the payoff for the buyer or the seller. These findings highlight the importance of reducing information asymmetry in ML model trading and open new directions for future research.

preprint2026arXiv

MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation

Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extended Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on multimodal reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method successfully increases Success Rate from 0.23 to 0.32 and decreases Navigation Error from 4.43m to 4.08m, demonstrating the state-of-the-art performance in the agricultural VLN domain. Code: https://github.com/AlexTraveling/MDE-AgriVLN.

preprint2026arXiv

Mechanism Design for Federated Learning with Non-Monotonic Network Effects

Mechanism design is pivotal to federated learning (FL) for maximizing social welfare by coordinating self-interested clients. Existing mechanisms, however, often overlook the network effects of client participation and the diverse model performance requirements (i.e., generalization error) across applications, leading to suboptimal incentives and social welfare, or even inapplicability in real deployments. To address this gap, we explore incentive mechanism design for FL with network effects and application-specific requirements of model performance. We develop a theoretical model to quantify the impact of network effects on heterogeneous client participation, revealing the non-monotonic nature of such effects. Based on these insights, we propose a Model Trading and Sharing (MoTS) framework, which enables clients to obtain FL models through either participation or purchase. To further address clients' strategic behaviors, we design a Social Welfare maximization with Application-aware and Network effects (SWAN) mechanism, exploiting model customer payments for incentivization. Experimental results on a hardware prototype demonstrate that our SWAN mechanism outperforms existing FL mechanisms, improving social welfare by up to $352.42\%$ and reducing extra incentive costs by $93.07\%$.

preprint2026arXiv

TF-Mamba: Text-enhanced Fusion Mamba with Missing Modalities for Robust Multimodal Sentiment Analysis

Multimodal Sentiment Analysis (MSA) with missing modalities has attracted increasing attention recently. While current Transformer-based methods leverage dense text information to maintain model robustness, their quadratic complexity hinders efficient long-range modeling and multimodal fusion. To this end, we propose a novel and efficient Text-enhanced Fusion Mamba (TF-Mamba) framework for robust MSA with missing modalities. Specifically, a Text-aware Modality Enhancement (TME) module aligns and enriches non-text modalities, while reconstructing the missing text semantics. Moreover, we develop Text-based Context Mamba (TC-Mamba) to capture intra-modal contextual dependencies under text collaboration. Finally, Text-guided Query Mamba (TQ-Mamba) queries text-guided multimodal information and learns joint representations for sentiment prediction. Extensive experiments on three MSA datasets demonstrate the effectiveness and efficiency of the proposed method under missing modality scenarios. Our code is available at https://github.com/codemous/TF-Mamba.

preprint2026arXiv

Ultra-sensitive graphene-based electro-optic sensors for optically-multiplexed neural recording

Large-scale neural recording with high spatio-temporal resolution is essential for understanding information processing in brain, yet current neural interfaces fall far short of comprehensively capturing brain activity due to extremely high neuronal density and limited scalability. Although recent advances have miniaturized neural probes and increased channel density, fundamental design constraints still prevent dramatic scaling of simultaneously recorded channels. To address this limitation, we introduce a novel electro-optic sensor that directly converts ultra-low-amplitude neural electrical signals into optical signals with high signal-to-noise ratio. By leveraging the ultra-high bandwidth and intrinsic multiplexing capability of light, this approach offers a scalable path toward massively parallel neural recording beyond the limits of traditional electrical interfaces. The sensor integrates an on-chip photonic microresonator with a graphene layer, enabling direct detection of neural signals without genetically encoded optical indicators or tissue modification, making it suitable for human translation. Neural signals are locally transduced into amplified optical modulations and transmitted through on-chip waveguides, enabling interference-free recording without bulky electromagnetic shielding. Arrays of wavelength-selective sensors can be multiplexed on a single bus waveguide using wavelength-division multiplexing (WDM), greatly improving scalability while maintaining a minimal footprint to reduce tissue damage. We demonstrate detection of evoked neural signals as small as 25 $μ$V with 3 dB SNR from mouse brain tissue and show multiplexed recording from 10 sensors on a single waveguide. These results establish a proof-of-concept for optically multiplexed neural recording and point toward scalable, high-density neural interfaces for neurological research and clinical applications.

preprint2026arXiv

UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction

Neural Audio Codecs (NACs) can reduce transmission overhead by performing compact compression and reconstruction, which also aim to bridge the gap between continuous and discrete signals. Existing NACs can be divided into two categories: multi-codebook and single-codebook codecs. Multi-codebook codecs face challenges such as structural complexity and difficulty in adapting to downstream tasks, while single-codebook codecs, though structurally simpler, suffer from low-fidelity, ineffective modeling of unified audio, and an inability to support modeling of high-frequency audio. We propose the UniSRCodec, a single-codebook codec capable of supporting high sampling rate, low-bandwidth, high fidelity, and unified. We analyze the inefficiency of waveform-based compression and introduce the time and frequency compression method using the Mel-spectrogram, and cooperate with a Vocoder to recover the phase information of the original audio. Moreover, we propose a sub-band reconstruction technique to achieve high-quality compression across both low and high frequency bands. Subjective and objective experimental results demonstrate that UniSRCodec achieves state-of-the-art (SOTA) performance among cross-domain single-codebook codecs with only a token rate of 40, and its reconstruction quality is comparable to that of certain multi-codebook methods. Our demo page is available at https://wxzyd123.github.io/unisrcodec.

preprint2026arXiv

Unsupervised Text Style Transfer for Controllable Intensity

Unsupervised Text Style Transfer (UTST) aims to build a system to transfer the stylistic properties of a given text without parallel text pairs. Compared with text transfer between style polarities, UTST for controllable intensity is more challenging due to the subtle differences in stylistic features across different intensity levels. Faced with the challenges posed by the lack of parallel data and the indistinguishability between adjacent intensity levels, we propose a SFT-then-PPO paradigm to fine-tune an LLM. We first fine-tune the LLM with synthesized parallel data. Then, we further train the LLM with PPO, where the rewards are elaborately designed for distinguishing the stylistic intensity in hierarchical levels. Both the global and local stylistic features are considered to formulate the reward functions. The experiments on two UTST benchmarks showcase that both rewards have their advantages and applying them to LLM fine-tuning can effectively improve the performance of an LLM backbone based on various evaluation metrics. Even for close levels of intensity, we can still observe the noticeable stylistic difference between the generated text.

preprint2026arXiv

WOW-Seg: A Word-free Open World Segmentation Model

Open world image segmentation aims to achieve precise segmentation and semantic understanding of targets within images by addressing the infinitely open set of object categories encountered in the real world. However, traditional closed-set segmentation approaches struggle to adapt to complex open world scenarios, while foundation segmentation models such as SAM exhibit notable discrepancies between their strong segmentation capabilities and relatively weaker semantic understanding. To bridge these discrepancies, we propose WOW-Seg, a Word-free Open World Segmentation model for segmenting and recognizing objects from open-set categories. Specifically, WOW-Seg introduces a novel visual prompt module, Mask2Token, which transforms image masks into visual tokens and ensures their alignment with the VLLM feature space. Moreover, we introduce the Cascade Attention Mask to decouple information across different instances. This approach mitigates inter-instance interference, leading to a significant improvement in model performance. We further construct an open world region recognition test benchmark: the Region Recognition Dataset (RR-7K). With 7,662 classes, it represents the most extensive category-rich region recognition dataset to date. WOW-Seg attains strong results on the LVIS dataset, achieving a semantic similarity of 89.7 and a semantic IoU of 82.4. This performance surpasses the previous SOTA while using only one-eighth the parameter count. These results underscore the strong open world generalization capabilities of WOW-Seg. The code and related resources are available at https://github.com/AAwcAA/WOW-Seg-Meta.

preprint2025arXiv

HeteroHBA: A Generative Structure-Manipulating Backdoor Attack on Heterogeneous Graphs

Heterogeneous graph neural networks (HGNNs) have achieved strong performance in many real-world applications, yet targeted backdoor poisoning on heterogeneous graphs remains less studied. We consider backdoor attacks for heterogeneous node classification, where an adversary injects a small set of trigger nodes and connections during training to force specific victim nodes to be misclassified into an attacker-chosen label at test time while preserving clean performance. We propose HeteroHBA, a generative backdoor framework that selects influential auxiliary neighbors for trigger attachment via saliency-based screening and synthesizes diverse trigger features and connection patterns to better match the local heterogeneous context. To improve stealthiness, we combine Adaptive Instance Normalization (AdaIN) with a Maximum Mean Discrepancy (MMD) loss to align the trigger feature distribution with benign statistics, thereby reducing detectability, and we optimize the attack with a bilevel objective that jointly promotes attack success and maintains clean accuracy. Experiments on multiple real-world heterogeneous graphs with representative HGNN architectures show that HeteroHBA consistently achieves higher attack success than prior backdoor baselines with comparable or smaller impact on clean accuracy; moreover, the attack remains effective under our heterogeneity-aware structural defense, CSD. These results highlight practical backdoor risks in heterogeneous graph learning and motivate the development of stronger defenses.

preprint2025arXiv

Tightening the mixed integer linear formulation for the piecewise linear approximation in general dimensions

This paper addresses the problem of tightening the mixed-integer linear programming (MILP) formulation for continuous piecewise linear (CPWL) approximations of data sets in arbitrary dimensions. The MILP formulation leverages the difference-of-convex (DC) representation of CPWL functions. We introduce the concept of well-behaved CPWL interpolations and demonstrate that any CPWL interpolation of a data set has a well-behaved version. This result is critical to tighten the MILP problem. We present six different strategies to tighten the problem, which include fixing the values of some variables, introducing additional constraints, identifying small big-M parameter values and applying tighter variable bounds. These methods leverage key aspects of the DC representation and the inherent structure of well-behaved CPWL interpolations. Experimental results demonstrate that specific combinations of these tightening strategies lead to significant improvement in solution times, especially for tightening strategies that consider well-behaved CPWL solutions.

Xiang Li

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding

Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models

APEX: Academic Poster Editing Agentic Expert

GRACE: Reinforcement Learning for Grounded Response and Abstention under Contextual Evidence

Gyral-Sulcal-Net: An Integrated Network Representation of Brain Folding Patterns

Iteration Sums of The Euler Totient Function Regarding Powers of Fermat Primes

Machine Learning Model Trading with Verification under Information Asymmetry

MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation

Mechanism Design for Federated Learning with Non-Monotonic Network Effects

TF-Mamba: Text-enhanced Fusion Mamba with Missing Modalities for Robust Multimodal Sentiment Analysis

Ultra-sensitive graphene-based electro-optic sensors for optically-multiplexed neural recording

UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction

Unsupervised Text Style Transfer for Controllable Intensity

WOW-Seg: A Word-free Open World Segmentation Model

HeteroHBA: A Generative Structure-Manipulating Backdoor Attack on Heterogeneous Graphs

Tightening the mixed integer linear formulation for the piecewise linear approximation in general dimensions