Source author record

Wei Li

Wei Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Computation and Language Machine Learning math.AP Applications cond-mat.str-el eess.SY math-ph math.MP math.OC math.SP Networking and Internet Architecture physics.optics Systems and Control

Catalog footprint

What is connected

16works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A new partial differential nonlinear system containing quasivariational and parabolic variational inequalities and its application

We study a new nonlinear system which contains a partial differential equation, a quasivariational inequality and a parabolic variational inequality in Banach spaces. We obtain the unique solvability of the coupled system under moderate conditions by using the Banach's fixed point theorem. We employ the main results to investigate a viscoelastic frictional contact problem with long-memory effects, wear processes, and damage phenomenon.

preprint2026arXiv

Assessing Interactive Causes of an Occurred Outcome Due to Two Binary Exposures

In contrast to evaluating treatment effects, causal attribution analysis focuses on identifying the key factors responsible for an observed outcome. For two binary exposure variables and a binary outcome variable, researchers need to assess not only the likelihood that an observed outcome was caused by a particular exposure, but also the likelihood that it resulted from the interaction between the two exposures. For example, in the case of a male worker who smoked, was exposed to asbestos, and developed lung cancer, researchers aim to explore whether the cancer resulted from smoking, asbestos exposure, or their interaction. Even in randomized controlled trials, widely regarded as the gold standard for causal inference, identifying and evaluating retrospective causal interactions between two exposures remains challenging. In this paper, we define posterior probabilities to characterize the interactive causes of an observed outcome. We establish the identifiability of posterior probabilities by using a secondary outcome variable that may appear after the primary outcome. We apply the proposed method to the classic case of smoking and asbestos exposure. Our results indicate that for lung cancer patients who smoked and were exposed to asbestos, the disease is primarily attributable to the synergistic effect between smoking and asbestos exposure.

preprint2026arXiv

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Vision-Language Continual Learning (VLCL) has attracted significant research attention for its robust capabilities, and the adoption of Parameter-Efficient Fine-Tuning (PEFT) strategies is enabling these models to achieve competitive performance with substantially reduced resource consumption. However, dominated First-Order (FO) optimization is prone to trap models in suboptimal local minima, especially in limited exploration subspace within PEFT. To overcome this challenge, this paper pioneers a systematic exploration of adopting Zeroth-Order (ZO) optimization for PEFT-based VLCL. We first identify the incompatibility of naive full-ZO adoption in VLCL due to optimization process instability. We then investigate the application of ZO optimization from a modality branch-wise to a fine-grained layer-wise across various training units to identify an optimal strategy. Besides, a key theoretical insight reveals that vision modality exhibit higher variance than language counterparts in VLCL during the ZO optimization process, and we propose a modality-aware ZO strategy, which adopts gradient sign normalization in ZO and constrains vision modality perturbation to further improve performance. Benefiting from the adoption of ZO optimization, PEFT-based VLCL fulfills better ability to escape local minima during the optimization process, extensive experiments on four benchmarks demonstrate that our method achieves state-of-the-art results.

preprint2026arXiv

CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records

Large Language Models (LLMs) hold significant promise for improving clinical decision support and reducing physician burnout by synthesizing complex, longitudinal cancer Electronic Health Records (EHRs). However, their implementation in this critical field faces three primary challenges: the inability to effectively process the extensive length and fragmented nature of patient records for accurate temporal analysis; a heightened risk of clinical hallucination, as conventional grounding techniques such as Retrieval-Augmented Generation (RAG) do not adequately incorporate process-oriented clinical guidelines; and unreliable evaluation metrics that hinder the validation of AI systems in oncology. To address these issues, we propose CliCARE, a framework for Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records. The framework operates by transforming unstructured, longitudinal EHRs into patient-specific Temporal Knowledge Graphs (TKGs) to capture long-range dependencies, and then grounding the decision support process by aligning these real-world patient trajectories with a normative guideline knowledge graph. This approach provides oncologists with evidence-grounded decision support by generating a high-fidelity clinical summary and an actionable recommendation. We validated our framework using large-scale, longitudinal data from a private Chinese cancer dataset and the public English MIMIC-IV dataset. In these settings, CliCARE significantly outperforms baselines, including leading long-context LLMs and Knowledge Graph-enhanced RAG methods. The clinical validity of our results is supported by a robust evaluation protocol, which demonstrates a high correlation with assessments made by oncologists.

preprint2026arXiv

Dominant Kitaev Interaction and Field-induced Quantum Disordered Phase in the Cobaltate Na$_2$Co$_2$TeO$_6$

The identification of quantum spin liquid phases in Kitaev candidate materials remains a major experimental challenge. Since most Kitaev candidates develop antiferromagnetic (AFM) order at low temperatures, currently there are great interest on the field-induced magnetic disordered phase in these compounds, that are distinct from (partially) polarized states. Recently, a cobaltate Na$_2$Co$_2$TeO$_6$ has emerged as a promising Kitaev candidate with high-spin $t^{5}_{2g}e^2_g$ configuration and spin-orbit entangled $J_{\rm eff} = 1/2$ honeycomb lattice system. There are intensive studies on field-induced magnetic states and phase transitions under in-plane magnetic fields. In this study, we propose an intermediate disordered phase induced by an out-of-plane field along the $c$-axis, through high-field magnetization and magnetocaloric effect measurements. To explain the high-field behavior of Na$_2$Co$_2$TeO$_6$, we develop an effective $K$-$J$-$Γ$-$Γ^{\prime}$ spin model featuring a dominant AFM Kitaev interaction. This framework uncovers an intermediate quantum spin liquid phase, establishing the material as a unique platform for exploring Kitaev physics and field-induced quantum-disordered states.

preprint2026arXiv

EntroLnn: Entropy-Guided Liquid Neural Networks for Operando Refinement of Battery Capacity Fade Trajectories

Battery capacity degradation prediction has long been a central topic in battery health analytics, and most studies focus on state of health (SoH) estimation and end of life (EoL) prediction. This study extends the scope to online refinement of the entire capacity fade trajectory (CFT) through EntroLnn, a framework based on entropy-guided transformable liquid neural networks (LNNs). EntroLnn treats CFT refinement as an integrated process rather than two independent tasks for pointwise SoH and EoL. We introduce entropy-based features derived from online temperature fields, applied for the first time in battery analytics, and combine them with customized LNNs that model temporal battery dynamics effectively. The framework enhances both static and dynamic adaptability of LNNs and achieves robust and generalizable CFT refinement across different batteries and operating conditions. The approach provides a high fidelity battery health model with lightweight computation, achieving mean absolute errors of only 0.004577 for CFT and 18 cycles for EoL prediction. This work establishes a foundation for entropy-informed learning in battery analytics and enables self-adaptive, lightweight, and interpretable battery health prediction in practical battery management systems.

preprint2026arXiv

FashionMAC: Deformation-Free Fashion Image Generation with Fine-Grained Model Appearance Customization

Garment-centric fashion image generation aims to synthesize realistic and controllable human models dressing a given garment, which has attracted growing interest due to its practical applications in e-commerce. The key challenges of the task lie in two aspects: (1) faithfully preserving the garment details, and (2) gaining fine-grained controllability over the model's appearance. Existing methods typically require performing garment deformation in the generation process, which often leads to garment texture distortions. Also, they fail to control the fine-grained attributes of the generated models, due to the lack of specifically designed mechanisms. To address these issues, we propose FashionMAC, a novel diffusion-based deformation-free framework that achieves high-quality and controllable fashion showcase image generation. The core idea of our framework is to eliminate the need for performing garment deformation and directly outpaint the garment segmented from a dressed person, which enables faithful preservation of the intricate garment details. Moreover, we propose a novel region-adaptive decoupled attention (RADA) mechanism along with a chained mask injection strategy to achieve fine-grained appearance controllability over the synthesized human models. Specifically, RADA adaptively predicts the generated regions for each fine-grained text attribute and enforces the text attribute to focus on the predicted regions by a chained mask injection strategy, significantly enhancing the visual fidelity and the controllability. Extensive experiments validate the superior performance of our framework compared to existing state-of-the-art methods.

preprint2026arXiv

Learning from Mistakes: Negative Reasoning Samples Enhance Out-of-Domain Generalization

Supervised fine-tuning (SFT) on chain-of-thought (CoT) trajectories demonstrations is a common approach for enabling reasoning in large language models. Standard practices typically only retain trajectories with correct final answers (positives) while ignoring the rest (negatives). We argue that this paradigm discards substantial supervision and exacerbates overfitting, limiting out-of-domain (OOD) generalization. Specifically, we surprisingly find that incorporating negative trajectories into SFT yields substantial OOD generalization gains over positive-only training, as these trajectories often retain valid intermediate reasoning despite incorrect final answers. To understand this effect in depth, we systematically analyze data, training dynamics, and inference behavior, identifying 22 recurring patterns in negative chains that serve a dual role: they moderate loss descent to mitigate overfitting during training and boost policy entropy by 35.67% during inference to facilitate exploration. Motivated by these observations, we further propose Gain-based LOss Weighting (GLOW), an adaptive, sample-aware scheme that exploits such distinctive training dynamics by rescaling per-sample loss based on inter-epoch progress. Empirically, GLOW efficiently leverages unfiltered trajectories, yielding a 5.51% OOD gain over positive-only SFT on Qwen2.5-7B and boosting MMLU from 72.82% to 76.47% as an RL initialization.

preprint2026arXiv

Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models

Continual learning in multimodal large language models (MLLMs) aims to sequentially acquire knowledge while mitigating catastrophic forgetting, yet existing methods face inherent limitations: architecture-based approaches incur additional computational overhead and often generalize poorly to new tasks, rehearsal-based methods rely on storing historical data, raising privacy and storage concerns, and conventional regularization-based strategies alone are insufficient to fully prevent parameter interference. We propose Octopus, a two-stage continual learning framework based on History-Free Gradient Orthogonalization (HiFGO), which enforces gradient-level orthogonality without historical task data. Our proposed two-stage finetuning strategy decouples task adaptation from regularization, achieving a principled balance between plasticity and stability. Experiments on UCIT show that Octopus establishes state-of-the-art performance, surpassing prior SOTA by 2.14% and 6.82% in terms of Avg and Last.

preprint2026arXiv

ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding

Recent Omni-multimodal Large Language Models show promise in unified audio, vision, and text modeling. However, streaming audio-video understanding remains challenging, as existing approaches suffer from disjointed capabilities: they typically exhibit incomplete modality support or lack autonomous proactive monitoring. To address this, we present ROMA, a real-time omni-multimodal assistant for unified reactive and proactive interaction. ROMA processes continuous inputs as synchronized multimodal units, aligning dense audio with discrete video frames to handle granularity mismatches. For online decision-making, we introduce a lightweight speak head that decouples response initiation from generation to ensure precise triggering without task conflict. We train ROMA with a curated streaming dataset and a two-stage curriculum that progressively optimizes for streaming format adaptation and proactive responsiveness. To standardize the fragmented evaluation landscape, we reorganize diverse benchmarks into a unified suite covering both proactive (alert, narration) and reactive (QA) settings. Extensive experiments across 12 benchmarks demonstrate ROMA achieves state-of-the-art performance on proactive tasks while competitive in reactive settings, validating its robustness in unified real-time omni-multimodal understanding.

preprint2026arXiv

Synecdoche: Efficient and Accurate In-Network Traffic Classification via Direct Packet Sequential Pattern Matching

Traffic classification on programmable data plane holds great promise for line-rate processing, with methods evolving from per-packet to flow-level analysis for higher accuracy. However, a trade-off between accuracy and efficiency persists. Statistical feature-based methods align with hardware constraints but often exhibit limited accuracy, while online deep learning methods using packet sequential features achieve superior accuracy but require substantial computational resources. This paper presents Synecdoche, the first traffic classification framework that successfully deploys packet sequential features on a programmable data plane via pattern matching, achieving both high accuracy and efficiency. Our key insight is that discriminative information concentrates in short sub-sequences--termed Key Segments--that serve as compact traffic features for efficient data plane matching. Synecdoche employs an "offline discovery, online matching" paradigm: deep learning models automatically discover Key Segment patterns offline, which are then compiled into optimized table entries for direct data plane matching. Extensive experiments demonstrate Synecdoche's superior accuracy, improving F1-scores by up to 26.4% against statistical methods and 18.3% against online deep learning methods, while reducing latency by 13.0% and achieving 79.2% reduction in SRAM usage. The source code of Synecdoche is publicly available to facilitate reproducibility and further research.

preprint2026arXiv

When Smaller Wins: Dual-Stage Distillation and Pareto-Guided Compression of Liquid Neural Networks for Edge Battery Prognostics

Battery management systems increasingly require accurate battery health prognostics under strict on-device constraints. This paper presents DLNet, a practical framework with dual-stage distillation of liquid neural networks that turns a high-capacity model into compact and edge-deployable models for battery health prediction. DLNet first applies Euler discretization to reformulate liquid dynamics for embedded compatibility. It then performs dual-stage knowledge distillation to transfer the teacher model's temporal behavior and recover it after further compression. Pareto-guided selection under joint error-cost objectives retains student models that balance accuracy and efficiency. We evaluate DLNet on a widely used dataset and validate real-device feasibility on an Arduino Nano 33 BLE Sense using int8 deployment. The final deployed student achieves a low error of 0.0066 when predicting battery health over the next 100 cycles, which is 15.4% lower than the teacher model. It reduces the model size from 616 kB to 94 kB with 84.7% reduction and takes 21 ms per inference on the device. These results support a practical smaller wins observation that a small model can match or exceed a large teacher for edge-based prognostics with proper supervision and selection. Beyond batteries, the DLNet framework can extend to other industrial analytics tasks with strict hardware constraints.

preprint2025arXiv

From Sequential to Spatial: Reordering Autoregression for Efficient Visual Generation

Inspired by the remarkable success of autoregressive models in language modeling, this paradigm has been widely adopted in visual generation. However, the sequential token-by-token decoding mechanism inherent in traditional autoregressive models leads to low inference efficiency.In this paper, we propose RadAR, an efficient and parallelizable framework designed to accelerate autoregressive visual generation while preserving its representational capacity. Our approach is motivated by the observation that visual tokens exhibit strong local dependencies and spatial correlations with their neighbors--a property not fully exploited in standard raster-scan decoding orders. Specifically, we organize the generation process around a radial topology: an initial token is selected as the starting point, and all other tokens are systematically grouped into multiple concentric rings according to their spatial distances from this center. Generation then proceeds in a ring-wise manner, from inner to outer regions, enabling the parallel prediction of all tokens within the same ring. This design not only preserves the structural locality and spatial coherence of visual scenes but also substantially increases parallelization. Furthermore, to address the risk of inconsistent predictions arising from simultaneous token generation with limited context, we introduce a nested attention mechanism. This mechanism dynamically refines implausible outputs during the forward pass, thereby mitigating error accumulation and preventing model collapse. By integrating radial parallel prediction with dynamic output correction, RadAR significantly improves generation efficiency.

preprint2025arXiv

Mathematical Theory for Photonic Hall Effect in Honeycomb Photonic Crystals

In this work, we develop a mathematical theory for the photonic Hall effect and prove the existence of guided electromagnetic waves at the interface of two honeycomb photonic crystals. The guided wave resembles the edge states in electronic systems: it is induced by the topological Hall effect, and the wave propagates along the interface but not in the bulk media. Starting from a symmetric honeycomb photonic crystal that attains Dirac points at the high-symmetry points of the Brillouin zone, $K$ and $K'$, we introduce two classes of perturbations for the periodic medium. The perturbations lift the Dirac degeneracy, forming a spectral band valley at the points $K$ and $K'$ with well-defined topological phase that depends on the sign of the perturbation parameters. By employing the layer potential techniques and spectral analysis, we investigate the existence of guided wave along an interface when two honeycomb photonic crystals are glued together. In particular, we elucidate the relationship between the existence of the interface mode and the nature of perturbations imposed on the two periodic media separated by the interface.

preprint2025arXiv

Model Predictive Path Integral Control for Roll-to-Roll Manufacturing

Roll-to-roll (R2R) manufacturing is a continuous processing technology essential for scalable production of thin-film materials and printed electronics, but precise control remains challenging due to subsystem interactions, nonlinearities, and process disturbances. This paper proposes a Model Predictive Path Integral (MPPI) control formulation for R2R systems, leveraging a GPU-based Monte-Carlo sampling approach to efficiently approximate optimal controls online. Crucially, MPPI easily handles non-differentiable cost functions, enabling the incorporation of complex performance criteria relevant to advanced manufacturing processes. A case study is presented that demonstrates that MPPI significantly improves tension regulation performance compared to conventional model predictive control (MPC), highlighting its suitability for real-time control in advanced manufacturing.

preprint2025arXiv

TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model

World models aim to endow AI systems with the ability to represent, generate, and interact with dynamic environments in a coherent and temporally consistent manner. While recent video generation models have demonstrated impressive visual quality, they remain limited in real-time interaction, long-horizon consistency, and persistent memory of dynamic scenes, hindering their evolution into practical world models. In this report, we present TeleWorld, a real-time multimodal 4D world modeling framework that unifies video generation, dynamic scene reconstruction, and long-term world memory within a closed-loop system. TeleWorld introduces a novel generation-reconstruction-guidance paradigm, where generated video streams are continuously reconstructed into a dynamic 4D spatio-temporal representation, which in turn guides subsequent generation to maintain spatial, temporal, and physical consistency. To support long-horizon generation with low latency, we employ an autoregressive diffusion-based video model enhanced with Macro-from-Micro Planning (MMPL)--a hierarchical planning method that reduces error accumulation from frame-level to segment-level-alongside efficient Distribution Matching Distillation (DMD), enabling real-time synthesis under practical computational budgets. Our approach achieves seamless integration of dynamic object modeling and static scene representation within a unified 4D framework, advancing world models toward practical, interactive, and computationally accessible systems. Extensive experiments demonstrate that TeleWorld achieves strong performance in both static and dynamic world understanding, long-term consistency, and real-time generation efficiency, positioning it as a practical step toward interactive, memory-enabled world models for multimodal generation and embodied intelligence.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2601.00051:author:9:wei-li

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2506.03119:author:4:wei-li

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2512.24639:author:3:wei-li

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2512.24477:author:1:wei-li

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.14938:author:6:wei-li

Imported May 20, 2026Synced May 21, 2026

2 works

Bingbing Xu

Researcher

Bingbing Xu contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Heng Dong

Researcher

Heng Dong contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Huawei Shen

Researcher

Huawei Shen contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Wei Zhang

Researcher

Wei Zhang contributes to research discovery and scholarly infrastructure.

Open to collaborate

Wei Li

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

A new partial differential nonlinear system containing quasivariational and parabolic variational inequalities and its application

Assessing Interactive Causes of an Occurred Outcome Due to Two Binary Exposures

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records

Dominant Kitaev Interaction and Field-induced Quantum Disordered Phase in the Cobaltate Na$_2$Co$_2$TeO$_6$

EntroLnn: Entropy-Guided Liquid Neural Networks for Operando Refinement of Battery Capacity Fade Trajectories

FashionMAC: Deformation-Free Fashion Image Generation with Fine-Grained Model Appearance Customization

Learning from Mistakes: Negative Reasoning Samples Enhance Out-of-Domain Generalization

Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models

ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding

Synecdoche: Efficient and Accurate In-Network Traffic Classification via Direct Packet Sequential Pattern Matching

When Smaller Wins: Dual-Stage Distillation and Pareto-Guided Compression of Liquid Neural Networks for Edge Battery Prognostics

From Sequential to Spatial: Reordering Autoregression for Efficient Visual Generation

Mathematical Theory for Photonic Hall Effect in Honeycomb Photonic Crystals

Model Predictive Path Integral Control for Roll-to-Roll Manufacturing

TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model