Source author record

Li Wang

Li Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Cryptography and Security math.NA Numerical Analysis physics.optics Computation and Language cond-mat.mtrl-sci eess.IV eess.SP Machine Learning math.AP Methodology physics.app-ph physics.data-an

Catalog footprint

What is connected

16works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

An NPDo Approach for Principal Joint Block Diagonalization

Matrix joint block-diagonalization (JBD) frequently arises from diverse applications such as independent component analysis, blind source separation, and common principal component analysis (CPCA), among others. Particularly, CPCA aims at joint diagonalization, i.e., each block size being $1$-by-$1$. This paper is concerned with {\em principal joint block-diagonalization\/} (\pjbd), which aim to achieve two goals: 1)~partial joint block-diagonalization, and 2)~identification of dominant common block-diagonal parts for all involved matrices. This is in contrast to most existing methods, especially the popular ones based on Givens rotation, which focus on full joint diagonalization and quickly become impractical for matrices of even moderate size ($300$-by-$300$ or larger). An NPDo approach is proposed and it is built on a {\em nonlinear polar decomposition with orthogonal polar factor dependency} that characterizes the solutions of the optimization problem designed to achieve \pjbd, and it is shown the associated SCF iteration is globally convergent to a stationary point while the objective function increases monotonically during the iterative process. Numerical experiments are presented to illustrate the effectiveness of the NPDo approach and its superiority to Givens rotation-based methods.

preprint2026arXiv

Beyond Known Fakes: Generalized Detection of AI-Generated Images via Post-hoc Distribution Alignment

The rapid proliferation of highly realistic AI-generated images poses serious security threats such as misinformation and identity fraud. Detecting generated images in open-world settings is particularly challenging when they originate from unknown generators, as existing methods typically rely on model-specific artifacts and require retraining on new fake data, limiting their generalization and scalability. In this work, we propose Post-hoc Distribution Alignment (PDA), a generalized and model-agnostic framework for detecting AI-generated images under unknown generative threats. Specifically, PDA reformulates detection as a distribution alignment task by regenerating test images through a known generative model. When real images are regenerated, they inherit model-specific artifacts and align with the known fake distribution. In contrast, regenerated unknown fakes contain incompatible or mixed artifacts and remain misaligned. This difference allows an existing detector, trained on the known generative model, to accurately distinguish real images from unknown fakes without requiring access to unseen data or retraining. Extensive experiments across 16 state-of-the-art generative models, including GANs, diffusion models, and commercial text-to-image APIs (e.g., Midjourney), demonstrate that PDA achieves average detection accuracy of 96.69%, outperforming the best baseline by 10.71%. Comprehensive ablation studies and robustness analyses further confirm PDA's generalizability and resilience to distribution shifts and image transformations. Overall, our work provides a practical and scalable solution for real-world AI-generated image detection where new generative models emerge continuously.

preprint2026arXiv

Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization

Large Language Models (LLMs) often generate substantively relevant content but fail to adhere to formal constraints, leading to outputs that are conceptually correct but procedurally flawed. Traditional prompt refinement approaches focus on rephrasing the description of the primary task an LLM has to perform, neglecting the granular constraints that function as acceptance criteria for its response. We propose a novel multi-agentic workflow that decouples optimization of the primary task description from its constraints, using quantitative scores as feedback to iteratively rewrite and improve them. Our evaluation demonstrates this method produces revised prompts that yield significantly higher compliance scores from models like Llama 3.1 8B and Mixtral-8x 7B.

preprint2026arXiv

Evaluating the Diagnostic Classification Ability of Multimodal Large Language Models: Insights from the Osteoarthritis Initiative

Multimodal large language models (MLLMs) show promising performance on medical visual question answering (VQA) and report generation, but these generation and explanation abilities do not reliably transfer to disease-specific classification. We evaluated MLLM architectures on knee osteoarthritis (OA) radiograph classification, which remains underrepresented in existing medical MLLM benchmarks, even though knee OA affects an estimated 300 to 400 million people worldwide. Through systematic ablation studies manipulating the vision encoder, the connector, and the large language model (LLM) across diverse training strategies, we measured each component's contribution to diagnostic accuracy. In our classification task, a trained vision encoder alone could outperform full MLLM pipelines in classification accuracy and fine-tuning the LLM provided no meaningful improvement over prompt-based guidance. And LoRA fine-tuning on a small, class-balanced dataset (500 images) gave better results than training on a much larger but class-imbalanced set (5,778 images), indicating that data balance and quality can matter more than raw scale for this task. These findings suggest that for domain-specific medical classification, LLMs are more effective as interpreters and report generators rather than as primary classifiers. Therefore, the MLLM architecture appears less suitable for medical image diagnostic classification tasks that demand high certainty. We recommend prioritizing vision encoder optimization and careful dataset curation when developing clinically applicable systems.

preprint2026arXiv

High-Ti induced planar-fault transformation toward superlattice extrinsic stacking faults and microtwins in crept CoNi-based superalloys

Controlling planar fault shearing mechanisms is key for improving the high-temperature creep performance of gamma prime-strengthened high-temperature superalloys. This work examines how the Ti concentration in L12-strengthened CoNi-based alloys affects planar fault formation during creep. Interrupted compressive creep tests were conducted at 1223 K under air with a constant load stress of 241 MPa. We found, for the first time, that high Ti additions shift the dominant gamma prime shearing mode from antiphase boundaries (APBs) in Ti-free and low-Ti alloys to superlattice extrinsic stacking faults (SESFs). Systematic ab initio calculations show that in high-Ti alloys, the elevated APB energy renders APB-shearing mode unfavorable. Nevertheless, the SESF energy decreases relative to that in low-Ti compositions, and an increased ratio of complex intrinsic stacking fault (CISF) to SESF energy promote the transformation of high-energy CISFs into lower-energy SESFs. Chemical analysis using scanning transmission electron microscopy combined with energy-dispersive X-ray spectroscopy further reveals that, SESFs in high-Ti alloys are enriched in Ti, Mo and W, yet no grid-like ordering is observed. Together with the ab initio calculations, Mo and W additions in high Ti alloys could facilitate the transformation from L12 structure to low-energy D024 structure, indicating Mo and W segregation along SESFs is energetically favourable. Furthermore, the successive SESF thickening facilitates microtwinning in the absence of D024 ordering along SESFs, as an additional big carrier for creep strain. These new findings clarify the role of Ti in controlling planar fault shearing mechanisms, providing new insights for optimizing the creep performance of next-generation CoNi-based superalloys.

preprint2026arXiv

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Large language models (LLMs) have increasingly leveraged tool invocation to enhance their reasoning capabilities. However, existing approaches typically tightly couple tool invocation with immediate execution. Such immediate tool interaction may disrupt the reasoning coherence of LLMs and constrain their expressivity, ultimately degrading reasoning performance. To this end, for the first time, we propose and formalize the problem of decoupling tool invocation from execution during reasoning, and introduce delayed execution with explicit control to enhance tool-integrated reasoning (TIR). Furthermore, we propose a hierarchical control framework and theoretically derive a surrogate loss that enables an implicitly hierarchical policy to learn behavior equivalent to that of an explicit hierarchical policy, leading to the proposed IH-GRPO algorithm. Extensive experiments on IH-GRPO achieve absolute improvements of 1.87\%, 2.16\%, and 2.53\% on Qwen3-1.7B, Qwen3-4B, and Qwen3-8B across six out-of-domain mathematical reasoning benchmarks over the strongest baseline method, while also yielding consistent performance gains in other domains. Our code is available at https://github.com/Lumina04/IH-GRPO-01.

preprint2026arXiv

Learn to Evolve: Self-supervised Neural JKO Operator for Wasserstein Gradient Flow

The Jordan-Kinderlehrer-Otto (JKO) scheme provides a stable variational framework for computing Wasserstein gradient flows, but its practical use is often limited by the high computational cost of repeatedly solving the JKO subproblems. We propose a self-supervised approach for learning a JKO solution operator without requiring numerical solutions of any JKO trajectories. The learned operator maps an input density directly to the minimizer of the corresponding JKO subproblem, and can be iteratively applied to efficiently generate the gradient-flow evolution. A key challenge is that only a number of initial densities are typically available for training. To address this, we introduce a Learn-to-Evolve algorithm that jointly learns the JKO operator and its induced trajectories by alternating between trajectory generation and operator updates. As training progresses, the generated data increasingly approximates true JKO trajectories. Meanwhile, this Learn-to-Evolve strategy serves as a natural form of data augmentation, significantly enhancing the generalization ability of the learned operator. Numerical experiments demonstrate the accuracy, stability, and robustness of the proposed method across various choices of energies and initial conditions.

preprint2026arXiv

Normalized Solutions for Schrödinger-Bopp-Podolsky Systems with Critical Choquard-Type Nonlinearity on Bounded Domains

In this paper, we study normalized solutions for the following critical Schrödinger-Bopp-Podolsky system: $$-Δu + q(x)ϕu = λu + |u|^{p-2}u + \bigl(I_α* |u|^{3+α}\bigr)|u|^{1+α}u,\quad \text{in } Ω_r,$$ $$-Δϕ+ Δ^2ϕ= q(x)u^2, \ \qquad\qquad\qquad\qquad\qquad\qquad\qquad\ \text{ in } Ω_r,$$ where $Ω_r \subset \mathbb R^3$ is a smooth bounded domain, $p \in \left(2, \frac{8}{3}\right)$, $q(x) \in C(\barΩ_r) \backslash \{0\}$ and $λ\in \mathbb R$ is the Lagrange multiplier associated with the constraint $\int_{Ω_r} |u|^2\, \mathrm d x = b^2$ for some $b > 0$. Here $α> 0$, $I_α$ denotes the Riesz potential, and the domain parameter $r$ reflects the size of $Ω_r$ whose precise definition will be given in Section 3. By applying a special minimax principle together with a truncation technique, we prove that there exists $b^* > 0$ such that the system admits multiple normalized solutions whenever $b \in (0, b^*)$ under Navier boundary conditions.

preprint2026arXiv

Pulse thermal imaging of FUHAO bronze artifact

The accurate identification of historical restoration traces and material degradation is essential for the scientific preservation of ancient bronzes. In this study, the prestigious FUHAO bronze artifact (late Shang period, 13th-11th century BCE) was non-destructively examined using pulsed thermal imaging (PT). By combining single- and double-layer heat conduction models with Thermal Tomography (TT), this approach allowed for precise spatial localization of repair crevices, patches, and filler materials, while also distinguishing restorative interventions from the original bronze substrate. The artifact was revealed to have been assembled from multiple fragments, exhibiting uneven surface corrosion and clear evidence of prior conservation. The results not only provide direct insights for conservation strategy and historical interpretation but also demonstrate the capability of pulsed thermal imaging as an effective diagnostic tool for the integrated surface and subsurface assessment of cultural heritage objects.

preprint2026arXiv

RemoteDet-Mamba: A Hybrid Mamba-CNN Network for Multi-modal Object Detection in Remote Sensing Images

Unmanned Aerial Vehicle (UAV) remote sensing, with its advantages of rapid information acquisition and low cost, has been widely applied in scenarios such as emergency response. However, due to the long imaging distance and complex imaging mechanisms, targets in remote sensing images often face challenges such as small object size, dense distribution, and low inter-class discriminability. To address these issues, this paper proposes a multi-modal remote sensing object detection network called RemoteDet-Mamba, which is based on a patch-level four-direction selective scanning fusion strategy. This method simultaneously learns unimodal local features and fuses cross-modal patch-level global semantic information, thereby enhancing the distinguishability of small objects and improving inter-class discrimination. Furthermore, the designed lightweight fusion mechanism effectively decouples densely packed targets while reducing computational complexity. Experimental results on the DroneVehicle dataset demonstrate that RemoteDet-Mamba achieves superior detection performance compared to current mainstream methods, while maintaining low parameter count and computational overhead, showing promising potential for practical applications.

preprint2026arXiv

Spectral point transformer for significant wave height estimation from sea clutter

This paper presents a method for estimating significant wave height (Hs) from sparse S_pectral P_oint using a T_ransformer-based approach (SPT). Based on empirical observations that only a minority of spectral points with strong power contribute to wave energy, the proposed SPT effectively integrates geometric and spectral characteristics of ocean surface waves to estimate Hs through multi-dimensional feature representation. The experiment reveals an intriguing phenomenon: the learned features of SPT align well with physical dispersion relations, where the contribution-score map of selected points is concentrated along dispersion curves. Compared to conventional vision networks that process image sequences and full spectra, SPT demonstrates superior performance in Hs regression while consuming significantly fewer computational resources. On a consumer-grade GPU, SPT completes the training of regression model for 1080 sea clutter image sequences within 4 minutes, showcasing its potential to reduce deployment costs for radar wave-measuring systems. The open-source implementation of SPT will be available at https://github.com/joeyee/spt

preprint2026arXiv

Towards Efficient 3D Object Detection for Vehicle-Infrastructure Collaboration via Risk-Intent Selection

Vehicle-Infrastructure Collaborative Perception (VICP) is pivotal for resolving occlusion in autonomous driving, yet the trade-off between communication bandwidth and feature redundancy remains a critical bottleneck. While intermediate fusion mitigates data volume compared to raw sharing, existing frameworks typically rely on spatial compression or static confidence maps, which inefficiently transmit spatially redundant features from non-critical background regions. To address this, we propose Risk-intent Selective detection (RiSe), an interaction-aware framework that shifts the paradigm from identifying visible regions to prioritizing risk-critical ones. Specifically, we introduce a Potential Field-Trajectory Correlation Model (PTCM) grounded in potential field theory to quantitatively assess kinematic risks. Complementing this, an Intention-Driven Area Prediction Module (IDAPM) leverages ego-motion priors to proactively predict and filter key Bird's-Eye-View (BEV) areas essential for decision-making. By integrating these components, RiSe implements a semantic-selective fusion scheme that transmits high-fidelity features only from high-interaction regions, effectively acting as a feature denoiser. Extensive experiments on the DeepAccident dataset demonstrate that our method reduces communication volume to 0.71\% of full feature sharing while maintaining state-of-the-art detection accuracy, establishing a competitive Pareto frontier between bandwidth efficiency and perception performance.

preprint2026arXiv

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby enhancing the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged; however, these datasets primarily focus on cameras and LiDAR, neglecting 4D Radar, a sensor used in single-vehicle autonomous driving to provide robust perception in adverse weather conditions. In this paper, to bridge the gap created by the absence of 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar. V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data encompasses sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as various typical challenging scenarios. The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories. To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. Furthermore, we provide comprehensive benchmarks across these three sub-datasets. We will release all datasets and benchmark codebase at https://huggingface.co/datasets/yanglei18/V2X-Radar and https://github.com/yanglei18/V2X-Radar.

preprint2026arXiv

VidLeaks: Membership Inference Attacks Against Text-to-Video Models

The proliferation of powerful Text-to-Video (T2V) models, trained on massive web-scale datasets, raises urgent concerns about copyright and privacy violations. Membership inference attacks (MIAs) provide a principled tool for auditing such risks, yet existing techniques - designed for static data like images or text - fail to capture the spatio-temporal complexities of video generation. In particular, they overlook the sparsity of memorization signals in keyframes and the instability introduced by stochastic temporal dynamics. In this paper, we conduct the first systematic study of MIAs against T2V models and introduce a novel framework VidLeaks, which probes sparse-temporal memorization through two complementary signals: 1) Spatial Reconstruction Fidelity (SRF), using a Top-K similarity to amplify spatial memorization signals from sparsely memorized keyframes, and 2) Temporal Generative Stability (TGS), which measures semantic consistency across multiple queries to capture temporal leakage. We evaluate VidLeaks under three progressively restrictive black-box settings - supervised, reference-based, and query-only. Experiments on three representative T2V models reveal severe vulnerabilities: VidLeaks achieves AUC of 82.92% on AnimateDiff and 97.01% on InstructVideo even in the strict query-only setting, posing a realistic and exploitable privacy risk. Our work provides the first concrete evidence that T2V models leak substantial membership information through both sparse and temporal memorization, establishing a foundation for auditing video generation systems and motivating the development of new defenses. Code is available at: https://zenodo.org/records/17972831.

preprint2026arXiv

Weighted least squares estimation by multivariate-dependent weights for linear regression models

Multivariate linear regression models often face the problem of heteroscedasticity caused by multiple explanatory variables. The weighted least squares estimation with univariate-dependent weights has limitations in constructing weight functions. Therefore, this paper proposes a multivariate dependent weighted least squares estimation method. By constructing a linear combination of explanatory variables and maximizing their Spearman rank correlation coefficient with the absolute residual value, combined with maximum likelihood method to depict heteroscedasticity, it can comprehensively reflect the trend of variance changes in the random error and improve the accuracy of the model. This paper demonstrates that the optimal linear combination exponent estimator for heteroscedastic volatility obtained by our algorithm possesses consistency and asymptotic normality. In the simulation experiment, three scenarios of heteroscedasticity were designed, and the comparison showed that the proposed method was superior to the univariate-dependent weighting method in parameter estimation and model prediction. In the real data applications, the proposed method was applied to two real-world datasets about consumer spending in China and housing prices in Boston. From the perspectives of MAE, RSE, cross-validation, and fitting performance, its accuracy and stability were verified in terms of model prediction, interval estimation, and generalization ability. Additionally, the proposed method demonstrated relative advantages in fitting data with large fluctuations. This study provides an effective new approach for dealing with heteroscedasticity in multivariate linear regression.

preprint2025arXiv

Enhanced TM-Mode 3D Coupled Wave Theory for Photonic Crystal Surface-Emitting Terahertz Quantum Cascade Lasers

In this study, we propose and develop an enhanced three-dimensional coupled wave theory (3D CWT) to investigate the optical field behavior in photonic crystal surface-emitting terahertz quantum cascade lasers (THz-QCLs). By incorporating an effective permittivity enhancement (EP) model and a self-consistent iteration (SCI) method, we successfully address the numerical dispersion issues encountered in analytical methods when dealing with metallic waveguide structures. The results demonstrate that the EP and SCI-enhanced 3D TM mode CWT achieves computational accuracy comparable to traditional numerical simulation methods such as finite-difference time-domain (FDTD), while significantly reducing the required computational resources, including time and memory, to just tens of minutes. Moreover, this method provides a clear physical insight, revealing the reasons behind the current low extraction efficiency in surface-emitting THz-QCLs. Our study showcases the potential of the EP and SCI-enhanced 3D CWT as a powerful simulation tool in the research of photonic crystal surface-emitting lasers, offering a new theoretical foundation and optimization direction for future laser designs.

Li Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

An NPDo Approach for Principal Joint Block Diagonalization

Beyond Known Fakes: Generalized Detection of AI-Generated Images via Post-hoc Distribution Alignment

Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization

Evaluating the Diagnostic Classification Ability of Multimodal Large Language Models: Insights from the Osteoarthritis Initiative

High-Ti induced planar-fault transformation toward superlattice extrinsic stacking faults and microtwins in crept CoNi-based superalloys

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Learn to Evolve: Self-supervised Neural JKO Operator for Wasserstein Gradient Flow

Normalized Solutions for Schrödinger-Bopp-Podolsky Systems with Critical Choquard-Type Nonlinearity on Bounded Domains

Pulse thermal imaging of FUHAO bronze artifact

RemoteDet-Mamba: A Hybrid Mamba-CNN Network for Multi-modal Object Detection in Remote Sensing Images

Spectral point transformer for significant wave height estimation from sea clutter

Towards Efficient 3D Object Detection for Vehicle-Infrastructure Collaboration via Risk-Intent Selection

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

VidLeaks: Membership Inference Attacks Against Text-to-Video Models

Weighted least squares estimation by multivariate-dependent weights for linear regression models

Enhanced TM-Mode 3D Coupled Wave Theory for Photonic Crystal Surface-Emitting Terahertz Quantum Cascade Lasers