Source author record

Yang Wu

Yang Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

54works

25topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GEM: Generating LiDAR World Model via Deformable Mamba

World models, which simulate environmental dynamics and generate sensor observations, are gaining increasing attention in autonomous driving. However, progress in LiDAR-based world models has lagged behind those built on camera videos or occupancy data, primarily due to two core challenges: the inherent disorder of LiDAR point clouds and the difficulty of distinguishing dynamic objects from static structures. To address these issues, we propose GEM: a Generative LiDAR world model that leverages deformable mamba architecture, significantly improving fidelity and imaginative capability. Specifically, leveraging the structural similarity between sequential laser scanning and Mamba's processing mechanism, we first tokenize LiDAR sweeps into compact representations via a custom LiDAR scene tokenizer. After unsupervised disentanglement of tokenized features via a dynamic-static separator, a tri-path deformable Mamba is introduced to perform selective scanning and adaptive gating fusion over the disentangled features, leading to enhanced spatial-temporal understanding of the world evolution. Optionally, a planner and a BEV layout controller can be integrated to explore the model's capability for autonomous rollout and its potential to generate ``what-if" scenarios. Extensive experiments show that GEM achieves state-of-the-art performances across diverse benchmarks and evaluation settings, demonstrating its superiority and effectiveness. Project page: https://github.com/wuyang98/GEM.

preprint2026arXiv

Reasoning Before Diagnosis: Physician-Inspired Structured Thinking for ECG Classification

Electrocardiogram (ECG) diagnosis in clinical practice relies on structured reasoning over multiple hierarchical aspects, including cardiac rhythm, conduction properties, waveform morphology, and overall diagnostic impression. However, most existing approaches predict labels directly from ECG signals without explicit clinical reasoning, resulting in opaque decisions that lack clinical alignment. To bridge this gap, we propose CardioThink, a physician-inspired multimodal large language model (MLLM) framework that explicitly models the diagnostic reasoning process through human-interpretable intermediate stages (rhythm, conduction, morphology, and impression) to derive final classification results. Furthermore, we introduce Structured Set Policy Optimization (SSPO) to jointly optimize adherence to this structured reasoning format and the accuracy of variable-size diagnostic sets, without requiring manually annotated reasoning traces. Extensive experiments on diverse ECG benchmarks demonstrate the significant superiority of our approach in diagnostic accuracy, while simultaneously providing interpretable clinical reasoning. Notably, reasoning quality evaluations confirm that SSPO substantially enhances the clinical validity of the generated rationales. These findings reveal that moving beyond direct label prediction toward structured reasoning offers a more clinically aligned direction for future ECG modeling.

preprint2024arXiv

GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation

In this paper, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits a strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The strategy sets up generation objectives by grouping body joints of detailed skeletons in close semantic proximity together and then replacing each of such joint group with a single body-part node. Such an operation recursively abstracts a human pose to coarser and coarser skeletons at multiple granularity levels. With gradually increasing the abstraction level, human motion becomes more and more concise and stable, significantly benefiting the cross-modal motion synthesis task. The whole text-driven human motion synthesis problem is then divided into multiple abstraction levels and solved with a multi-stage generation framework with a cascaded latent diffusion model: an initial generator first generates the coarsest human motion guess from a given text description; then, a series of successive generators gradually enrich the motion details based on the textual description and the previous synthesized results. Notably, we further integrate GUESS with the proposed dynamic multi-condition fusion mechanism to dynamically balance the cooperative effects of the given textual condition and synthesized coarse motion prompt in different generation stages. Extensive experiments on large-scale datasets verify that GUESS outperforms existing state-of-the-art methods by large margins in terms of accuracy, realisticness, and diversity. Code is available at https://github.com/Xuehao-Gao/GUESS.

preprint2022arXiv

A Constrained Deformable Convolutional Network for Efficient Single Image Dynamic Scene Blind Deblurring with Spatially-Variant Motion Blur Kernels Estimation

Most existing deep-learning-based single image dynamic scene blind deblurring (SIDSBD) methods usually design deep networks to directly remove the spatially-variant motion blurs from one inputted motion blurred image, without blur kernels estimation. In this paper, inspired by the Projective Motion Path Blur (PMPB) model and deformable convolution, we propose a novel constrained deformable convolutional network (CDCN) for efficient single image dynamic scene blind deblurring, which simultaneously achieves accurate spatially-variant motion blur kernels estimation and the high-quality image restoration from only one observed motion blurred image. In our proposed CDCN, we first construct a novel multi-scale multi-level multi-input multi-output (MSML-MIMO) encoder-decoder architecture for more powerful features extraction ability. Second, different from the DLVBD methods that use multiple consecutive frames, a novel constrained deformable convolution reblurring (CDCR) strategy is proposed, in which the deformable convolution is first applied to blurred features of the inputted single motion blurred image for learning the sampling points of motion blur kernel of each pixel, which is similar to the estimation of the motion density function of the camera shake in the PMPB model, and then a novel PMPB-based reblurring loss function is proposed to constrain the learned sampling points convergence, which can make the learned sampling points match with the relative motion trajectory of each pixel better and promote the accuracy of the spatially-variant motion blur kernels estimation.

preprint2022arXiv

A Weighted Random Forest Based PositioningAlgorithm for 6G Indoor Communications

Due to the indoor none-line-of-sight (NLoS) propagation and multi-access interference (MAI), it is a great challenge to achieve centimeter-level positioning accuracy in indoor scenarios. However, the sixth generation (6G) wireless communications provide a good opportunity for the centimeter-level positioning. In 6G, the millimeter wave (mmWave) and terahertz (THz) communications have ultra-broad bandwidth so that the channel state information (CSI) will have a high resolution. In this paper, a weighted random forest (WRF) based indoor positioning algorithm using CSI based channel fingerprint feature is proposed to achieve high-precision positioning for 6G indoor communications. In addition, ray-tracing (RT) is used to improve the efficiency of establishing channel fingerprint database. The simulation results demonstrate the accuracy and robustness of the proposed algorithm. It is shown that the positioning accuracy of the algorithm is stable within 6 cm in different indoor scenarios with the channel fingerprint database established at 0.2 m intervals.

preprint2022arXiv

Evolution of the electronic structure of ultrathin MnBi2Te4 Films

Ultrathin films of intrinsic magnetic topological insulator MnBi2Te4 exhibit fascinating quantum properties such as quantum anomalous Hall effect and axion insulator state. In this work, we systematically investigate the evolution of the electronic structure of MnBi2Te4 thin films. With increasing film thickness, the electronic structure changes from an insulator-type with a large energy gap to one with in-gap topological surface states, which is, however, still drastically different from the bulk material. By surface doping of alkali-metal atoms, a Rashba split band gradually emerges and hybridizes with topological surface states, which not only reconciles the puzzling difference between the electronic structures of the bulk and thin film MnBi2Te4 but also provides an interesting platform to establish Rashba ferromagnet that is attractive for (quantum) anomalous Hall effect. Our results provide important insights into the understanding and engineering of the intriguing quantum properties of MnBi2Te4 thin films.

preprint2022arXiv

Experimental violation of the Leggett-Garg inequality with a single-spin system

Investigation the boundary between quantum mechanical description and classical realistic view is of fundamental importance. The Leggett-Garg inequality provides a criterion to distinguish between quantum systems and classical systems, and can be used to prove the macroscopic superposition state. A larger upper bound of the LG function can be obtained in a multi-level system. Here, we present an experimental violation of the Leggett-Garg inequality in a three-level system using nitrogen-vacancy center in diamond by ideal negative result measurement. The experimental maximum value of Leggett-Garg function is $K_{3}^{exp}=1.625\pm0.022$ which exceeds the Lüders bound with a $5σ$ level of confidence.

preprint2022arXiv

Influences of the dissipative topological edge state on quantized transport in MnBi2Te4

The beauty of quantum Hall (QH) effect is the metrological precision of Hall resistance quantization that originates from the topological edge states. Understanding the factors that lead to quantization breakdown not only provides important insights on the nature of the topological protection of these edge states, but is beneficial for device applications involving such quantized transport. In this work, we combine conventional transport and real space conductivity mapping to investigate whether the quantization breakdown is tied to the disappearance of edge state in the hotly studied MnBi2Te4 system. Our experimental results unambiguously show that topological edge state does exist when quantization breakdown occurs. Such edge state is dissipative in nature and could lead to a quantization breakdown due to its diffusive character causing overlapping with bulk and other edge states in real devices. Our findings bring attentions to issues that are generally inaccessible in the transport study of QH, but can play important roles in practical measurements and device applications.

preprint2022arXiv

MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with Multimodal Fine-grained Aligned Annotations

Multimodal fine-grained sentiment analysis has recently attracted increasing attention due to its broad applications. However, the existing multimodal fine-grained sentiment datasets most focus on annotating the fine-grained elements in text but ignore those in images, which leads to the fine-grained elements in visual content not receiving the full attention they deserve. In this paper, we propose a new dataset, the Multimodal Aspect-Category Sentiment Analysis (MACSA) dataset, which contains more than 21K text-image pairs. The dataset provides fine-grained annotations for both textual and visual content and firstly uses the aspect category as the pivot to align the fine-grained elements between the two modalities. Based on our dataset, we propose the Multimodal ACSA task and a multimodal graph-based aligned model (MGAM), which adopts a fine-grained cross-modal fusion method. Experimental results show that our method can facilitate the baseline comparison for future research on this corpus. We will make the dataset and code publicly available.

preprint2022arXiv

Model Averaging for Generalized Linear Models in Fragmentary Data Prediction

Fragmentary data is becoming more and more popular in many areas which brings big challenges to researchers and data analysts. Most existing methods dealing with fragmentary data consider a continuous response while in many applications the response variable is discrete. In this paper we propose a model averaging method for generalized linear models in fragmentary data prediction. The candidate models are fitted based on different combinations of covariate availability and sample size. The optimal weight is selected by minimizing the Kullback-Leibler loss in the com?pleted cases and its asymptotic optimality is established. Empirical evidences from a simulation study and a real data analysis about Alzheimer disease are presented.

preprint2022arXiv

Quantifying Wetting Dynamics with Triboelectrification

Wetting is often perceived as an intrinsic surface property of materials, but determining its evolution is complicated by its complex dependence on roughness across the scales. The Wenzel state, where liquids have intimate contact with the rough substrate, and the Cassie-Baxter state, where liquids sit onto air pockets formed between asperities, are only two states among the plethora of wetting behaviors. Furthermore, transitions from the Cassie-Baxter to the Wenzel state dictate completely different surface performance, such as anti-contamination, anti-icing, drag reduction etc.; however, little is known about how transition occurs during time between the several wetting modes. In this paper, we show that wetting dynamics can be accurately quantified and tracked using solid-liquid triboelectrification. Theoretical underpinning reveals how surface micro-/nano-geometries regulate stability/infiltration, also demonstrating the generality of our theoretical approach in understanding wetting transitions.

preprint2022arXiv

Regulating effect of biaxial strain on electronic, optical and photocatalytic properties in promising X2PAs (X = Si, Ge and Sn) monolayers

Photocatalytic water splitting is an effective way to obtain renewable clean energy. The challenge is to design tunable photocatalyst to meet the needs in different environments. At the same time, the oxygen and hydrogen evolution reactions (OER and HER) on the photocatalyst should be separated, which will be conducive to the separation of products. The electronic, optical and photocatalytic properties of Janus X2PAs (X = Si, Ge and Sn) monolayers are explored by first-principles calculation. All the strain-free X2PAs monolayers exhibit excellent photocatalytic properties with suitable band edge positions straddling the standard redox potential of water and large visible light absorption coefficients (up to 105 cm-1). Interestingly, the intrinsic internal electric field is favorable for separating photogenerated carriers to different surfaces of the monolayer. It contributes to realize the OER and HER separated on different sides of the monolayer. In particular, the energy band edge positions of X2PAs monolayers can be well adjusted by biaxial strain. Then it can effectively modulate photocatalytic reactions, suggesting X2PAs monolayers can be a piezo-photocatalytic switch between the OER, HER and full-reaction of redox for water splitting. This investigation not only highlights that the photocatalyst X2PAs monolayers with the separated OER and HER can be effectively tuned by the mechanical strain, but also provides a new strategy for designing highly adaptable and tunable piezo-photocatalysts.

preprint2022arXiv

Ultrafast coherent interlayer phonon dynamics in atomically thin layers of MnBi2Te4

The atomically thin MnBi2Te4 crystal is a novel magnetic topological insulator, exhibiting exotic quantum physics. Here we report a systematic investigation of ultrafast carrier dynamics and coherent interlayer phonons in few-layer MnBi2Te4 as a function of layer number using time-resolved pump-probe reflectivity spectroscopy. Pronounced coherent phonon oscillations from the interlayer breathing mode are directly observed in the time domain. We find that the coherent oscillation frequency, the photocarrier and coherent phonon decay rates all depend sensitively on the sample thickness. The time-resolved measurements are complemented by ultralow-frequency Raman spectroscopy measurements, which both confirm the interlayer breathing mode and additionally enable observation of the interlayer shear mode. The layer dependence of these modes allows us to extract both the out-of-plane and in-plane interlayer force constants. Our studies not only reveal the interlayer van der Waals coupling strengths, but also shed light on the ultrafast optical properties of this novel two-dimensional material.

preprint2022arXiv

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection

Finding relevant moments and highlights in videos according to natural language queries is a natural and highly valuable common need in the current video content explosion era. Nevertheless, jointly conducting moment retrieval and highlight detection is an emerging research topic, even though its component problems and some related tasks have already been studied for a while. In this paper, we present the first unified framework, named Unified Multi-modal Transformers (UMT), capable of realizing such joint optimization while can also be easily degenerated for solving individual problems. As far as we are aware, this is the first scheme to integrate multi-modal (visual-audio) learning for either joint optimization or the individual moment retrieval task, and tackles moment retrieval as a keypoint detection problem using a novel query generator and query decoder. Extensive comparisons with existing methods and ablation studies on QVHighlights, Charades-STA, YouTube Highlights, and TVSum datasets demonstrate the effectiveness, superiority, and flexibility of the proposed method under various settings. Source code and pre-trained models are available at https://github.com/TencentARC/UMT.

preprint2021arXiv

Dynamic Face Video Segmentation via Reinforcement Learning

For real-time semantic video segmentation, most recent works utilised a dynamic framework with a key scheduler to make online key/non-key decisions. Some works used a fixed key scheduling policy, while others proposed adaptive key scheduling methods based on heuristic strategies, both of which may lead to suboptimal global performance. To overcome this limitation, we model the online key decision process in dynamic video segmentation as a deep reinforcement learning problem and learn an efficient and effective scheduling policy from expert information about decision history and from the process of maximising global return. Moreover, we study the application of dynamic video segmentation on face videos, a field that has not been investigated before. By evaluating on the 300VW dataset, we show that the performance of our reinforcement key scheduler outperforms that of various baselines in terms of both effective key selections and running speed. Further results on the Cityscapes dataset demonstrate that our proposed method can also generalise to other scenarios. To the best of our knowledge, this is the first work to use reinforcement learning for online key-frame decision in dynamic video segmentation, and also the first work on its application on face videos.

preprint2021arXiv

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Visual recognition is currently one of the most important and active research areas in computer vision, pattern recognition, and even the general field of artificial intelligence. It has great fundamental importance and strong industrial needs. Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks, with the help of large amounts of training data and new powerful computation resources. Though recognition accuracy is usually the first concern for new progresses, efficiency is actually rather important and sometimes critical for both academic research and industrial applications. Moreover, insightful views on the opportunities and challenges of efficiency are also highly required for the entire community. While general surveys on the efficiency issue of DNNs have been done from various perspectives, as far as we are aware, scarcely any of them focused on visual recognition systematically, and thus it is unclear which progresses are applicable to it and what else should be concerned. In this paper, we present the review of the recent advances with our suggestions on the new possible directions towards improving the efficiency of DNN-related visual recognition approaches. We investigate not only from the model but also the data point of view (which is not the case in existing surveys), and focus on three most studied data types (images, videos and points). This paper attempts to provide a systematic summary via a comprehensive survey which can serve as a valuable reference and inspire both researchers and practitioners who work on visual recognition problems.

preprint2021arXiv

Magnetization-tuned topological quantum phase transition in MnBi2Te4 devices

Recently, the intrinsic magnetic topological insulator MnBi2Te4 has attracted enormous research interest due to the great success in realizing exotic topological quantum states, such as the quantum anomalous Hall effect (QAHE), axion insulator state, high-Chern-number and high-temperature Chern insulator states. One key issue in this field is to effectively manipulate these states and control topological phase transitions. Here, by systematic angle-dependent transport measurements, we reveal a magnetization-tuned topological quantum phase transition from Chern insulator to magnetic insulator with gapped Dirac surface states in MnBi2Te4 devices. Specifically, as the magnetic field is tilted away from the out-of-plane direction by around 40-60 degrees, the Hall resistance deviates from the quantization value and a colossal, anisotropic magnetoresistance is detected. The theoretical analyses based on modified Landauer-Buttiker formalism show that the field-tilt-driven switching from ferromagnetic state to canted antiferromagnetic state induces a topological quantum phase transition from Chern insulator to magnetic insulator with gapped Dirac surface states in MnBi2Te4 devices. Our work provides an efficient means for modulating topological quantum states and topological quantum phase transitions.

preprint2021arXiv

ReMOTS: Self-Supervised Refining Multi-Object Tracking and Segmentation

We aim to improve the performance of Multiple Object Tracking and Segmentation (MOTS) by refinement. However, it remains challenging for refining MOTS results, which could be attributed to that appearance features are not adapted to target videos and it is also difficult to find proper thresholds to discriminate them. To tackle this issue, we propose a self-supervised refining MOTS (i.e., ReMOTS) framework. ReMOTS mainly takes four steps to refine MOTS results from the data association perspective. (1) Training the appearance encoder using predicted masks. (2) Associating observations across adjacent frames to form short-term tracklets. (3) Training the appearance encoder using short-term tracklets as reliable pseudo labels. (4) Merging short-term tracklets to long-term tracklets utilizing adopted appearance features and thresholds that are automatically obtained from statistical information. Using ReMOTS, we reached the $1^{st}$ place on CVPR 2020 MOTS Challenge 1, with an sMOTSA score of $69.9$.

Yang Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

54 published item(s)

GEM: Generating LiDAR World Model via Deformable Mamba

Reasoning Before Diagnosis: Physician-Inspired Structured Thinking for ECG Classification

GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation

A Constrained Deformable Convolutional Network for Efficient Single Image Dynamic Scene Blind Deblurring with Spatially-Variant Motion Blur Kernels Estimation

A Weighted Random Forest Based PositioningAlgorithm for 6G Indoor Communications

Evolution of the electronic structure of ultrathin MnBi2Te4 Films

Experimental violation of the Leggett-Garg inequality with a single-spin system

Influences of the dissipative topological edge state on quantized transport in MnBi2Te4

MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with Multimodal Fine-grained Aligned Annotations

Model Averaging for Generalized Linear Models in Fragmentary Data Prediction

Quantifying Wetting Dynamics with Triboelectrification

Regulating effect of biaxial strain on electronic, optical and photocatalytic properties in promising X2PAs (X = Si, Ge and Sn) monolayers

Ultrafast coherent interlayer phonon dynamics in atomically thin layers of MnBi2Te4

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection

Dynamic Face Video Segmentation via Reinforcement Learning

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Magnetization-tuned topological quantum phase transition in MnBi2Te4 devices

ReMOTS: Self-Supervised Refining Multi-Object Tracking and Segmentation

Beyond Intra-modality: A Survey of Heterogeneous Person Re-identification

Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

Compressing 3DCNNs Based on Tensor Train Decomposition

Electronic states and magnetic response of MnBi2Te4 by scanning tunneling microscopy and spectroscopy

Energy-Efficient Trajectory Design for UAV-Enabled Communication Under Malicious Jamming

Enhancement of superconductivity in organic-inorganic hybrid topological materials

High-Chern-Number and High-Temperature Quantum Hall Effect without Landau Levels

Make Skeleton-based Action Recognition Model Smaller, Faster and Better

Metallic Microswimmers Driven up the Wall by Gravity

Multiple Object Tracking by Flowing and Fusing

Robust axion insulator and Chern insulator phases in a two-dimensional antiferromagnetic topological insulator

Spin-orbit torque magnetization switching in MoTe2/permalloy heterostructures

Unified spectral hamiltonian results of balanced bipartite graphs and complementary graphs

Using Panoramic Videos for Multi-person Localization and Tracking in a 3D Panoramic Coordinate

Video Region Annotation with Sparse Bounding Boxes

When Person Re-identification Meets Changing Clothes

Enhanced spin-orbit torque via modulation of spin current absorption

Experimental observation of topological Fermi arcs in type-II Weyl semimetal MoTe2

Experimental time-optimal universal control of spin qubits in solids

Helicity dependent photovoltaic effect in Bi2Se3 under normal incident light

High performance THz emitters based on ferromagnetic/nonmagnetic heterostructures

Raman signatures of inversion symmetry breaking and structural phase transition in type-II Weyl semimetal MoTe2

Spin orbit torques and Dzyaloshinskii-Moriya interaction in dual-interfaced Co-Ni multilayers

Experimental fault-tolerant universal quantum gates with solid-state spins under ambient conditions

Feedback-optimized Extraordinary Optical Transmission of Continuous-variable Entangled States

Graphene terahertz modulators by ionic liquid gating

Spin-orbit torque engineering via oxygen manipulation

Thermal and Vibrational Properties of Thermoelectric ZnSb - Exploring the Origin of Low Thermal Conductivity

Collaborative Representation for Classification, Sparse or Non-sparse?

Strain-enhanced tunneling magnetoresistance in MgO magnetic tunnel junctions

A new route to spin-orbit torque engineering via oxygen manipulation

Dynamics of Open-Source Software Developer's Commit Behavior: An Empirical Investigation of Subversion

Graphene/liquid crystal based terahertz phase shifters

Ground state pairing correlations in the $S_4$ symmetric microscopic model for iron-based superconductors

Post-Acceleration Study for Neutrino Super-beam at CSNS

Variable Electron-Phonon Coupling in Isolated Metallic Carbon Nanotubes Observed by Raman Scattering