Source author record

Bing Wang

Bing Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

93works

51topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Hybrid Tucker-LSTM Tensor Network Model for SOC Prediction in Electric Vehicles

Accurate state of charge estimation is critical for the success of electric vehicle battery management strategies, but it is well known that conventional estimators suffer from two fundamental shortcomings: cumulative errors that grow over time and reliance on simplified battery models that do not reflect real world dynamics. Therefore, this paper presents a novel hybrid approach combining Tucker tensor decomposition with LSTM networks, using full - lifecycle EV field data for SOC prediction. The inputs are charge status, mileage, voltage, current, cell differentials, and temporal features. Tucker decomposition is skillfully used to reduce dimensionality while maintaining the temporal structure, hence allowing a direct, fair comparison with standard LSTM. The result is unequivocal: Tucker - LSTM outperforms the baseline on all metrics, with MSE dropping 70.5\% (from 21.07 to 6.22 ), MAE improving 48.7\% (from 3.37\% to 1.73\%), RMSE falling from 4.59\% to 2.49\%, and $R^2$ rising from 0.918 to 0.976. Since the experimental results demonstrably demonstrate that tensor decomposition compresses high-dimensional battery data very well without loss of predictive fidelity, this paper naturally opens up a new direction for tensor-based analytics in electric vehicle battery management.

preprint2026arXiv

AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

Quantization is a key method for reducing the GPU memory requirement of training large language models (LLMs). Yet, current approaches are ineffective for 4-bit activations and 8-bit gradients, which would easily cause slow convergence or accuracy loss. To address this, we introduce AGoQ, incorporating two new techniques: 1) a layer-aware activation quantization algorithm that allocates appropriate bit-widths for activations of various layers based on their types and pipeline stages to achieve near 4-bit activation storage, and 2) a gradient quantization algorithm that reduces memory usage and shortens communication time by employing 8-bit gradient storage and precision-preserving 8-bit All-Reduce communication. We conduct extensive experiments using different sizes of LLMs on two GPU clusters (up to 64 GPUs), and the experimental results show that our AGoQ reduces the memory by up to 52\% and achieves up to 1.34$\times$ improvement of training speed compared to state-of-the-art training systems Megatron-LM (w/ or w/o ZeRO), COAT and DeepSpeed with 8B to 32B LLaMA models, while achieving convergence loss on pretraining and comparable accuracy on downstream tasks with LLaMA architectures.

preprint2026arXiv

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

The rapid spread of misinformation on social media platforms has become a formidable challenge. To mitigate its proliferation, Misinformation Detection (MD) has emerged as a critical research topic. Traditional MD approaches based on small models typically perform binary classification through a black-box process. Recently, the rise of Large Language Models (LLMs) has enabled explainable MD, where models generate rationales that explain their decisions, thereby enhancing transparency. Existing explainable MD methods primarily focus on crafting sophisticated prompts to elicit rationales from off-the-shelf LLMs. In this work, we propose a pipeline to fine-tune a dedicated LLM specifically for explainable MD. Our pipeline begins by collecting large-scale fact-checked articles, and then uses multiple strong LLMs to produce veracity predictions and rationales. To ensure high-quality training data, we leverage a filtering strategy that selects only the correct instances for fine-tuning. While this pipeline is intuitive and prevalent, our experiments reveal that naive filtering based solely on label correctness is insufficient in practice and suffers from two critical limitations: (1) Coarse-grained labels cause insufficient rationales: Rationales filtered solely based on binary labels are insufficient to adequately support their decisions; (2) Over-verification behavior causes unnecessary rationales: Stronger LLMs tend to exhibit over-verification behavior, producing excessively verbose and unnecessary rationales. To address these issues, we introduce LONSREX, a novel data synthesis pipeline to Locate Necessary and Sufficient Rationales for Explainable MD. Specifically, we propose a metric that quantifies the contribution of each verification step to the final prediction, thereby evaluating its necessity and sufficiency. Experimental results demonstrate the effectiveness of LONSREX.

preprint2026arXiv

Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

Large language models (LLMs) have achieved remarkable success in complex reasoning tasks via long chain-of-thought (CoT), yet their immense computational overhead hinders real-world deployment. LLM reasoning distillation addresses this by transferring reasoning capabilities from formidable teacher models to compact student models. However, existing distillation paradigms face a fundamental dilemma. Typical off-policy distillation strictly utilizes teacher-generated golden trajectories, suffering from an exposure bias due to the mismatch between training distributions and student-generated inference contexts, which leads to error cascades in long CoT reasoning. To address this, on-policy distillation allows students to explore their own trajectories, but we demonstrate that it inherently introduces a reciprocal reversed exposure bias: the teacher model also struggles to provide positive guidance when conditioned on student-generated sub-optimal contexts. To resolve this dual exposure biases problem, we propose Monitoring Trajectories and Backtracking when it strays (MOTAB), a new LLM reasoning distillation pipeline. Specifically, MOTAB dynamically monitors the student's on-policy generation against an adaptive safety boundary. When the generation strays and exceeds this threshold, MOTAB backtracks to the last safe state and leverages teacher intervention to correct the course. This approach inherently tolerates minor student errors to mitigate exposure bias, while preventing sub-optimal contexts to circumvent reversed exposure bias. Extensive experiments on the LIMO-v2 and AceReason datasets demonstrate that MOTAB effectively alleviates the dual exposure biases, yielding a roughly 3% average performance improvement in reasoning tasks.

preprint2026arXiv

Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning

Recently, the prominent performance of large language models (LLMs) has been largely driven by multi-task instruct-tuning. Unfortunately, this training paradigm suffers from a key issue, named cross-task interference, due to conflicting gradients over shared parameters among different tasks. Some previous methods mitigate this issue by isolating task-specific parameters, e.g., task-specific neuron selection and mixture-of-experts. In this paper, we empirically reveal that the cross-task interference still exists for the existing solutions because of many parameters also shared by different tasks, and accordingly, we propose a novel solution, namely Basic Abilities Decomposition for multi-task Instruct-Tuning (BADIT). Specifically, we empirically find that certain parameters are consistently co-activated, and that co-activated parameters naturally organize into base groups. This motivates us to analogize that LLMs encode several orthogonal basic abilities, and that any task can be represented as a linear combination of these abilities. Accordingly, we propose BADIT that decomposes LLM parameters into orthogonal high-singular-value LoRA experts representing basic abilities, and dynamically enforces their orthogonality during training via spherical clustering of rank-1 components. We conduct extensive experiments on the SuperNI benchmark with 6 LLMs, and empirical results demonstrate that BADIT can outperform SOTA methods and mitigate the degree of cross-task interference.

preprint2026arXiv

Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

In this report, we introduce DASD-4B-Thinking, a lightweight yet highly capable, fully open-source reasoning model. It achieves SOTA performance among open-source models of comparable scale across challenging benchmarks in mathematics, scientific reasoning, and code generation -- even outperforming several larger models. We begin by critically reexamining a widely adopted distillation paradigm in the community: SFT on teacher-generated responses, also known as sequence-level distillation. Although a series of recent works following this scheme have demonstrated remarkable efficiency and strong empirical performance, they are primarily grounded in the SFT perspective. Consequently, these approaches focus predominantly on designing heuristic rules for SFT data filtering, while largely overlooking the core principle of distillation itself -- enabling the student model to learn the teacher's full output distribution so as to inherit its generalization capability. Specifically, we identify three critical limitations in current practice: i) Inadequate representation of the teacher's sequence-level distribution; ii) Misalignment between the teacher's output distribution and the student's learning capacity; and iii) Exposure bias arising from teacher-forced training versus autoregressive inference. In summary, these shortcomings reflect a systemic absence of explicit teacher-student interaction throughout the distillation process, leaving the essence of distillation underexploited. To address these issues, we propose several methodological innovations that collectively form an enhanced sequence-level distillation training pipeline. Remarkably, DASD-4B-Thinking obtains competitive results using only 448K training samples -- an order of magnitude fewer than those employed by most existing open-source efforts. To support community research, we publicly release our models and the training dataset.

preprint2026arXiv

ParkGaussian: Surround-view 3D Gaussian Splatting for Autonomous Parking

Parking is a critical task for autonomous driving systems (ADS), with unique challenges in crowded parking slots and GPS-denied environments. However, existing works focus on 2D parking slot perception, mapping, and localization, 3D reconstruction remains underexplored, which is crucial for capturing complex spatial geometry in parking scenarios. Naively improving the visual quality of reconstructed parking scenes does not directly benefit autonomous parking, as the key entry point for parking is the slots perception module. To address these limitations, we curate the first benchmark named ParkRecon3D, specifically designed for parking scene reconstruction. It includes sensor data from four surround-view fisheye cameras with calibrated extrinsics and dense parking slot annotations. We then propose ParkGaussian, the first framework that integrates 3D Gaussian Splatting (3DGS) for parking scene reconstruction. To further improve the alignment between reconstruction and downstream parking slot detection, we introduce a slot-aware reconstruction strategy that leverages existing parking perception methods to enhance the synthesis quality of slot regions. Experiments on ParkRecon3D demonstrate that ParkGaussian achieves state-of-the-art reconstruction quality and better preserves perception consistency for downstream tasks. The code and dataset will be released at: https://github.com/wm-research/ParkGaussian

preprint2026arXiv

Pixel-Perfect Visual Geometry Estimation

Recovering clean and accurate geometry from images is essential for robotics and augmented reality. However, existing geometry foundation models still suffer severely from flying pixels and the loss of fine details. In this paper, we present pixel-perfect visual geometry models that can predict high-quality, flying-pixel-free point clouds by leveraging generative modeling in the pixel space. We first introduce Pixel-Perfect Depth (PPD), a monocular depth foundation model built upon pixel-space diffusion transformers (DiT). To address the high computational complexity associated with pixel-space diffusion, we propose two key designs: 1) Semantics-Prompted DiT, which incorporates semantic representations from vision foundation models to prompt the diffusion process, preserving global semantics while enhancing fine-grained visual details; and 2) Cascade DiT architecture that progressively increases the number of image tokens, improving both efficiency and accuracy. To further extend PPD to video (PPVD), we introduce a new Semantics-Consistent DiT, which extracts temporally consistent semantics from a multi-view geometry foundation model. We then perform reference-guided token propagation within the DiT to maintain temporal coherence with minimal computational and memory overhead. Our models achieve the best performance among all generative monocular and video depth estimation models and produce significantly cleaner point clouds than all other models.

preprint2026arXiv

PointForward: Feedforward Driving Reconstruction through Point-Aligned Representations

High-fidelity reconstruction of driving scenes is crucial for autonomous driving. While recent feedforward 3D Gaussian Splatting (3DGS) methods enable fast reconstruction, their per-pixel Gaussian prediction paradigm often suffers from multi-view inconsistency and layering artifacts. Moreover, existing methods often model dynamic instances via dense flow prediction, which lacks explicit cross-view correspondence and instance-level consistency. In this paper, we propose PointForward, a feedforward driving reconstruction framework through point-aligned representations. Unlike pixel-aligned methods, we initialize sparse 3D queries in world space and aggregate multi-view image information via spatial-temporal fusion onto these queries, enforcing explicit cross-view consistency in a single feedforward pass. To handle scene dynamics, we introduce scene graphs that explicitly organize moving instances during reconstruction. By leveraging 3D bounding boxes, our method enables instance-level motion propagation and temporally consistent dynamic representations. Extensive experiments demonstrate that PointForward achieves state-of-the-art performance on large-scale driving benchmarks. The code will be available upon the publication of the paper.

preprint2026arXiv

Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation

On-policy distillation (OPD) trains a student model on its own rollouts using dense feedback from a stronger teacher. Prior literature suggests that, provided teacher feedback is available, supervising the full sequence of response tokens should monotonically improve performance. However, we demonstrate that this assumption sometimes fails to hold in strong-to-weak OPD settings. While later segments of a generated trajectory may still exhibit a non-zero teacher-student advantage, they frequently lack the local contrast that makes dense feedback effective for prioritizing student learning. We term this failure mode local teachability collapse. The resulting principle is straightforward: supervision should concentrate on trajectory regions where the teacher's feedback remains discriminative, rather than uniformly covering the entire response. We operationalize this principle through a trajectory-specific release rule. This rule measures the teacher's margin over the student's top-$K$ candidate set, aggregates this margin across NLTK-tokenized sentence segments, and truncates dense OPD supervision upon detecting a BIC-style downward change point. Experimental results across strong-to-weak distillation tasks using the Qwen3 model family indicate that this release rule consistently outperforms standard full-trajectory OPD across five in-domain benchmarks at various student scales. Furthermore, compared to baseline distillation methods, our approach better preserves model capabilities on out-of-domain task. These results suggest that effective strong-to-weak OPD requires evaluating not only the availability of teacher guidance but also its local utility, ensuring that the generated feedback remains teachable.

preprint2026arXiv

R$^3$L: Reasoning 3D Layouts from Relative Spatial Relations

Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred relations are often unreliable and are typically handled with post-hoc heuristics. In this paper, we propose R$^3$L, a general framework that improves the reliability and consistency of relative spatial reasoning for 3D layout generation. Our key motivation is that multi-hop reasoning requires repeated reference-frame transformations, which accumulate errors in inferred relations and lead to semantic and metric drift. To mitigate this, we propose invariant spatial decomposition to break coupled relation chains, and consistent spatial imagination to promote self-consistency through an imagine-and-revise loop. We further introduce supportive spatial optimization to ease pose optimization via global-to-local coordinate re-parameterization. Extensive experiments across diverse scene types and instructions demonstrate that R$^3$L produces more physically feasible and semantically consistent layouts. Notably, our analysis shows that resolving frame-induced inconsistencies is crucial for reliable multi-hop relative spatial reasoning. The code is available at https://github.com/Neal2020GitHub/R3L.

preprint2026arXiv

VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness

The safe deployment of autonomous driving (AD) systems is fundamentally hindered by the long-tail problem, where rare yet critical driving scenarios are severely underrepresented in real-world data. Existing solutions including safety-critical scenario generation and closed-loop learning often rely on rule-based heuristics, resampling methods and generative models learned from offline datasets, limiting their ability to produce diverse and novel challenges. While recent works leverage Vision Language Models (VLMs) to produce scene descriptions that guide a separate, downstream model in generating hazardous trajectories for agents, such two-stage framework constrains the generative potential of VLMs, as the diversity of the final trajectories is ultimately limited by the generalization ceiling of the downstream algorithm. To overcome these limitations, we introduce VILTA (VLM-In-the-Loop Trajectory Adversary), a novel framework that integrates a VLM into the closed-loop training of AD agents. Unlike prior works, VILTA actively participates in the training loop by comprehending the dynamic driving environment and strategically generating challenging scenarios through direct, fine-grained editing of surrounding agents' future trajectories. This direct-editing approach fully leverages the VLM's powerful generalization capabilities to create a diverse curriculum of plausible yet challenging scenarios that extend beyond the scope of traditional methods. We demonstrate that our approach substantially enhances the safety and robustness of the resulting AD policy, particularly in its ability to navigate critical long-tail events.

preprint2026arXiv

Xiaomi EV World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

This report presents a unified technical system addressing the two core capabilities of world models for autonomous driving: world representation and world generation. For world representation, we propose WorldRec, a feed-forward reconstruction architecture driven by sparse scene queries. WorldRec initializes structured queries in 3D space, leveraging them to aggregate cross-view, cross-temporal features, thereby naturally enforcing spatial consistency across frames and yielding compact yet high-fidelity 3D Gaussian scene representations. For world generation, we propose WorldGen, a two-stage training framework of bidirectional pretraining followed by causal fine-tuning through three progressive stages (Teacher Forcing, ODE distillation, and DMD), enabling high-quality online causal video generation in as few as 4 denoising steps. Building on both modules, we further introduce the JWM, which deeply integrates WorldRec and WorldGen to achieve synergistic gains in generation stability, cross-frame consistency, and visual fidelity, providing a solid foundation for closed-loop simulation, data synthesis, and end-to-end training in autonomous driving.

preprint2025arXiv

Mirage: One-Step Video Diffusion for Photorealistic and Coherent Asset Editing in Driving Scenes

Vision-centric autonomous driving systems rely on diverse and scalable training data to achieve robust performance. While video object editing offers a promising path for data augmentation, existing methods often struggle to maintain both high visual fidelity and temporal coherence. In this work, we propose \textbf{Mirage}, a one-step video diffusion model for photorealistic and coherent asset editing in driving scenes. Mirage builds upon a text-to-video diffusion prior to ensure temporal consistency across frames. However, 3D causal variational autoencoders often suffer from degraded spatial fidelity due to compression, and directly passing 3D encoder features to decoder layers breaks temporal causality. To address this, we inject temporally agnostic latents from a pretrained 2D encoder into the 3D decoder to restore detail while preserving causal structures. Furthermore, because scene objects and inserted assets are optimized under different objectives, their Gaussians exhibit a distribution mismatch that leads to pose misalignment. To mitigate this, we introduce a two-stage data alignment strategy combining coarse 3D alignment and fine 2D refinement, thereby improving alignment and providing cleaner supervision. Extensive experiments demonstrate that Mirage achieves high realism and temporal consistency across diverse editing scenarios. Beyond asset editing, Mirage can also generalize to other video-to-video translation tasks, serving as a reliable baseline for future research. Our code is available at https://github.com/wm-research/mirage.

preprint2023arXiv

Heat kernel on Ricci shrinkers (II)

This paper is the sequel to our study of heat kernels on Ricci shrinkers in \cite{LW20}. In this paper, we improve many estimates in \cite{LW20} and extend the recent progress of Bamler \cite{Bam20a}. In particular, we drop the compactness and curvature boundedness assumptions and show that the theory of $\IF$-convergence holds naturally on any Ricci flows induced by Ricci shrinkers.

preprint2022arXiv

A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence. However, it is always sensitive to the multi-aspect challenge, where features of multiple aspects in a sentence will affect each other. To mitigate this issue, we design a novel training framework, called Contrastive Cross-Channel Data Augmentation (C3 DA), which leverages an in-domain generator to construct more multi-aspect samples and then boosts the robustness of ABSA models via contrastive learning on these generated data. In practice, given a generative pretrained language model and some limited ABSA labeled data, we first employ some parameter-efficient approaches to perform the in-domain fine-tuning. Then, the obtained in-domain generator is used to generate the synthetic sentences from two channels, i.e., Aspect Augmentation Channel and Polarity Augmentation Channel, which generate the sentence condition on a given aspect and polarity respectively. Specifically, our C3 DA performs the sentence generation in a cross-channel manner to obtain more sentences, and proposes an Entropy-Minimization Filter to filter low-quality generated samples. Extensive experiments show that our C3 DA can outperform those baselines without any augmentations by about 1% on accuracy and Macro- F1. Code and data are released in https://github.com/wangbing1416/C3DA.

preprint2022arXiv

AutoPlace: Robust Place Recognition with Single-chip Automotive Radar

This paper presents a novel place recognition approach to autonomous vehicles by using low-cost, single-chip automotive radar. Aimed at improving recognition robustness and fully exploiting the rich information provided by this emerging automotive radar, our approach follows a principled pipeline that comprises (1) dynamic points removal from instant Doppler measurement, (2) spatial-temporal feature embedding on radar point clouds, and (3) retrieved candidates refinement from Radar Cross Section measurement. Extensive experimental results on the public nuScenes dataset demonstrate that existing visual/LiDAR/spinning radar place recognition approaches are less suitable for single-chip automotive radar. In contrast, our purpose-built approach for automotive radar consistently outperforms a variety of baseline methods via a comprehensive set of metrics, providing insights into the efficacy when used in a realistic system.

preprint2022arXiv

Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection

There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR. The camera provides rich semantic information such as color, texture, and the LiDAR reflects the 3D shape and locations of surrounding objects. People discover that fusing these two modalities can significantly boost the performance of 3D perception models as each modality has complementary information to the other. However, we observe that current datasets are captured from expensive vehicles that are explicitly designed for data collection purposes, and cannot truly reflect the realistic data distribution due to various reasons. To this end, we collect a series of real-world cases with noisy data distribution, and systematically formulate a robustness benchmark toolkit, that simulates these cases on any clean autonomous driving datasets. We showcase the effectiveness of our toolkit by establishing the robustness benchmark on two widely-adopted autonomous driving datasets, nuScenes and Waymo, then, to the best of our knowledge, holistically benchmark the state-of-the-art fusion methods for the first time. We observe that: i) most fusion methods, when solely developed on these data, tend to fail inevitably when there is a disruption to the LiDAR input; ii) the improvement of the camera input is significantly inferior to the LiDAR one. We further propose an efficient robust training strategy to improve the robustness of the current fusion method. The benchmark and code are available at https://github.com/kcyu2014/lidar-camera-robust-benchmark

preprint2022arXiv

Directional transport of active particles confined in 3D smooth corrugated channel

The transport phenomenon of active particles confined in 3D(three dimensional) corrugated confined channel with Gaussian noises is investigated. Large noise intensity perpendicular to the symmetry axis is good for the diffusion and current along the axis. The generalized resonance transport phenomenon appears with increasing noise intensity parallel to the symmetry axis. Large noise intensity parallel to axis can suppress the diffusion. The diffusion coefficient has a maximum with increasing polar angle noise intensity. There exits an optimal value of parameter f that result in maximum movement speed. Large f is good for the diffusion. Transport reverse phenomenon appears with increasing channel parameter ε and Δ. Too large or too small values of ε and Δ can suppress the diffusion.

preprint2022arXiv

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

Knowledge Distillation has shown very promising abil-ity in transferring learned representation from the largermodel (teacher) to the smaller one (student).Despitemany efforts, prior methods ignore the important role ofretaining inter-channel correlation of features, leading tothe lack of capturing intrinsic distribution of the featurespace and sufficient diversity properties of features in theteacher network.To solve the issue, we propose thenovel Inter-Channel Correlation for Knowledge Distillation(ICKD), with which the diversity and homology of the fea-ture space of the student network can align with that ofthe teacher network. The correlation between these twochannels is interpreted as diversity if they are irrelevantto each other, otherwise homology. Then the student isrequired to mimic the correlation within its own embed-ding space. In addition, we introduce the grid-level inter-channel correlation, making it capable of dense predictiontasks. Extensive experiments on two vision tasks, includ-ing ImageNet classification and Pascal VOC segmentation,demonstrate the superiority of our ICKD, which consis-tently outperforms many existing methods, advancing thestate-of-the-art in the fields of Knowledge Distillation. Toour knowledge, we are the first method based on knowl-edge distillation boosts ResNet18 beyond 72% Top-1 ac-curacy on ImageNet classification. Code is available at:https://github.com/ADLab-AutoDrive/ICKD.

preprint2022arXiv

Formation and Immediate Deformation of a Small Filament Through Intermittent Magnetic Interactions

It is generally believed that filament formation involves a process of the accumulation of magnetic energy. However, in this paper we discuss the idea that filaments will not erupt and will only deform when the stored magnetic energy is released gradually. Combining high-quality observations from Solar Dynamics Observatory and other instruments, we present the formation and immediate deformation of a small filament (F1) in the active region (AR) 12760 on 28-30 April 2020. Before the filament formation, three successive dipoles quickly emerged with separation motions in the center of AR 12760. Due to the magnetic interaction between magnetic dipoles and pre-existing positive polarities, coronal brightenings consequently appeared in the overlying atmosphere. Subsequently, because of the continuous cancellation of magnetic flux that happened around the adjacent ends of F1 and another nearby filament (F2), the magnetic reconections occurred intermittently occurred between F1 and F2. Finally, F1 lessened in the shear, and F2 became shorter. All the results show that the formation of F1 was closely associated with intermittent interactions between the sequence of emerging dipoles and pre-existing magnetic polarities, and the immediate deformation of F1 was intimately related to intermittent interactions between F1 and F2. We also suggest that the intermittent magnetic interactions driven by the continuous magnetic activities (magnetic-flux emergence, cancellation, and convergence) play an important role in the formation and deformation of filaments.

preprint2022arXiv

Learning Selective Sensor Fusion for States Estimation

Autonomous vehicles and mobile robotic systems are typically equipped with multiple sensors to provide redundancy. By integrating the observations from different sensors, these mobile agents are able to perceive the environment and estimate system states, e.g. locations and orientations. Although deep learning approaches for multimodal odometry estimation and localization have gained traction, they rarely focus on the issue of robust sensor fusion - a necessary consideration to deal with noisy or incomplete sensor observations in the real world. Moreover, current deep odometry models suffer from a lack of interpretability. To this extent, we propose SelectFusion, an end-to-end selective sensor fusion module which can be applied to useful pairs of sensor modalities such as monocular images and inertial measurements, depth images and LIDAR point clouds. Our model is a uniform framework that is not restricted to specific modality or task. During prediction, the network is able to assess the reliability of the latent features from different sensor modalities and estimate trajectory both at scale and global pose. In particular, we propose two fusion modules - a deterministic soft fusion and a stochastic hard fusion, and offer a comprehensive study of the new strategies compared to trivial direct fusion. We extensively evaluate all fusion strategies in both public datasets and on progressively degraded datasets that present synthetic occlusions, noisy and missing data and time misalignment between sensors, and we investigate the effectiveness of the different fusion strategies in attending the most reliable features, which in itself, provides insights into the operation of the various models.

preprint2022arXiv

No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces

Scene flow is a powerful tool for capturing the motion field of 3D point clouds. However, it is difficult to directly apply flow-based models to dynamic point cloud classification since the unstructured points make it hard or even impossible to efficiently and effectively trace point-wise correspondences. To capture 3D motions without explicitly tracking correspondences, we propose a kinematics-inspired neural network (Kinet) by generalizing the kinematic concept of ST-surfaces to the feature space. By unrolling the normal solver of ST-surfaces in the feature space, Kinet implicitly encodes feature-level dynamics and gains advantages from the use of mature backbones for static point cloud processing. With only minor changes in network structures and low computing overhead, it is painless to jointly train and deploy our framework with a given static model. Experiments on NvGesture, SHREC'17, MSRAction-3D, and NTU-RGBD demonstrate its efficacy in performance, efficiency in both the number of parameters and computational complexity, as well as its versatility to various static backbones. Noticeably, Kinet achieves the accuracy of 93.27% on MSRAction-3D with only 3.20M parameters and 10.35G FLOPS.

preprint2022arXiv

Numerical Study on Droplet Evaporation and Propagation Stability in Normal-temperature Two-phase Rotating Detonation System

A numerical study is carried out on the droplet-laden two-phase rotating detonation wave (RDW) of kerosene/oxygen-enriched air at normal temperature. Two types of combustors without and with the inlet mixing section (IMS) are constructed to illustrate the effect of IMS on the combustion characteristics of two-phase RDW. The important role of the preheating zone in the IMS after the back-propagation shock on the droplet evaporation is analyzed. The parameter sensitivity of RDW propagation stability to the average droplet diameter d0 is further discussed. Results show that the droplets mainly evaporate after the detonation front in the combustor without IMS, and the reaction heat release is completed in a short distance, which propels continuous propagation of the detonation wave. When d0 gradually increases, the droplet evaporation distance increases, and the coupling between the incident shock and reaction is continuously weakened, finally resulting in the detonation quenching. In the combustor with IMS, a preheating zone is induced close to the contact surface by the back-propagation shock of the RDW. A large number of droplets evaporate in this zone, and generate sufficient mixture of fuel vapor and oxidizer in front of detonation wave to maintain the detonation propagation. Priority to the combustor without IMS, the droplet evaporation relies less on the inlet high-temperature airflow with the assistance of preheating zone, and thus the wave propagation stability can be enhanced and the RDW can sustain for a wider range of d0. The present analysis provides a new understanding of two-phase rotating detonation systems.

preprint2022arXiv

On the pressure gain of stable flow systems with variable cross-section area

For the clarification of total pressure gain performance of rotating detonation propulsion systems, the extended Hugoniot curve is proposed and discussed for the stable flow systems with variable cross-section area (SFSVA). The dimensionless pressure integral (θ) along the system walls is found to have critical impact on the pressure gain of the SFSVA with given inlet Mach number (M0) and heat release (q). The key to obtain positive pressure gain of a SFSVA is to achieve the matching θ, M0 and q.

preprint2022arXiv

RangeUDF: Semantic Surface Reconstruction from 3D Point Clouds

We present RangeUDF, a new implicit representation based framework to recover the geometry and semantics of continuous 3D scene surfaces from point clouds. Unlike occupancy fields or signed distance fields which can only model closed 3D surfaces, our approach is not restricted to any type of topology. Being different from the existing unsigned distance fields, our framework does not suffer from any surface ambiguity. In addition, our RangeUDF can jointly estimate precise semantics for continuous surfaces. The key to our approach is a range-aware unsigned distance function together with a surface-oriented semantic segmentation module. Extensive experiments show that RangeUDF clearly surpasses state-of-the-art approaches for surface reconstruction on four point cloud datasets. Moreover, RangeUDF demonstrates superior generalization capability across multiple unseen datasets, which is nearly impossible for all existing approaches.

preprint2022arXiv

Sparse Attentive Memory Network for Click-through Rate Prediction with Long Sequences

Sequential recommendation predicts users' next behaviors with their historical interactions. Recommending with longer sequences improves recommendation accuracy and increases the degree of personalization. As sequences get longer, existing works have not yet addressed the following two main challenges. Firstly, modeling long-range intra-sequence dependency is difficult with increasing sequence lengths. Secondly, it requires efficient memory and computational speeds. In this paper, we propose a Sparse Attentive Memory (SAM) network for long sequential user behavior modeling. SAM supports efficient training and real-time inference for user behavior sequences with lengths on the scale of thousands. In SAM, we model the target item as the query and the long sequence as the knowledge database, where the former continuously elicits relevant information from the latter. SAM simultaneously models target-sequence dependencies and long-range intra-sequence dependencies with O(L) complexity and O(1) number of sequential updates, which can only be achieved by the self-attention mechanism with O(L^2) complexity. Extensive empirical results demonstrate that our proposed solution is effective not only in long user behavior modeling but also on short sequences modeling. Implemented on sequences of length 1000, SAM is successfully deployed on one of the largest international E-commerce platforms. This inference time is within 30ms, with a substantial 7.30% click-through rate improvement for the online A/B test. To the best of our knowledge, it is the first end-to-end long user sequence modeling framework that models intra-sequence and target-sequence dependencies with the aforementioned degree of efficiency and successfully deployed on a large-scale real-time industrial recommender system.

preprint2022arXiv

Strain effects on topological and valley properties of Janus monolayer $\mathrm{VSiGeN_4}$

Strain is an effective method to tune the electronic properties of two-dimension (2D) materials, and can induce novel phase transition. Recently, 2D $\mathrm{MA_2Z_4}$ family materials are of interest because of their emerging topological, magnetic and superconducting properties. Here, we investigate the impact of strain effects ($a/a_0$:0.96$\sim$1.04) on the physical properties of Janus monolayer $\mathrm{VSiGeN_4}$ as a derivative of $\mathrm{VSi_2N_4}$ or $\mathrm{VGe_2N_4}$, which possesses dynamical, mechanical and thermal stabilities. For out-of-plane magnetic anisotropy, with increasing strain, $\mathrm{VSiGeN_4}$ undergoes transition between ferrovalley semiconductor (FVS), half-valley-metal (HVM), valley-polarized quantum anomalous Hall insulator (VQAHI), HVM and FVS. These imply twice topological phase transitions, which are related with sign-reversible Berry curvature and band inversion between $d_{xy}$+$d_{x^2-y^2}$ and $d_{z^2}$ orbitals for K or -K valley. The band inversion also leads to transformation of valley splitting strength between valence and conduction bands. However, for in-plane magnetic anisotropy, no special quantum anomalous Hall (QAH) states and valley polarization exist within the considered strain range. The actual magnetic anisotropy energy (MAE) shows no special QAH and HVM states in monolayer $\mathrm{VSiGeN_4}$. Fortunately, these can be easily achieved by external magnetic field, which adjusts the easy magnetization axis of $\mathrm{VSiGeN_4}$ from in-plane one to out-of-plane one. Our findings shed light on how strain can be employed to engineer the electronic states of $\mathrm{VSiGeN_4}$, which may open new perspectives for multifunctional quantum devices in valleytronics and spintronics.

preprint2022arXiv

Strain-driven valley states and phase transitions in Janus VSiGeN4 monolayer

The interplay between topology and valley degree of freedom has attracted much interest because it can realize new phenomena and applications. Here, based on first-principles calculations, we demonstrate intrinsically valley-polarized quantum anomalous Hall effect in monolayer ferrovalley material: Janus VSiGeN4, of which the edge states are chiral-spin-valley locking. Besides, a small tensile or compressive strain can drive phase transition in the material from valley-polarized quantum anomalous Hall state to half-valley-metal state. With the increase of the strain, the material turns into ferrovalley semiconductor with valley anomalous Hall effect. The origin of phase transition is sequent band inversion of V d orbital at K valley. Moreover, we find that phase transition causes the sign reversal of Berry curvature and induces different polarized light absorption in different valley states. Our work provides an ideal material platform for practical applications and experimental exploration of the interplay between topology, spintronics, and valleytronics.

preprint2022arXiv

Twin extreme ultraviolet waves in the solar corona

Solar extreme ultraviolet (EUV) waves are spectacular propagating disturbances with EUV enhancements in annular shapes in the solar corona. These EUV waves carry critical information about the coronal magnetised plasma that can shed light on the elusive physical parameters (e.g. the magnetic field strength) by global solar coronal magneto-seismology. EUV waves are closely associated with a wide range of solar atmospheric eruptions, from violent flares and coronal mass ejections (CMEs) to less energetic plasma jets or mini-filament eruptions. However, the physical nature and driving mechanism of EUV waves is still controversial. Here, we report the unique discovery of twin EUV waves (TEWs) that were formed in a single eruption with observations from two different perspectives. In all earlier studies, a single eruption was associated at most with a single EUV wave. The newly found TEWs urge to re-visit our theoretical understanding about the underlying formation mechanism(s) of coronal EUV waves. Two distinct scenarios of TEWs were found. In the first scenario, the two waves were separately associated with a filament eruption and a precursor jet, while in another scenario the two waves were successively associated with a filament eruption. Hence, we label these distinguished scenarios as "fraternal TEWs" and "identical TEWs", respectively. Further, we also suggest that impulsive lateral expansions of two distinct groups of coronal loops are critical to the formation of TEWs in a single eruption.

preprint2021arXiv

On the structure of Ricci shrinkers

We develop a structure theory for non-collapsed Ricci shrinkers without any curvature condition. As applications, we obtain some curvature estimates of the Ricci shrinkers depending only on the non-collapsing constant.

preprint2021arXiv

Photonic Floquet time crystals

The public and scientists constantly have different perspectives. While on a time crystal, they stand in line and ask: What is a time crystal? Show me a material that is spontaneously crystalline in time? This study synthesizes a photonic material of Floquet time crystals and experimentally observes its indicative period-2T beating. We explicitly reconstruct a discrete time-crystalline ground state and reveal using an appropriately-designed photonic Floquet simulator the rigid period-doubling as a signature of the spontaneous breakage of the discrete time-translational symmetry. Unlike the result of the exquisite many-body interaction, the photonic time crystal is derived from a single-particle topological phase that can be extensively accessed by many pertinent nonequilibrium and periodically-driven platforms. Our observation will drive theoretical and technological interests toward condensed matter physics and topological photonics, and demystify time crystals for the non-scientific public.

Bing Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

93 published item(s)

A Hybrid Tucker-LSTM Tensor Network Model for SOC Prediction in Electric Vehicles

AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning

Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

ParkGaussian: Surround-view 3D Gaussian Splatting for Autonomous Parking

Pixel-Perfect Visual Geometry Estimation

PointForward: Feedforward Driving Reconstruction through Point-Aligned Representations

Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation

R$^3$L: Reasoning 3D Layouts from Relative Spatial Relations

VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness

Xiaomi EV World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

Mirage: One-Step Video Diffusion for Photorealistic and Coherent Asset Editing in Driving Scenes

Heat kernel on Ricci shrinkers (II)

A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis

AutoPlace: Robust Place Recognition with Single-chip Automotive Radar

Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection

Directional transport of active particles confined in 3D smooth corrugated channel

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

Formation and Immediate Deformation of a Small Filament Through Intermittent Magnetic Interactions

Learning Selective Sensor Fusion for States Estimation

No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces

Numerical Study on Droplet Evaporation and Propagation Stability in Normal-temperature Two-phase Rotating Detonation System

On the pressure gain of stable flow systems with variable cross-section area

RangeUDF: Semantic Surface Reconstruction from 3D Point Clouds

Sparse Attentive Memory Network for Click-through Rate Prediction with Long Sequences

Strain effects on topological and valley properties of Janus monolayer $\mathrm{VSiGeN_4}$

Strain-driven valley states and phase transitions in Janus VSiGeN4 monolayer

Twin extreme ultraviolet waves in the solar corona

On the structure of Ricci shrinkers

Photonic Floquet time crystals

A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence

An Extreme Ultraviolet Wave Associated with A Solar Filament Activation

Efficient Routing for Quantum Key Distribution Networks

Monolithic photonic chips for multi-channel frequency mixers and single photon detectors

Non-destructive testing and evaluation of composite materials/structures: A state-of-the-art review

Observation of pseudogap in SnSe2 atomic layers grown on graphite

On the regular-convexity of Ricci shrinker limit spaces

Planar Turán Number of intersecting triangles

Random walks in time-varying networks with memory

Relational Deep Reinforcement Learning for Routing in Wireless Networks

Representations and fusion rules for the orbifold vertex operator algebras $L_{\widehat{\frak{sl}_2}}(k,0)^{\mathbb{Z}_3}$

Ricci flow smoothing for locally collapsing manifolds

See Through Smoke: Robust Indoor Mapping with Low-cost mmWave Radar

The initiation of a solar streamer blowout coronal mass ejection arising from the streamer flank

The maximum number of s-cliques in connected graphs and its application to spectral moment

Transport of Finite Size Self-Propelled Particles Confined in a 2D Zigzag Channel with Gaussian Colored Noise

ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment

Representations of the orbifold VOAS $L_{\hat{\frak{sl}_2}}(k,0)^K$ and the commutant VOAS $C_{{L_{\hat{\mathfrak{so}_m}}(1,0)}^{\otimes 3}}({L_{\hat{\mathfrak{so}_m}}(3,0)})$

An Eruptive Hot-Channel Structure Observed at Metric Wavelength as a Moving Type-IV Solar Radio Burst

Dynamics of a prominence-horn structure during its evaporation in the solar corona

Exceptional Points and Asymmetric Mode Switching in Plasmonic Waveguides

Joint Learning of Siamese CNNs and Temporally Constrained Metrics for Tracklet Association

Learning Contextual Dependencies with Convolutional Hierarchical Recurrent Neural Networks

Monochromatic loose path partitions in k-uniform hypergraphs

Observation of a Metric Type N Solar Radio Burst

Optimal Deployment of Multistatic Radar System Using Multi-Objective Particle Swarm Optimization

Remarks of weak-compactness along Kahler Ricci flow

Scene Parsing with Integration of Parametric and Non-parametric Models

Slipping magnetic reconnections with multiple flare ribbons during an X-class solar flare

Space of Ricci flows (II)

Tracklet Association by Online Target-Specific Metric Learning and Coherent Dynamics Estimation

An observational revisit of band-split solar type-II radio bursts

DAG-Recurrent Neural Networks For Scene Labeling

Detecting structural breaks in seasonal time series by regularized optimization

Evidence of the Solar EUV hot channel as a magnetic flux rope from remote-sensing and in-situ observations

Highly-sensitive detection of the lattice distortion in single bent ZnO nanowires by second-harmonic generation microscopy

Systematic study of complete fusion suppression in reactions involving weakly bound nuclei at energies above the Coulomb barrier

The average velocity of self-propelled particles in a two-dimensional potential with colored noise

Theoretical study of fusion reactions $^{32}$S + $^{94,96}$Zr and $^{40}$Ca + $^{94,96}$Zr and quadrupole deformation of $^{94}$Zr

A solar type II radio burst from CME-coronal ray interaction: simultaneous radio and EUV imaging

Possible role of coronal streamer as magnetically-closed structure in shock-induced energetic electrons and metric type II radio bursts

Radial basis function process neural network training based on generalized frechet distance and GA-SA hybrid strategy