Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
22topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

Operating and maintaining (O&M) large-scale online engine systems (eg, search, recommendation and advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. Despite the inherent suitability of LLM-based agents for such operational scenarios, the critical bottleneck impeding their practical deployment lies not in reasoning, but in orchestration capability - specifically, the precise selection of relevant data (encompassing metrics, logs, and change events) and applicable knowledge (including handbook-defined rules and empirically derived practitioner experience) tailored to each individual operational event. Feeding all signals indiscriminately causes dilution and hallucination, while manually curating the event-to-(data, knowledge) mapping is intractable under dozens of daily releases. Here we present Bian Que, an agentic operating framework with three contributions: (i) The unified operational paradigm, which abstracts routine daily O&M actions into three canonical patterns: release interception, proactive inspection, and alert root cause analysis; (ii) The flexible Skill Arrangement, each predefined Skill explicitly defines the requisite data and operational knowledge for each specific context. Such Skills can be automatically generated and updated by LLM agents, and can also be iteratively optimized by on-call engineers via natural language instructions. (iii) The unified self-evolving mechanism, where each correction signal enables two parallel evolutionary pathways: distilling event memory into knowledge, and targeted refinement of Skills. Deployed on the e-commerce search engine of KuaiShou, Bian Que reduces alert volume by 75%, achieves 80% root-cause analysis accuracy, cuts mean time to resolution by over 50%, and attains a 99.0% pass rate on offline evaluations. Codes are at https://github.com/benchen4395/BianQue_Assistant.

preprint2025arXiv

High-Frequency Thermal Graviton Remnant from the End of Inflation

The standard inflationary theory focuses on the freezing of super-horizon fluctuations, which generate a scale-invariant spectrum, while the sub-horizon modes are expected to remain in thermal equilibrium. Building upon recent development of quantum thermodynamics of the de Sitter universe, we investigate the graviton remnant originating from this thermal horizon radiation released at the end of inflation. Unlike the stochastic background from super-horizon fluctuations, this signal represents a snapshot of the thermal dS state, which subsequently decouples and undergoes cosmological redshift. We present a semi-analytical approximation prediction for this relic background, typically peaking in near MHz band, with characteristic energy density of $\log_{10}(Ω_{\rm G} h^2) \sim \mathcal{O}(-18)$. These signals occupy a High-Frequency band, offering a potential novel probe of the reheating temperature and the thermal history of the early universe.

preprint2025arXiv

MorphoCopter: Design, Modeling, and Control of a New Transformable Quad-Bi Copter

This paper presents a novel morphing quadrotor, named MorphoCopter, covering its design, modeling, control, and experimental tests. It features a unique single rotary joint that enables rapid transformation into an ultra-narrow profile. Although quadrotors have seen widespread adoption in applications such as cinematography, agriculture, and disaster management with increasingly sophisticated control systems, their hardware configurations have remained largely unchanged, limiting their capabilities in certain environments. Our design addresses this by enabling the hardware configuration to change on the fly when required. In standard flight mode, the MorphoCopter adopts an X configuration, functioning as a traditional quadcopter, but can quickly fold into a stacked bicopters arrangement or any configuration in between. Existing morphing designs often sacrifice controllability in compact configurations or rely on complex multi-joint systems. Moreover, our design achieves a greater width reduction than any existing solution. We develop a new inertia and control-action aware adaptive control system that maintains robust performance across all rotary-joint configurations. The prototype can reduce its width from 447 mm to 138 mm (nearly 70\% reduction) in just a few seconds. We validated the MorphoCopter through rigorous simulations and a comprehensive series of flight experiments, including robustness tests, trajectory tracking, and narrow-gap passing tests.

preprint2023arXiv

CyberLoc: Towards Accurate Long-term Visual Localization

This technical report introduces CyberLoc, an image-based visual localization pipeline for robust and accurate long-term pose estimation under challenging conditions. The proposed method comprises four modules connected in a sequence. First, a mapping module is applied to build accurate 3D maps of the scene, one map for each reference sequence if there exist multiple reference sequences under different conditions. Second, a single-image-based localization pipeline (retrieval--matching--PnP) is performed to estimate 6-DoF camera poses for each query image, one for each 3D map. Third, a consensus set maximization module is proposed to filter out outlier 6-DoF camera poses, and outputs one 6-DoF camera pose for a query. Finally, a robust pose refinement module is proposed to optimize 6-DoF query poses, taking candidate global 6-DoF camera poses and their corresponding global 2D-3D matches, sparse 2D-2D feature matches between consecutive query images and SLAM poses of the query sequence as input. Experiments on the 4seasons dataset show that our method achieves high accuracy and robustness. In particular, our approach wins the localization challenge of ECCV 2022 workshop on Map-based Localization for Autonomous Driving (MLAD-ECCV2022).

preprint2023arXiv

Exploring Iterative Refinement with Diffusion Models for Video Grounding

Video grounding aims to localize the target moment in an untrimmed video corresponding to a given sentence query. Existing methods typically select the best prediction from a set of predefined proposals or directly regress the target span in a single-shot manner, resulting in the absence of a systematical prediction refinement process. In this paper, we propose DiffusionVG, a novel framework with diffusion models that formulates video grounding as a conditional generation task, where the target span is generated from Gaussian noise inputs and interatively refined in the reverse diffusion process. During training, DiffusionVG progressively adds noise to the target span with a fixed forward diffusion process and learns to recover the target span in the reverse diffusion process. In inference, DiffusionVG can generate the target span from Gaussian noise inputs by the learned reverse diffusion process conditioned on the video-sentence representations. Without bells and whistles, our DiffusionVG demonstrates superior performance compared to existing well-crafted models on mainstream Charades-STA, ActivityNet Captions and TACoS benchmarks.

preprint2022arXiv

Bridging the Gap between Deep Learning and Frustrated Quantum Spin System for Extreme-scale Simulations on New Generation of Sunway Supercomputer

Efficient numerical methods are promising tools for delivering unique insights into the fascinating properties of physics, such as the highly frustrated quantum many-body systems. However, the computational complexity of obtaining the wave functions for accurately describing the quantum states increases exponentially with respect to particle number. Here we present a novel convolutional neural network (CNN) for simulating the two-dimensional highly frustrated spin-$1/2$ $J_1-J_2$ Heisenberg model, meanwhile the simulation is performed at an extreme scale system with low cost and high scalability. By ingenious employment of transfer learning and CNN's translational invariance, we successfully investigate the quantum system with the lattice size up to $24\times24$, within 30 million cores of the new generation of sunway supercomputer. The final achievement demonstrates the effectiveness of CNN-based representation of quantum-state and brings the state-of-the-art record up to a brand-new level from both aspects of remarkable accuracy and unprecedented scales.

preprint2022arXiv

Coexistence of in-plane and out-of-plane exchange Bias in correlated kagome antiferromagnet Mn3- xCrxSn

The materials exhibiting exchange bias (EB) have been extensively investigated mainly due to their great technological applications in magnetic sensors, but its underlying mechanism remains elusive. Here we report the novel coexistence of in-plane and out-of-plane EB in the Cr-doped Mn3Sn, a non-colinear antiferromagnet with a geometrically frustrated Kagome plane of Mn. Field-cooling experiments with the applied field parallel and perpendicular to the frustrated Kagome plane exhibits loop shifts and enhanced coercivities. Interestingly, a maximum EB field of 1090 Oe is observed along out-of-plane direction in the Mn2.58Cr0.42Sn sample, higher than that of in-plane value. Our results indicate that the exchange bias along perpendicular kagome plane is primarily induced by Dzyaloshinskii-Moriya interactions due to the breaking of interfacial symmetry, while the EB along kagome plane is due to the exchange effects at interface of AFM and FM component originating from the net moment. These findings provide a new insight on EB in the kagome AFM materials, which are important and highly potential to the application of antiferromagnetic spintronics.

preprint2022arXiv

Dressi: A Hardware-Agnostic Differentiable Renderer with Reactive Shader Packing and Soft Rasterization

Differentiable rendering (DR) enables various computer graphics and computer vision applications through gradient-based optimization with derivatives of the rendering equation. Most rasterization-based approaches are built on general-purpose automatic differentiation (AD) libraries and DR-specific modules handcrafted using CUDA. Such a system design mixes DR algorithm implementation and algorithm building blocks, resulting in hardware dependency and limited performance. In this paper, we present a practical hardware-agnostic differentiable renderer called Dressi, which is based on a new full AD design. The DR algorithms of Dressi are fully written in our Vulkan-based AD for DR, Dressi-AD, which supports all primitive operations for DR. Dressi-AD and our inverse UV technique inside it bring hardware independence and acceleration by graphics hardware. Stage packing, our runtime optimization technique, can adapt hardware constraints and efficiently execute complex computational graphs of DR with reactive cache considering the render pass hierarchy of Vulkan. HardSoftRas, our novel rendering process, is designed for inverse rendering with a graphics pipeline. Under the limited functionalities of the graphics pipeline, HardSoftRas can propagate the gradients of pixels from the screen space to far-range triangle attributes. Our experiments and applications demonstrate that Dressi establishes hardware independence, high-quality and robust optimization with fast speed, and photorealistic rendering.

preprint2022arXiv

Modality-Balanced Embedding for Video Retrieval

Video search has become the main routine for users to discover videos relevant to a text query on large short-video sharing platforms. During training a query-video bi-encoder model using online search logs, we identify a modality bias phenomenon that the video encoder almost entirely relies on text matching, neglecting other modalities of the videos such as vision, audio. This modality imbalanceresults from a) modality gap: the relevance between a query and a video text is much easier to learn as the query is also a piece of text, with the same modality as the video text; b) data bias: most training samples can be solved solely by text matching. Here we share our practices to improve the first retrieval stage including our solution for the modality imbalance issue. We propose MBVR (short for Modality Balanced Video Retrieval) with two key components: manually generated modality-shuffled (MS) samples and a dynamic margin (DM) based on visual relevance. They can encourage the video encoder to pay balanced attentions to each modality. Through extensive experiments on a real world dataset, we show empirically that our method is both effective and efficient in solving modality bias problem. We have also deployed our MBVR in a large video platform and observed statistically significant boost over a highly optimized baseline in an A/B test and manual GSB evaluations.

preprint2022arXiv

Region Specific Optimization (RSO)-based Deep Interactive Registration

Medical image registration is a fundamental and vital task which will affect the efficacy of many downstream clinical tasks. Deep learning (DL)-based deformable image registration (DIR) methods have been investigated, showing state-of-the-art performance. A test time optimization (TTO) technique was proposed to further improve the DL models' performance. Despite the substantial accuracy improvement with this TTO technique, there still remained some regions that exhibited large registration errors even after many TTO iterations. To mitigate this challenge, we firstly identified the reason why the TTO technique was slow, or even failed, to improve those regions' registration results. We then proposed a two-levels TTO technique, i.e., image-specific optimization (ISO) and region-specific optimization (RSO), where the region can be interactively indicated by the clinician during the registration result reviewing process. For both efficiency and accuracy, we further envisioned a three-step DL-based image registration workflow. Experimental results showed that our proposed method outperformed the conventional method qualitatively and quantitatively.

preprint2021arXiv

Deep learning based CT-to-CBCT deformable image registration for autosegmentation in head and neck adaptive radiation therapy

The purpose of this study is to develop a deep learning based method that can automatically generate segmentations on cone-beam CT (CBCT) for head and neck online adaptive radiation therapy (ART), where expert-drawn contours in planning CT (pCT) can serve as prior knowledge. Due to lots of artifacts and truncations on CBCT, we propose to utilize a learning based deformable image registration method and contour propagation to get updated contours on CBCT. Our method takes CBCT and pCT as inputs, and output deformation vector field and synthetic CT (sCT) at the same time by jointly training a CycleGAN model and 5-cascaded Voxelmorph model together.The CycleGAN serves to generate sCT from CBCT, while the 5-cascaded Voxelmorph serves to warp pCT to sCT's anatommy. The segmentation results were compared to Elastix, Voxelmorph and 5-cascaded Voxelmorph on 18 structures including left brachial plexus, right brachial plexus, brainstem, oral cavity, middle pharyngeal constrictor, superior pharyngeal constrictor, inferior pharyngeal constrictor, esophagus, nodal gross tumor volume, larynx, mandible, left masseter, right masseter, left parotid gland, right parotid gland, left submandibular gland, right submandibular gland, and spinal cord. Results show that our proposed method can achieve average Dice similarity coefficients and 95% Hausdorff distance of 0.83 and 2.01mm. As compared to other methods, our method has shown better accuracy to Voxelmorph and 5-cascaded Voxelmorph, and comparable accuracy to Elastix but much higher efficiency. The proposed method can rapidly and simultaneously generate sCT with correct CT numbers and propagate contours from pCT to CBCT for online ART re-planning.

preprint2021arXiv

Hybrid convolutional neural network and PEPS wave functions for quantum many-particle states

Neural networks have been used as variational wave functions for quantum many-particle problems. It has been shown that the correct sign structure is crucial to obtain the high accurate ground state energies. In this work, we propose a hybrid wave function combining the convolutional neural network (CNN) and projected entangled pair states (PEPS), in which the sign structures are determined by the PEPS, and the amplitudes of the wave functions are provided by CNN. We benchmark the ansatz on the highly frustrated spin-1/2 $J_1$-$J_2$ model. We show that the achieved ground energies are competitive to state-of-the-art results.

preprint2020arXiv

A Real-Time Receding Horizon Sequence Planner for Disassembly in A Human-Robot Collaboration Setting

Product disassembly is a labor-intensive process and is far from being automated. Typically, disassembly is not robust enough to handle product varieties from different shapes, models, and physical uncertainties due to component imperfections, damage throughout component usage, or insufficient product information. To overcome these difficulties and to automate the disassembly procedure through human-robot collaboration without excessive computational cost, this paper proposes a real-time receding horizon sequence planner that distributes tasks between robot and human operator while taking real-time human motion into consideration. The sequence planner aims to address several issues in the disassembly line, such as varying orientations, safety constraints of human operators, uncertainty of human operation, and the computational cost of large number of disassembly tasks. The proposed disassembly sequence planner identifies both the positions and orientations of the to-be-disassembled items, as well as the locations of human operator, and obtains an optimal disassembly sequence that follows disassembly rules and safety constraints for human operation. Experimental tests have been conducted to validate the proposed planner: the robot can locate and disassemble the components following the optimal sequence, and consider explicitly human operator's real-time motion, and collaborate with the human operator without violating safety constraints.

preprint2020arXiv

Generalizability issues with deep learning models in medicine and their potential solutions: illustrated with Cone-Beam Computed Tomography (CBCT) to Computed Tomography (CT) image conversion

Generalizability is a concern when applying a deep learning (DL) model trained on one dataset to other datasets. Training a universal model that works anywhere, anytime, for anybody is unrealistic. In this work, we demonstrate the generalizability problem, then explore potential solutions based on transfer learning (TL) by using the cone-beam computed tomography (CBCT) to computed tomography (CT) image conversion task as the testbed. Previous works have converted CBCT to CT-like images. However, all of those works studied only one or two anatomical sites and used images from the same vendor's scanners. Here, we investigated how a model trained for one machine and one anatomical site works on other machines and other sites. We trained a model on CBCT images acquired from one vendor's scanners for head and neck cancer patients and applied it to images from another vendor's scanners and for other disease sites. We found that generalizability could be a significant problem for this particular application when applying a trained DL model to datasets from another vendor's scanners. We then explored three practical solutions based on TL to solve this generalization problem: the target model, which is trained on a target domain from scratch; the combined model, which is trained on both source and target domain datasets from scratch; and the adapted model, which fine-tunes the trained source model to a target domain. We found that when there are sufficient data in the target domain, all three models can achieve good performance. When the target dataset is limited, the adapted model works the best, which indicates that using the fine-tuning strategy to adapt the trained model to an unseen target domain dataset is a viable and easy way to implement DL models in the clinic.

preprint2020arXiv

Including Image-based Perception in Disturbance Observer for Warehouse Drones

Grasping and releasing objects would cause oscillations to delivery drones in the warehouse. To reduce such undesired oscillations, this paper treats the to-be-delivered object as an unknown external disturbance and presents an image-based disturbance observer (DOB) to estimate and reject such disturbance. Different from the existing DOB technique that can only compensate for the disturbance after the oscillations happen, the proposed image-based one incorporates image-based disturbance prediction into the control loop to further improve the performance of the DOB. The proposed image-based DOB consists of two parts. The first one is deep-learning-based disturbance prediction. By taking an image of the to-be-delivered object, a sequential disturbance signal is predicted in advance using a connected pre-trained convolutional neural network (CNN) and a long short-term memory (LSTM) network. The second part is a conventional DOB in the feedback loop with a feedforward correction, which utilizes the deep learning prediction to generate a learning signal. Numerical studies are performed to validate the proposed image-based DOB regarding oscillation reduction for delivery drones during the grasping and releasing periods of the objects.

preprint2020arXiv

Model Uncertainty Quantification for Reliable Deep Vision Structural Health Monitoring

Computer vision leveraging deep learning has achieved significant success in the last decade. Despite the promising performance of the existing deep models in the recent literature, the extent of models' reliability remains unknown. Structural health monitoring (SHM) is a crucial task for the safety and sustainability of structures, and thus prediction mistakes can have fatal outcomes. This paper proposes Bayesian inference for deep vision SHM models where uncertainty can be quantified using the Monte Carlo dropout sampling. Three independent case studies for cracks, local damage identification, and bridge component detection are investigated using Bayesian inference. Aside from better prediction results, mean class softmax variance and entropy, the two uncertainty metrics, are shown to have good correlations with misclassifications. While the uncertainty metrics can be used to trigger human intervention and potentially improve prediction results, interpretation of uncertainty masks can be challenging. Therefore, surrogate models are introduced to take the uncertainty as input such that the performance can be further boosted. The proposed methodology in this paper can be applied to future deep vision SHM frameworks to incorporate model uncertainty in the inspection processes.

preprint2020arXiv

RelSen: An Optimization-based Framework for Simultaneously Sensor Reliability Monitoring and Data Cleaning

Recent advances in the Internet of Things (IoT) technology have led to a surge on the popularity of sensing applications. As a result, people increasingly rely on information obtained from sensors to make decisions in their daily life. Unfortunately, in most sensing applications, sensors are known to be error-prone and their measurements can become misleading at any unexpected time. Therefore, in order to enhance the reliability of sensing applications, apart from the physical phenomena/processes of interest, we believe it is also highly important to monitor the reliability of sensors and clean the sensor data before analysis on them being conducted. Existing studies often regard sensor reliability monitoring and sensor data cleaning as separate problems. In this work, we propose RelSen, a novel optimization-based framework to address the two problems simultaneously via utilizing the mutual dependence between them. Furthermore, RelSen is not application-specific as its implementation assumes a minimal prior knowledge of the process dynamics under monitoring. This significantly improves its generality and applicability in practice. In our experiments, we apply RelSen on an outdoor air pollution monitoring system and a condition monitoring system for a cement rotary kiln. Experimental results show that our framework can timely identify unreliable sensors and remove sensor measurement errors caused by three types of most commonly observed sensor faults.

preprint2019arXiv

A new probe to study symmetry energy at low density by deuteron breakup reaction

The reactions of nucleon and polarized deuteron scattered off a heavy target at large impact parameter with intermediate energies have been investigated by using the improved quantum molecular dynamics model. It is found that, due to the difference effect of isovector potential on proton and neutron, there is a significant difference between the angle distribution of elastic scattering protons and neutrons. To overcome the lack of monochromatic neutron beam, the reaction of polarized deuteron peripherally scattered off the heavy target is used to replace the reaction of individual proton and neutron scattered off heavy target to study the isospin effect. It is found that the distributions of elastic scattering angle of proton and neutron originating from the breakup of deuteron are very similar to the results of the individual proton- and neutron-induced reaction. A new probe more effective and more clean, namely the difference between elastic scattering angle of proton and neutron originating from the breakup of polarized deuteron, is promoted to constrain the symmetry energy at subsaturation density.

preprint2019arXiv

The Quantum Cocktail Party Problem

The cocktail party problem refers to the famous selective attention problem of how to find out the signal of each individual sources from signals of a number of detectors. In the classical cocktail party problem, the signal of each source is a sequence of data such as the voice from a speaker, and each detector detects signal as a linear combination of all sources. This problem can be solved by a unsupervised machine learning algorithm known as the independent component analysis. In this work we propose a quantum analog of the cocktail party problem. Here each source is a density matrix of a pure state and each detector detects a density matrix as a linear combination of all pure state density matrix. The quantum cocktail party problem is to recover the pure state density matrix from a number observed mixed state density matrices. We propose the physical realization of this problem, and how to solve this problem through either classical Newton's optimization method or by mapping the problem to the ground state of an Ising type of spin Hamiltonian.

preprint2017arXiv

Generation of Bose-Einstein Condensates' Ground State Through Machine Learning

We show that both single-component and two-component Bose-Einstein condensates' (BECs) ground states can be simulated by deep convolutional neural networks of the same structure. We trained the neural network via inputting the coupling strength in the dimensionless Gross-Pitaevskii equation (GPE) and outputting the ground state wave-function. After training, the neural network generates ground states faster than the method of imaginary time evolution, while the relative mean-square-error between predicted states and original states is in the magnitude between $10^{-5}$ and $10^{-4}$. We compared the eigen-energies based on predicted states and original states, it is shown that the neural network can predict eigen-energies in high precisions. Therefore, the BEC ground states, which are continuous wave-functions, can be represented by deep convolution neural networks.