Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

CAST-LUT: Tokenizer-Guided HSV Look-Up Tables for Purple Flare Removal

Purple flare, a diffuse chromatic aberration artifact commonly found around highlight areas, severely degrades the tone transition and color of the image. Existing traditional methods are based on hand-crafted features, which lack flexibility and rely entirely on fixed priors, while the scarcity of paired training data critically hampers deep learning. To address this issue, we propose a novel network built upon decoupled HSV Look-Up Tables (LUTs). The method aims to simplify color correction by adjusting the Hue (H), Saturation (S), and Value (V) components independently. This approach resolves the inherent color coupling problems in traditional methods. Our model adopts a two-stage architecture: First, a Chroma-Aware Spectral Tokenizer (CAST) converts the input image from RGB space to HSV space and independently encodes the Hue (H) and Value (V) channels into a set of semantic tokens describing the Purple flare status; second, the HSV-LUT module takes these tokens as input and dynamically generates independent correction curves (1D-LUTs) for the three channels H, S, and V. To effectively train and validate our model, we built the first large-scale purple flare dataset with diverse scenes. We also proposed new metrics and a loss function specifically designed for this task. Extensive experiments demonstrate that our model not only significantly outperforms existing methods in visual effects but also achieves state-of-the-art performance on all quantitative metrics.

preprint2026arXiv

Lifelong Domain Adaptive 3D Human Pose Estimation

3D Human Pose Estimation (3D HPE) is vital in various applications, from person re-identification and action recognition to virtual reality. However, the reliance on annotated 3D data collected in controlled environments poses challenges for generalization to diverse in-the-wild scenarios. Existing domain adaptation (DA) paradigms like general DA and source-free DA for 3D HPE overlook the issues of non-stationary target pose datasets. To address these challenges, we propose a novel task named lifelong domain adaptive 3D HPE. To our knowledge, we are the first to introduce the lifelong domain adaptation to the 3D HPE task. In this lifelong DA setting, the pose estimator is pretrained on the source domain and subsequently adapted to distinct target domains. Moreover, during adaptation to the current target domain, the pose estimator cannot access the source and all the previous target domains. The lifelong DA for 3D HPE involves overcoming challenges in adapting to current domain poses and preserving knowledge from previous domains, particularly combating catastrophic forgetting. We present an innovative Generative Adversarial Network (GAN) framework, which incorporates 3D pose generators, a 2D pose discriminator, and a 3D pose estimator. This framework effectively mitigates domain shifts and aligns original and augmented poses. Moreover, we construct a novel 3D pose generator paradigm, integrating pose-aware, temporal-aware, and domain-aware knowledge to enhance the current domain's adaptation and alleviate catastrophic forgetting on previous domains. Our method demonstrates superior performance through extensive experiments on diverse domain adaptive 3D HPE datasets.

preprint2026arXiv

SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition

Parameter-efficient fine-tuning (PEFT) is a scalable approach for adapting large speech foundation models to new domains. While methods such as LoRA and its state-of-the-art variants reduce adaptation costs, they typically allocate parameters uniformly across model subspaces, which limits their efficiency and scalability in speech applications. Building on our prior work, this paper introduces SSVD-Outer (SSVD-O), an extension of the structured SVD-guided (SSVD) fine-tuning method. SSVD-O combines input acoustic feature space-associated inner transformations with output semantic feature space-associated outer transformations to enable scalable and balanced adaptation. We conduct the first systematic analysis of parameter budget allocation across model subspaces in PEFT for automatic speech recognition (ASR), and investigate the trade-off between learning and forgetting under constrained resources. SSVD-O is benchmarked against LoRA, DoRA, PiSSA, and SSVD on domain-shifted ASR tasks, including child speech and regional accents, across model scales from 0.1B to 2B within the ESPnet framework. Experimental results show that SSVD-O consistently narrows the performance gap to full fine-tuning while improving generalization and mitigating catastrophic forgetting.

preprint2022arXiv

A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose

Existing deep learning-based human mesh reconstruction approaches have a tendency to build larger networks in order to achieve higher accuracy. Computational complexity and model size are often neglected, despite being key characteristics for practical use of human mesh reconstruction models (e.g. virtual try-on systems). In this paper, we present GTRS, a lightweight pose-based method that can reconstruct human mesh from 2D human pose. We propose a pose analysis module that uses graph transformers to exploit structured and implicit joint correlations, and a mesh regression module that combines the extracted pose feature with the mesh template to reconstruct the final human mesh. We demonstrate the efficiency and generalization of GTRS by extensive evaluations on the Human3.6M and 3DPW datasets. In particular, GTRS achieves better accuracy than the SOTA pose-based method Pose2Mesh while only using 10.2% of the parameters (Params) and 2.5% of the FLOPs on the challenging in-the-wild 3DPW dataset. Code will be publicly available.

preprint2022arXiv

AutoQML: Automated Quantum Machine Learning for Wi-Fi Integrated Sensing and Communications

Commercial Wi-Fi devices can be used for integrated sensing and communications (ISAC) to jointly exchange data and monitor indoor environment. In this paper, we investigate a proof-of-concept approach using automated quantum machine learning (AutoQML) framework called AutoAnsatz to recognize human gesture. We address how to efficiently design quantum circuits to configure quantum neural networks (QNN). The effectiveness of AutoQML is validated by an in-house experiment for human pose recognition, achieving state-of-the-art performance greater than 80% accuracy for a limited data size with a significantly small number of trainable parameters.

preprint2022arXiv

Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding

End-to-end spoken language understanding (SLU) systems benefit from pretraining on large corpora, followed by fine-tuning on application-specific data. The resulting models are too large for on-edge applications. For instance, BERT-based systems contain over 110M parameters. Observing the model is overparameterized, we propose lean transformer structure where the dimension of the attention mechanism is automatically reduced using group sparsity. We propose a variant where the learned attention subspace is transferred to an attention bottleneck layer. In a low-resource setting and without pre-training, the resulting compact SLU model achieves accuracies competitive with pre-trained large models.

preprint2022arXiv

Deep learning on rail profiles matching

Matching the rail cross-section profiles measured on site with the designed profile is a must to evaluate the wear of the rail, which is very important for track maintenance and rail safety. So far, the measured rail profiles to be matched usually have four features, that is, large amount of data, diverse section shapes, hardware made errors, and human experience needs to be introduced to solve the complex situation on site during matching process. However, traditional matching methods based on feature points or feature lines could no longer meet the requirements. To this end, we first establish the rail profiles matching dataset composed of 46386 pairs of professional manual matched data, then propose a general high-precision method for rail profiles matching using pre-trained convolutional neural network (CNN). This new method based on deep learning is promising to be the dominant approach for this issue. Source code is at https://github.com/Kunqi1994/Deep-learning-on-rail-profile-matching.

preprint2022arXiv

EdgeML: Towards Network-Accelerated Federated Learning over Wireless Edge

Federated learning (FL) is a distributed machine learning technology for next-generation AI systems that allows a number of workers, i.e., edge devices, collaboratively learn a shared global model while keeping their data locally to prevent privacy leakage. Enabling FL over wireless multi-hop networks can democratize AI and make it accessible in a cost-effective manner. However, the noisy bandwidth-limited multi-hop wireless connections can lead to delayed and nomadic model updates, which significantly slows down the FL convergence speed. To address such challenges, this paper aims to accelerate FL convergence over wireless edge by optimizing the multi-hop federated networking performance. In particular, the FL convergence optimization problem is formulated as a Markov decision process (MDP). To solve such MDP, multi-agent reinforcement learning (MA-RL) algorithms along with domain-specific action space refining schemes are developed, which online learn the delay-minimum forwarding paths to minimize the model exchange latency between the edge devices (i.e., workers) and the remote server. To validate the proposed solutions, FedEdge is developed and implemented, which is the first experimental framework in the literature for FL over multi-hop wireless edge computing networks. FedEdge allows us to fast prototype, deploy, and evaluate novel FL algorithms along with RL-based system optimization methods in real wireless devices. Moreover, a physical experimental testbed is implemented by customizing the widely adopted Linux wireless routers and ML computing nodes.Finally, our experimentation results on the testbed show that the proposed network-accelerated FL system can practically and significantly improve FL convergence speed, compared to the FL system empowered by the production-grade commercially available wireless networking protocol, BATMAN-Adv.

preprint2022arXiv

Exploiting Temporal Relations on Radar Perception for Autonomous Driving

We consider the object recognition problem in autonomous driving using automotive radar sensors. Comparing to Lidar sensors, radar is cost-effective and robust in all-weather conditions for perception in autonomous driving. However, radar signals suffer from low angular resolution and precision in recognizing surrounding objects. To enhance the capacity of automotive radar, in this work, we exploit the temporal information from successive ego-centric bird-eye-view radar image frames for radar object recognition. We leverage the consistency of an object's existence and attributes (size, orientation, etc.), and propose a temporal relational layer to explicitly model the relations between objects within successive radar images. In both object detection and multiple object tracking, we show the superiority of our method compared to several baseline approaches.

preprint2022arXiv

From Raman frequency combs to supercontinuum generation in nitrogen-filled hollow-core anti-resonant fiber

We demonstrate a route to supercontinuum generation in gas-filled hollow-core anti-resonant fibers through the creation of a broad vibrational Raman frequency comb followed by continuous broadening and merging of the comb lines through either rotational Raman scattering or the optical Kerr effect. Our demonstration experiments, utilizing a single pump pulse with 20 ps duration at 532 nm in a nitrogen-filled fiber, produce a supercontinuum spanning from 440 nm to 1200 nm, with an additional deep ultraviolet continuum from 250 nm to 360 nm. Numerical results suggest that this approach can produce even broader supercontinuum spectra extending from the ultraviolet to mid-infrared.

preprint2022arXiv

KST-GCN: A Knowledge-Driven Spatial-Temporal Graph Convolutional Network for Traffic Forecasting

While considering the spatial and temporal features of traffic, capturing the impacts of various external factors on travel is an essential step towards achieving accurate traffic forecasting. However, existing studies seldom consider external factors or neglect the effect of the complex correlations among external factors on traffic. Intuitively, knowledge graphs can naturally describe these correlations. Since knowledge graphs and traffic networks are essentially heterogeneous networks, it is challenging to integrate the information in both networks. On this background, this study presents a knowledge representation-driven traffic forecasting method based on spatial-temporal graph convolutional networks. We first construct a knowledge graph for traffic forecasting and derive knowledge representations by a knowledge representation learning method named KR-EAR. Then, we propose the Knowledge Fusion Cell (KF-Cell) to combine the knowledge and traffic features as the input of a spatial-temporal graph convolutional backbone network. Experimental results on the real-world dataset show that our strategy enhances the forecasting performances of backbones at various prediction horizons. The ablation and perturbation analysis further verify the effectiveness and robustness of the proposed method. To the best of our knowledge, this is the first study that constructs and utilizes a knowledge graph to facilitate traffic forecasting; it also offers a promising direction to integrate external information and spatial-temporal information for traffic forecasting. The source code is available at https://github.com/lehaifeng/T-GCN/tree/master/KST-GCN.

preprint2022arXiv

Line-field Coherent Sensing with LED Illumination

We describe a method of low-coherence interferometry based optical profilometry using standard light-emitting diode (LED) illumination and complementary metal-oxide-semiconductor (CMOS) image sensors. A line-field illumination strategy allows for the simultaneous measurement of many points in space. Micron scale accuracy and resolution are achieved and demonstrated using a variety of targets.

preprint2022arXiv

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning

Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices). However, the data distribution among clients is often non-IID in nature, making efficient optimization difficult. To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model. Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. To this end, we first present a systematic study informed by second-order indicators to better understand algorithm effectiveness in FL. Interestingly, we find that standard regularization methods are surprisingly strong performers in mitigating data heterogeneity effects. Based on our findings, we further propose a simple and effective method, FedAlign, to overcome data heterogeneity and the pitfalls of previous methods. FedAlign achieves competitive accuracy with state-of-the-art FL methods across a variety of settings while minimizing computation and memory overhead. Code is available at https://github.com/mmendiet/FedAlign

preprint2022arXiv

Oxygen Deficient α-MoO3 with Promoted Adsorption and State-Quenching of H2O for Gas Sensor: A DFT Study

Semiconducting oxides with reducible cations are ideal platforms for various functional applications in nanoelectronics and catalysts. Here we report an ultrathin monolayer alpha-MoO3 where tunable electronic properties and different gas adsorbing behaviors upon introducing the oxygen vacancies (VO). The unique property of alpha-MoO3 is that it contains three different types of oxygen atoms occupying three Wyckoff sites that are absent in other low-dimensional oxides and provides rich electronic hybridized states. The presence of VO triggers intermediate state in the gap at ~0.59 eV below the conduction band minimum and reduces the work function dramatically, together with new excitations at near infrared. The realigned Fermi level associated with the dangling state of VO reduces the neighboring Mo atoms and affects the gas adsorption thereafter. The binding energy of H2O molecules above VO is 2.5 times up to -0.75 eV compared with that of perfect lattice site and trends of transfer of electrons also reverse. The latter is related with the shallow localized state in the band gap due to H2O adsorbed above perfect MoO3 which becomes quenched upon adsorbing at the VO site. Those rich in-gap defective states in oxygen deficient MoO3, broadening the light absorption and promoting the uptake of water, are conductive to the application of alpha-MoO3 for optoelectronics, photothermal therapy, and sensor of moisture.

preprint2022arXiv

Quantum Transfer Learning for Wi-Fi Sensing

Beyond data communications, commercial-off-the-shelf Wi-Fi devices can be used to monitor human activities, track device locomotion, and sense the ambient environment. In particular, spatial beam attributes that are inherently available in the 60-GHz IEEE 802.11ad/ay standards have shown to be effective in terms of overhead and channel measurement granularity for these indoor sensing tasks. In this paper, we investigate transfer learning to mitigate domain shift in human monitoring tasks when Wi-Fi settings and environments change over time. As a proof-of-concept study, we consider quantum neural networks (QNN) as well as classical deep neural networks (DNN) for the future quantum-ready society. The effectiveness of both DNN and QNN is validated by an in-house experiment for human pose recognition, achieving greater than 90% accuracy with a limited data size.

preprint2021arXiv

MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations

Most existing deep neural networks are static, which means they can only do inference at a fixed complexity. But the resource budget can vary substantially across different devices. Even on a single device, the affordable budget can change with different scenarios, and repeatedly training networks for each required budget would be incredibly expensive. Therefore, in this work, we propose a general method called MutualNet to train a single network that can run at a diverse set of resource constraints. Our method trains a cohort of model configurations with various network widths and input resolutions. This mutual learning scheme not only allows the model to run at different width-resolution configurations but also transfers the unique knowledge among these configurations, helping the model to learn stronger representations overall. MutualNet is a general training methodology that can be applied to various network structures (e.g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e.g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets. Since we only train the model once, it also greatly reduces the training cost compared to independently training several models. Surprisingly, MutualNet can also be used to significantly boost the performance of a single network, if dynamic resource constraint is not a concern. In summary, MutualNet is a unified method for both static and adaptive, 2D and 3D networks. Codes and pre-trained models are available at \url{https://github.com/taoyang1122/MutualNet}.

preprint2020arXiv

Rapid Determination of Antimicrobial Susceptibility by Stimulated Raman Scattering Imaging of D2O Metabolic Incorporation in a Single Bacterium

Rapid antimicrobial susceptibility testing (AST) is urgently needed for treating infections with correct antibiotics and slowing down the emergence of antibiotic-resistant bacteria. Here, we report a phenotypic platform that rapidly produces AST results by femtosecond stimulated Raman scattering imaging of deuterium oxide (D2O) metabolism. Metabolic incorporation of D2O into biomass in a single bacterium is probed in as short as 10 minutes after culture in 70% D2O medium, the fastest among current technologies. Single-cell metabolism inactivation concentration (SC-MIC) is obtained in less than 2.5 hours from colony to results. The SC-MIC results of 37 sets of samples, which include 8 major bacterial species and 14 different antibiotics often encountered in clinic, are validated by standard minimal inhibitory concentration blindly measured via broth microdilution. Towards clinical translation, SRS imaging of D2O metabolic incorporation and SC-MIC determination after 1-h antibiotics treatment and 30-minutes mixture of D2O and antibiotics incubation of bacteria in urine or whole blood is demonstrated.