Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

EdgeFM: Efficient Edge Inference for Vision-Language Models

Vision-language models (VLMs) have demonstrated strong applicability in edge industrial applications, yet their deployment remains severely constrained by requirements for deterministic low latency and stable execution under resource limitations. Existing frameworks either rely on bloated general-purpose designs or force developers into opaque, hardware-specific closed-source ecosystems, leading to hardware lock-in limitation and poor cross-platform adaptability. Observing that modern AI agents can efficiently search and tune configurations to generate highly optimized low-level kernels for standard LLM operators, we propose EdgeFM, a lightweight, agent-driven VLM/LLM inference framework tailored for cross-platform industrial edge deployment. EdgeFM removes non-essential features to reduce single-request latency, and encapsulates agent-tuned kernel optimizations as a modular library of reusable skills. By allowing direct invocation of these skills rather than waiting for closed-source implementations, it effectively closes the performance gap long dominated by proprietary toolchains. The framework natively supports mainstream platforms including x86 and NVIDIA Orin SoCs, and represents the first end-to-end VLA deployment on the domestic Horizon Journey platform, enhancing cross-platform portability. In most cases, it yields clearly better inference performance than conventional vendor-specific toolchains, achieving up to 1.49 times speedup over TensorRT-Edge-LLM on the NVIDIA Orin platform. Experimental results show that EdgeFM delivers favorable end-to-end inference performance, providing an open-source, production-grade solution for diverse edge industrial scenarios.

preprint2025arXiv

CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards

Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zero-supervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10--13 F$_1$ points and strong LLM fine-tunes by 5--8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence. CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines.

preprint2022arXiv

ContrastMask: Contrastive Learning to Segment Every Thing

Partially-supervised instance segmentation is a task which requests segmenting objects from novel unseen categories via learning on limited seen categories with annotated masks thus eliminating demands of heavy annotation burden. The key to addressing this task is to build an effective class-agnostic mask segmentation model. Unlike previous methods that learn such models only on seen categories, in this paper, we propose a new method, named ContrastMask, which learns a mask segmentation model on both seen and unseen categories under a unified pixel-level contrastive learning framework. In this framework, annotated masks of seen categories and pseudo masks of unseen categories serve as a prior for contrastive learning, where features from the mask regions (foreground) are pulled together, and are contrasted against those from the background, and vice versa. Through this framework, feature discrimination between foreground and background is largely improved, facilitating learning of the class-agnostic mask segmentation model. Exhaustive experiments on the COCO dataset demonstrate the superiority of our method, which outperforms previous state-of-the-arts.

preprint2022arXiv

Geometric Synthesis: A Free lunch for Large-scale Palmprint Recognition Model Pretraining

Palmprints are private and stable information for biometric recognition. In the deep learning era, the development of palmprint recognition is limited by the lack of sufficient training data. In this paper, by observing that palmar creases are the key information to deep-learning-based palmprint recognition, we propose to synthesize training data by manipulating palmar creases. Concretely, we introduce an intuitive geometric model which represents palmar creases with parameterized Bézier curves. By randomly sampling Bézier parameters, we can synthesize massive training samples of diverse identities, which enables us to pretrain large-scale palmprint recognition models. Experimental results demonstrate that such synthetically pretrained models have a very strong generalization ability: they can be efficiently transferred to real datasets, leading to significant performance improvements on palmprint recognition. For example, under the open-set protocol, our method improves the strong ArcFace baseline by more than 10\% in terms of TAR@1e-6. And under the closed-set protocol, our method reduces the equal error rate (EER) by an order of magnitude.

preprint2022arXiv

Learning an Efficient Multimodal Depth Completion Model

With the wide application of sparse ToF sensors in mobile devices, RGB image-guided sparse depth completion has attracted extensive attention recently, but still faces some problems. First, the fusion of multimodal information requires more network modules to process different modalities. But the application scenarios of sparse ToF measurements usually demand lightweight structure and low computational cost. Second, fusing sparse and noisy depth data with dense pixel-wise RGB data may introduce artifacts. In this paper, a light but efficient depth completion network is proposed, which consists of a two-branch global and local depth prediction module and a funnel convolutional spatial propagation network. The two-branch structure extracts and fuses cross-modal features with lightweight backbones. The improved spatial propagation module can refine the completed depth map gradually. Furthermore, corrected gradient loss is presented for the depth completion problem. Experimental results demonstrate the proposed method can outperform some state-of-the-art methods with a lightweight architecture. The proposed method also wins the championship in the MIPI2022 RGB+TOF depth completion challenge.

preprint2022arXiv

Nonlinear semigroup approach to Hamilton-Jacobi equations -- A toy model

In this paper, we discuss the existence and multiplicity problem of viscosity solution to the Hamilton-Jacobi equation $$h(x,d_x u)+λ(x)u=c,\quad x\in M,$$ where $M$ is a closed manifold and $λ:M\rightarrow\mathbb{R}$ changes signs on $M$, via nonlinear semigroup method. It turns out that a bifurcation phenomenon occurs when parameter $c$ strides over the critical value. As an application of the main result, we analyse the structure of the set of viscosity solutions of an one-dimensional example in detail.

preprint2022arXiv

SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets

Today's scientific high performance computing (HPC) applications or advanced instruments are producing vast volumes of data across a wide range of domains, which introduces a serious burden on data transfer and storage. Error-bounded lossy compression has been developed and widely used in scientific community, because not only can it significantly reduce the data volumes but it can also strictly control the data distortion based on the use-specified error bound. Existing lossy compressors, however, cannot offer ultra-fast compression speed, which is highly demanded by quite a few applications or use-cases (such as in-memory compression and online instrument data compression). In this paper, we propose a novel ultra-fast error-bounded lossy compressor, which can obtain fairly high compression performance on both CPU and GPU, also with reasonably high compression ratios. The key contributions are three-fold: (1) We propose a novel, generic ultra-fast error-bounded lossy compression framework -- called UFZ, by confining our design to be composed of only super-lightweight operations such as bitwise and addition/subtraction operation, still keeping a certain high compression ratio. (2) We implement UFZ on both CPU and GPU and optimize the performance according to their architectures carefully. (3) We perform a comprehensive evaluation with 6 real-world production-level scientific datasets on both CPU and GPU. Experiments show that UFZ is 2~16X as fast as the second-fastest existing error-bounded lossy compressor (either SZ or ZFP) on CPU and GPU, with respect to both compression and decompression.

preprint2022arXiv

Variational attraction of the KAM torus for the conformally symplectic system

For the conformally symplectic system \[ \left\{ \begin{aligned} \dot{q}&=H_p(q,p),\quad(q,p)\in T^*\mathbb{T}^n\\ \dot p&=-H_q(q,p)-λp, \quad λ>0 \end{aligned} \right. \] with a positive definite Hamiltonian, we discuss the variational significance of invariant Lagrangian graphs and explain how the KAM torus impacts the $W^{1,\infty}-$convergence speed of the Lax-Oleinik semigroup.

preprint2021arXiv

Res2Net: A New Multi-scale Backbone Architecture

Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/.

preprint2021arXiv

The localization of quantum random walks on sierpinski gaskets

We consider the discrete time quantum random walks on a Sierpinski gasket. We study the hitting probability as the level of fractal goes to infinity in terms of their localization exponents $β_w$ , total variation exponents $δ_w$ and relative entropy exponents $η_w$ . We define and solve the amplitude Green functions recursively when the level of the fractal graph goes to infinity. We obtain exact recursive formulas for the amplitude Green functions, based on which the hitting probabilities and expectation of the first-passage time are calculated. Using the recursive formula with the aid of Monte Carlo integration, we evaluate their numerical values. We also show that when the level of the fractal graph goes to infinity, with probability 1, the quantum random walks will return to origin, i.e., the quantum walks on Sierpinski gasket are recurrent.

preprint2021arXiv

Time-periodic solutions of contact Hamilton-Jacobi equations on the circle

We are concerned with the existence and multiplicity of nontrivial time-periodic viscosity solutions to \[ \partial_t w(x,t) + H( x,\partial_x w(x,t),w(x,t) )=0,\quad (x,t)\in \mathbb{S} \times [0,+\infty). \] We find that there are infinitely many nontrivial time-periodic viscosity solutions with different periods when $\frac{\partial H}{\partial u}(x,p,u)\leqslant-δ<0$ by analyzing the asymptotic behavior of the dynamical system $(C(\mathbb{S} ,\mathbb{R}),\{T_t\}_{t\geqslant 0})$, where $\{T_t\}_{t\geqslant 0}$ was introduced in \cite{WWY1}. Moreover, in view of the convergence of $T_{t_n}φ$, we get the existence of nontrivial periodic points of $T_t$, where $φ$ are initial data satisfying certain properties. This is a long-time behavior result for the solution to the above equation with initial data $φ$. At last, as an application, we describe to readers a bifurcation phenomenon for \[ \partial_t w(x,t) + H( x,\partial_x w(x,t),λw(x,t) )=0,\quad (x,t)\in \mathbb{S} \times [0,+\infty), \] when the sign of the parameter $λ$ varies. The structure of the unit circle $\mathbb{S}$ plays an essential role here. The most important novelty is the discovery of the nontrivial recurrence of $(C(\mathbb{S} ,\mathbb{R}),\{T_t\}_{t\geqslant 0})$.

preprint2020arXiv

cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data

Error-bounded lossy compression is a state-of-the-art data reduction technique for HPC applications because it not only significantly reduces storage overhead but also can retain high fidelity for postanalysis. Because supercomputers and HPC applications are becoming heterogeneous using accelerator-based architectures, in particular GPUs, several development teams have recently released GPU versions of their lossy compressors. However, existing state-of-the-art GPU-based lossy compressors suffer from either low compression and decompression throughput or low compression quality. In this paper, we present an optimized GPU version, cuSZ, for one of the best error-bounded lossy compressors-SZ. To the best of our knowledge, cuSZ is the first error-bounded lossy compressor on GPUs for scientific data. Our contributions are fourfold. (1) We propose a dual-quantization scheme to entirely remove the data dependency in the prediction step of SZ such that this step can be performed very efficiently on GPUs. (2) We develop an efficient customized Huffman coding for the SZ compressor on GPUs. (3) We implement cuSZ using CUDA and optimize its performance by improving the utilization of GPU memory bandwidth. (4) We evaluate our cuSZ on five real-world HPC application datasets from the Scientific Data Reduction Benchmarks and compare it with other state-of-the-art methods on both CPUs and GPUs. Experiments show that our cuSZ improves SZ&#39;s compression throughput by up to 370.1x and 13.1x, respectively, over the production version running on single and multiple CPU cores, respectively, while getting the same quality of reconstructed data. It also improves the compression ratio by up to 3.48x on the tested data compared with another state-of-the-art GPU supported lossy compressor.

preprint2020arXiv

Dependency Aware Filter Pruning

Convolutional neural networks (CNNs) are typically over-parameterized, bringing considerable computational overhead and memory footprint in inference. Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost. For this purpose, identifying unimportant convolutional filters is the key to effective filter pruning. Previous work prunes filters according to either their weight norms or the corresponding batch-norm scaling factors, while neglecting the sequential dependency between adjacent layers. In this paper, we further develop the norm-based importance estimation by taking the dependency between the adjacent layers into consideration. Besides, we propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity. In this way, we can identify unimportant filters and search for the optimal network architecture within certain resource budgets in a more principled manner. Comprehensive experimental results demonstrate the proposed method performs favorably against the existing strong baseline on the CIFAR, SVHN, and ImageNet datasets. The training sources will be publicly available after the review process.

preprint2020arXiv

FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural Networks

Convolutional neural networks (CNNs) are becoming more and more important for solving challenging and critical problems in many fields. CNN inference applications have been deployed in safety-critical systems, which may suffer from soft errors caused by high-energy particles, high temperature, or abnormal voltage. Of critical importance is ensuring the stability of the CNN inference process against soft errors. Traditional fault tolerance methods are not suitable for CNN inference because error-correcting code is unable to protect computational components, instruction duplication techniques incur high overhead, and existing algorithm-based fault tolerance (ABFT) techniques cannot protect all convolution implementations. In this paper, we focus on how to protect the CNN inference process against soft errors as efficiently as possible, with the following three contributions. (1) We propose several systematic ABFT schemes based on checksum techniques and analyze their fault protection ability and runtime thoroughly.Unlike traditional ABFT based on matrix-matrix multiplication, our schemes support any convolution implementations. (2) We design a novel workflow integrating all the proposed schemes to obtain a high detection/correction ability with limited total runtime overhead. (3) We perform our evaluation using ImageNet with well-known CNN models including AlexNet, VGG-19, ResNet-18, and YOLOv2. Experimental results demonstrate that our implementation can handle soft errors with very limited runtime overhead (4%~8% in both error-free and error-injected situations).

preprint2020arXiv

Progress of Quantum Molecular Dynamics model and its applications in Heavy Ion Collisions

In this review article, we first briefly introduce the transport theory and quantum molecular dynamics model applied in the study of the heavy ion collisions from low to intermediate energies. The developments of improved quantum molecular dynamics model (ImQMD) and ultra-relativistic quantum molecular dynamics model (UrQMD), are reviewed. The reaction mechanism and phenomena related to the fusion, multinucleon transfer, fragmentation, collective flow and particle production are reviewed and discussed within the framework of the two models. The constraints on the isospin asymmetric nuclear equation of state and in-medium nucleon-nucleon cross sections by comparing the heavy ion collision data with transport models calculations in last decades are also discussed, and the uncertainties of these constraints are analyzed as well. Finally, we discuss the future direction of the development of the transport models for improving the understanding of the reaction mechanism, the descriptions of various observables, the constraint on the nuclear equation of state, as well as for the constraint on in-medium nucleon-nucleon cross sections.

preprint2019arXiv

LinearFold: linear-time approximate RNA folding by 5&#39;-to-3&#39; dynamic programming and beam search

Motivation: Predicting the secondary structure of an RNA sequence is useful in many applications. Existing algorithms (based on dynamic programming) suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Results: We present a novel alternative $O(n^3)$-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in $O(n)$ time and $O(n)$ space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5&#39;-to-3&#39;) direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models. Availability: Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100,000nt).

preprint2018arXiv

Vanishing contact structure problem and convergence of the viscosity solutions

This paper is devoted to study the vanishing contact structure problem which is a generalization of the vanishing discount problem. Let $H^λ(x,p,u)$ be a family of Hamiltonians of contact type with parameter $λ>0$ and converges to $G(x,p)$. For the contact type Hamilton-Jacobi equation with respect to $H^λ$, we prove that, under mild assumptions, the associated viscosity solution $u^λ$ converges to a specific viscosity solution $u^0$ of the vanished contact equation. As applications, we give some convergence results for the nonlinear vanishing discount problem.