Researcher profile

Rui Huang

Rui Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

Beyond Fixed Formulas: Data-Driven Linear Predictor for Efficient Diffusion Models

To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping. We propose L2P (Learnable Linear Predictor), a simple data-driven caching framework that replaces fixed coefficients with learnable per-timestep weights. Rapidly trained in ~20 seconds on a single GPU, L2P accurately reconstructs current features from past trajectories. L2P significantly outperforms existing baselines: it achieves a 4.55x FLOPs reduction and 4.15x latency speedup on FLUX.1-dev, and maintains high visual fidelity under up to 7.18x acceleration on Qwen-Image models, where prior methods show noticeable quality degradation. Our results show learning linear predictors is highly effective for efficient DiT inference. Code is available at https://github.com/Aredstone/L2P-Cache.

preprint2026arXiv

Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction

Dense 3D reconstruction from continuous image streams requires both accurate geometric aggregation and stable long-term memory management. Recent feed-forward reconstruction frameworks integrate observations through persistent memory representations, yet most rely primarily on appearance-based similarity when updating memory. Such appearance-driven integration often leads to redundant accumulation of observations and unstable geometry when viewpoint changes occur. In this work, we propose a ray-aware pointer memory for streaming 3D reconstruction that explicitly models both spatial location and viewing direction within a unified memory representation. Each memory pointer stores its 3D position, associated ray direction, and feature embedding, allowing the system to reason jointly about geometric proximity and viewpoint consistency. Based on this representation, we introduce an adaptive pointer update strategy that replaces traditional fusion-based memory compression with a retain-or-replace mechanism. Instead of averaging nearby observations, the system selectively retains informative pointers while discarding redundant ones, preserving distinctive geometric structures while maintaining bounded memory growth. Furthermore, the joint reasoning over spatial distance and ray-direction discrepancy enables the system to distinguish between local redundancy, novel observations, and potential loop revisits in a unified manner. When loop candidates are detected, pose refinement is triggered to enforce global geometric consistency across the reconstruction. Extensive experiments demonstrate that the proposed ray-aware memory design significantly improves long-term reconstruction stability and camera pose accuracy while maintaining efficient streaming inference. Our approach provides a principled framework for scalable and drift-resistant online 3D reconstruction from image streams.

preprint2024arXiv

Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

Backdoor attack aims to deceive a victim model when facing backdoor instances while maintaining its performance on benign data. Current methods use manual patterns or special perturbations as triggers, while they often overlook the robustness against data corruption, making backdoor attacks easy to defend in practice. To address this issue, we propose a novel backdoor attack method named Spy-Watermark, which remains effective when facing data collapse and backdoor defense. Therein, we introduce a learnable watermark embedded in the latent domain of images, serving as the trigger. Then, we search for a watermark that can withstand collapse during image decoding, cooperating with several anti-collapse operations to further enhance the resilience of our trigger against data corruption. Extensive experiments are conducted on CIFAR10, GTSRB, and ImageNet datasets, demonstrating that Spy-Watermark overtakes ten state-of-the-art methods in terms of robustness and stealthiness.

preprint2023arXiv

Joint Representation Learning for Text and 3D Point Cloud

Recent advancements in vision-language pre-training (e.g. CLIP) have shown that vision models can benefit from language supervision. While many models using language modality have achieved great success on 2D vision tasks, the joint representation learning of 3D point cloud with text remains under-explored due to the difficulty of 3D-Text data pair acquisition and the irregularity of 3D data structure. In this paper, we propose a novel Text4Point framework to construct language-guided 3D point cloud models. The key idea is utilizing 2D images as a bridge to connect the point cloud and the language modalities. The proposed Text4Point follows the pre-training and fine-tuning paradigm. During the pre-training stage, we establish the correspondence of images and point clouds based on the readily available RGB-D data and use contrastive learning to align the image and point cloud representations. Together with the well-aligned image and text features achieved by CLIP, the point cloud features are implicitly aligned with the text embeddings. Further, we propose a Text Querying Module to integrate language information into 3D representation learning by querying text embeddings with point cloud features. For fine-tuning, the model learns task-specific 3D representations under informative language guidance from the label set without 2D images. Extensive experiments demonstrate that our model shows consistent improvement on various downstream tasks, such as point cloud semantic segmentation, instance segmentation, and object detection. The code will be available here: https://github.com/LeapLabTHU/Text4Point

preprint2022arXiv

Deep Semantic Statistics Matching (D2SM) Denoising Network

The ultimate aim of image restoration like denoising is to find an exact correlation between the noisy and clear image domains. But the optimization of end-to-end denoising learning like pixel-wise losses is performed in a sample-to-sample manner, which ignores the intrinsic correlation of images, especially semantics. In this paper, we introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network. It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space. By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks, and the denoised results can be better understood by high-level vision tasks. Comprehensive experiments conducted on the noisy Cityscapes dataset demonstrate the superiority of our method on both the denoising performance and semantic segmentation accuracy. Moreover, the performance improvement observed on our extended tasks including super-resolution and dehazing experiments shows its potentiality as a new general plug-and-play component.

preprint2022arXiv

Domain Adaptation via Prompt Learning

Unsupervised domain adaption (UDA) aims to adapt models learned from a well-annotated source domain to a target domain, where only unlabeled samples are given. Current UDA approaches learn domain-invariant features by aligning source and target feature spaces. Such alignments are imposed by constraints such as statistical discrepancy minimization or adversarial training. However, these constraints could lead to the distortion of semantic feature structures and loss of class discriminability. In this paper, we introduce a novel prompt learning paradigm for UDA, named Domain Adaptation via Prompt Learning (DAPL). In contrast to prior works, our approach makes use of pre-trained vision-language models and optimizes only very few parameters. The main idea is to embed domain information into prompts, a form of representations generated from natural language, which is then used to perform classification. This domain information is shared only by images from the same domain, thereby dynamically adapting the classifier according to each domain. By adopting this paradigm, we show that our model not only outperforms previous methods on several cross-domain benchmarks but also is very efficient to train and easy to implement.

preprint2022arXiv

Fully Attentional Network for Semantic Segmentation

Recent non-local self-attention methods have proven to be effective in capturing long-range dependencies for semantic segmentation. These methods usually form a similarity map of RC*C (by compressing spatial dimensions) or RHW*HW (by compressing channels) to describe the feature relations along either channel or spatial dimensions, where C is the number of channels, H and W are the spatial dimensions of the input feature map. However, such practices tend to condense feature dependencies along the other dimensions,hence causing attention missing, which might lead to inferior results for small/thin categories or inconsistent segmentation inside large objects. To address this problem, we propose anew approach, namely Fully Attentional Network (FLANet),to encode both spatial and channel attentions in a single similarity map while maintaining high computational efficiency. Specifically, for each channel map, our FLANet can harvest feature responses from all other channel maps, and the associated spatial positions as well, through a novel fully attentional module. Our new method has achieved state-of-the-art performance on three challenging semantic segmentation datasets,i.e., 83.6%, 46.99%, and 88.5% on the Cityscapes test set,the ADE20K validation set, and the PASCAL VOC test set,respectively.

preprint2022arXiv

Interlayer Coupling and Strain Localization in Small-Twist-Angle Graphene Flakes

Twisted bilayer graphene (TBG) exhibits a wide range of intriguing physical properties, such as superconductivity, ferromagnetism, and superlubricity. Depending on the twist angle, periodic moiré superlattices form in twisted bilayer graphene, with inhomogeneous interlayer coupling and lattice deformation. For a small twist angle (typically <2°), each moiré supercell contains a large number of atoms (>10,000), making it computationally expensive for first-principles and atomistic modeling. In this work, a finite element method based on a continuum model is used to simulate the inhomogeneous interlayer and intralayer deformations of twisted graphene flakes on a rigid graphene substrate. The van der Waals interactions between the graphene layers are described by a periodic potential energy function, whereas the graphene flake is treated as a continuum membrane with effective elastic properties. Our simulations show that structural relaxation and the induced strain localization are most significant in a relatively large graphene flake at small twist angles, where the strain distribution is highly localized as shear strain solitons along the boundaries between neighboring domains of commensurate AB stacking. Moreover, it is found that there exist many metastable equilibrium configurations at particular twist angles, depending on the flake size. The nonlinear mechanics of twisted bilayer graphene is thus expected to be essential for understanding the strain distributions in the moiré superlattices and the strain effects on other physical properties.

preprint2022arXiv

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated. Relocalization in large-scale indoor environments enables attractive applications such as augmented reality and robot navigation. However, appearance changes fast in such environments when the camera moves, which is challenging for the relocalization system. To address this problem, we propose a virtual view synthesis-based approach, RenderNet, to enrich the database and refine poses regarding this particular scenario. Instead of rendering real images which requires high-quality 3D models, we opt to directly render the needed global and local features of virtual viewpoints and apply them in the subsequent image retrieval and feature matching operations respectively. The proposed method can largely improve the performance in large-scale indoor environments, e.g., achieving an improvement of 7.1\% and 12.2\% on the Inloc dataset.

preprint2021arXiv

Automatic Segmentation of Organs-at-Risk from Head-and-Neck CT using Separable Convolutional Neural Network with Hard-Region-Weighted Loss

Nasopharyngeal Carcinoma (NPC) is a leading form of Head-and-Neck (HAN) cancer in the Arctic, China, Southeast Asia, and the Middle East/North Africa. Accurate segmentation of Organs-at-Risk (OAR) from Computed Tomography (CT) images with uncertainty information is critical for effective planning of radiation therapy for NPC treatment. Despite the stateof-the-art performance achieved by Convolutional Neural Networks (CNNs) for automatic segmentation of OARs, existing methods do not provide uncertainty estimation of the segmentation results for treatment planning, and their accuracy is still limited by several factors, including the low contrast of soft tissues in CT, highly imbalanced sizes of OARs and large inter-slice spacing. To address these problems, we propose a novel framework for accurate OAR segmentation with reliable uncertainty estimation. First, we propose a Segmental Linear Function (SLF) to transform the intensity of CT images to make multiple organs more distinguishable than existing methods based on a simple window width/level that often gives a better visibility of one organ while hiding the others. Second, to deal with the large inter-slice spacing, we introduce a novel 2.5D network (named as 3D-SepNet) specially designed for dealing with clinic HAN CT scans with anisotropic spacing. Thirdly, existing hardness-aware loss function often deal with class-level hardness, but our proposed attention to hard voxels (ATH) uses a voxel-level hardness strategy, which is more suitable to dealing with some hard regions despite that its corresponding class may be easy. Our code is now available at https://github.com/HiLab-git/SepNet.

preprint2020arXiv

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

Detecting objects in 3D LiDAR data is a core technology for autonomous driving and other robotics applications. Although LiDAR data is acquired over time, most of the 3D object detection algorithms propose object bounding boxes independently for each frame and neglect the useful information available in the temporal domain. To address this problem, in this paper we propose a sparse LSTM-based multi-frame 3d object detection algorithm. We use a U-Net style 3D sparse convolution network to extract features for each frame&#39;s LiDAR point-cloud. These features are fed to the LSTM module together with the hidden and memory features from last frame to predict the 3d objects in the current frame as well as hidden and memory features that are passed to the next frame. Experiments on the Waymo Open Dataset show that our algorithm outperforms the traditional frame by frame approach by 7.5% mAP@0.7 and other multi-frame approaches by 1.2% while using less memory and computation per frame. To the best of our knowledge, this is the first work to use an LSTM for 3D object detection in sparse point clouds.

preprint2020arXiv

An X-ray and SZ bright diffuse source toward M31: a Local Hot Bridge

We report a large-scale ($r\approx 20^\circ$) X-ray and Sunyaev-Zeldovich (SZ)-bright diffuse enhancement toward M31, which might be a Local Hot Bridge connecting the Milky Way (MW) with M31. We subtract the Galactic emission from the all-sky O VII and O VIII emission line measurement survey, and find that the emission of these two ions is enhanced within $r\approx20^\circ$ around M31. The mean emission enhancements are $5.6\pm 1.3$ L.U., and $2.8\pm0.6$ L.U. for O VII and O VIII, respectively ($>4σ$ for both ions). We also extract the SZ signal around M31, which suggests a surface brightness $y$ of $2-4\times10^{-7}$, an enhancement $>2.5σ$ (and a best fit of $5.9σ$). These three measurements trace the hot gas with a temperature $\log~T({\rm K})> 6$, showing similar plateau shapes (flat within $\approx15^\circ$, and zero beyond $\approx30^\circ$). A single-phase assumption leads to a temperature of $\log~T({\rm K})=6.34\pm0.03$, which is determined by the O VII/O VIII line ratio. Combining X-ray and SZ measurements, we suggest that this feature is unlikely to be the hot halo around M31 (too massive) or in the MW (too high pressure and X-ray bright). The plateau shape may be explained by a cylinder connecting the MW and M31 (the Local Hot Bridge). We constrain its length to be about 400 kpc, with a radius of 120 kpc, a density of $\approx 2\times10^{-4}-10^{-3} ~\rm cm^{-3}$, and a metallicity of $0.02-0.1~ Z_\odot$. The baryon mass is $\gtrsim10^{11}~M_\odot$, and the oxygen mass is about $\gtrsim10^8~M_\odot$, which contribute to the baryon or metal budget of the Local Group.

preprint2020arXiv

Background Model for the High-Energy Telescope of Insight-HXMT

Accurate background estimation is essential for spectral and temporal analysis in astrophysics. In this work, we construct the in-orbit background model for the High-Energy Telescope (HE) of the Hard X-ray Modulation Telescope (dubbed as Insight-HXMT). Based on the two-year blank sky observations of Insight-HXMT/HE, we first investigate the basic properties of the background and find that both the background spectral shape and intensity have long-term evolution at different geographical sites. The entire earth globe is then divided into small grids, each with a typical area of 5x5 square degrees in geographical coordinate system. For each grid, an empirical function is used to describe the long-term evolution of each channel of the background spectrum; the intensity of the background can be variable and a modification factor is introduced to account for this variability by measuring the contemporary flux of the blind detector. For a given pointing observation, the background model is accomplished by integrating over the grids that are passed by the track of the satellite in each orbit. Such a background model is tested with both the blank sky observations and campaigns for observations of a series of celestial sources. The results show an average systematic error of 1.5% for the background energy spectrum (26-100 keV) under a typical exposure of 8 ks, and <3% for background light curve estimation (30-150 keV). Therefore, the background model introduced in this paper is included in the Insight-HXMT software as a standard part specialized for both spectral and temporal analyses.

preprint2020arXiv

Disentangle Perceptual Learning through Online Contrastive Learning

Pursuing realistic results according to human visual perception is the central concern in the image transformation tasks. Perceptual learning approaches like perceptual loss are empirically powerful for such tasks but they usually rely on the pre-trained classification network to provide features, which are not necessarily optimal in terms of visual perception of image transformation. In this paper, we argue that, among the features representation from the pre-trained classification network, only limited dimensions are related to human visual perception, while others are irrelevant, although both will affect the final image transformation results. Under such an assumption, we try to disentangle the perception-relevant dimensions from the representation through our proposed online contrastive learning. The resulted network includes the pre-training part and a feature selection layer, followed by the contrastive learning module, which utilizes the transformed results, target images, and task-oriented distorted images as the positive, negative, and anchor samples, respectively. The contrastive learning aims at activating the perception-relevant dimensions and suppressing the irrelevant ones by using the triplet loss, so that the original representation can be disentangled for better perceptual quality. Experiments on various image transformation tasks demonstrate the superiority of our framework, in terms of human visual perception, to the existing approaches using pre-trained networks and empirically designed losses.

preprint2020arXiv

Errata on the Calculation of Hot Gas Properties in a Few Li Jiang-Tao&#39;s Papers

This is a combination of the errata of seven papers published between 2008 and 2016 with Jiang-Tao Li (JTL) as the first author. All the problems are caused by two mistakes in the original scripts written by JTL used to calculate the physical parameters of the hot gas from X-ray spectral analysis with a thermal plasma code. The mistakes will result in an overestimate of some parameters, such as the electron number density and hot gas mass by a factor of $\sqrt{10}\approx3.162$, and an overestimate of the thermal pressure by a factor of $\approx2.725$. JTL apologizes to the community for the inconvenience caused by these mistakes. We present an update on the text, numbers, figures, and tables of all the seven papers affected by these mistakes. Other papers led by JTL or co-authored papers are not affected.

preprint2020arXiv

Global Optimum Search in Quantum Deep Learning

This paper aims to solve machine learning optimization problem by using quantum circuit. Two approaches, namely the average approach and the Partial Swap Test Cut-off method (PSTC) was proposed to search for the global minimum/maximum of two different objective functions. The current cost is $O(\sqrt{|Θ|} N)$, but there is potential to improve PSTC further to $O(\sqrt{|Θ|} \cdot sublinear \ N)$ by enhancing the checking process.

preprint2020arXiv

Multi-organ Segmentation via Co-training Weight-averaged Models from Few-organ Datasets

Multi-organ segmentation has extensive applications in many clinical applications. To segment multiple organs of interest, it is generally quite difficult to collect full annotations of all the organs on the same images, as some medical centers might only annotate a portion of the organs due to their own clinical practice. In most scenarios, one might obtain annotations of a single or a few organs from one training set, and obtain annotations of the the other organs from another set of training images. Existing approaches mostly train and deploy a single model for each subset of organs, which are memory intensive and also time inefficient. In this paper, we propose to co-train weight-averaged models for learning a unified multi-organ segmentation network from few-organ datasets. We collaboratively train two networks and let the coupled networks teach each other on un-annotated organs. To alleviate the noisy teaching supervisions between the networks, the weighted-averaged models are adopted to produce more reliable soft labels. In addition, a novel region mask is utilized to selectively apply the consistent constraint on the un-annotated organ regions that require collaborative teaching, which further boosts the performance. Extensive experiments on three public available single-organ datasets LiTS, KiTS, Pancreas and manually-constructed single-organ datasets from MOBA show that our method can better utilize the few-organ datasets and achieves superior performance with less inference computational cost.

preprint2018arXiv

Theoretical and numerical studies on global stability of traveling waves with oscillations for time-delayed nonlocal dispersion equations

This paper is concerned with the global stability of non-critical/critical traveling waves with oscillations for time-delayed nonlocal dispersion equations. We first theoretically prove that all traveling waves, especially the critical oscillatory traveling waves, are globally stable in a certain weighted space, where the convergence rates to the non-critical oscillatory traveling waves are time-exponential, and the convergence to the critical oscillatory traveling waves are time-algebraic. Both of the rates are optimal. The approach adopted is the weighted energy method with the fundamental solution theory for time-delayed equations. Secondly, we carry out numerical computations in different cases, which also confirm our theoretical results. Because of oscillations of the solutions and nonlocality of the equation, the numerical results obtained by the regular finite difference scheme are not stable, even worse to be blow-up. In order to overcome these obstacles, we propose a new finite difference scheme by adding artificial viscosities to both sides of the equation, and obtain the desired numerical results.