Source author record

Qian He

Qian He appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.stat-mech Populations and Evolution Quantitative Methods Biomolecules cond-mat.mtrl-sci eess.IV Human-Computer Interaction Information Theory Machine Learning math.IT math.ST Multimedia Robotics Statistics Theory

Catalog footprint

What is connected

13works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy

Visuomotor policies aim to learn complex manipulation tasks from expert demonstrations. However, generating smooth and coherent trajectories remains challenging, as it requires balancing proximal precision with distal foresight. Existing approaches typically focus on optimizing intra-chunk action distributions, often neglecting the inter-chunk coherence. Consequently, inter-chunk discontinuities significantly impede the learning of coherent long-horizon actions. To overcome this limitation and achieve a synergetic balance between precision and foresight, we propose FocalPolicy, a foresight-aware visuomotor policy that combines Frequency-Optimized Chunking with Locally Anchored flow matching. We introduce a foresight composite objective that supervises time-domain alignment within the proximal actions while regularizing frequency-domain structure over multiple future action chunks to improve cross-chunk coherence. To efficiently learn complex action distributions, we design locally anchored campling to enhance target signal propagation efficiency during consistency flow matching training. Extensive experiments demonstrate that FocalPolicy outperforms existing approaches and confirm the generalizability of our modules to other baselines. Project website: https://focalpolicy.github.io/

preprint2022arXiv

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Training a text-to-image generator in the general domain (e.g., Dall.e, CogView) requires huge amounts of paired text-image data, which is too expensive to collect. In this paper, we propose a self-supervised scheme named as CLIP-GEN for general text-to-image generation with the language-image priors extracted with a pre-trained CLIP model. In our approach, we only require a set of unlabeled images in the general domain to train a text-to-image generator. Specifically, given an image without text labels, we first extract the embedding of the image in the united language-vision embedding space with the image encoder of CLIP. Next, we convert the image into a sequence of discrete tokens in the VQGAN codebook space (the VQGAN model can be trained with the unlabeled image dataset in hand). Finally, we train an autoregressive transformer that maps the image tokens from its unified language-vision representation. Once trained, the transformer can generate coherent image tokens based on the text embedding extracted from the text encoder of CLIP upon an input text. Such a strategy enables us to train a strong and general text-to-image generator with large text-free image dataset such as ImageNet. Qualitative and quantitative evaluations verify that our method significantly outperforms optimization-based text-to-image methods in terms of image quality while not compromising the text-image matching. Our method can even achieve comparable performance as flagship supervised models like CogView.

preprint2022arXiv

Current and perspective sensing methods for monkeypox virus: a reemerging zoonosis in its infancy

Objectives The review is dedicated to evaluate the current monkeypox virus (MPXV) detection methods, discuss their pros and cons, and provide recommended solutions to the problems. Methods The literature for this review is identified through searches in PubMed, Web of Science, Google Scholar, ResearchGate, and Science Direct advanced search for articles published in English without any start date until June, 2022, by use of the terms "monkeypox virus" or "poxvirus" along with "diagnosis"; "PCR"; "real-time PCR"; "LAMP"; "RPA"; "immunoassay"; "reemergence"; "biothreat"; "endemic", and "multi-country outbreak" and also, by tracking citations of the relevant papers. The most relevant articles are included in the review. Results Our literature review shows that PCR is the gold standard method for MPXV detection. In addition, loop-mediated isothermal amplification (LAMP) and recombinase polymerase amplification (RPA) have been reported as alternatives to PCR. Immunodiagnostics, whole particle detection, and image-based detection are the non-nucleic acid-based MPXV detection modalities. Conclusions PCR is easy to leverage and adapt for a quick response to an outbreak, but the PCR-based MPXV detection approaches may not be suitable for marginalized settings. Limited progress has been made towards innovations in MPXV diagnostics, providing room for the development of novel detection techniques for this virus.

preprint2022arXiv

Part-aware Prototypical Graph Network for One-shot Skeleton-based Action Recognition

In this paper, we study the problem of one-shot skeleton-based action recognition, which poses unique challenges in learning transferable representation from base classes to novel classes, particularly for fine-grained actions. Existing meta-learning frameworks typically rely on the body-level representations in spatial dimension, which limits the generalisation to capture subtle visual differences in the fine-grained label space. To overcome the above limitation, we propose a part-aware prototypical representation for one-shot skeleton-based action recognition. Our method captures skeleton motion patterns at two distinctive spatial levels, one for global contexts among all body joints, referred to as body level, and the other attends to local spatial regions of body parts, referred to as the part level. We also devise a class-agnostic attention mechanism to highlight important parts for each action class. Specifically, we develop a part-aware prototypical graph network consisting of three modules: a cascaded embedding module for our dual-level modelling, an attention-based part fusion module to fuse parts and generate part-aware prototypes, and a matching module to perform classification with the part-aware representations. We demonstrate the effectiveness of our method on two public skeleton-based action recognition datasets: NTU RGB+D 120 and NW-UCLA.

preprint2022arXiv

Region-Aware Face Swapping

This paper presents a novel Region-Aware Face Swapping (RAFSwap) network to achieve identity-consistent harmonious high-resolution face generation in a local-global manner: \textbf{1)} Local Facial Region-Aware (FRA) branch augments local identity-relevant features by introducing the Transformer to effectively model misaligned cross-scale semantic interaction. \textbf{2)} Global Source Feature-Adaptive (SFA) branch further complements global identity-relevant cues for generating identity-consistent swapped faces. Besides, we propose a \textit{Face Mask Predictor} (FMP) module incorporated with StyleGAN2 to predict identity-relevant soft facial masks in an unsupervised manner that is more practical for generating harmonious high-resolution faces. Abundant experiments qualitatively and quantitatively demonstrate the superiority of our method for generating more identity-consistent high-resolution swapped faces over SOTA methods, \eg, obtaining 96.70 ID retrieval that outperforms SOTA MegaFS by 5.87$\uparrow$.

preprint2022arXiv

SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Convolutional neural networks (CNNs) are good at extracting contexture features within certain receptive fields, while transformers can model the global long-range dependency features. By absorbing the advantage of transformer and the merit of CNN, Swin Transformer shows strong feature representation ability. Based on it, we propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection. It is driven by Swin Transformer to extract the hierarchical features, boosted by attention mechanism to bridge the gap between two modalities, and guided by edge information to sharp the contour of salient object. To be specific, two-stream Swin Transformer encoder first extracts multi-modality features, and then spatial alignment and channel re-calibration module is presented to optimize intra-level cross-modality features. To clarify the fuzzy boundary, edge-guided decoder achieves inter-level cross-modality fusion under the guidance of edge features. The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets, showing that it provides more insight into the cross-modality complementarity task.

preprint2022arXiv

Weakly Supervised Nuclei Segmentation via Instance Learning

Weakly supervised nuclei segmentation is a critical problem for pathological image analysis and greatly benefits the community due to the significant reduction of labeling cost. Adopting point annotations, previous methods mostly rely on less expressive representations for nuclei instances and thus have difficulty in handling crowded nuclei. In this paper, we propose to decouple weakly supervised semantic and instance segmentation in order to enable more effective subtask learning and to promote instance-aware representation learning. To achieve this, we design a modular deep network with two branches: a semantic proposal network and an instance encoding network, which are trained in a two-stage manner with an instance-sensitive loss. Empirical results show that our approach achieves the state-of-the-art performance on two public benchmarks of pathological images from different types of organs.

preprint2022arXiv

XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation

Generating a new font library is a very labor-intensive and time-consuming job for glyph-rich scripts. Few-shot font generation is thus required, as it requires only a few glyph references without fine-tuning during test. Existing methods follow the style-content disentanglement paradigm and expect novel fonts to be produced by combining the style codes of the reference glyphs and the content representations of the source. However, these few-shot font generation methods either fail to capture content-independent style representations, or employ localized component-wise style representations, which is insufficient to model many Chinese font styles that involve hyper-component features such as inter-component spacing and "connected-stroke". To resolve these drawbacks and make the style representations more reliable, we propose a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder that is conditioned jointly on the glyph image and the corresponding stroke labels. The cross-modality encoder is pre-trained in a self-supervised manner to allow effective capture of cross- and intra-modality correlations, which facilitates the content-style disentanglement and modeling style representations of all scales (stroke-level, component-level and character-level). The pre-trained encoder is then applied to the downstream font generation task without fine-tuning. Experimental comparisons of our method with state-of-the-art methods demonstrate our method successfully transfers styles of all scales. In addition, it only requires one reference glyph and achieves the lowest rate of bad cases in the few-shot font generation task 28% lower than the second best

preprint2015arXiv

Generalized Cramer-Rao Bound for Joint Estimation of Target Position and Velocity for Active and Passive Radar Networks

In this paper, we derive the Cramer-Rao bound (CRB) for joint target position and velocity estimation using an active or passive distributed radar network under more general, and practically occurring, conditions than assumed in previous work. In particular, the presented results allow nonorthogonal signals, spatially dependent Gaussian reflection coefficients, and spatially dependent Gaussian clutter-plus-noise. These bounds allow designers to compare the performance of their developed approaches, which are deemed to be of acceptable complexity, to the best achievable performance. If their developed approaches lead to performance close to the bounds, these developed approaches can be deemed "good enough". A particular recent study where algorithms have been developed for a practical radar application which must involve nonorthogonal signals, for which the best performance is unknown, is a great example. The presented results in our paper do not make any assumptions about the approximate location of the target being known from previous target detection signal processing. In addition, for situations in which we do not know some parameters accurately, we also derive the mismatched CRB. Numerical investigations of the mean squared error of the maximum likelihood estimation are employed to support the validity of the CRBs. In order to demonstrate the utility of the provided results to a topic of great current interest, the numerical results focus on a passive radar system using the Global System for Mobile communication (GSM) cellar system.

preprint2015arXiv

HAADF-STEM Study of Mo/V Distributions in Mo-V-Te-Ta-O M1 Phases and Their Correlations with Surface Reactivity

The MoVTeTaO M1 phases were prepared by conventional hydrothermal (HT) and microwave-assisted HT synthesis methods (MW) employing two different Ta precursors, Ta ethoxide and a custom-made Ta oxalate complex. The profile intensity analysis of the HAADF-STEM image of M1 phases oriented along [hk0] directions from the surface to bulk region of HAADF-STEM images indicated that the chemical composition of surface ab planes is very similar to their composition in the bulk. The HAADF-STEM image analysis showed that synthesis methods have a significant impact on the Mo/V distribution in the MoVTeTaO M1 phases and their reactivity in propane ammoxidation. Enhanced acrylonitrile (ACN) yield and 1st order irreversible reaction rate constants for propane consumption, normalized to the estimated surface ab plane areas, correlated with increased V content in the proposed catalytic center (S2-S4-S4-S7-S7). These observations lend further support to the idea that multiple VOx sites present in the surface ab planes may be responsible for the activity and selectivity of the M1 phase in propane ammoxidation.

preprint2012arXiv

On the relationship between cyclic and hierarchical three-species predator-prey systems and the two-species Lotka-Volterra model

We aim to clarify the relationship between interacting three-species models and the two-species Lotka-Volterra (LV) model. We utilize mean-field theory and Monte Carlo simulations on two-dimensional square lattices to explore the temporal evolution characteristics of two different interacting three-species predator-prey systems: (1) a cyclic rock-paper-scissors (RPS) model with conserved total particle number but strongly asymmetric reaction rates that lets the system evolve towards one corner of configuration space; (2) a hierarchical food chain where an additional intermediate species is inserted between the predator and prey in the LV model. For model variant (1), we demonstrate that the evolutionary properties of both minority species in the steady state of this stochastic spatial three-species corner RPS model are well approximated by the LV system, with its emerging characteristic features of localized population clustering, persistent oscillatory dynamics, correlated spatio-temporal patterns, and fitness enhancement through quenched spatial disorder in the predation rates. In contrast, we could not identify any regime where the hierarchical model (2) would reduce to the two-species LV system. In the presence of pair exchange processes, the system remains essentially well-mixed, and we generally find the Monte Carlo simulation results for the spatially extended model (2) to be consistent with the predictions from the corresponding mean-field rate equations. If spreading occurs only through nearest-neighbor hopping, small population clusters emerge; yet the requirement of an intermediate species cluster obviously disrupts spatio-temporal correlations between predator and prey, and correspondingly eliminates many of the intriguing fluctuation phenomena that characterize the stochastic spatial LV system.

preprint2011arXiv

Co-existence in the two-dimensional May-Leonard model with random rates

We employ Monte Carlo simulations to numerically study the temporal evolution and transient oscillations of the population densities, the associated frequency power spectra, and the spatial correlation functions in the (quasi-)steady state in two-dimensional stochastic May--Leonard models of mobile individuals, allowing for particle exchanges with nearest-neighbors and hopping onto empty sites. We therefore consider a class of four-state three-species cyclic predator-prey models whose total particle number is not conserved. We demonstrate that quenched disorder in either the reaction or in the mobility rates hardly impacts the dynamical evolution, the emergence and structure of spiral patterns, or the mean extinction time in this system. We also show that direct particle pair exchange processes promote the formation of regular spiral structures. Moreover, upon increasing the rates of mobility, we observe a remarkable change in the extinction properties in the May--Leonard system (for small system sizes): (1) As the mobility rate exceeds a threshold that separates a species coexistence (quasi-)steady state from an absorbing state, the mean extinction time as function of system size N crosses over from a functional form ~ e^{cN} / N (where c is a constant) to a linear dependence; (2) the measured histogram of extinction times displays a corresponding crossover from an (approximately) exponential to a Gaussian distribution. The latter results are found to hold true also when the mobility rates are randomly distributed.

preprint2010arXiv

Spatial Rock-Paper-Scissors Models with Inhomogeneous Reaction Rates

We study several variants of the stochastic four-state rock-paper-scissors game or, equivalently, cyclic three-species predator-prey models with conserved total particle density, by means of Monte Carlo simulations on one- and two-dimensional lattices. Specifically, we investigate the influence of spatial variability of the reaction rates and site occupancy restrictions on the transient oscillations of the species densities and on spatial correlation functions in the quasi-stationary coexistence state. For small systems, we also numerically determine the dependence of typical extinction times on the number of lattice sites. In stark contrast with two-species stochastic Lotka-Volterra systems, we find that for our three-species models with cyclic competition quenched disorder in the reaction rates has very little effect on the dynamics and the long-time properties of the coexistence state. Similarly, we observe that site restriction only has a minor influence on the system's dynamical properties. Our results therefore demonstrate that the features of the spatial rock-paper-scissors system are remarkably robust with respect to model variations, and stochastic fluctuations as well as spatial correlations play a comparatively minor role.

Qian He

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Current and perspective sensing methods for monkeypox virus: a reemerging zoonosis in its infancy

Part-aware Prototypical Graph Network for One-shot Skeleton-based Action Recognition

Region-Aware Face Swapping

SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Weakly Supervised Nuclei Segmentation via Instance Learning

XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation

Generalized Cramer-Rao Bound for Joint Estimation of Target Position and Velocity for Active and Passive Radar Networks

HAADF-STEM Study of Mo/V Distributions in Mo-V-Te-Ta-O M1 Phases and Their Correlations with Surface Reactivity

On the relationship between cyclic and hierarchical three-species predator-prey systems and the two-species Lotka-Volterra model

Co-existence in the two-dimensional May-Leonard model with random rates

Spatial Rock-Paper-Scissors Models with Inhomogeneous Reaction Rates