Researcher profile

Jianfu Zhang

Jianfu Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport

Recent diffusion- and flow-based VTON methods achieve strong results with pretrained generative models, but their reliance on multi-step sampling incurs high inference cost, while existing acceleration methods largely overlook the intrinsic structure of the try-on task. In this paper, we highlight a key observation: VTON outputs are highly constrained by the conditional inputs, suggesting that the conditional sampling trajectory can be much straighter than that in general image generation, making one-step generation a natural solution. However, limited task-specific data makes training from scratch impractical, forcing existing methods to fine-tune pretrained models whose objectives do not encourage such straight conditional trajectories. Thus, the deviation from an ideal straight path mainly comes from the mismatch between pretrained base models and the conditional nature of try-on generation, rather than from the task itself. Motivated by this insight, we encourage straighter VTON sampling trajectories through three targeted modifications: pure conditional transport, a garment preservation loss, and a self consistency loss. We further introduce a one-step distillation stage. Extensive experiments show that our method achieves state-of-the-art performance with one-step sampling, establishing a new standard for efficient and high-quality VTON.

preprint2026arXiv

Enhancing Domain Generalization in 3D Human Pose Estimation through Controllable Generative Augmentation

Pedestrian motion, due to its causal nature, is strongly influenced by domain gaps arising from discrepancies between training and testing data distributions. Focusing on 3D human pose estimation, this work presents a controllable human pose generation framework that synthesizes diverse video data by systematically varying poses, backgrounds, and camera viewpoints. This generative augmentation enriches training datasets, enhances model generalization, and alleviates the limitations of existing methods in handling domain discrepancies. By leveraging both indoor/real-world and outdoor/virtual datasets, we perform cross-domain data fusion and controllable video generation to construct enriched training data, tailored to realistic deployment settings. Extensive experiments show that the augmented datasets significantly improve model performance on unseen scenarios and datasets, validating the effectiveness of the proposed approach.

preprint2026arXiv

High-Quality 3D Head Reconstruction from Any Single Portrait Image

In this work, we introduce a novel high-fidelity 3D head reconstruction method from a single portrait image, regardless of perspective, expression, or accessories. Despite significant efforts in adapting 2D generative models for novel view synthesis and 3D optimization, most methods struggle to produce high-quality 3D portraits. The lack of crucial information, such as identity, expression, hair, and accessories, limits these approaches in generating realistic 3D head models. To address these challenges, we construct a new high-quality dataset containing 227 sequences of digital human portraits captured from 96 different perspectives, totalling 21,792 frames, featuring diverse expressions and accessories. To further improve performance, we integrate identity and expression information into the multi-view diffusion process to enhance facial consistency across views. Specifically, we apply identity- and expression-aware guidance and supervision to extract accurate facial representations, which guide the model and enforce objective functions to ensure high identity and expression consistency during generation. Finally, we generate an orbital video around the portrait consisting of 96 multi-view frames, which can be used for 3D portrait model reconstruction. Our method demonstrates robust performance across challenging scenarios, including side-face angles and complex accessories

preprint2022arXiv

DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Learning to generate new images for a novel category based on only a few images, named as few-shot image generation, has attracted increasing research interest. Several state-of-the-art works have yielded impressive results, but the diversity is still limited. In this work, we propose a novel Delta Generative Adversarial Network (DeltaGAN), which consists of a reconstruction subnetwork and a generation subnetwork. The reconstruction subnetwork captures intra-category transformation, i.e., "delta", between same-category pairs. The generation subnetwork generates sample-specific "delta" for an input image, which is combined with this input image to generate a new image within the same category. Besides, an adversarial delta matching loss is designed to link the above two subnetworks together. Extensive experiments on five few-shot image datasets demonstrate the effectiveness of our proposed method.

preprint2022arXiv

DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Learning to generate new images for a novel category based on only a few images, named as few-shot image generation, has attracted increasing research interest. Several state-of-the-art works have yielded impressive results, but the diversity is still limited. In this work, we propose a novel Delta Generative Adversarial Network (DeltaGAN), which consists of a reconstruction subnetwork and a generation subnetwork. The reconstruction subnetwork captures intra-category transformation, i.e., delta, between same-category pairs. The generation subnetwork generates sample-specific delta for an input image, which is combined with this input image to generate a new image within the same category. Besides, an adversarial delta matching loss is designed to link the above two subnetworks together. Extensive experiments on six benchmark datasets demonstrate the effectiveness of our proposed method. Our code is available at https://github.com/bcmi/DeltaGAN-Few-Shot-Image-Generation.

preprint2022arXiv

Few-shot Image Generation Using Discrete Content Representation

Few-shot image generation and few-shot image translation are two related tasks, both of which aim to generate new images for an unseen category with only a few images. In this work, we make the first attempt to adapt few-shot image translation method to few-shot image generation task. Few-shot image translation disentangles an image into style vector and content map. An unseen style vector can be combined with different seen content maps to produce different images. However, it needs to store seen images to provide content maps and the unseen style vector may be incompatible with seen content maps. To adapt it to few-shot image generation task, we learn a compact dictionary of local content vectors via quantizing continuous content maps into discrete content maps instead of storing seen images. Furthermore, we model the autoregressive distribution of discrete content map conditioned on style vector, which can alleviate the incompatibility between content map and style vector. Qualitative and quantitative results on three real datasets demonstrate that our model can produce images of higher diversity and fidelity for unseen categories than previous methods.

preprint2022arXiv

Measurement of MHD turbulence properties by synchrotron radiation techniques

It is well-known that magnetohydrodynamic (MHD) turbulence is ubiquitous in astrophysical environments. The correct understanding of the fundamental properties of MHD turbulence is a prerequisite for revealing many key astrophysical processes. The development of observation-based measurement techniques has significantly promoted MHD turbulence theory and its implications in astrophysics. After describing the modern understanding of MHD turbulence based on theoretical analysis and direct numerical simulations, we review recent developments related to synchrotron fluctuation techniques. Specifically, we comment on the validation of synchrotron fluctuation techniques and the measurement performance of several properties of magnetic turbulence based on data cubes from MHD turbulence simulations and observations. Furthermore, we propose to strengthen the studies of the magnetization and 3D magnetic field structure's measurements of interstellar turbulence. At the same time, we also discuss the prospects of new techniques for measuring magnetic field properties and understanding astrophysical processes, using a large number of data cubes from the Low-Frequency Array (LOFAR) and the Square Kilometre Array (SKA).

preprint2022arXiv

Shadow Generation for Composite Image in Real-world Scenes

Image composition targets at inserting a foreground object into a background image. Most previous image composition methods focus on adjusting the foreground to make it compatible with background while ignoring the shadow effect of foreground on the background. In this work, we focus on generating plausible shadow for the foreground object in the composite image. First, we contribute a real-world shadow generation dataset DESOBA by generating synthetic composite images based on paired real images and deshadowed images. Then, we propose a novel shadow generation network SGRNet, which consists of a shadow mask prediction stage and a shadow filling stage. In the shadow mask prediction stage, foreground and background information are thoroughly interacted to generate foreground shadow mask. In the shadow filling stage, shadow parameters are predicted to fill the shadow area. Extensive experiments on our DESOBA dataset and real composite images demonstrate the effectiveness of our proposed method. Our dataset and code are available at https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBA.

preprint2020arXiv

Beyond without Forgetting: Multi-Task Learning for Classification with Disjoint Datasets

Multi-task Learning (MTL) for classification with disjoint datasets aims to explore MTL when one task only has one labeled dataset. In existing methods, for each task, the unlabeled datasets are not fully exploited to facilitate this task. Inspired by semi-supervised learning, we use unlabeled datasets with pseudo labels to facilitate each task. However, there are two major issues: 1) the pseudo labels are very noisy; 2) the unlabeled datasets and the labeled dataset for each task has considerable data distribution mismatch. To address these issues, we propose our MTL with Selective Augmentation (MTL-SA) method to select the training samples in unlabeled datasets with confident pseudo labels and close data distribution to the labeled dataset. Then, we use the selected training samples to add information and use the remaining training samples to preserve information. Extensive experiments on face-centric and human-centric applications demonstrate the effectiveness of our MTL-SA method.

preprint2020arXiv

F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation

In order to generate images for a given category, existing deep generative models generally rely on abundant training images. However, extensive data acquisition is expensive and fast learning ability from limited data is necessarily required in real-world applications. Also, these existing methods are not well-suited for fast adaptation to a new category. Few-shot image generation, aiming to generate images from only a few images for a new category, has attracted some research interest. In this paper, we propose a Fusing-and-Filling Generative Adversarial Network (F2GAN) to generate realistic and diverse images for a new category with only a few images. In our F2GAN, a fusion generator is designed to fuse the high-level features of conditional images with random interpolation coefficients, and then fills in attended low-level details with non-local attention module to produce a new image. Moreover, our discriminator can ensure the diversity of generated images by a mode seeking loss and an interpolation regression loss. Extensive experiments on five datasets demonstrate the effectiveness of our proposed method for few-shot image generation.

preprint2020arXiv

Image Harmonization Dataset iHarmony4: HCOCO, HAdobe5k, HFlickr, and Hday2night

Image composition is an important operation in image processing, but the inconsistency between foreground and background significantly degrades the quality of composite image. Image harmonization, which aims to make the foreground compatible with the background, is a promising yet challenging task. However, the lack of high-quality public dataset for image harmonization, which significantly hinders the development of image harmonization techniques. Therefore, we contribute an image harmonization dataset iHarmony4 by generating synthesized composite images based on existing COCO (resp., Adobe5k, day2night) dataset, leading to our HCOCO (resp., HAdobe5k, Hday2night) sub-dataset. To enrich the diversity of our dataset, we also generate synthesized composite images based on our collected Flick images, leading to our HFlickr sub-dataset. The image harmonization dataset iHarmony4 is released at https://github.com/bcmi/Image_Harmonization_Datasets.

preprint2020arXiv

MatchingGAN: Matching-based Few-shot Image Generation

To generate new images for a given category, most deep generative models require abundant training images from this category, which are often too expensive to acquire. To achieve the goal of generation based on only a few images, we propose matching-based Generative Adversarial Network (GAN) for few-shot generation, which includes a matching generator and a matching discriminator. Matching generator can match random vectors with a few conditional images from the same category and generate new images for this category based on the fused features. The matching discriminator extends conventional GAN discriminator by matching the feature of generated image with the fused feature of conditional images. Extensive experiments on three datasets demonstrate the effectiveness of our proposed method.