Source author record

Liqing Zhang

Liqing Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computation and Language Artificial Intelligence eess.IV Numerical Analysis Social and Information Networks Computational Engineering, Finance, and Science Data Structures and Algorithms Human-Computer Interaction Information Theory math.IT Multimedia q-fin.GN

Catalog footprint

What is connected

33works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport

Recent diffusion- and flow-based VTON methods achieve strong results with pretrained generative models, but their reliance on multi-step sampling incurs high inference cost, while existing acceleration methods largely overlook the intrinsic structure of the try-on task. In this paper, we highlight a key observation: VTON outputs are highly constrained by the conditional inputs, suggesting that the conditional sampling trajectory can be much straighter than that in general image generation, making one-step generation a natural solution. However, limited task-specific data makes training from scratch impractical, forcing existing methods to fine-tune pretrained models whose objectives do not encourage such straight conditional trajectories. Thus, the deviation from an ideal straight path mainly comes from the mismatch between pretrained base models and the conditional nature of try-on generation, rather than from the task itself. Motivated by this insight, we encourage straighter VTON sampling trajectories through three targeted modifications: pure conditional transport, a garment preservation loss, and a self consistency loss. We further introduce a one-step distillation stage. Extensive experiments show that our method achieves state-of-the-art performance with one-step sampling, establishing a new standard for efficient and high-quality VTON.

preprint2026arXiv

Enhancing Domain Generalization in 3D Human Pose Estimation through Controllable Generative Augmentation

Pedestrian motion, due to its causal nature, is strongly influenced by domain gaps arising from discrepancies between training and testing data distributions. Focusing on 3D human pose estimation, this work presents a controllable human pose generation framework that synthesizes diverse video data by systematically varying poses, backgrounds, and camera viewpoints. This generative augmentation enriches training datasets, enhances model generalization, and alleviates the limitations of existing methods in handling domain discrepancies. By leveraging both indoor/real-world and outdoor/virtual datasets, we perform cross-domain data fusion and controllable video generation to construct enriched training data, tailored to realistic deployment settings. Extensive experiments show that the augmented datasets significantly improve model performance on unseen scenarios and datasets, validating the effectiveness of the proposed approach.

preprint2026arXiv

High-Quality 3D Head Reconstruction from Any Single Portrait Image

In this work, we introduce a novel high-fidelity 3D head reconstruction method from a single portrait image, regardless of perspective, expression, or accessories. Despite significant efforts in adapting 2D generative models for novel view synthesis and 3D optimization, most methods struggle to produce high-quality 3D portraits. The lack of crucial information, such as identity, expression, hair, and accessories, limits these approaches in generating realistic 3D head models. To address these challenges, we construct a new high-quality dataset containing 227 sequences of digital human portraits captured from 96 different perspectives, totalling 21,792 frames, featuring diverse expressions and accessories. To further improve performance, we integrate identity and expression information into the multi-view diffusion process to enhance facial consistency across views. Specifically, we apply identity- and expression-aware guidance and supervision to extract accurate facial representations, which guide the model and enforce objective functions to ensure high identity and expression consistency during generation. Finally, we generate an orbital video around the portrait consisting of 96 multi-view frames, which can be used for 3D portrait model reconstruction. Our method demonstrates robust performance across challenging scenarios, including side-face angles and complex accessories

preprint2026arXiv

PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

Cell-type-specific marker genes are fundamental to plant biology, yet existing resources primarily rely on curated databases or high-throughput studies without explicitly modeling the supporting evidence found in scientific literature. We introduce PlantMarkerBench, a multi-species benchmark for evaluating literature-grounded plant marker evidence interpretation from full-text biological papers. PlantMarkerBench is constructed using a modular curation pipeline integrating large-scale literature retrieval, hybrid search, species-aware biological grounding, structured evidence extraction, and targeted human review. The benchmark spans four plant species -- Arabidopsis, maize, rice, and tomato -- and contains 5,550 sentence-level evidence instances annotated for marker-evidence validity, evidence type, and support strength. We define two benchmark tasks: determining whether a candidate sentence provides valid marker evidence for a gene-cell-type pair, and classifying the evidence into expression, localization, function, indirect, or negative categories. We benchmark diverse open-weight and closed-source language models across species and prompting strategies. Although frontier models achieve relatively strong performance on direct expression evidence, performance drops substantially on functional, indirect, and weak-support evidence, with evidence-type confusion emerging as a dominant failure mode. Open-weight models additionally exhibit elevated false-positive rates under ambiguous biological contexts. PlantMarkerBench provides a challenging and reproducible evaluation framework for literature-grounded biological evidence attribution and supports future research on trustworthy scientific information extraction and AI-assisted plant biology.

preprint2022arXiv

DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Learning to generate new images for a novel category based on only a few images, named as few-shot image generation, has attracted increasing research interest. Several state-of-the-art works have yielded impressive results, but the diversity is still limited. In this work, we propose a novel Delta Generative Adversarial Network (DeltaGAN), which consists of a reconstruction subnetwork and a generation subnetwork. The reconstruction subnetwork captures intra-category transformation, i.e., delta, between same-category pairs. The generation subnetwork generates sample-specific delta for an input image, which is combined with this input image to generate a new image within the same category. Besides, an adversarial delta matching loss is designed to link the above two subnetworks together. Extensive experiments on six benchmark datasets demonstrate the effectiveness of our proposed method. Our code is available at https://github.com/bcmi/DeltaGAN-Few-Shot-Image-Generation.

preprint2022arXiv

DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Learning to generate new images for a novel category based on only a few images, named as few-shot image generation, has attracted increasing research interest. Several state-of-the-art works have yielded impressive results, but the diversity is still limited. In this work, we propose a novel Delta Generative Adversarial Network (DeltaGAN), which consists of a reconstruction subnetwork and a generation subnetwork. The reconstruction subnetwork captures intra-category transformation, i.e., "delta", between same-category pairs. The generation subnetwork generates sample-specific "delta" for an input image, which is combined with this input image to generate a new image within the same category. Besides, an adversarial delta matching loss is designed to link the above two subnetworks together. Extensive experiments on five few-shot image datasets demonstrate the effectiveness of our proposed method.

preprint2022arXiv

Few-shot Image Generation Using Discrete Content Representation

Few-shot image generation and few-shot image translation are two related tasks, both of which aim to generate new images for an unseen category with only a few images. In this work, we make the first attempt to adapt few-shot image translation method to few-shot image generation task. Few-shot image translation disentangles an image into style vector and content map. An unseen style vector can be combined with different seen content maps to produce different images. However, it needs to store seen images to provide content maps and the unseen style vector may be incompatible with seen content maps. To adapt it to few-shot image generation task, we learn a compact dictionary of local content vectors via quantizing continuous content maps into discrete content maps instead of storing seen images. Furthermore, we model the autoregressive distribution of discrete content map conditioned on style vector, which can alleviate the incompatibility between content map and style vector. Qualitative and quantitative results on three real datasets demonstrate that our model can produce images of higher diversity and fidelity for unseen categories than previous methods.

preprint2022arXiv

From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

Zero-shot learning has been actively studied for image classification task to relieve the burden of annotating image labels. Interestingly, semantic segmentation task requires more labor-intensive pixel-wise annotation, but zero-shot semantic segmentation has only attracted limited research interest. Thus, we focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations provided for unseen categories. In this paper, we propose a novel Context-aware feature Generation Network (CaGNet), which can synthesize context-aware pixel-wise visual features for unseen categories based on category-level semantic representations and pixel-wise contextual information. The synthesized features are used to finetune the classifier to enable segmenting unseen objects. Furthermore, we extend pixel-wise feature generation and finetuning to patch-wise feature generation and finetuning, which additionally considers inter-pixel relationship. Experimental results on Pascal-VOC, Pascal-Context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods. Code is available at https://github.com/bcmi/CaGNetv2-Zero-Shot-Semantic-Segmentation.

preprint2022arXiv

From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Video understanding has achieved great success in representation learning, such as video caption, video object grounding, and video descriptive question-answer. However, current methods still struggle on video reasoning, including evidence reasoning and commonsense reasoning. To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). For commonsense reasoning, we set up a two-step solution by answering the question and providing a proper reason. Through extensive experiments on existing VideoQA methods, we find that the state-of-the-art methods are strong in descriptions but weak in reasoning. We hope that Causal-VidQA can guide the research of video understanding from representation learning to deeper reasoning. The dataset and related resources are available at \url{https://github.com/bcmi/Causal-VidQA.git}.

preprint2022arXiv

High-Resolution Image Harmonization via Collaborative Dual Transformations

Given a composite image, image harmonization aims to adjust the foreground to make it compatible with the background. High-resolution image harmonization is in high demand, but still remains unexplored. Conventional image harmonization methods learn global RGB-to-RGB transformation which could effortlessly scale to high resolution, but ignore diverse local context. Recent deep learning methods learn the dense pixel-to-pixel transformation which could generate harmonious outputs, but are highly constrained in low resolution. In this work, we propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet) to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end network. Our CDTNet consists of a low-resolution generator for pixel-to-pixel transformation, a color mapping module for RGB-to-RGB transformation, and a refinement module to take advantage of both. Extensive experiments on high-resolution benchmark dataset and our created high-resolution real composite images demonstrate that our CDTNet strikes a good balance between efficiency and effectiveness. Our used datasets can be found in https://github.com/bcmi/CDTNet-High-Resolution-Image-Harmonization.

preprint2022arXiv

Human-centric Image Cropping with Partition-aware and Content-preserving Features

Image cropping aims to find visually appealing crops in an image, which is an important yet challenging task. In this paper, we consider a specific and practical application: human-centric image cropping, which focuses on the depiction of a person. To this end, we propose a human-centric image cropping method with two novel feature designs for the candidate crop: partition-aware feature and content-preserving feature. For partition-aware feature, we divide the whole image into nine partitions based on the human bounding box and treat different partitions in a candidate crop differently conditioned on the human information. For content-preserving feature, we predict a heatmap indicating the important content to be included in a good crop, and extract the geometric relation between the heatmap and a candidate crop. Extensive experiments demonstrate that our method can perform favorably against state-of-the-art image cropping methods on human-centric image cropping task. Code is available at https://github.com/bcmi/Human-Centric-Image-Cropping.

preprint2022arXiv

OPA: Object Placement Assessment Dataset

Image composition aims to generate realistic composite image by inserting an object from one image into another background image, where the placement (e.g., location, size, occlusion) of inserted object may be unreasonable, which would significantly degrade the quality of the composite image. Although some works attempted to learn object placement to create realistic composite images, they did not focus on assessing the plausibility of object placement. In this paper, we focus on object placement assessment task, which verifies whether a composite image is plausible in terms of the object placement. To accomplish this task, we construct the first Object Placement Assessment (OPA) dataset consisting of composite images and their rationality labels. We also propose a simple yet effective baseline for this task. Dataset is available at https://github.com/bcmi/Object-Placement-Assessment-Dataset-OPA.

preprint2022arXiv

Shadow Generation for Composite Image in Real-world Scenes

Image composition targets at inserting a foreground object into a background image. Most previous image composition methods focus on adjusting the foreground to make it compatible with background while ignoring the shadow effect of foreground on the background. In this work, we focus on generating plausible shadow for the foreground object in the composite image. First, we contribute a real-world shadow generation dataset DESOBA by generating synthetic composite images based on paired real images and deshadowed images. Then, we propose a novel shadow generation network SGRNet, which consists of a shadow mask prediction stage and a shadow filling stage. In the shadow mask prediction stage, foreground and background information are thoroughly interacted to generate foreground shadow mask. In the shadow filling stage, shadow parameters are predicted to fill the shadow area. Extensive experiments on our DESOBA dataset and real composite images demonstrate the effectiveness of our proposed method. Our dataset and code are available at https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBA.

preprint2022arXiv

Spatial Transformation for Image Composition via Correspondence Learning

When using cut-and-paste to acquire a composite image, the geometry inconsistency between foreground and background may severely harm its fidelity. To address the geometry inconsistency in composite images, several existing works learned to warp the foreground object for geometric correction. However, the absence of annotated dataset results in unsatisfactory performance and unreliable evaluation. In this work, we contribute a Spatial TRAnsformation for virtual Try-on (STRAT) dataset covering three typical application scenarios. Moreover, previous works simply concatenate foreground and background as input without considering their mutual correspondence. Instead, we propose a novel correspondence learning network (CorrelNet) to model the correspondence between foreground and background using cross-attention maps, based on which we can predict the target coordinate that each source coordinate of foreground should be mapped to on the background. Then, the warping parameters of foreground object can be derived from pairs of source and target coordinates. Additionally, we learn a filtering mask to eliminate noisy pairs of coordinates to estimate more accurate warping parameters. Extensive experiments on our STRAT dataset demonstrate that our proposed CorrelNet performs more favorably against previous methods.

preprint2022arXiv

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation

Video Instance Segmentation (VIS) is a task that simultaneously requires classification, segmentation, and instance association in a video. Recent VIS approaches rely on sophisticated pipelines to achieve this goal, including RoI-related operations or 3D convolutions. In contrast, we present a simple and efficient single-stage VIS framework based on the instance segmentation method CondInst by adding an extra tracking head. To improve instance association accuracy, a novel bi-directional spatio-temporal contrastive learning strategy for tracking embedding across frames is proposed. Moreover, an instance-wise temporal consistency scheme is utilized to produce temporally coherent results. Experiments conducted on the YouTube-VIS-2019, YouTube-VIS-2021, and OVIS-2021 datasets validate the effectiveness and efficiency of the proposed method. We hope the proposed framework can serve as a simple and strong alternative for many other instance-level video association tasks.

preprint2022arXiv

XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding

Recently, various multimodal networks for Visually-Rich Document Understanding(VRDU) have been proposed, showing the promotion of transformers by integrating visual and layout information with the text embeddings. However, most existing approaches utilize the position embeddings to incorporate the sequence information, neglecting the noisy improper reading order obtained by OCR tools. In this paper, we propose a robust layout-aware multimodal network named XYLayoutLM to capture and leverage rich layout information from proper reading orders produced by our Augmented XY Cut. Moreover, a Dilated Conditional Position Encoding module is proposed to deal with the input sequence of variable lengths, and it additionally extracts local layout information from both textual and visual modalities while generating position embeddings. Experiment results show that our XYLayoutLM achieves competitive results on document understanding tasks.

preprint2020arXiv

Beyond without Forgetting: Multi-Task Learning for Classification with Disjoint Datasets

Multi-task Learning (MTL) for classification with disjoint datasets aims to explore MTL when one task only has one labeled dataset. In existing methods, for each task, the unlabeled datasets are not fully exploited to facilitate this task. Inspired by semi-supervised learning, we use unlabeled datasets with pseudo labels to facilitate each task. However, there are two major issues: 1) the pseudo labels are very noisy; 2) the unlabeled datasets and the labeled dataset for each task has considerable data distribution mismatch. To address these issues, we propose our MTL with Selective Augmentation (MTL-SA) method to select the training samples in unlabeled datasets with confident pseudo labels and close data distribution to the labeled dataset. Then, we use the selected training samples to add information and use the remaining training samples to preserve information. Extensive experiments on face-centric and human-centric applications demonstrate the effectiveness of our MTL-SA method.

preprint2020arXiv

Context-aware Feature Generation for Zero-shot Semantic Segmentation

Existing semantic segmentation models heavily rely on dense pixel-wise annotations. To reduce the annotation pressure, we focus on a challenging task named zero-shot semantic segmentation, which aims to segment unseen objects with zero annotations. This task can be accomplished by transferring knowledge across categories via semantic word embeddings. In this paper, we propose a novel context-aware feature generation method for zero-shot segmentation named CaGNet. In particular, with the observation that a pixel-wise feature highly depends on its contextual information, we insert a contextual module in a segmentation network to capture the pixel-wise contextual information, which guides the process of generating more diverse and context-aware features from semantic word embeddings. Our method achieves state-of-the-art results on three benchmark datasets for zero-shot segmentation. Codes are available at: https://github.com/bcmi/CaGNet-Zero-Shot-Semantic-Segmentation.

preprint2020arXiv

F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation

In order to generate images for a given category, existing deep generative models generally rely on abundant training images. However, extensive data acquisition is expensive and fast learning ability from limited data is necessarily required in real-world applications. Also, these existing methods are not well-suited for fast adaptation to a new category. Few-shot image generation, aiming to generate images from only a few images for a new category, has attracted some research interest. In this paper, we propose a Fusing-and-Filling Generative Adversarial Network (F2GAN) to generate realistic and diverse images for a new category with only a few images. In our F2GAN, a fusion generator is designed to fuse the high-level features of conditional images with random interpolation coefficients, and then fills in attended low-level details with non-local attention module to produce a new image. Moreover, our discriminator can ensure the diversity of generated images by a mode seeking loss and an interpolation regression loss. Extensive experiments on five datasets demonstrate the effectiveness of our proposed method for few-shot image generation.

preprint2020arXiv

Hard Pixel Mining for Depth Privileged Semantic Segmentation

Semantic segmentation has achieved remarkable progress but remains challenging due to the complex scene, object occlusion, and so on. Some research works have attempted to use extra information such as a depth map to help RGB based semantic segmentation because the depth map could provide complementary geometric cues. However, due to the inaccessibility of depth sensors, depth information is usually unavailable for the test images. In this paper, we leverage only the depth of training images as the privileged information to mine the hard pixels in semantic segmentation, in which depth information is only available for training images but not available for test images. Specifically, we propose a novel Loss Weight Module, which outputs a loss weight map by employing two depth-related measurements of hard pixels: Depth Prediction Error and Depthaware Segmentation Error. The loss weight map is then applied to segmentation loss, with the goal of learning a more robust model by paying more attention to the hard pixels. Besides, we also explore a curriculum learning strategy based on the loss weight map. Meanwhile, to fully mine the hard pixels on different scales, we apply our loss weight module to multi-scale side outputs. Our hard pixels mining method achieves the state-of-the-art results on two benchmark datasets, and even outperforms the methods which need depth input during testing.

preprint2020arXiv

Image Harmonization Dataset iHarmony4: HCOCO, HAdobe5k, HFlickr, and Hday2night

Image composition is an important operation in image processing, but the inconsistency between foreground and background significantly degrades the quality of composite image. Image harmonization, which aims to make the foreground compatible with the background, is a promising yet challenging task. However, the lack of high-quality public dataset for image harmonization, which significantly hinders the development of image harmonization techniques. Therefore, we contribute an image harmonization dataset iHarmony4 by generating synthesized composite images based on existing COCO (resp., Adobe5k, day2night) dataset, leading to our HCOCO (resp., HAdobe5k, Hday2night) sub-dataset. To enrich the diversity of our dataset, we also generate synthesized composite images based on our collected Flick images, leading to our HFlickr sub-dataset. The image harmonization dataset iHarmony4 is released at https://github.com/bcmi/Image_Harmonization_Datasets.

preprint2020arXiv

Learning from Web Data with Self-Organizing Memory Module

Learning from web data has attracted lots of research interest in recent years. However, crawled web images usually have two types of noises, label noise and background noise, which induce extra difficulties in utilizing them effectively. Most existing methods either rely on human supervision or ignore the background noise. In this paper, we propose a novel method, which is capable of handling these two types of noises together, without the supervision of clean images in the training stage. Particularly, we formulate our method under the framework of multi-instance learning by grouping ROIs (i.e., images and their region proposals) from the same category into bags. ROIs in each bag are assigned with different weights based on the representative/discriminative scores of their nearest clusters, in which the clusters and their scores are obtained via our designed memory module. Our memory module could be naturally integrated with the classification module, leading to an end-to-end trainable system. Extensive experiments on four benchmark datasets demonstrate the effectiveness of our method.

preprint2020arXiv

MatchingGAN: Matching-based Few-shot Image Generation

To generate new images for a given category, most deep generative models require abundant training images from this category, which are often too expensive to acquire. To achieve the goal of generation based on only a few images, we propose matching-based Generative Adversarial Network (GAN) for few-shot generation, which includes a matching generator and a matching discriminator. Matching generator can match random vectors with a few conditional images from the same category and generate new images for this category based on the fused features. The matching discriminator extends conventional GAN discriminator by matching the feature of generated image with the fused feature of conditional images. Extensive experiments on three datasets demonstrate the effectiveness of our proposed method.

preprint2020arXiv

Prediction defaults for networked-guarantee loans

Networked-guarantee loans may cause the systemic risk related concern of the government and banks in China. The prediction of default of enterprise loans is a typical extremely imbalanced prediction problem, and the networked-guarantee make this problem more difficult to solve. Since the guaranteed loan is a debt obligation promise, if one enterprise in the guarantee network falls into a financial crisis, the debt risk may spread like a virus across the guarantee network, even lead to a systemic financial crisis. In this paper, we propose an imbalanced network risk diffusion model to forecast the enterprise default risk in a short future. Positive weighted k-nearest neighbors (p-wkNN) algorithm is developed for the stand-alone case -- when there is no default contagious; then a data-driven default diffusion model is integrated to further improve the prediction accuracy. We perform the empirical study on a real-world three-years loan record from a major commercial bank. The results show that our proposed method outperforms conventional credit risk methods in terms of AUC. In summary, our quantitative risk evaluation model shows promising prediction performance on real-world data, which could be useful to both regulators and stakeholders.

preprint2020arXiv

Visual analytics for networked-guarantee loans risk management

Groups of enterprises guarantee each other and form complex guarantee networks when they try to obtain loans from banks. Such secured loan can enhance the solvency and promote the rapid growth in the economic upturn period. However, potential systemic risk may happen within the risk binding community. Especially, during the economic down period, the crisis may spread in the guarantee network like a domino. Monitoring the financial status, preventing or reducing systematic risk when crisis happens is highly concerned by the regulatory commission and banks. We propose visual analytics approach for loan guarantee network risk management, and consolidate the five analysis tasks with financial experts: i) visual analytics for enterprises default risk, whereby a hybrid representation is devised to predict the default risk and developed an interface to visualize key indicators; ii) visual analytics for high default groups, whereby a community detection based interactive approach is presented; iii) visual analytics for high defaults pattern, whereby a motif detection based interactive approach is described, and we adopt a Shneiderman Mantra strategy to reduce the computation complexity. iv) visual analytics for evolving guarantee network, whereby animation is used to help understanding the guarantee dynamic; v) visual analytics approach and interface for default diffusion path. The temporal diffusion path analysis can be useful for the government and bank to monitor the default spread status. It also provides insight for taking precautionary measures to prevent and dissolve systemic financial risk. We implement the system with case studies on a real-world guarantee network. Two financial experts are consulted with endorsement on the developed tool. To the best of our knowledge, this is the first visual analytics tool to explore the guarantee network risks in a systematic manner.

preprint2020arXiv

Zero-Shot Sketch-Based Image Retrieval with Structure-aware Asymmetric Disentanglement

The goal of Sketch-Based Image Retrieval (SBIR) is using free-hand sketches to retrieve images of the same category from a natural image gallery. However, SBIR requires all test categories to be seen during training, which cannot be guaranteed in real-world applications. So we investigate more challenging Zero-Shot SBIR (ZS-SBIR), in which test categories do not appear in the training stage. After realizing that sketches mainly contain structure information while images contain additional appearance information, we attempt to achieve structure-aware retrieval via asymmetric disentanglement.For this purpose, we propose our STRucture-aware Asymmetric Disentanglement (STRAD) method, in which image features are disentangled into structure features and appearance features while sketch features are only projected to structure space. Through disentangling structure and appearance space, bi-directional domain translation is performed between the sketch domain and the image domain. Extensive experiments demonstrate that our STRAD method remarkably outperforms state-of-the-art methods on three large-scale benchmark datasets.

preprint2016arXiv

Tensor Ring Decomposition

Tensor networks have in recent years emerged as the powerful tools for solving the large-scale optimization problems. One of the most popular tensor network is tensor train (TT) decomposition that acts as the building blocks for the complicated tensor networks. However, the TT decomposition highly depends on permutations of tensor dimensions, due to its strictly sequential multilinear products over latent cores, which leads to difficulties in finding the optimal TT representation. In this paper, we introduce a fundamental tensor decomposition model to represent a large dimensional tensor by a circular multilinear products over a sequence of low dimensional cores, which can be graphically interpreted as a cyclic interconnection of 3rd-order tensors, and thus termed as tensor ring (TR) decomposition. The key advantage of TR model is the circular dimensional permutation invariance which is gained by employing the trace operation and treating the latent cores equivalently. TR model can be viewed as a linear combination of TT decompositions, thus obtaining the powerful and generalized representation abilities. For optimization of latent cores, we present four different algorithms based on the sequential SVDs, ALS scheme, and block-wise ALS techniques. Furthermore, the mathematical properties of TR model are investigated, which shows that the basic multilinear algebra can be performed efficiently by using TR representaions and the classical tensor decompositions can be conveniently transformed into the TR representation. Finally, the experiments on both synthetic signals and real-world datasets were conducted to evaluate the performance of different algorithms.

preprint2015arXiv

Bayesian Robust Tensor Factorization for Incomplete Multiway Data

We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CP-rank tensor capturing the global information and a sparse tensor capturing the local information (also considered as outliers), thus providing the robust predictive distribution over missing entries. The low-CP-rank tensor is modeled by multilinear interactions between multiple latent factors on which the column sparsity is enforced by a hierarchical prior, while the sparse tensor is modeled by a hierarchical view of Student-$t$ distribution that associates an individual hyperparameter with each element independently. For model learning, we develop an efficient closed-form variational inference under a fully Bayesian treatment, which can effectively prevent the overfitting problem and scales linearly with data size. In contrast to existing related works, our method can perform model selection automatically and implicitly without need of tuning parameters. More specifically, it can discover the groundtruth of CP rank and automatically adapt the sparsity inducing priors to various types of outliers. In addition, the tradeoff between the low-rank approximation and the sparse representation can be optimized in the sense of maximum model evidence. The extensive experiments and comparisons with many state-of-the-art algorithms on both synthetic and real-world datasets demonstrate the superiorities of our method from several perspectives.

preprint2015arXiv

Bayesian Sparse Tucker Models for Dimension Reduction and Tensor Completion

Tucker decomposition is the cornerstone of modern machine learning on tensorial data analysis, which have attracted considerable attention for multiway feature extraction, compressive sensing, and tensor completion. The most challenging problem is related to determination of model complexity (i.e., multilinear rank), especially when noise and missing data are present. In addition, existing methods cannot take into account uncertainty information of latent factors, resulting in low generalization performance. To address these issues, we present a class of probabilistic generative Tucker models for tensor decomposition and completion with structural sparsity over multilinear latent space. To exploit structural sparse modeling, we introduce two group sparsity inducing priors by hierarchial representation of Laplace and Student-t distributions, which facilitates fully posterior inference. For model learning, we derived variational Bayesian inferences over all model (hyper)parameters, and developed efficient and scalable algorithms based on multilinear operations. Our methods can automatically adapt model complexity and infer an optimal multilinear rank by the principle of maximum lower bound of model evidence. Experimental results and comparisons on synthetic, chemometrics and neuroimaging data demonstrate remarkable performance of our models for recovering ground-truth of multilinear rank and missing entries.

preprint2014arXiv

Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination

CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank. In addition, existing approaches do not take into account uncertainty information of latent factors, as well as missing entries. To address these issues, we formulate CP factorization using a hierarchical probabilistic model and employ a fully Bayesian treatment by incorporating a sparsity-inducing prior over multiple latent factors and the appropriate hyperpriors over all hyperparameters, resulting in automatic rank determination. To learn the model, we develop an efficient deterministic Bayesian inference algorithm, which scales linearly with data size. Our method is characterized as a tuning parameter-free approach, which can effectively infer underlying multilinear factors with a low-rank constraint, while also providing predictive distributions over missing entries. Extensive simulations on synthetic data illustrate the intrinsic capability of our method to recover the ground-truth of CP rank and prevent the overfitting problem, even when a large amount of entries are missing. Moreover, the results from real-world applications, including image inpainting and facial image synthesis, demonstrate that our method outperforms state-of-the-art approaches for both tensor factorization and tensor completion in terms of predictive performance.

preprint2014arXiv

Uplink Contention Based SCMA for 5G Radio Access

Fifth generation (5G) wireless networks are expected to support very diverse applications and terminals. Massive connectivity with a large number of devices is an important requirement for 5G networks. Current LTE system is not able to efficiently support massive connectivity, especially on the uplink (UL). Among the issues arise due to massive connectivity is the cost of signaling overhead and latency. In this paper, an uplink contention-based sparse code multiple access (SCMA) design is proposed as a solution. First, the system design aspects of the proposed multiple-access scheme are described. The SCMA parameters can be adjusted to provide different levels of overloading, thus suitable to meet the diverse traffic connectivity requirements. In addition, the system-level evaluations of a small packet application scenario are provided for contention-based UL SCMA. SCMA is compared to OFDMA in terms of connectivity and drop rate under a tight latency requirement. The simulation results demonstrate that contention-based SCMA can provide around 2.8 times gain over contention-based OFDMA in terms of supported active users. The uplink contention-based SCMA scheme can be a promising technology for 5G wireless networks for data transmission with low signaling overhead, low delay, and support of massive connectivity.

preprint2013arXiv

Spatial-Spectral Boosting Analysis for Stroke Patients' Motor Imagery EEG in Rehabilitation Training

Current studies about motor imagery based rehabilitation training systems for stroke subjects lack an appropriate analytic method, which can achieve a considerable classification accuracy, at the same time detects gradual changes of imagery patterns during rehabilitation process and disinters potential mechanisms about motor function recovery. In this study, we propose an adaptive boosting algorithm based on the cortex plasticity and spectral band shifts. This approach models the usually predetermined spatial-spectral configurations in EEG study into variable preconditions, and introduces a new heuristic of stochastic gradient boost for training base learners under these preconditions. We compare our proposed algorithm with commonly used methods on datasets collected from 2 months' clinical experiments. The simulation results demonstrate the effectiveness of the method in detecting the variations of stroke patients' EEG patterns. By chronologically reorganizing the weight parameters of the learned additive model, we verify the spatial compensatory mechanism on impaired cortex and detect the changes of accentuation bands in spectral domain, which may contribute important prior knowledge for rehabilitation practice.

preprint2012arXiv

Higher-Order Partial Least Squares (HOPLS): A Generalized Multi-Linear Regression Method

A new generalized multilinear regression model, termed the Higher-Order Partial Least Squares (HOPLS), is introduced with the aim to predict a tensor (multiway array) $\tensor{Y}$ from a tensor $\tensor{X}$ through projecting the data onto the latent space and performing regression on the corresponding latent variables. HOPLS differs substantially from other regression models in that it explains the data by a sum of orthogonal Tucker tensors, while the number of orthogonal loadings serves as a parameter to control model complexity and prevent overfitting. The low dimensional latent space is optimized sequentially via a deflation operation, yielding the best joint subspace approximation for both $\tensor{X}$ and $\tensor{Y}$. Instead of decomposing $\tensor{X}$ and $\tensor{Y}$ individually, higher order singular value decomposition on a newly defined generalized cross-covariance tensor is employed to optimize the orthogonal loadings. A systematic comparison on both synthetic data and real-world decoding of 3D movement trajectories from electrocorticogram (ECoG) signals demonstrate the advantages of HOPLS over the existing methods in terms of better predictive ability, suitability to handle small sample sizes, and robustness to noise.

Liqing Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

33 published item(s)

DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport

Enhancing Domain Generalization in 3D Human Pose Estimation through Controllable Generative Augmentation

High-Quality 3D Head Reconstruction from Any Single Portrait Image

PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Few-shot Image Generation Using Discrete Content Representation

From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

High-Resolution Image Harmonization via Collaborative Dual Transformations

Human-centric Image Cropping with Partition-aware and Content-preserving Features

OPA: Object Placement Assessment Dataset

Shadow Generation for Composite Image in Real-world Scenes

Spatial Transformation for Image Composition via Correspondence Learning

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation

XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding

Beyond without Forgetting: Multi-Task Learning for Classification with Disjoint Datasets

Context-aware Feature Generation for Zero-shot Semantic Segmentation

F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation

Hard Pixel Mining for Depth Privileged Semantic Segmentation

Image Harmonization Dataset iHarmony4: HCOCO, HAdobe5k, HFlickr, and Hday2night

Learning from Web Data with Self-Organizing Memory Module

MatchingGAN: Matching-based Few-shot Image Generation

Prediction defaults for networked-guarantee loans

Visual analytics for networked-guarantee loans risk management

Zero-Shot Sketch-Based Image Retrieval with Structure-aware Asymmetric Disentanglement

Tensor Ring Decomposition

Bayesian Robust Tensor Factorization for Incomplete Multiway Data

Bayesian Sparse Tucker Models for Dimension Reduction and Tensor Completion

Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination

Uplink Contention Based SCMA for 5G Radio Access

Spatial-Spectral Boosting Analysis for Stroke Patients' Motor Imagery EEG in Rehabilitation Training

Higher-Order Partial Least Squares (HOPLS): A Generalized Multi-Linear Regression Method