Researcher profile

Ping Liu

Ping Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

Closed-Form Linear-Probe Dataset Distillation for Pre-trained Vision Models

Dataset distillation compresses a large training set into a small synthetic set that preserves downstream training utility. While most existing methods target training networks from scratch, modern visual transfer learning often uses frozen pre-trained encoders followed by lightweight linear probing. Existing distillation methods for this setting either unroll iterative linear-probe updates with trajectory-based gradient matching, or rely on closed-form formulations originally designed for from-scratch training with neural-tangent-kernel (NTK) approximations. Neither route exploits the fact that frozen-feature linear probing admits a closed-form solution determined directly by the pre-trained features themselves, with no infinite-width approximation and no inner-loop trajectory. We propose Closed-Form Linear-Probe Dataset Distillation (CLP-DD), a bilevel formulation that computes the linear probe induced by the synthetic set with a sample-space kernel ridge solver. The synthetic images are then updated by evaluating this induced classifier on real features through a temperature-scaled softmax cross-entropy, where the classifier columns act as learned class anchors in feature space. We further show that the choice of outer objective is decisive: pairing the closed-form inner solver with a standard MSE outer loss substantially underperforms trajectory-based methods, while the discriminative outer loss closes most of the gap. On ImageNet-100 with four pre-trained backbones, CLP-DD substantially improves over LGM without DSA and approaches LGM with DSA at a fraction of the computational cost. On ImageNet-1K, CLP-DD matches or surpasses LGM with DSA on three of four backbones while running roughly $14\times$ faster and using less than one-eighth of the GPU memory.

preprint2024arXiv

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance

Existing text-to-image editing methods tend to excel either in rigid or non-rigid editing but encounter challenges when combining both, resulting in misaligned outputs with the provided text prompts. In addition, integrating reference images for control remains challenging. To address these issues, we present a versatile image editing framework capable of executing both rigid and non-rigid edits, guided by either textual prompts or reference images. We leverage a dual-path injection scheme to handle diverse editing scenarios and introduce an integrated self-attention mechanism for fusion of appearance and structural information. To mitigate potential visual artifacts, we further employ latent fusion techniques to adjust intermediate latents. Compared to previous work, our approach represents a significant advance in achieving precise and versatile image editing. Comprehensive experiments validate the efficacy of our method, showcasing competitive or superior results in text-based editing and appearance transfer tasks, encompassing both rigid and non-rigid settings.

preprint2022arXiv

A measurement decoupling based fast algorithm for super-resolving point sources with multi-cluster structure

We consider the problem of resolving closely spaced point sources in one dimension from their Fourier data in a bounded domain. Classical subspace methods (e.g., MUSIC algorithm, Matrix Pencil method, etc.) show great superiority in resolving closely spaced sources, but their computational cost is usually heavy. This is especially the case for point sources with a multi-cluster structure which requires processing large-sized data matrix resulted from highly sampled measurements. To address this issue, we propose a fast algorithm termed D-MUSIC, based on a measurement decoupling strategy. We demonstrate theoretically that for point sources with a known cluster structure, their measurement can be decoupled into local measurements of each of the clusters by solving a system of linear equations that are obtained by using a multipole basis. We further develop a subsampled MUSIC algorithm to detect the cluster structure and utilize it to decouple the global measurement. In the end, the MUSIC algorithm was applied to each local measurement to resolve point sources therein. Compared to the standard MUSIC algorithm, the proposed algorithm has comparable super-resolving capability while having a much lower computational complexity.

preprint2022arXiv

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning

The expensive annotation cost is notoriously known as the main constraint for the development of the point cloud semantic segmentation technique. Active learning methods endeavor to reduce such cost by selecting and labeling only a subset of the point clouds, yet previous attempts ignore the spatial-structural diversity of the selected samples, inducing the model to select clustered candidates with similar shapes in a local area while missing other representative ones in the global environment. In this paper, we propose a new 3D region-based active learning method to tackle this problem. Dubbed SSDR-AL, our method groups the original point clouds into superpoints and incrementally selects the most informative and representative ones for label acquisition. We achieve the selection mechanism via a graph reasoning network that considers both the spatial and structural diversities of superpoints. To deploy SSDR-AL in a more practical scenario, we design a noise-aware iterative labeling strategy to confront the "noisy annotation" problem introduced by the previous "dominant labeling" strategy in superpoints. Extensive experiments on two point cloud benchmarks demonstrate the effectiveness of SSDR-AL in the semantic segmentation task. Particularly, SSDR-AL significantly outperforms the baseline method and reduces the annotation cost by up to 63.0% and 24.0% when achieving 90% performance of fully supervised learning, respectively.

preprint2022arXiv

Dynamic super-resolution in particle tracking problems

Particle tracking in biological imaging is concerned with reconstructing the trajectories, locations, or velocities of the targeting particles. The standard approach of particle tracking consists of two steps: first reconstructing statically the source locations in each time step, and second applying tracking techniques to obtain the trajectories and velocities. In contrast, the dynamic reconstruction seeks to simultaneously recover the source locations and velocities from all frames, which enjoys certain advantages. In this paper, we provide a rigorous mathematical analysis for the resolution limit of reconstructing source number, locations, and velocities by general dynamical reconstruction in particle tracking problems, by which we demonstrate the possibility of achieving super-resolution for the dynamic reconstruction. We show that when the location-velocity pairs of the particles are separated beyond certain distances (the resolution limits), the number of particles and the location-velocity pair can be stably recovered. The resolution limits are related to the cut-off frequency of the imaging system, signal-to-noise ratio, and the sparsity of the source. By these estimates, we also derive a stability result for a sparsity-promoting dynamic reconstruction. In addition, we further show that the reconstruction of velocities has a better resolution limit which improves constantly as the particles moving. This result is derived by an observation that the inherent cut-off frequency for the velocity recovery can be viewed as the total observation time multiplies the cut-off frequency of the imaging system, which may lead to a better resolution limit as compared to the one for each diffraction-limited frame. It is anticipated that this observation can inspire new reconstruction algorithms that improve the resolution of particle tracking in practice.

preprint2022arXiv

Filter Pruning by Switching to Neighboring CNNs with Good Attributes

Filter pruning is effective to reduce the computational costs of neural networks. Existing methods show that updating the previous pruned filter would enable large model capacity and achieve better performance. However, during the iterative pruning process, even if the network weights are updated to new values, the pruning criterion remains the same. In addition, when evaluating the filter importance, only the magnitude information of the filters is considered. However, in neural networks, filters do not work individually, but they would affect other filters. As a result, the magnitude information of each filter, which merely reflects the information of an individual filter itself, is not enough to judge the filter importance. To solve the above problems, we propose Meta-attribute-based Filter Pruning (MFP). First, to expand the existing magnitude information based pruning criteria, we introduce a new set of criteria to consider the geometric distance of filters. Additionally, to explicitly assess the current state of the network, we adaptively select the most suitable criteria for pruning via a meta-attribute, a property of the neural network at the current state. Experiments on two image classification benchmarks validate our method. For ResNet-50 on ILSVRC-2012, we could reduce more than 50% FLOPs with only 0.44% top-5 accuracy loss.

preprint2022arXiv

Nearly optimal resolution estimate for the two-dimensional super-resolution and a new algorithm for direction of arrival estimation with uniform rectangular array

In this paper, we develop a new technique to obtain nearly optimal estimates of the computational resolution limits introduced in Appl. Comput. Harmon. Anal. 56 (2022) 402-446; IEEE Trans. Inf. Theory 67(7) (2021) 4812-4827; Inverse Probl. 37(10) (2021) 104001 for two-dimensional super-resolution problems. Our main contributions are fivefold: (i) Our work improves the resolution estimate for number detection and location recovery in two-dimensional super-resolution problems to nearly optimal; (ii) As a consequence, we derive a stability result for a sparsity-promoting algorithm in two-dimensional super-resolution problems (or Direction of Arrival problems (DOA)). The stability result exhibits the optimal performance of sparsity promoting in solving such problems; (iii) Our techniques pave the way for improving the estimate for resolution limits in higher-dimensional super-resolutions to nearly optimal; (iv) Inspired by these new techniques, we propose a new coordinate-combination-based model order detection algorithm for two-dimensional DOA estimation and theoretically demonstrate its optimal performance, and (v) we also propose a new coordinate-combination-based MUSIC algorithm for super-resolving sources in two-dimensional DOA estimation. It has excellent performance and enjoys many advantages compared to the conventional DOA algorithms. The coordinate-combination idea seems to be a promising way for multi-dimensional DOA estimation.

preprint2022arXiv

Pro-UIGAN: Progressive Face Hallucination from Occluded Thumbnails

In this paper, we study the task of hallucinating an authentic high-resolution (HR) face from an occluded thumbnail. We propose a multi-stage Progressive Upsampling and Inpainting Generative Adversarial Network, dubbed Pro-UIGAN, which exploits facial geometry priors to replenish and upsample (8*) the occluded and tiny faces (16*16 pixels). Pro-UIGAN iteratively (1) estimates facial geometry priors for low-resolution (LR) faces and (2) acquires non-occluded HR face images under the guidance of the estimated priors. Our multi-stage hallucination network super-resolves and inpaints occluded LR faces in a coarse-to-fine manner, thus reducing unwanted blurriness and artifacts significantly. Specifically, we design a novel cross-modal transformer module for facial priors estimation, in which an input face and its landmark features are formulated as queries and keys, respectively. Such a design encourages joint feature learning across the input facial and landmark features, and deep feature correspondences will be discovered by attention. Thus, facial appearance features and facial geometry priors are learned in a mutual promotion manner. Extensive experiments demonstrate that our Pro-UIGAN achieves visually pleasing HR faces, reaching superior performance in downstream tasks, i.e., face alignment, face parsing, face recognition and expression classification, compared with other state-of-the-art (SotA) methods.

preprint2021arXiv

A mathematical theory of computational resolution limit in one dimension

Given an image generated by the convolution of point sources with a band-limited function, the deconvolution problem is to reconstruct the source number, positions, and amplitudes. This problem arises from many important applications in imaging and signal processing. It is well-known that it is impossible to resolve the sources when they are close enough in practice. Rayleigh investigated this problem and formulated a resolution limit, the so-called Rayleigh limit, for the case of two sources with identical amplitudes. On the other hand, many numerical experiments demonstrate that a stable recovery of the sources is possible even if the sources are separated below the Rayleigh limit. In this paper, a mathematical theory for the deconvolution problem in one dimension is developed. The theory addresses the issue when one can recover the source number exactly from noisy data. The key is a new concept "computational resolution limit" which is defined to be the minimum separation distance between the sources such that exact recovery of the source number is possible. This new resolution limit is determined by the signal-to-noise ratio and the sparsity of sources, in addition to the cutoff frequency of the image. Quantitative bounds for this limit is derived, which reveal the importance of the sparsity as well as the signal-to-noise ratio to the recovery problem. The stability for recovering the source positions is also analyzed under a condition on their separation distances. Moreover, a singular-value-thresholding algorithm is proposed to recover the source number of a cluster of closely spaced point sources and to verify our theoretical results on the computational resolution limit. The results are based on a multipole expansion method and a non-linear approximation theory in Vandermonde space.

preprint2021arXiv

Generative Transition Mechanism to Image-to-Image Translation via Encoded Transformation

In this paper, we revisit the Image-to-Image (I2I) translation problem with transition consistency, namely the consistency defined on the conditional data mapping between each data pairs. Explicitly parameterizing each data mappings with a transition variable $t$, i.e., $x \overset{t(x,y)}{\mapsto}y$, we discover that existing I2I translation models mainly focus on maintaining consistency on results, e.g., image reconstruction or attribute prediction, named result consistency in our paper. This restricts their generalization ability to generate satisfactory results with unseen transitions in the test phase. Consequently, we propose to enforce both result consistency and transition consistency for I2I translation, to benefit the problem with a closer consistency between the input and output. To benefit the generalization ability of the translation model, we propose transition encoding to facilitate explicit regularization of these two {kinds} of consistencies on unseen transitions. We further generalize such explicitly regularized consistencies to distribution-level, thus facilitating a generalized overall consistency for I2I translation problems. With the above design, our proposed model, named Transition Encoding GAN (TEGAN), can poss superb generalization ability to generate realistic and semantically consistent translation results with unseen transitions in the test phase. It also provides a unified understanding of the existing GAN-based I2I transition models with our explicitly modeling of the data mapping, i.e., transition. Experiments on four different I2I translation tasks demonstrate the efficacy and generality of TEGAN.

preprint2020arXiv

Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation

We aim at the problem named One-Shot Unsupervised Domain Adaptation. Unlike traditional Unsupervised Domain Adaptation, it assumes that only one unlabeled target sample can be available when learning to adapt. This setting is realistic but more challenging, in which conventional adaptation approaches are prone to failure due to the scarce of unlabeled target data. To this end, we propose a novel Adversarial Style Mining approach, which combines the style transfer module and task-specific module into an adversarial manner. Specifically, the style transfer module iteratively searches for harder stylized images around the one-shot target sample according to the current learning state, leading the task model to explore the potential styles that are difficult to solve in the almost unseen target domain, thus boosting the adaptation performance in a data-scarce scenario. The adversarial learning framework makes the style transfer module and task-specific module benefit each other during the competition. Extensive experiments on both cross-domain classification and segmentation benchmarks verify that ASM achieves state-of-the-art adaptation performance under the challenging one-shot setting.

preprint2020arXiv

Face Hallucination with Finishing Touches

Obtaining a high-quality frontal face image from a low-resolution (LR) non-frontal face image is primarily important for many facial analysis applications. However, mainstreams either focus on super-resolving near-frontal LR faces or frontalizing non-frontal high-resolution (HR) faces. It is desirable to perform both tasks seamlessly for daily-life unconstrained face images. In this paper, we present a novel Vivid Face Hallucination Generative Adversarial Network (VividGAN) for simultaneously super-resolving and frontalizing tiny non-frontal face images. VividGAN consists of coarse-level and fine-level Face Hallucination Networks (FHnet) and two discriminators, i.e., Coarse-D and Fine-D. The coarse-level FHnet generates a frontal coarse HR face and then the fine-level FHnet makes use of the facial component appearance prior, i.e., fine-grained facial components, to attain a frontal HR face image with authentic details. In the fine-level FHnet, we also design a facial component-aware module that adopts the facial geometry guidance as clues to accurately align and merge the frontal coarse HR face and prior information. Meanwhile, two-level discriminators are designed to capture both the global outline of a face image as well as detailed facial characteristics. The Coarse-D enforces the coarsely hallucinated faces to be upright and complete while the Fine-D focuses on the fine hallucinated ones for sharper details. Extensive experiments demonstrate that our VividGAN achieves photo-realistic frontal HR faces, reaching superior performance in downstream tasks, i.e., face recognition and expression classification, compared with other state-of-the-art methods.

preprint2020arXiv

Federated Generative Adversarial Learning

This work studies training generative adversarial networks under the federated learning setting. Generative adversarial networks (GANs) have achieved advancement in various real-world applications, such as image editing, style transfer, scene generations, etc. However, like other deep learning models, GANs are also suffering from data limitation problems in real cases. To boost the performance of GANs in target tasks, collecting images as many as possible from different sources becomes not only important but also essential. For example, to build a robust and accurate bio-metric verification system, huge amounts of images might be collected from surveillance cameras, and/or uploaded from cellphones by users accepting agreements. In an ideal case, utilize all those data uploaded from public and private devices for model training is straightforward. Unfortunately, in the real scenarios, this is hard due to a few reasons. At first, some data face the serious concern of leakage, and therefore it is prohibitive to upload them to a third-party server for model training; at second, the images collected by different kinds of devices, probably have distinctive biases due to various factors, $\textit{e.g.}$, collector preferences, geo-location differences, which is also known as "domain shift". To handle those problems, we propose a novel generative learning scheme utilizing a federated learning framework. Following the configuration of federated learning, we conduct model training and aggregation on one center and a group of clients. Specifically, our method learns the distributed generative models in clients, while the models trained in each client are fused into one unified and versatile model in the center. We perform extensive experiments to compare different federation strategies, and empirically examine the effectiveness of federation under different levels of parallelism and data skewness.

preprint2020arXiv

Fermi liquid behavior and colossal magnetoresistance in layered MoOCl2

A characteristic of a Fermi liquid is the T^2 dependence of its resistivity, sometimes referred to as the Baber law. However, for most metals, this behavior is only restricted to very low temperatures, usually below 20 K. Here, we experimentally demonstrate that for the single-crystal van der Waals layered material MoOCl2, the Baber law holds in a wide temperature range up to ~120 K, indicating that the electron-electron scattering plays a dominant role in this material. Combining with the specific heat measurement, we find that the modified Kadowaki-Woods ratio of the material agrees well with many other strongly correlated metals. Furthermore, in the magneto-transport measurement, a colossal magneto-resistance is observed, which reaches ~350% at 9 T and displays no sign of saturation. With the help of first-principles calculations, we attribute this behavior to the presence of open orbits on the Fermi surface. We also suggest that the dominance of electron-electron scattering is related to an incipient charge density wave state of the material. Our results establish MoOCl2 as a strongly correlated metal and shed light on the underlying physical mechanism, which may open a new path for exploring the effects of electron-electron interaction in van der Waals layered structures.

preprint2020arXiv

Pseudo-Labeling for Small Lesion Detection on Diabetic Retinopathy Images

Diabetic retinopathy (DR) is a primary cause of blindness in working-age people worldwide. About 3 to 4 million people with diabetes become blind because of DR every year. Diagnosis of DR through color fundus images is a common approach to mitigate such problem. However, DR diagnosis is a difficult and time consuming task, which requires experienced clinicians to identify the presence and significance of many small features on high resolution images. Convolutional Neural Network (CNN) has proved to be a promising approach for automatic biomedical image analysis recently. In this work, we investigate lesion detection on DR fundus images with CNN-based object detection methods. Lesion detection on fundus images faces two unique challenges. The first one is that our dataset is not fully labeled, i.e., only a subset of all lesion instances are marked. Not only will these unlabeled lesion instances not contribute to the training of the model, but also they will be mistakenly counted as false negatives, leading the model move to the opposite direction. The second challenge is that the lesion instances are usually very small, making them difficult to be found by normal object detectors. To address the first challenge, we introduce an iterative training algorithm for the semi-supervised method of pseudo-labeling, in which a considerable number of unlabeled lesion instances can be discovered to boost the performance of the lesion detector. For the small size targets problem, we extend both the input size and the depth of feature pyramid network (FPN) to produce a large CNN feature map, which can preserve the detail of small lesions and thus enhance the effectiveness of the lesion detector. The experimental results show that our proposed methods significantly outperform the baselines.

preprint2020arXiv

Unsupervised Eyeglasses Removal in the Wild

Eyeglasses removal is challenging in removing different kinds of eyeglasses, e.g., rimless glasses, full-rim glasses and sunglasses, and recovering appropriate eyes. Due to the large visual variants, the conventional methods lack scalability. Most existing works focus on the frontal face images in the controlled environment, such as the laboratory, and need to design specific systems for different eyeglass types. To address the limitation, we propose a unified eyeglass removal model called Eyeglasses Removal Generative Adversarial Network (ERGAN), which could handle different types of glasses in the wild. The proposed method does not depend on the dense annotation of eyeglasses location but benefits from the large-scale face images with weak annotations. Specifically, we study the two relevant tasks simultaneously, i.e., removing and wearing eyeglasses. Given two facial images with and without eyeglasses, the proposed model learns to swap the eye area in two faces. The generation mechanism focuses on the eye area and invades the difficulty of generating a new face. In the experiment, we show the proposed method achieves a competitive removal quality in terms of realism and diversity. Furthermore, we evaluate ERGAN on several subsequent tasks, such as face verification and facial expression recognition. The experiment shows that our method could serve as a pre-processing method for these tasks.

preprint2019arXiv

Very Long Natural Scenery Image Prediction by Outpainting

Comparing to image inpainting, image outpainting receives less attention due to two challenges in it. The first challenge is how to keep the spatial and content consistency between generated images and original input. The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input. To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure. By this design, our network can generate highly realistic outpainting prediction effectively and efficiently. Other than that, our method can generate new images with very long sizes while keeping the same style and semantic content as the given input. To test the effectiveness of the proposed architecture, we collect a new scenery dataset with diverse, complicated natural scenes. The experimental results on this dataset have demonstrated the efficacy of our proposed network. The code and dataset are available from https://github.com/z-x-yang/NS-Outpainting.