Researcher profile

Weihao Xia

Weihao Xia contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Image quality assessment (IQA) algorithm aims to quantify the human perception of image quality. Unfortunately, there is a performance drop when assessing the distortion images generated by generative adversarial network (GAN) with seemingly realistic texture. In this work, we conjecture that this maladaptation lies in the backbone of IQA models, where patch-level prediction methods use independent image patches as input to calculate their scores separately, but lack spatial relationship modeling among image patches. Therefore, we propose an Attention-based Hybrid Image Quality Assessment Network (AHIQ) to deal with the challenge and get better performance on the GAN-based IQA task. Firstly, we adopt a two-branch architecture, including a vision transformer (ViT) branch and a convolutional neural network (CNN) branch for feature extraction. The hybrid architecture combines interaction information among image patches captured by ViT and local texture details from CNN. To make the features from shallow CNN more focused on the visually salient region, a deformable convolution is applied with the help of semantic information from the ViT branch. Finally, we use a patch-wise score prediction module to obtain the final score. The experiments show that our model outperforms the state-of-the-art methods on four standard IQA datasets and AHIQ ranked first on the Full Reference (FR) track of the NTIRE 2022 Perceptual Image Quality Assessment Challenge.

preprint2022arXiv

High-fidelity GAN Inversion with Padding Space

Inverting a Generative Adversarial Network (GAN) facilitates a wide range of image editing tasks using pre-trained generators. Existing methods typically employ the latent space of GANs as the inversion space yet observe the insufficient recovery of spatial details. In this work, we propose to involve the padding space of the generator to complement the latent space with spatial information. Concretely, we replace the constant padding (e.g., usually zeros) used in convolution layers with some instance-aware coefficients. In this way, the inductive bias assumed in the pre-trained model can be appropriately adapted to fit each individual image. Through learning a carefully designed encoder, we manage to improve the inversion quality both qualitatively and quantitatively, outperforming existing alternatives. We then demonstrate that such a space extension barely affects the native GAN manifold, hence we can still reuse the prior knowledge learned by GANs for various downstream applications. Beyond the editing tasks explored in prior arts, our approach allows a more flexible image manipulation, such as the separate control of face contour and facial details, and enables a novel editing manner where users can customize their own manipulations highly efficiently.

preprint2022arXiv

Identity-guided Face Generation with Multi-modal Contour Conditions

Recent face generation methods have tried to synthesize faces based on the given contour condition, like a low-resolution image or sketch. However, the problem of identity ambiguity remains unsolved, which usually occurs when the contour is too vague to provide reliable identity information (e.g., when its resolution is extremely low). Thus feasible solutions of image restoration could be infinite. In this work, we propose a novel framework that takes the contour and an extra image specifying the identity as the inputs, where the contour can be of various modalities, including the low-resolution image, sketch, and semantic label map. Concretely, we propose a novel dual-encoder architecture, in which an identity encoder extracts the identity-related feature, accompanied by a main encoder to obtain the rough contour information and further fuse all the information together. The encoder output is iteratively fed into a pre-trained StyleGAN generator until getting a satisfying result. To the best of our knowledge, this is the first work that achieves identity-guided face generation conditioned on multi-modal contour images. Moreover, our method can produce photo-realistic results with 1024$\times$1024 resolution.

preprint2022arXiv

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames and their masks as memory are helpful to segment target objects in videos. However, they mainly focus on better matching between the current frame and the memory frames without explicitly paying attention to the quality of the memory. Therefore, frames with poor segmentation masks are prone to be memorized, which leads to a segmentation mask error accumulation problem and further affect the segmentation performance. In addition, the linear increase of memory frames with the growth of frame number also limits the ability of the models to handle long videos. To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem. Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank to improve the practicability of the models. Without any bells and whistles, our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks. Moreover, extensive experiments demonstrate that the proposed Quality Assessment Module (QAM) can be applied to memory-based methods as generic plugins and significantly improves performance. Our source code is available at https://github.com/workforai/QDMN.

preprint2020arXiv

Cooperative Semantic Segmentation and Image Restoration in Adverse Environmental Conditions

Most state-of-the-art semantic segmentation approaches only achieve high accuracy in good conditions. In practically-common but less-discussed adverse environmental conditions, their performance can decrease enormously. Existing studies usually cast the handling of segmentation in adverse conditions as a separate post-processing step after signal restoration, making the segmentation performance largely depend on the quality of restoration. In this paper, we propose a novel deep-learning framework to tackle semantic segmentation and image restoration in adverse environmental conditions in a holistic manner. The proposed approach contains two components: Semantically-Guided Adaptation, which exploits semantic information from degraded images to refine the segmentation; and Exemplar-Guided Synthesis, which restores images from semantic label maps given degraded exemplars as the guidance. Our method cooperatively leverages the complementarity and interdependence of low-level restoration and high-level segmentation in adverse environmental conditions. Extensive experiments on various datasets demonstrate that our approach can not only improve the accuracy of semantic segmentation with degradation cues, but also boost the perceptual quality and structural similarity of image restoration with semantic guidance.

preprint2020arXiv

Domain Fingerprints for No-reference Image Quality Assessment

Human fingerprints are detailed and nearly unique markers of human identity. Such a unique and stable fingerprint is also left on each acquired image. It can reveal how an image was degraded during the image acquisition procedure and thus is closely related to the quality of an image. In this work, we propose a new no-reference image quality assessment (NR-IQA) approach called domain-aware IQA (DA-IQA), which for the first time introduces the concept of domain fingerprint to the NR-IQA field. The domain fingerprint of an image is learned from image collections of different degradations and then used as the unique characteristics to identify the degradation sources and assess the quality of the image. To this end, we design a new domain-aware architecture, which enables simultaneous determination of both the distortion sources and the quality of an image. With the distortion in an image better characterized, the image quality can be more accurately assessed, as verified by extensive experiments, which show that the proposed DA-IQA performs better than almost all the compared state-of-the-art NR-IQA methods.