Source author record

Guoping Qiu

Guoping Qiu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV eess.SP Graphics Machine Learning Computational Engineering, Finance, and Science Computational Geometry math.DG Multimedia Robotics

Catalog footprint

What is connected

17works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Restoration of User Videos Shared on Social Media

User videos shared on social media platforms usually suffer from degradations caused by unknown proprietary processing procedures, which means that their visual quality is poorer than that of the originals. This paper presents a new general video restoration framework for the restoration of user videos shared on social media platforms. In contrast to most deep learning-based video restoration methods that perform end-to-end mapping, where feature extraction is mostly treated as a black box, in the sense that what role a feature plays is often unknown, our new method, termed Video restOration through adapTive dEgradation Sensing (VOTES), introduces the concept of a degradation feature map (DFM) to explicitly guide the video restoration process. Specifically, for each video frame, we first adaptively estimate its DFM to extract features representing the difficulty of restoring its different regions. We then feed the DFM to a convolutional neural network (CNN) to compute hierarchical degradation features to modulate an end-to-end video restoration backbone network, such that more attention is paid explicitly to potentially more difficult to restore areas, which in turn leads to enhanced restoration performance. We will explain the design rationale of the VOTES framework and present extensive experimental results to show that the new VOTES method outperforms various state-of-the-art techniques both quantitatively and qualitatively. In addition, we contribute a large scale real-world database of user videos shared on different social media platforms. Codes and datasets are available at https://github.com/luohongming/VOTES.git

preprint2021arXiv

A Discrete Scheme for Computing Image's Weighted Gaussian Curvature

Weighted Gaussian Curvature is an important measurement for images. However, its conventional computation scheme has low performance, low accuracy and requires that the input image must be second order differentiable. To tackle these three issues, we propose a novel discrete computation scheme for the weighted Gaussian curvature. Our scheme does not require the second order differentiability. Moreover, our scheme is more accurate, has smaller support region and computationally more efficient than the conventional schemes. Therefore, our scheme holds promise for a large range of applications where the weighted Gaussian curvature is needed, for example, image smoothing, cartoon texture decomposition, optical flow estimation, etc.

preprint2021arXiv

Gaussian Curvature Filter on 3D Meshes

Minimizing the Gaussian curvature of meshes can play a fundamental role in 3D mesh processing. However, there is a lack of computationally efficient and robust Gaussian curvature optimization method. In this paper, we present a simple yet effective method that can efficiently reduce Gaussian curvature for 3D meshes. We first present the mathematical foundation of our method. Then, we introduce a simple and robust implicit Gaussian curvature optimization method named Gaussian Curvature Filter (GCF). GCF implicitly minimizes Gaussian curvature without the need to explicitly calculate the Gaussian curvature itself. GCF is highly efficient and this method can be used in a large range of applications that involve Gaussian curvature. We conduct extensive experiments to demonstrate that GCF significantly outperforms state-of-the-art methods in minimizing Gaussian curvature, and geometric feature preserving soothing on 3D meshes. GCF program is available at https://github.com/tangwenming/GCF-filter.

preprint2021arXiv

Quarter Laplacian Filter for Edge Aware Image Processing

This paper presents a quarter Laplacian filter that can preserve corners and edges during image smoothing. Its support region is $2\times2$, which is smaller than the $3\times3$ support region of Laplacian filter. Thus, it is more local. Moreover, this filter can be implemented via the classical box filter, leading to high performance for real time applications. Finally, we show its edge preserving property in several image processing tasks, including image smoothing, texture enhancement, and low-light image enhancement. The proposed filter can be adopted in a wide range of image processing applications.

preprint2021arXiv

VHS to HDTV Video Translation using Multi-task Adversarial Learning

There are large amount of valuable video archives in Video Home System (VHS) format. However, due to the analog nature, their quality is often poor. Compared to High-definition television (HDTV), VHS video not only has a dull color appearance but also has a lower resolution and often appears blurry. In this paper, we focus on the problem of translating VHS video to HDTV video and have developed a solution based on a novel unsupervised multi-task adversarial learning model. Inspired by the success of generative adversarial network (GAN) and CycleGAN, we employ cycle consistency loss, adversarial loss and perceptual loss together to learn a translation model. An important innovation of our work is the incorporation of super-resolution model and color transfer model that can solve unsupervised multi-task problem. To our knowledge, this is the first work that dedicated to the study of the relation between VHS and HDTV and the first computational solution to translate VHS to HDTV. We present experimental results to demonstrate the effectiveness of our solution qualitatively and quantitatively.

preprint2020arXiv

Class-Aware Domain Adaptation for Improving Adversarial Robustness

Recent works have demonstrated convolutional neural networks are vulnerable to adversarial examples, i.e., inputs to machine learning models that an attacker has intentionally designed to cause the models to make a mistake. To improve the adversarial robustness of neural networks, adversarial training has been proposed to train networks by injecting adversarial examples into the training data. However, adversarial training could overfit to a specific type of adversarial attack and also lead to standard accuracy drop on clean images. To this end, we propose a novel Class-Aware Domain Adaptation (CADA) method for adversarial defense without directly applying adversarial training. Specifically, we propose to learn domain-invariant features for adversarial examples and clean images via a domain discriminator. Furthermore, we introduce a class-aware component into the discriminator to increase the discriminative power of the network for adversarial examples. We evaluate our newly proposed approach using multiple benchmark datasets. The results demonstrate that our method can significantly improve the state-of-the-art of adversarial robustness for various attacks and maintain high performances on clean images.

preprint2020arXiv

HLO: Half-kernel Laplacian Operator for Surface Smoothing

This paper presents a simple yet effective method for feature-preserving surface smoothing. Through analyzing the differential property of surfaces, we show that the conventional discrete Laplacian operator with uniform weights is not applicable to feature points at which the surface is non-differentiable and the second order derivatives do not exist. To overcome this difficulty, we propose a Half-kernel Laplacian Operator (HLO) as an alternative to the conventional Laplacian. Given a vertex v, HLO first finds all pairs of its neighboring vertices and divides each pair into two subsets (called half windows); then computes the uniform Laplacians of all such subsets and subsequently projects the computed Laplacians to the full-window uniform Laplacian to alleviate flipping and degeneration. The half window with least regularization energy is then chosen for v. We develop an iterative approach to apply HLO for surface denoising. Our method is conceptually simple and easy to use because it has a single parameter, i.e., the number of iterations for updating vertices. We show that our method can preserve features better than the popular uniform Laplacian-based denoising and it significantly alleviates the shrinkage artifact. Extensive experimental results demonstrate that HLO is better than or comparable to state-of-the-art techniques both qualitatively and quantitatively and that it is particularly good at handling meshes with high noise. We will make our source code publicly available.

preprint2020arXiv

MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation

Predicting depth from a single image is an attractive research topic since it provides one more dimension of information to enable machines to better perceive the world. Recently, deep learning has emerged as an effective approach to monocular depth estimation. As obtaining labeled data is costly, there is a recent trend to move from supervised learning to unsupervised learning to obtain monocular depth. However, most unsupervised learning methods capable of achieving high depth prediction accuracy will require a deep network architecture which will be too heavy and complex to run on embedded devices with limited storage and memory spaces. To address this issue, we propose a new powerful network with a recurrent module to achieve the capability of a deep network while at the same time maintaining an extremely lightweight size for real-time high performance unsupervised monocular depth prediction from video sequences. Besides, a novel efficient upsample block is proposed to fuse the features from the associated encoder layer and recover the spatial size of features with the small number of model parameters. We validate the effectiveness of our approach via extensive experiments on the KITTI dataset. Our new model can run at a speed of about 110 frames per second (fps) on a single GPU, 37 fps on a single CPU, and 2 fps on a Raspberry Pi 3. Moreover, it achieves higher depth accuracy with nearly 33 times fewer model parameters than state-of-the-art models. To the best of our knowledge, this work is the first extremely lightweight neural network trained on monocular video sequences for real-time unsupervised monocular depth estimation, which opens up the possibility of implementing deep learning-based real-time unsupervised monocular depth prediction on low-cost embedded devices.

preprint2020arXiv

PoseGAN: A Pose-to-Image Translation Framework for Camera Localization

Camera localization is a fundamental requirement in robotics and computer vision. This paper introduces a pose-to-image translation framework to tackle the camera localization problem. We present PoseGANs, a conditional generative adversarial networks (cGANs) based framework for the implementation of pose-to-image translation. PoseGANs feature a number of innovations including a distance metric based conditional discriminator to conduct camera localization and a pose estimation technique for generated camera images as a stronger constraint to improve camera localization performance. Compared with learning-based regression methods such as PoseNet, PoseGANs can achieve better performance with model sizes that are 70% smaller. In addition, PoseGANs introduce the view synthesis technique to establish the correspondence between the 2D images and the scene, \textit{i.e.}, given a pose, PoseGANs are able to synthesize its corresponding camera images. Furthermore, we demonstrate that PoseGANs differ in principle from structure-based localization and learning-based regressions for camera localization, and show that PoseGANs exploit the geometric structures to accomplish the camera localization task, and is therefore more stable than and superior to learning-based regressions which rely on local texture features instead. In addition to camera localization and view synthesis, we also demonstrate that PoseGANs can be successfully used for other interesting applications such as moving object elimination and frame interpolation in video sequences.

preprint2016arXiv

Automatic Visual Theme Discovery from Joint Image and Text Corpora

A popular approach to semantic image understanding is to manually tag images with keywords and then learn a mapping from vi- sual features to keywords. Manually tagging images is a subjective pro- cess and the same or very similar visual contents are often tagged with different keywords. Furthermore, not all tags have the same descriptive power for visual contents and large vocabulary available from natural language could result in a very diverse set of keywords. In this paper, we propose an unsupervised visual theme discovery framework as a better (more compact, efficient and effective) alternative to semantic represen- tation of visual contents. We first show that tag based annotation lacks consistency and compactness for describing visually similar contents. We then learn the visual similarity between tags based on the visual features of the images containing the tags. At the same time, we use a natural language processing technique (word embedding) to measure the seman- tic similarity between tags. Finally, we cluster tags into visual themes based on their visual similarity and semantic similarity measures using a spectral clustering algorithm. We conduct user studies to evaluate the effectiveness and rationality of the visual themes discovered by our unsu- pervised algorithm and obtains promising result. We then design three common computer vision tasks, example based image search, keyword based image search and image labelling to explore potential applica- tion of our visual themes discovery framework. In experiments, visual themes significantly outperforms tags on semantic image understand- ing and achieve state-of-art performance in all three tasks. This again demonstrate the effectiveness and versatility of proposed framework.

preprint2016arXiv

Object Specific Deep Learning Feature and Its Application to Face Detection

We present a method for discovering and exploiting object specific deep learning features and use face detection as a case study. Motivated by the observation that certain convolutional channels of a Convolutional Neural Network (CNN) exhibit object specific responses, we seek to discover and exploit the convolutional channels of a CNN in which neurons are activated by the presence of specific objects in the input image. A method for explicitly fine-tuning a pre-trained CNN to induce an object specific channel (OSC) and systematically identifying it for the human face object has been developed. Based on the basic OSC features, we introduce a multi-resolution approach to constructing robust face heatmaps for fast face detection in unconstrained settings. We show that multi-resolution OSC can be used to develop state of the art face detectors which have the advantage of being simple and compact.

preprint2013arXiv

Fast non parametric entropy estimation for spatial-temporal saliency method

This paper formulates bottom-up visual saliency as center surround conditional entropy and presents a fast and efficient technique for the computation of such a saliency map. It is shown that the new saliency formulation is consistent with self-information based saliency, decision-theoretic saliency and Bayesian definition of surprises but also faces the same significant computational challenge of estimating probability density in very high dimensional spaces with limited samples. We have developed a fast and efficient nonparametric method to make the practical implementation of these types of saliency maps possible. By aligning pixels from the center and surround regions and treating their location coordinates as random variables, we use a k-d partitioning method to efficiently estimating the center surround conditional entropy. We present experimental results on two publicly available eye tracking still image databases and show that the new technique is competitive with state of the art bottom-up saliency computational methods. We have also extended the technique to compute spatiotemporal visual saliency of video and evaluate the bottom-up spatiotemporal saliency against eye tracking data on a video taken onboard a moving vehicle with the driver's eye being tracked by a head mounted eye-tracker.

preprint2013arXiv

Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling

The bottom-up saliency, an early stage of humans' visual attention, can be considered as a binary classification problem between centre and surround classes. Discriminant power of features for the classification is measured as mutual information between distributions of image features and corresponding classes . As the estimated discrepancy very much depends on considered scale level, multi-scale structure and discriminant power are integrated by employing discrete wavelet features and Hidden Markov Tree (HMT). With wavelet coefficients and Hidden Markov Tree parameters, quad-tree like label structures are constructed and utilized in maximum a posterior probability (MAP) of hidden class variables at corresponding dyadic sub-squares. Then, a saliency value for each square block at each scale level is computed with discriminant power principle. Finally, across multiple scales is integrated the final saliency map by an information maximization rule. Both standard quantitative tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed multi-scale discriminant saliency (MDIS) method against the well-know information based approach AIM on its released image collection with eye-tracking data. Simulation results are presented and analysed to verify the validity of MDIS as well as point out its limitation for further research direction.

preprint2013arXiv

Multi-scale Visual Attention & Saliency Modelling with Decision Theory

Bottom-up saliency, an early human visual processing, behaves like binary classification of interest and null hypothesis. Its discriminant power, mutual information of image features and class distribution, is closely related to saliency value by the well-known centre-surround theory. As classification accuracy very much depends on window sizes, the discriminant saliency (power) varies according to sampling scales. Discriminating power estimation in multi-scales framework needs integrating with wavelet transformation and then estimating statistical discrepancy of two consecutive scales (centre-surround windows) by Hidden Markov Tree (HMT) model. Finally, multi-scale discriminant saliency (MDIS) maps are combined by the maximum information rule to synthesize a final saliency map. All MDIS maps are evaluated with standard quantitative tools (NSS,LCC,AUC) on N.Bruce's database with ground truth data as eye-tracking locations ; as well assessed qualitatively by visual examination of individual cases. For evaluating MDIS against well-known AIM saliency method, simulations are needed and described in details with several interesting conclusions, drawn for further research directions.

preprint2013arXiv

Multiscale Discriminant Saliency for Visual Attention

The bottom-up saliency, an early stage of humans' visual attention, can be considered as a binary classification problem between center and surround classes. Discriminant power of features for the classification is measured as mutual information between features and two classes distribution. The estimated discrepancy of two feature classes very much depends on considered scale levels; then, multi-scale structure and discriminant power are integrated by employing discrete wavelet features and Hidden markov tree (HMT). With wavelet coefficients and Hidden Markov Tree parameters, quad-tree like label structures are constructed and utilized in maximum a posterior probability (MAP) of hidden class variables at corresponding dyadic sub-squares. Then, saliency value for each dyadic square at each scale level is computed with discriminant power principle and the MAP. Finally, across multiple scales is integrated the final saliency map by an information maximization rule. Both standard quantitative tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed multiscale discriminant saliency method (MDIS) against the well-know information-based saliency method AIM on its Bruce Database wity eye-tracking data. Simulation results are presented and analyzed to verify the validity of MDIS as well as point out its disadvantages for further research direction.

preprint2013arXiv

Supervised Learning and Anti-learning of Colorectal Cancer Classes and Survival Rates from Cellular Biology Parameters

In this paper, we describe a dataset relating to cellular and physical conditions of patients who are operated upon to remove colorectal tumours. This data provides a unique insight into immunological status at the point of tumour removal, tumour classification and post-operative survival. Attempts are made to learn relationships between attributes (physical and immunological) and the resulting tumour stage and survival. Results for conventional machine learning approaches can be considered poor, especially for predicting tumour stages for the most important types of cancer. This poor performance is further investigated and compared with a synthetic, dataset based on the logical exclusive-OR function and it is shown that there is a significant level of 'anti-learning' present in all supervised methods used and this can be explained by the highly dimensional, complex and sparsely representative dataset. For predicting the stage of cancer from the immunological attributes, anti-learning approaches outperform a range of popular algorithms.

preprint2013arXiv

Wavelet-based Scale Saliency

Both pixel-based scale saliency (PSS) and basis project methods focus on multiscale analysis of data content and structure. Their theoretical relations and practical combination are previously discussed. However, no models have ever been proposed for calculating scale saliency on basis-projected descriptors since then. This paper extend those ideas into mathematical models and implement them in the wavelet-based scale saliency (WSS). While PSS uses pixel-value descriptors, WSS treats wavelet sub-bands as basis descriptors. The paper discusses different wavelet descriptors: discrete wavelet transform (DWT), wavelet packet transform (DWPT), quaternion wavelet transform (QWT) and best basis quaternion wavelet packet transform (QWPTBB). WSS saliency maps of different descriptors are generated and compared against other saliency methods by both quantitative and quanlitative methods. Quantitative results, ROC curves, AUC values and NSS values are collected from simulations on Bruce and Kootstra image databases with human eye-tracking data as ground-truth. Furthermore, qualitative visual results of saliency maps are analyzed and compared against each other as well as eye-tracking data inclusive in the databases.

Guoping Qiu

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Restoration of User Videos Shared on Social Media

A Discrete Scheme for Computing Image's Weighted Gaussian Curvature

Gaussian Curvature Filter on 3D Meshes

Quarter Laplacian Filter for Edge Aware Image Processing

VHS to HDTV Video Translation using Multi-task Adversarial Learning

Class-Aware Domain Adaptation for Improving Adversarial Robustness

HLO: Half-kernel Laplacian Operator for Surface Smoothing

MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation

PoseGAN: A Pose-to-Image Translation Framework for Camera Localization

Automatic Visual Theme Discovery from Joint Image and Text Corpora

Object Specific Deep Learning Feature and Its Application to Face Detection

Fast non parametric entropy estimation for spatial-temporal saliency method

Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling

Multi-scale Visual Attention & Saliency Modelling with Decision Theory

Multiscale Discriminant Saliency for Visual Attention

Supervised Learning and Anti-learning of Colorectal Cancer Classes and Survival Rates from Cellular Biology Parameters

Wavelet-based Scale Saliency