Source author record

Xiaolin Wu

Xiaolin Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Multimedia eess.SP Emerging Technologies Machine Learning

Catalog footprint

What is connected

16works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Data Acquisition and Preparation for Dual-reference Deep Learning of Image Super-Resolution

The performance of deep learning based image super-resolution (SR) methods depend on how accurately the paired low and high resolution images for training characterize the sampling process of real cameras. Low and high resolution (LR$\sim$HR) image pairs synthesized by degradation models (e.g., bicubic downsampling) deviate from those in reality; thus the synthetically-trained DCNN SR models work disappointingly when being applied to real-world images. To address this issue, we propose a novel data acquisition process to shoot a large set of LR$\sim$HR image pairs using real cameras. The images are displayed on an ultra-high quality screen and captured at different resolutions. The resulting LR$\sim$HR image pairs can be aligned at very high sub-pixel precision by a novel spatial-frequency dual-domain registration method, and hence they provide more appropriate training data for the learning task of super-resolution. Moreover, the captured HR image and the original digital image offer dual references to strengthen supervised learning. Experimental results show that training a super-resolution DCNN by our LR$\sim$HR dataset achieves higher image quality than training it by other datasets in the literature. Moreover, the proposed screen-capturing data collection process can be automated; it can be carried out for any target camera with ease and low cost, offering a practical way of tailoring the training of a DCNN SR model separately to each of the given cameras.

preprint2022arXiv

Deep Decoding of $\ell_\infty$-coded Light Field Images

To enrich the functionalities of traditional cameras, light field cameras record both the intensity and direction of light rays, so that images can be rendered with user-defined camera parameters via computations. The added capability and flexibility are gained at the cost of gathering typically more than $100\times$ greater amount of information than conventional images. To cope with this issue, several light field compression schemes have been introduced. However, their ways of exploiting correlations of multidimensional light field data are complex and are hence not suited for inexpensive light field cameras. In this work, we propose a novel $\ell_\infty$-constrained light-field image compression system that has a very low-complexity DPCM encoder and a CNN-based deep decoder. Targeting high-fidelity reconstruction, the CNN decoder capitalizes on the $\ell_\infty$-constraint and light field properties to remove the compression artifacts and achieves significantly better performance than existing state-of-the-art $\ell_2$-based light field compression methods.

preprint2022arXiv

Multi-modality Deep Restoration of Extremely Compressed Face Videos

Arguably the most common and salient object in daily video communications is the talking head, as encountered in social media, virtual classrooms, teleconferences, news broadcasting, talk shows, etc. When communication bandwidth is limited by network congestions or cost effectiveness, compression artifacts in talking head videos are inevitable. The resulting video quality degradation is highly visible and objectionable due to high acuity of human visual system to faces. To solve this problem, we develop a multi-modality deep convolutional neural network method for restoring face videos that are aggressively compressed. The main innovation is a new DCNN architecture that incorporates known priors of multiple modalities: the video-synchronized speech signal and semantic elements of the compression code stream, including motion vectors, code partition map and quantization parameters. These priors strongly correlate with the latent video and hence they are able to enhance the capability of deep learning to remove compression artifacts. Ample empirical evidences are presented to validate the superior performance of the proposed DCNN method on face videos over the existing state-of-the-art methods.

preprint2020arXiv

Adaptive Loss Function for Super Resolution Neural Networks Using Convex Optimization Techniques

Single Image Super-Resolution (SISR) task refers to learn a mapping from low-resolution images to the corresponding high-resolution ones. This task is known to be extremely difficult since it is an ill-posed problem. Recently, Convolutional Neural Networks (CNNs) have achieved state of the art performance on SISR. However, the images produced by CNNs do not contain fine details of the images. Generative Adversarial Networks (GANs) aim to solve this issue and recover sharp details. Nevertheless, GANs are notoriously difficult to train. Besides that, they generate artifacts in the high-resolution images. In this paper, we have proposed a method in which CNNs try to align images in different spaces rather than only the pixel space. Such a space is designed using convex optimization techniques. CNNs are encouraged to learn high-frequency components of the images as well as low-frequency components. We have shown that the proposed method can recover fine details of the images and it is stable in the training process.

preprint2020arXiv

Challenge of Spatial Cognition for Deep Learning

Given the success of the deep convolutional neural networks (DCNNs) in applications of visual recognition and classification, it would be tantalizing to test if DCNNs can also learn spatial concepts, such as straightness, convexity, left/right, front/back, relative size, aspect ratio, polygons, etc., from varied visual examples of these concepts that are simple and yet vital for spatial reasoning. Much to our dismay, extensive experiments of the type of cognitive psychology demonstrate that the data-driven deep learning (DL) cannot see through superficial variations in visual representations and grasp the spatial concept in abstraction. The root cause of failure turns out to be the learning methodology, not the computational model of the neural network itself. By incorporating task-specific convolutional kernels, we are able to construct DCNNs for spatial cognition tasks that can generalize to input images not drawn from the same distribution of the training set. This work raises a precaution that without manually-incorporated priors or features DCCNs may fail spatial cognitive tasks at rudimentary level.

preprint2020arXiv

Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos

We propose a novel deep multi-modality neural network for restoring very low bit rate videos of talking heads. Such video contents are very common in social media, teleconferencing, distance education, tele-medicine, etc., and often need to be transmitted with limited bandwidth. The proposed CNN method exploits the correlations among three modalities, video, audio and emotion state of the speaker, to remove the video compression artifacts caused by spatial down sampling and quantization. The deep learning approach turns out to be ideally suited for the video restoration task, as the complex non-linear cross-modality correlations are very difficult to model analytically and explicitly. The new method is a video post processor that can significantly boost the perceptual quality of aggressively compressed talking head videos, while being fully compatible with all existing video compression standards.

preprint2020arXiv

Lossless Compression of Mosaic Images with Convolutional Neural Network Prediction

We present a CNN-based predictive lossless compression scheme for raw color mosaic images of digital cameras. This specialized application problem was previously understudied but it is now becoming increasingly important, because modern CNN methods for image restoration tasks (e.g., superresolution, low lighting enhancement, deblurring), must operate on original raw mosaic images to obtain the best possible results. The key innovation of this paper is a high-order nonlinear CNN predictor of spatial-spectral mosaic patterns. The deep learning prediction can model highly complex sample dependencies in spatial-spectral mosaic images more accurately and hence remove statistical redundancies more thoroughly than existing image predictors. Experiments show that the proposed CNN predictor achieves unprecedented lossless compression performance on camera raw images.

preprint2020arXiv

Near-lossless $\ell_\infty$-constrained Image Decompression via Deep Neural Network

Recently a number of CNN-based techniques were proposed to remove image compression artifacts. As in other restoration applications, these techniques all learn a mapping from decompressed patches to the original counterparts under the ubiquitous $\ell_\infty$ metric. However, this approach is incapable of restoring distinctive image details which may be statistical outliers but have high semantic importance (e.g., tiny lesions in medical images). To overcome this weakness, we propose to incorporate an $\ell_\infty$ fidelity criterion in the design of neural network so that no small, distinctive structures of the original image can be dropped or distorted. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods in $\ell_\infty$ error metric and perceptual quality, while being competitive in $\ell_2$ error metric as well. It can restore subtle image details that are otherwise destroyed or missed by other algorithms. Our research suggests a new machine learning paradigm of ultra high fidelity image compression that is ideally suited for applications in medicine, space, and sciences.

preprint2020arXiv

Rapid Whole Slide Imaging via Learning-based Two-shot Virtual Autofocusing

Whole slide imaging (WSI) is an emerging technology for digital pathology. The process of autofocusing is the main influence of the performance of WSI. Traditional autofocusing methods either are time-consuming due to repetitive mechanical motions, or require additional hardware and thus are not compatible to current WSI systems. In this paper, we propose the concept of \textit{virtual autofocusing}, which does not rely on mechanical adjustment to conduct refocusing but instead recovers in-focus images in an offline learning-based manner. With the initial focal position, we only perform two-shot imaging, in contrast traditional methods commonly need to conduct as many as 21 times image shooting in each tile scanning. Considering that the two captured out-of-focus images retain pieces of partial information about the underlying in-focus image, we propose a U-Net-inspired deep neural network based approach for fusing them into a recovered in-focus image. The proposed scheme is fast in tissue slides scanning, enabling a high-throughput generation of digital pathology images. Experimental results demonstrate that our scheme achieves satisfactory refocusing performance.

preprint2020arXiv

Ultra High Fidelity Image Compression with $\ell_\infty$-constrained Encoding and Deep Decoding

In many professional fields, such as medicine, remote sensing and sciences, users often demand image compression methods to be mathematically lossless. But lossless image coding has a rather low compression ratio (around 2:1 for natural images). The only known technique to achieve significant compression while meeting the stringent fidelity requirements is the methodology of $\ell_\infty$-constrained coding that was developed and standardized in nineties. We make a major progress in $\ell_\infty$-constrained image coding after two decades, by developing a novel CNN-based soft $\ell_\infty$-constrained decoding method. The new method repairs compression defects by using a restoration CNN called the $\ell_\infty\mbox{-SDNet}$ to map a conventionally decoded image to the latent image. A unique strength of the $\ell_\infty\mbox{-SDNet}$ is its ability to enforce a tight error bound on a per pixel basis. As such, no small distinctive structures of the original image can be dropped or distorted, even if they are statistical outliers that are otherwise sacrificed by mainstream CNN restoration methods. More importantly, this research ushers in a new image compression system of $\ell_\infty$-constrained encoding and deep soft decoding ($\ell_\infty\mbox{-ED}^2$). The $\ell_\infty \mbox{-ED}^2$ approach beats the best of existing lossy image compression methods (e.g., BPG, WebP, etc.) not only in $\ell_\infty$ but also in $\ell_2$ error metric and perceptual quality, for bit rates near the threshold of perceptually transparent reconstruction. Operationally, the new compression system is practical, with a low-complexity real-time encoder and a cascade decoder consisting of a fast initial decoder and an optional CNN soft decoder.

preprint2016arXiv

Automated Inference on Sociopsychological Impressions of Attractive Female Faces

This article is a sequel to our earlier work [25]. The main objective of our research is to explore the potential of supervised machine learning in face-induced social computing and cognition, riding on the momentum of much heralded successes of face processing, analysis and recognition on the tasks of biometric-based identification. We present a case study of automated statistical inference on sociopsychological perceptions of female faces controlled for race, attractiveness, age and nationality. Our empirical evidences point to the possibility of training machine learning algorithms, using example face images characterized by internet users, to predict perceptions of personality traits and demeanors.

preprint2016arXiv

Quality Adaptive Low-Rank Based JPEG Decoding with Applications

Small compression noises, despite being transparent to human eyes, can adversely affect the results of many image restoration processes, if left unaccounted for. Especially, compression noises are highly detrimental to inverse operators of high-boosting (sharpening) nature, such as deblurring and superresolution against a convolution kernel. By incorporating the non-linear DCT quantization mechanism into the formulation for image restoration, we propose a new sparsity-based convex programming approach for joint compression noise removal and image restoration. Experimental results demonstrate significant performance gains of the new approach over existing image restoration methods.

preprint2016arXiv

Random Walk Graph Laplacian based Smoothness Prior for Soft Decoding of JPEG Images

Given the prevalence of JPEG compressed images, optimizing image reconstruction from the compressed format remains an important problem. Instead of simply reconstructing a pixel block from the centers of indexed DCT coefficient quantization bins (hard decoding), soft decoding reconstructs a block by selecting appropriate coefficient values within the indexed bins with the help of signal priors. The challenge thus lies in how to define suitable priors and apply them effectively. In this paper, we combine three image priors---Laplacian prior for DCT coefficients, sparsity prior and graph-signal smoothness prior for image patches---to construct an efficient JPEG soft decoding algorithm. Specifically, we first use the Laplacian prior to compute a minimum mean square error (MMSE) initial solution for each code block. Next, we show that while the sparsity prior can reduce block artifacts, limiting the size of the over-complete dictionary (to lower computation) would lead to poor recovery of high DCT frequencies. To alleviate this problem, we design a new graph-signal smoothness prior (desired signal has mainly low graph frequencies) based on the left eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared to previous graph-signal smoothness priors, LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of a piecewise smooth (PWS) signal via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.

preprint2013arXiv

Coded Acquisition of High Frame Rate Video

High frame video (HFV) is an important investigational tool in sciences, engineering and military. In ultra-high speed imaging, the obtainable temporal, spatial and spectral resolutions are limited by the sustainable throughput of in-camera mass memory, the lower bound of exposure time, and illumination conditions. In order to break these bottlenecks, we propose a new coded video acquisition framework that employs K > 2 conventional cameras, each of which makes random measurements of the 3D video signal in both temporal and spatial domains. For each of the K cameras, this multi-camera strategy greatly relaxes the stringent requirements in memory speed, shutter speed, and illumination strength. The recovery of HFV from these random measurements is posed and solved as a large scale l1 minimization problem by exploiting joint temporal and spatial sparsities of the 3D signal. Three coded video acquisition techniques of varied trade offs between performance and hardware complexity are developed: frame-wise coded acquisition, pixel-wise coded acquisition, and column-row-wise coded acquisition. The performances of these techniques are analyzed in relation to the sparsity of the underlying video signal. Simulations of these new HFV capture techniques are carried out and experimental results are reported.

preprint2012arXiv

Temporal Psychovisual Modulation: a new paradigm of information display

We report on a new paradigm of information display that greatly extends the utility and versatility of current optoelectronic displays. The main innovation is to let a display of high refresh rate optically broadcast so-called atom frames, which are designed through non-negative matrix factorization to form bases for a class of images, and different viewers perceive selfintended images by using display-synchronized viewing devices and their own human visual systems to fuse appropriately weighted atom frames. This work is essentially a scheme of temporal psychovisual modulation in visible spectrum, using an optoelectronic modulator coupled with a biological demodulator.

preprint2010arXiv

Image Deblurring and Super-resolution by Adaptive Sparse Domain Selection and Adaptive Regularization

As a powerful statistical image modeling technique, sparse representation has been successfully used in various image restoration applications. The success of sparse representation owes to the development of l1-norm optimization techniques, and the fact that natural images are intrinsically sparse in some domain. The image restoration quality largely depends on whether the employed sparse domain can represent well the underlying image. Considering that the contents can vary significantly across different images or different patches in a single image, we propose to learn various sets of bases from a pre-collected dataset of example image patches, and then for a given patch to be processed, one set of bases are adaptively selected to characterize the local sparse domain. We further introduce two adaptive regularization terms into the sparse representation framework. First, a set of autoregressive (AR) models are learned from the dataset of example image patches. The best fitted AR models to a given patch are adaptively selected to regularize the image local structures. Second, the image non-local self-similarity is introduced as another regularization term. In addition, the sparsity regularization parameter is adaptively estimated for better image restoration performance. Extensive experiments on image deblurring and super-resolution validate that by using adaptive sparse domain selection and adaptive regularization, the proposed method achieves much better results than many state-of-the-art algorithms in terms of both PSNR and visual perception.

Xiaolin Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Data Acquisition and Preparation for Dual-reference Deep Learning of Image Super-Resolution

Deep Decoding of $\ell_\infty$-coded Light Field Images

Multi-modality Deep Restoration of Extremely Compressed Face Videos

Adaptive Loss Function for Super Resolution Neural Networks Using Convex Optimization Techniques

Challenge of Spatial Cognition for Deep Learning

Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos

Lossless Compression of Mosaic Images with Convolutional Neural Network Prediction

Near-lossless $\ell_\infty$-constrained Image Decompression via Deep Neural Network

Rapid Whole Slide Imaging via Learning-based Two-shot Virtual Autofocusing

Ultra High Fidelity Image Compression with $\ell_\infty$-constrained Encoding and Deep Decoding

Automated Inference on Sociopsychological Impressions of Attractive Female Faces

Quality Adaptive Low-Rank Based JPEG Decoding with Applications

Random Walk Graph Laplacian based Smoothness Prior for Soft Decoding of JPEG Images

Coded Acquisition of High Frame Rate Video

Temporal Psychovisual Modulation: a new paradigm of information display

Image Deblurring and Super-resolution by Adaptive Sparse Domain Selection and Adaptive Regularization