Source author record

Xin Feng

Xin Feng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Biomolecules eess.IV Machine Learning

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Generative Memory-Guided Semantic Reasoning Model for Image Inpainting

Most existing methods for image inpainting focus on learning the intra-image priors from the known regions of the current input image to infer the content of the corrupted regions in the same image. While such methods perform well on images with small corrupted regions, it is challenging for these methods to deal with images with large corrupted area due to two potential limitations: 1) such methods tend to overfit each single training pair of images relying solely on the intra-image prior knowledge learned from the limited known area; 2) the inter-image prior knowledge about the general distribution patterns of visual semantics, which can be transferred across images sharing similar semantics, is not exploited. In this paper, we propose the Generative Memory-Guided Semantic Reasoning Model (GM-SRM), which not only learns the intra-image priors from the known regions, but also distills the inter-image reasoning priors to infer the content of the corrupted regions. In particular, the proposed GM-SRM first pre-learns a generative memory from the whole training data to capture the semantic distribution patterns in a global view. Then the learned memory are leveraged to retrieve the matching inter-image priors for the current corrupted image to perform semantic reasoning during image inpainting. While the intra-image priors are used for guaranteeing the pixel-level content consistency, the inter-image priors are favorable for performing high-level semantic reasoning, which is particularly effective for inferring semantic content for large corrupted area. Extensive experiments on Paris Street View, CelebA-HQ, and Places2 benchmarks demonstrate that our GM-SRM outperforms the state-of-the-art methods for image inpainting in terms of both the visual quality and quantitative metrics.

preprint2022arXiv

Learning Generalizable Latent Representations for Novel Degradations in Super Resolution

Typical methods for blind image super-resolution (SR) focus on dealing with unknown degradations by directly estimating them or learning the degradation representations in a latent space. A potential limitation of these methods is that they assume the unknown degradations can be simulated by the integration of various handcrafted degradations (e.g., bicubic downsampling), which is not necessarily true. The real-world degradations can be beyond the simulation scope by the handcrafted degradations, which are referred to as novel degradations. In this work, we propose to learn a latent representation space for degradations, which can be generalized from handcrafted (base) degradations to novel degradations. The obtained representations for a novel degradation in this latent space are then leveraged to generate degraded images consistent with the novel degradation to compose paired training data for SR model. Furthermore, we perform variational inference to match the posterior of degradations in latent representation space with a prior distribution (e.g., Gaussian distribution). Consequently, we are able to sample more high-quality representations for a novel degradation to augment the training data for SR model. We conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness and advantages of our method for blind super-resolution with novel degradations.

preprint2022arXiv

Learning Sequence Representations by Non-local Recurrent Neural Memory

The key challenge of sequence representation learning is to capture the long-range temporal dependencies. Typical methods for supervised sequence representation learning are built upon recurrent neural networks to capture temporal dependencies. One potential limitation of these methods is that they only model one-order information interactions explicitly between adjacent time steps in a sequence, hence the high-order interactions between nonadjacent time steps are not fully exploited. It greatly limits the capability of modeling the long-range temporal dependencies since the temporal features learned by one-order interactions cannot be maintained for a long term due to temporal information dilution and gradient vanishing. To tackle this limitation, we propose the Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning, which performs non-local operations \MR{by means of self-attention mechanism} to learn full-order interactions within a sliding temporal memory block and models global interactions between memory blocks in a gated recurrent manner. Consequently, our model is able to capture long-range dependencies. Besides, the latent high-level features contained in high-order interactions can be distilled by our model. We validate the effectiveness and generalization of our NRNM on three types of sequence applications across different modalities, including sequence classification, step-wise sequential prediction and sequence similarity learning. Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.

preprint2022arXiv

ViT-P: Rethinking Data-efficient Vision Transformers from Locality

Recent advances of Transformers have brought new trust to computer vision tasks. However, on small dataset, Transformers is hard to train and has lower performance than convolutional neural networks. We make vision transformers as data-efficient as convolutional neural networks by introducing multi-focal attention bias. Inspired by the attention distance in a well-trained ViT, we constrain the self-attention of ViT to have multi-scale localized receptive field. The size of receptive field is adaptable during training so that optimal configuration can be learned. We provide empirical evidence that proper constrain of receptive field can reduce the amount of training data for vision transformers. On Cifar100, our ViT-P Base model achieves the state-of-the-art accuracy (83.16%) trained from scratch. We also perform analysis on ImageNet to show our method does not lose accuracy on large data sets.

preprint2014arXiv

Persistent Homology for The Quantitative Prediction of Fullerene Stability

Persistent homology is a relatively new tool often used for \emph{qualitative} analysis of intrinsic topological features in images and data originated from scientific and engineering applications. In this paper, we report novel \emph{quantitative} predictions of the energy and stability of fullerene molecules, the very first attempt in employing persistent homology in this context. The ground-state structures of a series of small fullerene molecules are first investigated with the standard Vietoris-Rips complex. We decipher all the barcodes, including both short-lived local bars and long-lived global bars arising from topological invariants, and associate them with fullerene structural details. By using accumulated bar lengths, we build quantitative models to correlate local and global Betti-2 bars respectively with the heat of formation and total curvature energies of fullerenes. It is found that the heat of formation energy is related to the local hexagonal cavities of small fullerenes, while the total curvature energies of fullerene isomers are associated with their sphericities, which are measured by the lengths of their long-lived Betti-2 bars. Excellent correlation coefficients ($>0.94$) between persistent homology predictions and those of quantum or curvature analysis have been observed. A correlation matrix based filtration is introduced to further verify our findings.

Xin Feng

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Generative Memory-Guided Semantic Reasoning Model for Image Inpainting

Learning Generalizable Latent Representations for Novel Degradations in Super Resolution

Learning Sequence Representations by Non-local Recurrent Neural Memory

ViT-P: Rethinking Data-efficient Vision Transformers from Locality

Persistent Homology for The Quantitative Prediction of Fullerene Stability