Source author record

Yongwei Nie

Yongwei Nie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision

Catalog footprint

What is connected

4works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an Auxiliary Space

Diverse human motion prediction aims at predicting multiple possible future pose sequences from a sequence of observed poses. Previous approaches usually employ deep generative networks to model the conditional distribution of data, and then randomly sample outcomes from the distribution. While different results can be obtained, they are usually the most likely ones which are not diverse enough. Recent work explicitly learns multiple modes of the conditional distribution via a deterministic network, which however can only cover a fixed number of modes within a limited range. In this paper, we propose a novel sampling strategy for sampling very diverse results from an imbalanced multimodal distribution learned by a deep generative model. Our method works by generating an auxiliary space and smartly making randomly sampling from the auxiliary space equivalent to the diverse sampling from the target distribution. We propose a simple yet effective network architecture that implements this novel sampling strategy, which incorporates a Gumbel-Softmax coefficient matrix sampling method and an aggressive diversity promoting hinge loss function. Extensive experiments demonstrate that our method significantly improves both the diversity and accuracy of the samplings compared with previous state-of-the-art sampling approaches. Code and pre-trained models are available at https://github.com/Droliven/diverse_sampling.

preprint2022arXiv

Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction

This paper presents a high-quality human motion prediction method that accurately predicts future human poses given observed ones. Our method is based on the observation that a good initial guess of the future poses is very helpful in improving the forecasting accuracy. This motivates us to propose a novel two-stage prediction framework, including an init-prediction network that just computes the good guess and then a formal-prediction network that predicts the target future poses based on the guess. More importantly, we extend this idea further and design a multi-stage prediction framework where each stage predicts initial guess for the next stage, which brings more performance gain. To fulfill the prediction task at each stage, we propose a network comprising Spatial Dense Graph Convolutional Networks (S-DGCN) and Temporal Dense Graph Convolutional Networks (T-DGCN). Alternatively executing the two networks helps extract spatiotemporal features over the global receptive field of the whole pose sequence. All the above design choices cooperating together make our method outperform previous approaches by large margins: 6%-7% on Human3.6M, 5%-10% on CMU-MoCap, and 13%-16% on 3DPW.

preprint2020arXiv

Enhancing Underexposed Photos using Perceptually Bidirectional Similarity

Although remarkable progress has been made, existing methods for enhancing underexposed photos tend to produce visually unpleasing results due to the existence of visual artifacts (e.g., color distortion, loss of details and uneven exposure). We observed that this is because they fail to ensure the perceptual consistency of visual information between the source underexposed image and its enhanced output. To obtain high-quality results free of these artifacts, we present a novel underexposed photo enhancement approach that is able to maintain the perceptual consistency. We achieve this by proposing an effective criterion, referred to as perceptually bidirectional similarity, which explicitly describes how to ensure the perceptual consistency. Particularly, we adopt the Retinex theory and cast the enhancement problem as a constrained illumination estimation optimization, where we formulate perceptually bidirectional similarity as constraints on illumination and solve for the illumination which can recover the desired artifact-free enhancement results. In addition, we describe a video enhancement framework that adopts the presented illumination estimation for handling underexposed videos. To this end, a probabilistic approach is introduced to propagate illuminations of sampled keyframes to the entire video by tackling a Bayesian Maximum A Posteriori problem. Extensive experiments demonstrate the superiority of our method over the state-of-the-art methods.

preprint2020arXiv

Understanding More about Human and Machine Attention in Deep Neural Networks

Human visual system can selectively attend to parts of a scene for quick perception, a biological mechanism known as Human attention. Inspired by this, recent deep learning models encode attention mechanisms to focus on the most task-relevant parts of the input signal for further processing, which is called Machine/Neural/Artificial attention. Understanding the relation between human and machine attention is important for interpreting and designing neural networks. Many works claim that the attention mechanism offers an extra dimension of interpretability by explaining where the neural networks look. However, recent studies demonstrate that artificial attention maps do not always coincide with common intuition. In view of these conflicting evidence, here we make a systematic study on using artificial attention and human attention in neural network design. With three example computer vision tasks, diverse representative backbones, and famous architectures, corresponding real human gaze data, and systematically conducted large-scale quantitative studies, we quantify the consistency between artificial attention and human visual attention and offer novel insights into existing artificial attention mechanisms by giving preliminary answers to several key questions related to human and artificial attention mechanisms. Overall results demonstrate that human attention can benchmark the meaningful `ground-truth' in attention-driven tasks, where the more the artificial attention is close to human attention, the better the performance; for higher-level vision tasks, it is case-by-case. It would be advisable for attention-driven tasks to explicitly force a better alignment between artificial and human attention to boost the performance; such alignment would also improve the network explainability for higher-level computer vision tasks.

Yongwei Nie

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an Auxiliary Space

Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction

Enhancing Underexposed Photos using Perceptually Bidirectional Similarity

Understanding More about Human and Machine Attention in Deep Neural Networks