Source author record

Yunde Jia

Yunde Jia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Human-Computer Interaction Robotics

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set

Recently, deep learning based 3D face reconstruction methods have shown promising results in both quality and efficiency.However, training deep neural networks typically requires a large volume of data, whereas face images with ground-truth 3D face shapes are scarce. In this paper, we propose a novel deep 3D face reconstruction approach that 1) leverages a robust, hybrid loss function for weakly-supervised learning which takes into account both low-level and perception-level information for supervision, and 2) performs multi-image face reconstruction by exploiting complementary information from different images for shape aggregation. Our method is fast, accurate, and robust to occlusion and large pose. We provide comprehensive experiments on three datasets, systematically comparing our method with fifteen recent methods and demonstrating its state-of-the-art performance.

preprint2020arXiv

Content-Aware Inter-Scale Cost Aggregation for Stereo Matching

Cost aggregation is a key component of stereo matching for high-quality depth estimation. Most methods use multi-scale processing to downsample cost volume for proper context information, but will cause loss of details when upsampling. In this paper, we present a content-aware inter-scale cost aggregation method that adaptively aggregates and upsamples the cost volume from coarse-scale to fine-scale by learning dynamic filter weights according to the content of the left and right views on the two scales. Our method achieves reliable detail recovery when upsampling through the aggregation of information across different scales. Furthermore, a novel decomposition strategy is proposed to efficiently construct the 3D filter weights and aggregate the 3D cost volume, which greatly reduces the computation cost. We first learn the 2D similarities via the feature maps on the two scales, and then build the 3D filter weights based on the 2D similarities from the left and right views. After that, we split the aggregation in a full 3D spatial-disparity space into the aggregation in 1D disparity space and 2D spatial space. Experiment results on Scene Flow dataset, KITTI2015 and Middlebury demonstrate the effectiveness of our method.

preprint2020arXiv

Deep 3D Portrait from a Single Image

In this paper, we present a learning-based approach for recovering the 3D geometry of human head from a single portrait image. Our method is learned in an unsupervised manner without any ground-truth 3D data. We represent the head geometry with a parametric 3D face model together with a depth map for other head regions including hair and ear. A two-step geometry learning scheme is proposed to learn 3D head reconstruction from in-the-wild face images, where we first learn face shape on single images using self-reconstruction and then learn hair and ear geometry using pairs of images in a stereo-matching fashion. The second step is based on the output of the first to not only improve the accuracy but also ensure the consistency of overall head geometry. We evaluate the accuracy of our method both in 3D and with pose manipulation tasks on 2D images. We alter pose based on the recovered geometry and apply a refinement network trained with adversarial learning to ameliorate the reprojected images and translate them to the real image domain. Extensive evaluations and comparison with previous methods show that our new method can produce high-fidelity 3D head geometry and head pose manipulation results.

preprint2020arXiv

Video Captioning Using Weak Annotation

Video captioning has shown impressive progress in recent years. One key reason of the performance improvements made by existing methods lie in massive paired video-sentence data, but collecting such strong annotation, i.e., high-quality sentences, is time-consuming and laborious. It is the fact that there now exist an amazing number of videos with weak annotation that only contains semantic concepts such as actions and objects. In this paper, we investigate using weak annotation instead of strong annotation to train a video captioning model. To this end, we propose a progressive visual reasoning method that progressively generates fine sentences from weak annotations by inferring more semantic concepts and their dependency relationships for video captioning. To model concept relationships, we use dependency trees that are spanned by exploiting external knowledge from large sentence corpora. Through traversing the dependency trees, the sentences are generated to train the captioning model. Accordingly, we develop an iterative refinement algorithm that refines sentences via spanning dependency trees and fine-tunes the captioning model using the refined sentences in an alternative training manner. Experimental results demonstrate that our method using weak annotation is very competitive to the state-of-the-art methods using strong annotation.

preprint2016arXiv

A Low-Cost Tele-Presence Wheelchair System

This paper presents the architecture and implementation of a tele-presence wheelchair system based on tele-presence robot, intelligent wheelchair, and touch screen technologies. The tele-presence wheelchair system consists of a commercial electric wheelchair, an add-on tele-presence interaction module, and a touchable live video image based user interface (called TIUI). The tele-presence interaction module is used to provide video-chatting for an elderly or disabled person with the family members or caregivers, and also captures the live video of an environment for tele-operation and semi-autonomous navigation. The user interface developed in our lab allows an operator to access the system anywhere and directly touch the live video image of the wheelchair to push it as if he/she did it in the presence. This paper also discusses the evaluation of the user experience.

preprint2016arXiv

Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration

The Iterative Closest Point (ICP) algorithm is one of the most widely used methods for point-set registration. However, being based on local iterative optimization, ICP is known to be susceptible to local minima. Its performance critically relies on the quality of the initialization and only local optimality is guaranteed. This paper presents the first globally optimal algorithm, named Go-ICP, for Euclidean (rigid) registration of two 3D point-sets under the L2 error metric defined in ICP. The Go-ICP method is based on a branch-and-bound (BnB) scheme that searches the entire 3D motion space SE(3). By exploiting the special structure of SE(3) geometry, we derive novel upper and lower bounds for the registration error function. Local ICP is integrated into the BnB scheme, which speeds up the new method while guaranteeing global optimality. We also discuss extensions, addressing the issue of outlier robustness. The evaluation demonstrates that the proposed method is able to produce reliable registration results regardless of the initialization. Go-ICP can be applied in scenarios where an optimal solution is desirable or where a good initialization is not always available.

preprint2016arXiv

Telepresence Interaction by Touching Live Video Images

This paper presents a telepresence interaction framework based on touchscreen and telepresence-robot technologies. The core of the framework is a new user interface, Touchable live video Image based User Interface, called TIUI. The TIUI allows a remote operator to not just drive the telepresence robot but operate and interact with real objects by touching their live video images on a pad with finger touch gestures. We implemented a telepresence interaction system which is composed of a telepresence robot and tele-interactive objects located in a local space, the TIUI of a pad located in a remote space, and the wireless networks connecting the two spaces. Our system can be a perfect embodiment of a remote operator to do most of daily living tasks, such as opening a door, drawing a curtain, pushing a wheelchair, and other like tasks. The evaluation and demonstration results show the effectiveness and promising applications of our system.

Yunde Jia

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set

Content-Aware Inter-Scale Cost Aggregation for Stereo Matching

Deep 3D Portrait from a Single Image

Video Captioning Using Weak Annotation

A Low-Cost Tele-Presence Wheelchair System

Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration

Telepresence Interaction by Touching Live Video Images