Source author record

Ming-Ting Sun

Ming-Ting Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Multimedia Artificial Intelligence Computation and Language Machine Learning

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation

Supervised deep learning with pixel-wise training labels has great successes on multi-person part segmentation. However, data labeling at pixel-level is very expensive. To solve the problem, people have been exploring to use synthetic data to avoid the data labeling. Although it is easy to generate labels for synthetic data, the results are much worse compared to those using real data and manual labeling. The degradation of the performance is mainly due to the domain gap, i.e., the discrepancy of the pixel value statistics between real and synthetic data. In this paper, we observe that real and synthetic humans both have a skeleton (pose) representation. We found that the skeletons can effectively bridge the synthetic and real domains during the training. Our proposed approach takes advantage of the rich and realistic variations of the real data and the easily obtainable labels of the synthetic data to learn multi-person part segmentation on real images without any human-annotated labels. Through experiments, we show that without any human labeling, our method performs comparably to several state-of-the-art approaches which require human labeling on Pascal-Person-Parts and COCO-DensePose datasets. On the other hand, if part labels are also available in the real-images during training, our method outperforms the supervised state-of-the-art methods by a large margin. We further demonstrate the generalizability of our method on predicting novel keypoints in real images where no real data labels are available for the novel keypoints detection. Code and pre-trained models are available at https://github.com/kevinlin311tw/CDCL-human-part-segmentation

preprint2020arXiv

Learning Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes

Nonparametric approaches have shown promising results on reconstructing 3D human mesh from a single monocular image. Unlike previous approaches that use a parametric human model like skinned multi-person linear model (SMPL), and attempt to regress the model parameters, nonparametric approaches relax the heavy reliance on the parametric space. However, existing nonparametric methods require ground truth meshes as their regression target for each vertex, and obtaining ground truth mesh labels is very expensive. In this paper, we propose a novel approach to learn human mesh reconstruction without any ground truth meshes. This is made possible by introducing two new terms into the loss function of a graph convolutional neural network (Graph CNN). The first term is the Laplacian prior that acts as a regularizer on the reconstructed mesh. The second term is the part segmentation loss that forces the projected region of the reconstructed mesh to match the part segmentation. Experimental results on multiple public datasets show that without using 3D ground truth meshes, the proposed approach outperforms the previous state-of-the-art approaches that require ground truth meshes for training.

preprint2020arXiv

Learning to Generate Multiple Style Transfer Outputs for an Input Sentence

Text style transfer refers to the task of rephrasing a given text in a different style. While various methods have been proposed to advance the state of the art, they often assume the transfer output follows a delta distribution, and thus their models cannot generate different style transfer results for a given input text. To address the limitation, we propose a one-to-many text style transfer framework. In contrast to prior works that learn a one-to-one mapping that converts an input sentence to one output sentence, our approach learns a one-to-many mapping that can convert an input sentence to multiple different output sentences, while preserving the input content. This is achieved by applying adversarial training with a latent decomposition scheme. Specifically, we decompose the latent representation of the input sentence to a style code that captures the language style variation and a content code that encodes the language style-independent content. We then combine the content code with the style code for generating a style transfer output. By combining the same content code with a different style code, we generate a different style transfer output. Extensive experimental results with comparisons to several text style transfer approaches on multiple public datasets using a diverse set of performance metrics validate effectiveness of the proposed approach.

preprint2016arXiv

Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.

preprint2015arXiv

A Computation Control Motion Estimation Method for Complexity-Scalable Video Coding

In this paper, a new Computation-Control Motion Estimation (CCME) method is proposed which can perform Motion Estimation (ME) adaptively under different computation or power budgets while keeping high coding performance. We first propose a new class-based method to measure the Macroblock (MB) importance where MBs are classified into different classes and their importance is measured by combining their class information as well as their initial matching cost information. Based on the new MB importance measure, a complete CCME framework is then proposed to allocate computation for ME. The proposed method performs ME in a one-pass flow. Experimental results demonstrate that the proposed method can allocate computation more accurately than previous methods and thus has better performance under the same computation budget.

preprint2015arXiv

A Fast Sub-Pixel Motion Estimation Algorithm for H.264/AVC Video Coding

Motion Estimation (ME) is one of the most time-consuming parts in video coding. The use of multiple partition sizes in H.264/AVC makes it even more complicated when compared to ME in conventional video coding standards. It is important to develop fast and effective sub-pixel ME algorithms since (a) The computation overhead by sub-pixel ME has become relatively significant while the complexity of integer-pixel search has been greatly reduced by fast algorithms, and (b) Reducing sub-pixel search points can greatly save the computation for sub-pixel interpolation. In this paper, a novel fast sub-pixel ME algorithm is proposed which performs a 'rough' sub-pixel search before the partition selection, and performs a 'precise' sub-pixel search for the best partition. By reducing the searching load for the large number of non-best partitions, the computation complexity for sub-pixel search can be greatly decreased. Experimental results show that our method can reduce the sub-pixel search points by more than 50% compared to existing fast sub-pixel ME methods with negligible quality degradation.

preprint2015arXiv

Activity Recognition Using A Combination of Category Components And Local Models for Video Surveillance

This paper presents a novel approach for automatic recognition of human activities for video surveillance applications. We propose to represent an activity by a combination of category components, and demonstrate that this approach offers flexibility to add new activities to the system and an ability to deal with the problem of building models for activities lacking training data. For improving the recognition accuracy, a Confident-Frame- based Recognition algorithm is also proposed, where the video frames with high confidence for recognizing an activity are used as a specialized local model to help classify the remainder of the video frames. Experimental results show the effectiveness of the proposed approach.

preprint2015arXiv

Group Event Detection with a Varying Number of Group Members for Video Surveillance

This paper presents a novel approach for automatic recognition of group activities for video surveillance applications. We propose to use a group representative to handle the recognition with a varying number of group members, and use an Asynchronous Hidden Markov Model (AHMM) to model the relationship between people. Furthermore, we propose a group activity detection algorithm which can handle both symmetric and asymmetric group activities, and demonstrate that this approach enables the detection of hierarchical interactions between people. Experimental results show the effectiveness of our approach.

preprint2015arXiv

Macroblock Classification Method for Video Applications Involving Motions

In this paper, a macroblock classification method is proposed for various video processing applications involving motions. Based on the analysis of the Motion Vector field in the compressed video, we propose to classify Macroblocks of each video frame into different classes and use this class information to describe the frame content. We demonstrate that this low-computation-complexity method can efficiently catch the characteristics of the frame. Based on the proposed macroblock classification, we further propose algorithms for different video processing applications, including shot change detection, motion discontinuity detection, and outlier rejection for global motion estimation. Experimental results demonstrate that the methods based on the proposed approach can work effectively on these applications.

preprint2015arXiv

Region-Based Rate-Control for H.264/AVC for Low Bit-Rate Applications

Rate-control plays an important role in video coding. However, in the conventional rate-control algorithms, the number and position of Macroblocks (MBs) inside one basic unit for rate-control is inflexible and predetermined. The different characteristics of the MBs are not fully considered. Also, there is no overall optimization of the coding of basic units. This paper proposes a new region-based rate-control scheme for H.264/AVC to improve the coding efficiency. The inter-frame information is explored to objectively divide one frame into multiple regions based on their rate-distortion behaviors. The MBs with the similar characteristics are classified into the same region, and the entire region instead of a single MB or a group of contiguous MBs is treated as a basic unit for rate-control. A linear rate-quantization stepsize model and a linear distortion-quantization stepsize model are proposed to accurately describe the rate-distortion characteristics for the region-based basic units. Moreover, based on the above linear models, an overall optimization model is proposed to obtain suitable Quantization Parameters (QPs) for the region-based basic units. Experimental results demonstrate that the proposed region-based rate-control approach can achieve both better subjective and objective quality by performing the rate-control adaptively with the content, compared to the conventional rate-control approaches.

Ming-Ting Sun

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation

Learning Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes

Learning to Generate Multiple Style Transfer Outputs for an Input Sentence

Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

A Computation Control Motion Estimation Method for Complexity-Scalable Video Coding

A Fast Sub-Pixel Motion Estimation Algorithm for H.264/AVC Video Coding

Activity Recognition Using A Combination of Category Components And Local Models for Video Surveillance

Group Event Detection with a Varying Number of Group Members for Video Surveillance

Macroblock Classification Method for Video Applications Involving Motions

Region-Based Rate-Control for H.264/AVC for Low Bit-Rate Applications