Source author record

Haoming Chen

Haoming Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computation and Language Distributed, Parallel, and Cluster Computing Graphics

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning

Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets with the aim of enhancing the Chinese conversational capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through instruction fine-tuning on this carefully processed dataset, we successfully construct the Mixtral-8x7B sparse Mixture-of-Experts model named "Aurora." To assess the performance of Aurora, we utilize three widely recognized benchmark tests: C-Eval, MMLU, and CMMLU. Empirical studies validate the effectiveness of instruction fine-tuning applied to Mixtral-8x7B sparse Mixture-of-Experts model. This work is pioneering in the execution of instruction fine-tuning on a sparse expert-mixed model, marking a significant breakthrough in enhancing the capabilities of this model architecture. Our code, data and model are publicly available at https://github.com/WangRongsheng/Aurora

preprint2022arXiv

2D Human Pose Estimation: A Survey

Human pose estimation aims at localizing human anatomical keypoints or body parts in the input data (e.g., images, videos, or signals). It forms a crucial component in enabling machines to have an insightful understanding of the behaviors of humans, and has become a salient problem in computer vision and related fields. Deep learning techniques allow learning feature representations directly from the data, significantly pushing the performance boundary of human pose estimation. In this paper, we reap the recent achievements of 2D human pose estimation methods and present a comprehensive survey. Briefly, existing approaches put their efforts in three directions, namely network architecture design, network training refinement, and post processing. Network architecture design looks at the architecture of human pose estimation models, extracting more robust features for keypoint recognition and localization. Network training refinement tap into the training of neural networks and aims to improve the representational ability of models. Post processing further incorporates model-agnostic polishing strategies to improve the performance of keypoint detection. More than 200 research contributions are involved in this survey, covering methodological frameworks, common benchmark datasets, evaluation metrics, and performance comparisons. We seek to provide researchers with a more comprehensive and systematic review on human pose estimation, allowing them to acquire a grand panorama and better identify future directions.

preprint2022arXiv

A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning

Recently, Deep Neural Networks (DNNs) have recorded great success in handling medical and other complex classification tasks. However, as the sizes of a DNN model and the available dataset increase, the training process becomes more complex and computationally intensive, which usually takes a longer time to complete. In this work, we have proposed a generic full end-to-end hybrid parallelization approach combining both model and data parallelism for efficiently distributed and scalable training of DNN models. We have also proposed a Genetic Algorithm based heuristic resources allocation mechanism (GABRA) for optimal distribution of partitions on the available GPUs for computing performance optimization. We have applied our proposed approach to a real use case based on 3D Residual Attention Deep Neural Network (3D-ResAttNet) for efficient Alzheimer Disease (AD) diagnosis on multiple GPUs. The experimental evaluation shows that the proposed approach is efficient and scalable, which achieves almost linear speedup with little or no differences in accuracy performance when compared with the existing non-parallel DNN models.

preprint2022arXiv

Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation

Multi-frame human pose estimation has long been a compelling and fundamental problem in computer vision. This task is challenging due to fast motion and pose occlusion that frequently occur in videos. State-of-the-art methods strive to incorporate additional visual evidences from neighboring frames (supporting frames) to facilitate the pose estimation of the current frame (key frame). One aspect that has been obviated so far, is the fact that current methods directly aggregate unaligned contexts across frames. The spatial-misalignment between pose features of the current frame and neighboring frames might lead to unsatisfactory results. More importantly, existing approaches build upon the straightforward pose estimation loss, which unfortunately cannot constrain the network to fully leverage useful information from neighboring frames. To tackle these problems, we present a novel hierarchical alignment framework, which leverages coarse-to-fine deformations to progressively update a neighboring frame to align with the current frame at the feature level. We further propose to explicitly supervise the knowledge extraction from neighboring frames, guaranteeing that useful complementary cues are extracted. To achieve this goal, we theoretically analyzed the mutual information between the frames and arrived at a loss that maximizes the task-relevant mutual information. These allow us to rank No.1 in the Multi-frame Person Pose Estimation Challenge on benchmark dataset PoseTrack2017, and obtain state-of-the-art performance on benchmarks Sub-JHMDB and Pose-Track2018. Our code is released at https://github. com/Pose-Group/FAMI-Pose, hoping that it will be useful to the community.

preprint2015arXiv

On Intra Prediction for Screen Content Video Coding

Screen content coding (SCC) is becoming increasingly important in various applications, such as desktop sharing, video conferencing, and remote education. When compared to natural camera- captured content, screen content has different characteristics, in particular sharper edges. In this paper, we propose a novel intra prediction scheme for screen content video. In the proposed scheme, bilinear interpolation in angular intra prediction in HEVC is selectively replaced by nearest-neighbor intra prediction to preserve the sharp edges in screen content video. We present three different variants of the proposed nearest neighbor prediction algorithm: two implicit methods where both the encoder, and the decoder derive whether to perform nearest neighbor prediction or not based on either (a) the sum of the absolute difference, or (b) the difference between the boundary pixels from which prediction is performed; and another variant where Rate-Distortion-Optimization (RDO) search is performed at the encoder to decide whether or not to use the nearest neighbor interpolation, and explicitly signaled to the decoder. We also discuss the various underlying trade-offs in terms of the complexity of the three variants. All the three proposed variants provide significant gains over HEVC, and simulation results show that average gains of 3.3% BD-bitrate in Intra-frame coding are achieved by the RDO variant for screen content video. To the best of our knowledge, this is the first paper that 1) points out current HEVC intra prediction scheme with bilinear interpolation does not work efficiently for screen content video and 2) uses different filters adaptively in the HEVC intra prediction interpolation.

Haoming Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning

2D Human Pose Estimation: A Survey

A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning

Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation

On Intra Prediction for Screen Content Video Coding