Source author record

Yuyao Huang

Yuyao Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Computer Vision eess.SP Machine Learning

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Training Report of TeleChat3-MoE

TeleChat3-MoE is the latest series of TeleChat large language models, featuring a Mixture-of-Experts (MoE) architecture with parameter counts ranging from 105 billion to over one trillion,trained end-to-end on Ascend NPU cluster. This technical report mainly presents the underlying training infrastructure that enables reliable and efficient scaling to frontier model sizes. We detail systematic methodologies for operator-level and end-to-end numerical accuracy verification, ensuring consistency across hardware platforms and distributed parallelism strategies. Furthermore, we introduce a suite of performance optimizations, including interleaved pipeline scheduling, attention-aware data scheduling for long-sequence training,hierarchical and overlapped communication for expert parallelism, and DVM-based operator fusion. A systematic parallelization framework, leveraging analytical estimation and integer linear programming, is also proposed to optimize multi-dimensional parallelism configurations. Additionally, we present methodological approaches to cluster-level optimizations, addressing host- and device-bound bottlenecks during large-scale training tasks. These infrastructure advancements yield significant throughput improvements and near-linear scaling on clusters comprising thousands of devices, providing a robust foundation for large-scale language model development on hardware ecosystems.

preprint2020arXiv

DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction

For both driving safety and efficiency, automated vehicles should be able to predict the behavior of surrounding traffic participants in a complex dynamic environment. To accomplish such a task, trajectory prediction is the key. Although many researchers have been engaged in this topic, it is still challenging. One of the important and inherent factors is the multi-modality of vehicle motion. Because of the disparate driving behaviors under the same condition, the prediction of vehicle trajectory should also be multi-modal. At present, related researches have more or less shortcomings for multi-modal trajectory prediction, such as requiring explicit modal labels or multiple forward propagation caused by sampling. In this work, we focus on overcoming these issues by pointing out the dual-levels of multi-modal characteristics in vehicle motion and proposing the dual-level stochastic multiple choice learning method (named as DsMCL, for short). This method does not require modal labels and can implement a comprehensive probabilistic multi-modal trajectory prediction by a single forward propagation. By experiments on the NGSIM and HighD datasets, our method has proven significant improvement on several trajectory prediction frameworks and achieves state-of-the-art performance.

preprint2020arXiv

SPFCN: Select and Prune the Fully Convolutional Networks for Real-time Parking Slot Detection

For vehicles equipped with the automatic parking system, the accuracy and speed of the parking slot detection are crucial. But the high accuracy is obtained at the price of low speed or expensive computation equipment, which are sensitive for many car manufacturers. In this paper, we proposed a detector using CNN(convolutional neural networks) for faster speed and smaller model size while keeps accuracy. To achieve the optimal balance, we developed a strategy to select the best receptive fields and prune the redundant channels automatically after each training epoch. The proposed model is capable of jointly detecting corners and line features of parking slots while running efficiently in real time on average processors. The model has a frame rate of about 30 FPS on a 2.3 GHz CPU core, yielding parking slot corner localization error of 1.51$\pm$2.14 cm (std. err.) and slot detection accuracy of 98\%, generally satisfying the requirements in both speed and accuracy on on-board mobile terminals.

Yuyao Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Training Report of TeleChat3-MoE

DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction

SPFCN: Select and Prune the Fully Convolutional Networks for Real-time Parking Slot Detection