Researcher profile

Yuyao Huang

Yuyao Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2025arXiv

Training Report of TeleChat3-MoE

TeleChat3-MoE is the latest series of TeleChat large language models, featuring a Mixture-of-Experts (MoE) architecture with parameter counts ranging from 105 billion to over one trillion,trained end-to-end on Ascend NPU cluster. This technical report mainly presents the underlying training infrastructure that enables reliable and efficient scaling to frontier model sizes. We detail systematic methodologies for operator-level and end-to-end numerical accuracy verification, ensuring consistency across hardware platforms and distributed parallelism strategies. Furthermore, we introduce a suite of performance optimizations, including interleaved pipeline scheduling, attention-aware data scheduling for long-sequence training,hierarchical and overlapped communication for expert parallelism, and DVM-based operator fusion. A systematic parallelization framework, leveraging analytical estimation and integer linear programming, is also proposed to optimize multi-dimensional parallelism configurations. Additionally, we present methodological approaches to cluster-level optimizations, addressing host- and device-bound bottlenecks during large-scale training tasks. These infrastructure advancements yield significant throughput improvements and near-linear scaling on clusters comprising thousands of devices, providing a robust foundation for large-scale language model development on hardware ecosystems.

preprint2020arXiv

DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction

For both driving safety and efficiency, automated vehicles should be able to predict the behavior of surrounding traffic participants in a complex dynamic environment. To accomplish such a task, trajectory prediction is the key. Although many researchers have been engaged in this topic, it is still challenging. One of the important and inherent factors is the multi-modality of vehicle motion. Because of the disparate driving behaviors under the same condition, the prediction of vehicle trajectory should also be multi-modal. At present, related researches have more or less shortcomings for multi-modal trajectory prediction, such as requiring explicit modal labels or multiple forward propagation caused by sampling. In this work, we focus on overcoming these issues by pointing out the dual-levels of multi-modal characteristics in vehicle motion and proposing the dual-level stochastic multiple choice learning method (named as DsMCL, for short). This method does not require modal labels and can implement a comprehensive probabilistic multi-modal trajectory prediction by a single forward propagation. By experiments on the NGSIM and HighD datasets, our method has proven significant improvement on several trajectory prediction frameworks and achieves state-of-the-art performance.

preprint2020arXiv

SPFCN: Select and Prune the Fully Convolutional Networks for Real-time Parking Slot Detection

For vehicles equipped with the automatic parking system, the accuracy and speed of the parking slot detection are crucial. But the high accuracy is obtained at the price of low speed or expensive computation equipment, which are sensitive for many car manufacturers. In this paper, we proposed a detector using CNN(convolutional neural networks) for faster speed and smaller model size while keeps accuracy. To achieve the optimal balance, we developed a strategy to select the best receptive fields and prune the redundant channels automatically after each training epoch. The proposed model is capable of jointly detecting corners and line features of parking slots while running efficiently in real time on average processors. The model has a frame rate of about 30 FPS on a 2.3 GHz CPU core, yielding parking slot corner localization error of 1.51$\pm$2.14 cm (std. err.) and slot detection accuracy of 98\%, generally satisfying the requirements in both speed and accuracy on on-board mobile terminals.