Source author record

Junhao Hu

Junhao Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Computation and Language math.NA Numerical Analysis Artificial Intelligence Distributed, Parallel, and Cluster Computing eess.AS Machine Learning Software Engineering Sound

Catalog footprint

What is connected

7works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

Modern large language model (LLM) training is inherently dynamic: resource fluctuations, RLHF phase shifts, and cluster elasticity continually reshape the optimal parallelism layout, posing a significant challenge to existing training frameworks built around a static execution model. We present DynaTrain, a distributed training system for sub-second, online reconfiguration across arbitrary multi-dimensional parallelism. At its core, we propose a Virtual Parameter Space (VPS) abstraction that unifies all distributed training states under one logical coordinate space, turning any parallelism configuration into a deterministic mapping and collapsing complex transition into manageable geometric intersections. On top of VPS, a state routing-and-transition layer executes rank-local transfers under a memory-aware, deadlock-free schedule, and an Elastic Device Manager overlaps new-world construction with ongoing training to mask topology-change cost. On dense and MoE models up to 235B parameters, DynaTrain reconfigures a 70B dense model in under 2s and a 235B MoE model in 4.36s, outperforming state-of-the-art checkpoint-based and elastic systems by up to three orders of magnitude while preserving correctness.

preprint2026arXiv

MiMo-V2-Flash Technical Report

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

preprint2025arXiv

MiMo-Audio: Audio Language Models are Few-Shot Learners

Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions. GPT-3 has shown that scaling next-token prediction pretraining enables strong generalization capabilities in text, and we believe this paradigm is equally applicable to the audio domain. By scaling MiMo-Audio's pretraining data to over one hundred million of hours, we observe the emergence of few-shot learning capabilities across a diverse set of audio tasks. We develop a systematic evaluation of these capabilities and find that MiMo-Audio-7B-Base achieves SOTA performance on both speech intelligence and audio understanding benchmarks among open-source models. Beyond standard metrics, MiMo-Audio-7B-Base generalizes to tasks absent from its training data, such as voice conversion, style transfer, and speech editing. MiMo-Audio-7B-Base also demonstrates powerful speech continuation capabilities, capable of generating highly realistic talk shows, recitations, livestreaming and debates. At the post-training stage, we curate a diverse instruction-tuning corpus and introduce thinking mechanisms into both audio understanding and generation. MiMo-Audio-7B-Instruct achieves open-source SOTA on audio understanding benchmarks (MMSU, MMAU, MMAR, MMAU-Pro), spoken dialogue benchmarks (Big Bench Audio, MultiChallenge Audio) and instruct-TTS evaluations, approaching or surpassing closed-source models. Model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-Audio.

preprint2023arXiv

Practitioners' Expectations on Code Completion

Code completion has become a common practice for programmers during their daily programming activities. It aims at automatically predicting the next tokens or lines that the programmers tend to use. A good code completion tool can substantially save keystrokes and improve the programming efficiency for programmers. Recently, various techniques for code completion have been proposed for usage in practice. However, it is still unclear what are practitioners' expectations on code completion and whether existing research has met their demands. To fill the gap, we perform an empirical study by first interviewing 15 practitioners and then surveying 599 practitioners from 18 IT companies about their expectations on code completion. We then compare the practitioners' demands with current research via conducting a literature review of papers on code completion published in premier publication venues from 2012 to 2022. Based on the comparison, we highlight the directions desirable for researchers to invest efforts towards developing code completion techniques for meeting practitioners' expectations.

preprint2020arXiv

Strong convergence rate of the truncated Euler-Maruyama method for stochastic differential delay equations with Poisson jumps

In this paper, we study a class of super-linear stochastic differential delay equations with Poisson jumps (SDDEwPJs). The convergence and rate of the convergence of the truncated Euler-Maruyama numerical solutions to SDDEwPJs are investigated under the generalized Khasminskii-type condition.

preprint2020arXiv

The Strong Convergence and Stability of Explicit Approximations for Nonlinear Stochastic Delay Differential Equations

This paper focuses on explicit approximations for nonlinear stochastic delay differential equations (SDDEs). Under the weakly local Lipschitz and some suitable conditions, a generic truncated Euler-Maruyama (TEM) scheme for SDDEs is proposed, which numerical solutions are bounded and converge to the exact solutions in qth moment for q>0. Furthermore, the 1/2 order convergent rate is yielded. Under the Khasminskii-type condition, a more precise TEM scheme is given, which numerical solutions are exponential stable in mean square and P-1. Finally, several numerical experiments are carried out to illustrate our results.

preprint2014arXiv

Almost Sure Asymptotic Stability for Regime-Switching Diffusions

In this paper, we discuss long-time behavior of sample paths for a wide range of regime-switching diffusions. Firstly, almost sure asymptotic stability is concerned (i) for regime-switching diffusions with finite state spaces by the Perron-Frobenius theorem, and, with regard to the case of reversible Markov chain, via the principal eigenvalue approach; (ii) for regime-switching diffusions with countable state spaces by means of a finite partition trick and an M-Matrix theory. We then apply our theory to study the stabilization for linear switching models. Several examples are given to demonstrate our theory.

Junhao Hu

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

MiMo-V2-Flash Technical Report

MiMo-Audio: Audio Language Models are Few-Shot Learners

Practitioners' Expectations on Code Completion

Strong convergence rate of the truncated Euler-Maruyama method for stochastic differential delay equations with Poisson jumps

The Strong Convergence and Stability of Explicit Approximations for Nonlinear Stochastic Delay Differential Equations

Almost Sure Asymptotic Stability for Regime-Switching Diffusions