Source author record

Yudong Wang

Yudong Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language eess.AS Sound hep-ex hep-ph Machine Learning math.NT math.OC physics.class-ph physics.optics

Catalog footprint

What is connected

8works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for reasoning in Large Language Models. However, optimizing solely for final-answer correctness often drives models into aimless, verbose exploration, where they rely on exhaustive trial-and-error tactics rather than structured planning to reach solutions. While heuristic constraints like length penalties can reduce verbosity, they often truncate essential reasoning steps, creating a difficult trade-off between efficiency and verification. In this paper, we argue that discriminative capability is a prerequisite for efficient generation: by learning to distinguish valid solutions, a model can internalize a guidance signal that prunes the search space. We propose JudgeRLVR, a two-stage judge-then-generate paradigm. In the first stage, we train the model to judge solution responses with verifiable answers. In the second stage, we fine-tune the same model with vanilla generating RLVR initialized from the judge. Compared to Vanilla RLVR using the same math-domain training data, JudgeRLVR achieves a better quality--efficiency trade-off for Qwen3-30B-A3B: on in-domain math, it delivers about +3.7 points average accuracy gain with -42\% average generation length; on out-of-domain benchmarks, it delivers about +4.5 points average accuracy improvement, demonstrating enhanced generalization.

preprint2026arXiv

MiMo-V2-Flash Technical Report

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

preprint2026arXiv

UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models

The development of audio foundation models has accelerated rapidly since the emergence of GPT-4o. However, the lack of comprehensive evaluation has become a critical bottleneck for further progress in the field, particularly in audio generation. Current audio evaluation faces three major challenges: (1) audio evaluation lacks a unified framework, with datasets and code scattered across various sources, hindering fair and efficient cross-model comparison;(2) audio codecs, as a key component of audio foundation models, lack a widely accepted and holistic evaluation methodology; (3) existing speech benchmarks are heavily reliant on English, making it challenging to objectively assess models' performance on Chinese. To address the first issue, we introduce UltraEval-Audio, a unified evaluation framework for audio foundation models, specifically designed for both audio understanding and generation tasks. UltraEval-Audio features a modular architecture, supporting 10 languages and 14 core task categories, while seamlessly integrating 24 mainstream models and 36 authoritative benchmarks. To enhance research efficiency, the framework provides a one-command evaluation feature, accompanied by real-time public leaderboards. For the second challenge, UltraEval-Audio adopts a novel comprehensive evaluation scheme for audio codecs, evaluating performance across three key dimensions: semantic accuracy, timbre fidelity, and acoustic quality. To address the third issue, we propose two new Chinese benchmarks, SpeechCMMLU and SpeechHSK, designed to assess Chinese knowledge proficiency and language fluency. We wish that UltraEval-Audio will provide both academia and industry with a transparent, efficient, and fair platform for comparison of audio models. Our code, benchmarks, and leaderboards are available at https://github.com/OpenBMB/UltraEval-Audio.

preprint2025arXiv

MiMo-Audio: Audio Language Models are Few-Shot Learners

Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions. GPT-3 has shown that scaling next-token prediction pretraining enables strong generalization capabilities in text, and we believe this paradigm is equally applicable to the audio domain. By scaling MiMo-Audio's pretraining data to over one hundred million of hours, we observe the emergence of few-shot learning capabilities across a diverse set of audio tasks. We develop a systematic evaluation of these capabilities and find that MiMo-Audio-7B-Base achieves SOTA performance on both speech intelligence and audio understanding benchmarks among open-source models. Beyond standard metrics, MiMo-Audio-7B-Base generalizes to tasks absent from its training data, such as voice conversion, style transfer, and speech editing. MiMo-Audio-7B-Base also demonstrates powerful speech continuation capabilities, capable of generating highly realistic talk shows, recitations, livestreaming and debates. At the post-training stage, we curate a diverse instruction-tuning corpus and introduce thinking mechanisms into both audio understanding and generation. MiMo-Audio-7B-Instruct achieves open-source SOTA on audio understanding benchmarks (MMSU, MMAU, MMAR, MMAU-Pro), spoken dialogue benchmarks (Big Bench Audio, MultiChallenge Audio) and instruct-TTS evaluations, approaching or surpassing closed-source models. Model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-Audio.

preprint2022arXiv

Analysis of $B_s\toϕν\barν$ at CEPC

The rare $b\to sν\barν$ decays are sensitive to contributions of new physics (NP) and helpful to resolve the puzzle of multiple $B$ flavor anomalies. In this work, we propose to study the $b\to sν\barν$ transition at a future lepton collider operating at the $Z$ pole through the $B_s \to ϕν\barν$ decay. Using the $B_s\toϕ$ decay form factors from lattice simulations, we first update the SM prediction of BR($B_s \to ϕν\barν)_{\mathrm{SM}}=(9.93\pm 0.72)\times 10^{-6}$ and the corresponding $ϕ$ longitudinal polarization fraction $F_{L,{\mathrm{SM}}}=0.53\pm 0.04$. Our analysis uses the full CEPC simulation samples with a net statistic of $\mathcal{O}(10^9)$ $Z$ decays. Precise $ϕ$ and $B_s$ reconstructions are used to suppress backgrounds. The results show that BR($B_s \to ϕν\barν)$ can be measured with a statistical uncertainty of $\mathcal{O}(\%)$ and an $S/B$ ratio of $\mathcal{O}(1)$ at the CEPC. The quality measures for the event reconstruction are also derived. By combining the measurement of BR($B_s \to ϕν\barν)$ and $F_L$, the constraints on the effective theory couplings at low energy are given.

preprint2016arXiv

Optimal targeting of nonlinear chaotic systems using a novel evolutionary computing strategy

Control of chaotic systems to given targets is a subject of substantial and well-developed research issue in nonlinear science, which can be formulated as a class of multi-modal constrained numerical optimization problem with multi-dimensional decision variables. This investigation elucidates the feasibility of applying a novel population-based metaheuristics labelled here as Teaching-learning-based optimization to direct the orbits of discrete chaotic dynamical systems towards the desired target region. Several consecutive control steps of small bounded perturbations are made in the Teaching-learning-based optimization strategy to direct the chaotic series towards the optimal neighborhood of the desired target rapidly, where a conventional controller is effective for chaos control. Working with the dynamics of the well-known Henon as well as Ushio discrete chaotic systems, we assess the effectiveness and efficiency of the Teaching-learning-based optimization based optimal control technique, meanwhile the impacts of the core parameters on performances are also discussed. Furthermore, possible engineering applications of directing chaotic orbits are discussed.

preprint2014arXiv

Experimental demonstration of a multiphysics cloak: manipulating heat flux and electric current simultaneously

In past years, triggered by their successful realizations in electromagnetics, invisible cloaks have experienced rapid development and have been widely pursued in many different fields, though so far only for a single physical system. In this letter we made an unprecedented experimental attempt to show a multidisciplinary framework designed on the basis of two different physical equations. The proposed structure has the exceptional capability to simultaneously control two different physical phenomena according to the predetermined evolution scenarios. As a proof of concept, we implemented an electric-thermal bifunctional device that can guide both electric current and heat flux "across" a strong 'scatter' (air cavity) and restore their original diffusion directions as if nothing exists along the paths, thus rending dual cloaking effects for objects placed inside the cavity. This bifunctional cloaking performance is also numerically verified for a point-source nonuniform excitation. Our results and the fabrication technique presented here will help broaden the current research scope for multiple disciplines and may pave a prominent way to manipulate multiple flows and create new functional devices, e.g., for on-chip applications.

preprint2013arXiv

On the Lacunarity of some eta-products

The lacunarity is an interesting property of a formal series. We say a series is lacunary if "almost all" of its coefficients are zero. In this article we considered about the lacunarity of some eta-products like η(z)^2η(bz)^2, and proved that they are lacunary if and only if b is 1,2,3,4 or 16. Then We write them as linear combinations of some CM forms.

Yudong Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

MiMo-V2-Flash Technical Report

UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models

MiMo-Audio: Audio Language Models are Few-Shot Learners

Analysis of $B_s\toϕν\barν$ at CEPC

Optimal targeting of nonlinear chaotic systems using a novel evolutionary computing strategy

Experimental demonstration of a multiphysics cloak: manipulating heat flux and electric current simultaneously

On the Lacunarity of some eta-products