Source author record

Hailin Zhang

Hailin Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Artificial Intelligence Computation and Language Machine Learning Computer Vision Distributed, Parallel, and Cluster Computing eess.AS Sound

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for reasoning in Large Language Models. However, optimizing solely for final-answer correctness often drives models into aimless, verbose exploration, where they rely on exhaustive trial-and-error tactics rather than structured planning to reach solutions. While heuristic constraints like length penalties can reduce verbosity, they often truncate essential reasoning steps, creating a difficult trade-off between efficiency and verification. In this paper, we argue that discriminative capability is a prerequisite for efficient generation: by learning to distinguish valid solutions, a model can internalize a guidance signal that prunes the search space. We propose JudgeRLVR, a two-stage judge-then-generate paradigm. In the first stage, we train the model to judge solution responses with verifiable answers. In the second stage, we fine-tune the same model with vanilla generating RLVR initialized from the judge. Compared to Vanilla RLVR using the same math-domain training data, JudgeRLVR achieves a better quality--efficiency trade-off for Qwen3-30B-A3B: on in-domain math, it delivers about +3.7 points average accuracy gain with -42\% average generation length; on out-of-domain benchmarks, it delivers about +4.5 points average accuracy improvement, demonstrating enhanced generalization.

preprint2026arXiv

MiMo-V2-Flash Technical Report

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

preprint2025arXiv

MiMo-Audio: Audio Language Models are Few-Shot Learners

Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions. GPT-3 has shown that scaling next-token prediction pretraining enables strong generalization capabilities in text, and we believe this paradigm is equally applicable to the audio domain. By scaling MiMo-Audio's pretraining data to over one hundred million of hours, we observe the emergence of few-shot learning capabilities across a diverse set of audio tasks. We develop a systematic evaluation of these capabilities and find that MiMo-Audio-7B-Base achieves SOTA performance on both speech intelligence and audio understanding benchmarks among open-source models. Beyond standard metrics, MiMo-Audio-7B-Base generalizes to tasks absent from its training data, such as voice conversion, style transfer, and speech editing. MiMo-Audio-7B-Base also demonstrates powerful speech continuation capabilities, capable of generating highly realistic talk shows, recitations, livestreaming and debates. At the post-training stage, we curate a diverse instruction-tuning corpus and introduce thinking mechanisms into both audio understanding and generation. MiMo-Audio-7B-Instruct achieves open-source SOTA on audio understanding benchmarks (MMSU, MMAU, MMAR, MMAU-Pro), spoken dialogue benchmarks (Big Bench Audio, MultiChallenge Audio) and instruct-TTS evaluations, approaching or surpassing closed-source models. Model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-Audio.

preprint2022arXiv

Confidence-Aware Multi-Teacher Knowledge Distillation

Knowledge distillation is initially introduced to utilize additional supervision from a single teacher model for the student model training. To boost the student performance, some recent variants attempt to exploit diverse knowledge sources from multiple teachers. However, existing studies mainly integrate knowledge from diverse sources by averaging over multiple teacher predictions or combining them using other various label-free strategies, which may mislead student in the presence of low-quality teacher predictions. To tackle this problem, we propose Confidence-Aware Multi-teacher Knowledge Distillation (CA-MKD), which adaptively assigns sample-wise reliability for each teacher prediction with the help of ground-truth labels, with those teacher predictions close to one-hot labels assigned large weights. Besides, CA-MKD incorporates intermediate layers to stable the knowledge transfer process. Extensive experiments show that our CA-MKD consistently outperforms all compared state-of-the-art methods across various teacher-student architectures.

preprint2022arXiv

Fog Based Computation Offloading for Swarm of Drones

Due to the limited computing resources of swarm of drones, it is difficult to handle computation-intensive tasks locally, hence the cloud based computation offloading is widely adopted. However, for the business which requires low latency and high reliability, the cloud-based solution is not suitable, because of the slow response time caused by long distance data transmission. Therefore, to solve the problem mentioned above, in this paper, we introduce fog computing into swarm of drones (FCSD). Focusing on the latency and reliability sensitive business scenarios, the latency and reliability is constructed as the constraints of the optimization problem. And in order to enhance the practicality of the FCSD system, we formulate the energy consumption of FCSD as the optimization target function, to decrease the energy consumption as far as possible, under the premise of satisfying the latency and reliability requirements of the task. Furthermore, a heuristic algorithm based on genetic algorithm is designed to perform optimal task allocation in FCSD system. The simulation results validate that the proposed fog based computation offloading with the heuristic algorithm can complete the computing task effectively with the minimal energy consumption under the requirements of latency and reliability.

preprint2022arXiv

Knowledge Distillation with the Reused Teacher Classifier

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years, generally with elaborately designed knowledge representations, which in turn increase the difficulty of model development and interpretation. In contrast, we empirically show that a simple knowledge distillation technique is enough to significantly narrow down the teacher-student performance gap. We directly reuse the discriminative classifier from the pre-trained teacher model for student inference and train a student encoder through feature alignment with a single $\ell_2$ loss. In this way, the student model is able to achieve exactly the same performance as the teacher model provided that their extracted features are perfectly aligned. An additional projector is developed to help the student encoder match with the teacher classifier, which renders our technique applicable to various teacher and student architectures. Extensive experiments demonstrate that our technique achieves state-of-the-art results at the modest cost of compression ratio due to the added projector.

preprint2013arXiv

Distributed Linear Convolutional Space-Time Coding for Two-Relay Full-Duplex Asynchronous Cooperative Networks

In this paper, a two-relay full-duplex asynchronous cooperative network with the amplify-and-forward (AF) protocol is considered. We propose two distributed space-time coding schemes for the cases with and without cross-talks, respectively. In the first case, each relay can receive the signal sent by the other through the cross-talk link. We first study the feasibility of cross-talk cancellation in this network and show that the cross-talk interference cannot be removed well. For this reason, we design space-time codes by utilizing the cross-talk signals instead of removing them. In the other case, the self-coding is realized individually through the loop channel at each relay node and the signals from the two relay nodes form a space-time code. The achievable cooperative diversity of both cases is investigated and the conditions to achieve full cooperative diversity are presented. Simulation results verify the theoretical analysis.

preprint2013arXiv

Distributed Space-Time Coding for Full-Duplex Asynchronous Cooperative Communications

In this paper, we propose two distributed linear convolutional space-time coding (DLC-STC) schemes for full-duplex (FD) asynchronous cooperative communications. The DLC-STC Scheme 1 is for the case of the complete loop channel cancellation, which achieves the full asynchronous cooperative diversity. The DLC-STC Scheme 2 is for the case of the partial loop channel cancellation and amplifying, where some loop signals are used as the self-coding instead of treated as interference to be directly cancelled. We show this scheme can achieve full asynchronous cooperative diversity. We then evaluate the performance of the two schemes when loop channel information is not accurate and present an amplifying factor control method for the DLC-STC Scheme 2 to improve its performance with inaccurate loop channel information. Simulation results show that the DLC-STC Scheme 1 outperforms the DLC-STC Scheme 2 and the delay diversity scheme if perfect or high quality loop channel information is available at the relay, while the DLC-STC Scheme 2 achieves better performance if the loop channel information is imperfect.

preprint2012arXiv

An Improved WBF Algorithm for Higher-Speed Decoding of LDPC Codes

Due to the speed limitation of the conventional bit-chosen strategy in the existing weighted bit flipping algorithms, a high-speed LDPC decoder cannot be realized. To solve this problem, we propose a fast weighted bit flipping (FWBF) algorithm. Specifically, based on the stochastic error bitmap of the received vector, a partially parallel bit-choose strategy is adopted to lower the delay of choosing the bit flipped. Because of its partially parallel structure, the novel strategy can be well incorporated into the LDPC decoder [1]. The analysis of the decoding delay demonstrates that, the decoding speed can be greatly improved by adopting the proposed FWBF algorithm. Further, simulation results verify the validity of the proposed algorithm.

preprint2011arXiv

Full Duplex Wireless Communications for Cognitive Radio Networks

As a key in cognitive radio networks (CRNs), dynamic spectrum access needs to be carefully designed to minimize the interference and delay to the \emph{primary} (licensed) users. One of the main challenges in dynamic spectrum access is to determine when the \emph{secondary} (unlicensed) users can use the spectrum. In particular, when the secondary user is using the spectrum, if the primary user becomes active to use the spectrum, it is usually hard for the secondary user to detect the primary user instantaneously, thus causing unexpected interference and delay to primary users. The secondary user cannot detect the presence of primary users instantaneously because the secondary user is unable to detect the spectrum at the same time while it is transmitting. To solve this problem, we propose the full duplex wireless communications scheme for CRNs. In particular, we employ the Antennas Cancellation (AC), the RF Interference Cancellation (RIC), and the Digital Interference Cancellation (DIC) techniques for secondary users so that the secondary user can scan for active primary users while it is transmitting. Once detecting the presence of primary users, the secondary user will release the spectrum instantaneously to avoid the interference and delay to primary users. We analyze the packet loss rate of primary users in wireless full duplex CRNs, and compare them with the packet loss rate of primary users in wireless half duplex CRNs. Our analyses and simulations show that using our developped wireless full duplex CRNs, the packet loss rate of primary users can be significantly decreased as compared with that of primary users by using the half duplex CRNs.

preprint2011arXiv

On-Demand Based Wireless Resources Trading for Green Communications

The purpose of Green Communications is to reduce the energy consumption of the communication system as much as possible without compromising the quality of service (QoS) for users. An effective approach for Green Wireless Communications is On-Demand strategy, which scales power consumption with the volume and location of user demand. Applying the On-Demand Communications model, we propose a novel scheme -- Wireless Resource Trading, which characterizes the trading relationship among different wireless resources for a given number of performance metrics. According to wireless resource trading relationship, different wireless resources can be consumed for the same set of performance metrics. Therefore, to minimize the energy consumption for given performance metrics, we can trade the other type of wireless resources for the energy resource under the demanded performance metrics. Based on the wireless resource trading relationship, we derive the optimal energy-bandwidth and energy-time wireless resource trading relationship for green wireless communications. We also develop an adaptive trading strategy by using different bandwidths or different delays for different transmission distances with available bandwidths and acceptable delay bounds in wireless networks. Our conducted simulations show that the energy consumption of wireless networks can be significantly reduced with our proposed wireless resources trading scheme.

Hailin Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

MiMo-V2-Flash Technical Report

MiMo-Audio: Audio Language Models are Few-Shot Learners

Confidence-Aware Multi-Teacher Knowledge Distillation

Fog Based Computation Offloading for Swarm of Drones

Knowledge Distillation with the Reused Teacher Classifier

Distributed Linear Convolutional Space-Time Coding for Two-Relay Full-Duplex Asynchronous Cooperative Networks

Distributed Space-Time Coding for Full-Duplex Asynchronous Cooperative Communications

An Improved WBF Algorithm for Higher-Speed Decoding of LDPC Codes

Full Duplex Wireless Communications for Cognitive Radio Networks

On-Demand Based Wireless Resources Trading for Green Communications