Source author record

Yixin Yang

Yixin Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Computer Vision math.FA Artificial Intelligence eess.AS eess.SP math.CV math.DS math.SP Sound

Catalog footprint

What is connected

7works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A geometric approach to the compressed shift operator on the Hardy space over the bidisk

This paper studies the compressed shift operator $S_z$ on the Hardy space over the bidisk via the geometric approach. We calculate the spectrum and essential spectrum of $S_z$ on the Beurling type quotient modules induced by rational inner functions, and give a complete characterization for $S_z^*$ to be a Cowen-Douglas operator. Then we extend the concept of Cowen-Douglas operator to be the generalized Cowen-Douglas operator, and show that $S_z^*$ is a generalized Cowen-Douglas operator. Moreover, we establish the connection between the reducibility of the Hermitian holomorphic vector bundle induced by kernel spaces and the reducibility of the generalized Cowen-Douglas operator. By using the geometric approach, we study the reducing subspaces of $S_z$ on certain polynomial quotient modules.

preprint2026arXiv

IdentiFace: Multi-Modal Iterative Diffusion Framework for Identifiable Suspect Face Generation in Crime Investigations

Suspect face generation remains a technical challenge in crime investigations. Traditional sketch-drawing workflows suffer from low efficiency and quality, while diffusion-based approaches still face intrinsic limitations on conditional ambiguity for text-to-image models and sampling variance for one-shot generation. We proposed IdentiFace, a novel diffusion-based framework for identifiable suspect face generation, which addressed these issues through (1) multi-modal input design to strengthen conditional control, and (2) an iterative generation pipeline enabling identifiable feature adjustment. We additionally contributed a facial identity loss and two task-specific datasets. Comprehensive experiments on synthetic datasets and in real-world scenarios indicate that IdentiFace achieves superior performance over existing methods, especially in terms of identity retrieval, and shows strong potential for practical applications.

preprint2026arXiv

InstructAV2AV: Instruction-Guided Audio-Video Joint Editing

Recent diffusion-based methods have achieved impressive progress in video content manipulation. However, they typically ignore the accompanying audio, leaving the audio disjointed from the edited results. In this paper, we propose InstructAV2AV, the first end-to-end framework for instruction-guided audio-video joint editing. We first develop a scalable data synthesis pipeline and construct InsAVE-80K, the first large-scale audio-video editing dataset with high-quality source-to-target pairs. With this data foundation, we adapt an audio-video generation backbone to leverage its robust priors. We concatenate the audio-video input with noisy latent codes to anchor the source context, propose the source-instruction gated attention to improve instruction following and content preservation, and introduce a two-stage training strategy to effectively transfer these pre-trained priors. Extensive experiments demonstrate that InstructAV2AV outperforms state-of-the-art methods across 11 metrics spanning three aspects on two evaluation sets, highlighting its potential for controllable content creation. Project page: https://hjzheng.net/projects/InstructAV2AV/.

preprint2026arXiv

MiMo-V2-Flash Technical Report

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

preprint2026arXiv

Spectral dynamics for the infinite dihedral group and the lamplighter group

For a tuple $A=(A_0,A_1,\cdots,A_n)$ of elements in a Banach algebra $\mathfrak{B}$, its projective (joint) spectrum $p(A)$ is the collection of $z\in \mathbb{P}^n$ such that $A(z)=z_0A_0+z_1A_1+\cdots+z_nA_n$ is not invertible. If $\mathfrak{B}$ is the group $C^*$-algebra for a discrete group $G$ generated by $A_0, A_1,\dots, A_n$ with a representation $ρ$, then $p(A)$ is an invariant of (weak) equivalence for $ρ$. In \cite{BY}, B. Goldberg and R. Yang proved that the Julia set $\mathcal{J}(F)$ of the induced rational map $F$ for the infinite dihedral group $D_\infty$ is the union of the projective spectrum with the extended indeterminacy set. But the extended indeterminacy set $E_F$ is complicated. To obtain a better relationship between the projective spectrum and the Julia set, by replacing $A_π(z)=z_0+z_1π(a)+z_2π(t)$ with the extended pencil $A_π(z)=z_0+z_1π(a)+z_2π(t)+z_3π(at)$, where $π$ is the Koopman representation, and using the method of operator recursions, we show that $p(A_π)=\mathcal{J}(F).$ Further, we study the spectral dynamics for the Lamplighter group $\mathcal{L}$, and prove that $\mathcal{J}(Q)=E_Q$, where $Q$ is the rational map associated with $\mathcal{L}$.

preprint2025arXiv

MiMo-Audio: Audio Language Models are Few-Shot Learners

Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions. GPT-3 has shown that scaling next-token prediction pretraining enables strong generalization capabilities in text, and we believe this paradigm is equally applicable to the audio domain. By scaling MiMo-Audio's pretraining data to over one hundred million of hours, we observe the emergence of few-shot learning capabilities across a diverse set of audio tasks. We develop a systematic evaluation of these capabilities and find that MiMo-Audio-7B-Base achieves SOTA performance on both speech intelligence and audio understanding benchmarks among open-source models. Beyond standard metrics, MiMo-Audio-7B-Base generalizes to tasks absent from its training data, such as voice conversion, style transfer, and speech editing. MiMo-Audio-7B-Base also demonstrates powerful speech continuation capabilities, capable of generating highly realistic talk shows, recitations, livestreaming and debates. At the post-training stage, we curate a diverse instruction-tuning corpus and introduce thinking mechanisms into both audio understanding and generation. MiMo-Audio-7B-Instruct achieves open-source SOTA on audio understanding benchmarks (MMSU, MMAU, MMAR, MMAU-Pro), spoken dialogue benchmarks (Big Bench Audio, MultiChallenge Audio) and instruct-TTS evaluations, approaching or surpassing closed-source models. Model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-Audio.

preprint2020arXiv

GLRT-based Detection in Bistatic Sonar under Strong Direct Blast with Multipath Propagation

Direct blast is a strong interference in bistatic sonar and difficult to suppress due to multipath propagation for blasts and signals. A generalized likelihood ratio test (GLRT) based detection scheme in the frequency domain of the received signals is proposed in this study, and the unknown parameters are estimated using Maximum Likelihood Estimates and Weighted Fourier Transform and Relaxation in a multipath environment. The distributions of the test statistic of detectors for known and unknown noise power are given in theory, and the detection probability is determined. The performance of the detector decreases by 4 dB when the noise power is evaluated with maximum likelihood estimates. Simulations show the effectiveness of the detector under a forward scattering detection configuration with a low signal-to-direct blast ratio. The sensitivity of many factors is discussed, and robustness is achieved.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint