Researcher profile

Luoyuan Zhang

Luoyuan Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplex omni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting proactive behaviors such as issuing reminders or comments based on its continuous understanding of the live scene. The key technique behind MiniCPM-o 4.5 is Omni-Flow, a unified streaming framework that aligns omni-modal inputs and outputs along a shared temporal axis. This formulation converts conventional turn-based interaction into a full-duplex, time-aligned process, enabling simultaneous perception and response and allowing proactive behavior to arise within the same framework. With a total of 9B parameters, MiniCPM-o 4.5 approaches Gemini 2.5 Flash in vision-language capabilities, delivering state-of-the-art open-source performance at its scale. It also surpasses Qwen3-Omni-30B-A3B in omni-modal understanding and delivers better speech generation, with significantly higher computation efficiency. Driven by its efficient architecture design and inference optimization, the model can perform real-time full-duplex omni-modal interaction on edge devices with less than 12GB RAM cost.

preprint2026arXiv

WaveDiffusion: Joint Latent Diffusion for Physically Consistent Seismic and Velocity Generation

Full Waveform Inversion (FWI) is a critical technique in subsurface imaging, aiming to reconstruct high-resolution subsurface properties from surface measurements. Acoustic FWI involves two physical modalities, seismic waveforms and velocity maps, which are governed by the acoustic wave equation. Prior works primarily focus on the inverse problem, modeling the relationship between seismic and velocity as an image-to-image translation task. In this work, we study their relationship from a generative perspective. Our aim is to explore and characterize the latent space structure, and identify latent vectors that generate seismic-velocity pairs consistent with the governing partial differential equation (PDE). Specifically, we model seismic and velocity data jointly from a shared latent space via a diffusion process. In experiments, we find that diffusion progressively refines arbitrary latent vectors into ones that yield approximately physics-consistent seismic-velocity pairs, even without explicit physics constraints. This provides empirical evidence of PDE-consistency in latent diffusion, where sampling is biased toward PDE-valid solutions. In latent space, satisfying the acoustic wave equation can be approximated through sampling and gradient descent. We formalize this physics-consistent latent modeling task and quantify it through extensive experiments. On large-scale OpenFWI benchmarks, our approach produces high-fidelity, diverse, and physically consistent seismic-velocity pairs, demonstrating the potential of a data-driven latent diffusion for physically consistent generation in a complex scientific domain.