Source author record

Luoyuan Zhang

Luoyuan Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language physics.geo-ph

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplex omni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting proactive behaviors such as issuing reminders or comments based on its continuous understanding of the live scene. The key technique behind MiniCPM-o 4.5 is Omni-Flow, a unified streaming framework that aligns omni-modal inputs and outputs along a shared temporal axis. This formulation converts conventional turn-based interaction into a full-duplex, time-aligned process, enabling simultaneous perception and response and allowing proactive behavior to arise within the same framework. With a total of 9B parameters, MiniCPM-o 4.5 approaches Gemini 2.5 Flash in vision-language capabilities, delivering state-of-the-art open-source performance at its scale. It also surpasses Qwen3-Omni-30B-A3B in omni-modal understanding and delivers better speech generation, with significantly higher computation efficiency. Driven by its efficient architecture design and inference optimization, the model can perform real-time full-duplex omni-modal interaction on edge devices with less than 12GB RAM cost.

preprint2026arXiv

WaveDiffusion: Joint Latent Diffusion for Physically Consistent Seismic and Velocity Generation

Full Waveform Inversion (FWI) is a critical technique in subsurface imaging, aiming to reconstruct high-resolution subsurface properties from surface measurements. Acoustic FWI involves two physical modalities, seismic waveforms and velocity maps, which are governed by the acoustic wave equation. Prior works primarily focus on the inverse problem, modeling the relationship between seismic and velocity as an image-to-image translation task. In this work, we study their relationship from a generative perspective. Our aim is to explore and characterize the latent space structure, and identify latent vectors that generate seismic-velocity pairs consistent with the governing partial differential equation (PDE). Specifically, we model seismic and velocity data jointly from a shared latent space via a diffusion process. In experiments, we find that diffusion progressively refines arbitrary latent vectors into ones that yield approximately physics-consistent seismic-velocity pairs, even without explicit physics constraints. This provides empirical evidence of PDE-consistency in latent diffusion, where sampling is biased toward PDE-valid solutions. In latent space, satisfying the acoustic wave equation can be approximated through sampling and gradient descent. We formalize this physics-consistent latent modeling task and quantify it through extensive experiments. On large-scale OpenFWI benchmarks, our approach produces high-fidelity, diverse, and physically consistent seismic-velocity pairs, demonstrating the potential of a data-driven latent diffusion for physically consistent generation in a complex scientific domain.

Luoyuan Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

WaveDiffusion: Joint Latent Diffusion for Physically Consistent Seismic and Velocity Generation