Source author record

Kaiyuan Tan

Kaiyuan Tan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Identify Then Project: Contrastive Learning of Latent Dynamics from Partial Observations with Port-Hamiltonian Structure

Identifying latent state representations and dynamics is essential when direct modeling in observation space is infeasible, particularly under partial and high-dimensional observations. In such settings, representation learning and physics-aware modeling are inherently coupled. We study this problem for latent port-Hamiltonian systems, a structured class encompassing both conservative and dissipative dynamics. We propose a two-stage identify-then-project framework. First, a contrastive teacher learns continuous-time latent dynamics from partial observations. Then, a student projects the identified teacher representation and dynamics onto a port-Hamiltonian submanifold via a learned affine chart, yielding a physically consistent realization. As a conceptual counterfactual, we also consider a single-stage variant that jointly learns latent identification and port-Hamiltonian structure, but find it to be less reliable, motivating the proposed two-stage teacher-student framework. We show theoretically that affine projection is the natural bridge between the affine gauge of contrastive latent identification and the port-Hamiltonian systems. Empirically, we demonstrate that the proposed two-stage approach preserves the teacher's dynamics while enforcing physical structure, and performs more reliably than the single-stage alternative, particularly in dissipative regimes and high-dimensional visual settings.

preprint2026arXiv

Xiaomi EV World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

This report presents a unified technical system addressing the two core capabilities of world models for autonomous driving: world representation and world generation. For world representation, we propose WorldRec, a feed-forward reconstruction architecture driven by sparse scene queries. WorldRec initializes structured queries in 3D space, leveraging them to aggregate cross-view, cross-temporal features, thereby naturally enforcing spatial consistency across frames and yielding compact yet high-fidelity 3D Gaussian scene representations. For world generation, we propose WorldGen, a two-stage training framework of bidirectional pretraining followed by causal fine-tuning through three progressive stages (Teacher Forcing, ODE distillation, and DMD), enabling high-quality online causal video generation in as few as 4 denoising steps. Building on both modules, we further introduce the JWM, which deeply integrates WorldRec and WorldGen to achieve synergistic gains in generation stability, cross-frame consistency, and visual fidelity, providing a solid foundation for closed-loop simulation, data synthesis, and end-to-end training in autonomous driving.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint