Researcher profile

Yutian Tao

Yutian Tao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

All-Optical Deep Learning with Quantum Nonlinearity

The rapid scaling of deep neural networks comes at the cost of unsustainable power consumption. While optical neural networks offer an alternative, their capabilities remain constrained by the lack of efficient optical nonlinearities. To address this, we propose an all-optical deep learning architecture by embedding quantum emitters in inverse-designed nanophotonic structures. Due to their saturability, quantum emitters exhibit exceptionally strong nonlinearity compared with conventional materials. Using physics-aware training, we demonstrate that the proposed architecture can solve complex tasks, including nonlinear classification and reinforcement learning, which have not been realized in all-optical neural networks. To enable fair comparison across different platforms, we introduce a framework that quantitatively links nonlinearity to a network's expressive power. Analysis shows that our quantum activation, operating below nW/μm^2 intensity, reduces the power budget by seven orders of magnitude. System-level estimates show that the optical power required for large language models scales sublinearly with model size, enabling watt-level operation. Our results indicate that quantum nanophotonics provides a route toward sustainable AI inference.

preprint2026arXiv

Learning Visual Feature-Based World Models via Residual Latent Action

World models predict future transitions from observations and actions. Existing works predominantly focus on image generation only. Visual feature-based world models, on the other hand, predict future visual features instead of raw video pixels, offering a promising alternative that is more efficient and less prone to hallucination. However, current feature-based approaches rely on direct regression, which leads to blurry or collapsed predictions in complex interactions, while generative modeling in high-dimensional feature spaces still remains challenging. In this work, we discover that a new type of latent action representation, which we refer to as *Residual Latent Action* (RLA), can be easily learned from DINO residuals. We also show that RLA is predictive, generalizable, and encodes temporal progression. Building on RLA, we propose *RLA World Model* (RLA-WM), which predicts RLA values via flow matching. RLA-WM outperforms both state-of-the-art feature-based and video-diffusion world models on simulation and real-world datasets, while being orders of magnitude faster than video diffusion. Furthermore, we develop two robot learning techniques that use RLA-WM to improve policy learning. The first one is a minimalist world action model with RLA that learns from actionless demonstration videos. The second one is the first visual RL framework trained entirely inside a world model learned from offline videos only, using a video-aligned reward and no online interactions or handcrafted rewards. Project page: https://mlzxy.github.io/rla-wm

preprint2020arXiv

Optimized Processing of Localized Collisions in Projective Dynamics

We present a method for the efficient processing of contact and collision in volumetric elastic models simulated using the Projective Dynamics paradigm. Our approach enables interactive simulation of tetrahedral meshes with more than half a million elements, provided that the model satisfies two fundamental properties: the region of the model's surface that is susceptible to collision events needs to be known in advance, and the simulation degrees of freedom associated with that surface region should be limited to a small fraction (e.g. 5\%) of the total simulation nodes. Despite this conscious delineation of scope, our hypotheses hold true for common animation subjects, such as simulated models of the human face and parts of the body. In such scenarios, a partial Cholesky factorization can abstract away the behavior of the collision-safe subset of the face into the Schur Complement matrix with respect to the collision-prone region. We demonstrate how fast and accurate updates of penalty-based collision terms can be incorporated into this representation, and solved with high efficiency on the GPU. We also demonstrate the opportunity to iterate a partial update of the element rotations, akin to a selective application of the local step, specifically on the smaller collision-prone region without explicitly paying the cost associated with the rest of the simulation mesh. We demonstrate efficient and robust interactive simulation in detailed models from animation and medical applications.