Source author record

Yutian Tao

Yutian Tao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Graphics Machine Learning physics.app-ph physics.comp-ph physics.optics Robotics

Catalog footprint

What is connected

3works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

All-Optical Deep Learning with Quantum Nonlinearity

The rapid scaling of deep neural networks comes at the cost of unsustainable power consumption. While optical neural networks offer an alternative, their capabilities remain constrained by the lack of efficient optical nonlinearities. To address this, we propose an all-optical deep learning architecture by embedding quantum emitters in inverse-designed nanophotonic structures. Due to their saturability, quantum emitters exhibit exceptionally strong nonlinearity compared with conventional materials. Using physics-aware training, we demonstrate that the proposed architecture can solve complex tasks, including nonlinear classification and reinforcement learning, which have not been realized in all-optical neural networks. To enable fair comparison across different platforms, we introduce a framework that quantitatively links nonlinearity to a network's expressive power. Analysis shows that our quantum activation, operating below nW/μm^2 intensity, reduces the power budget by seven orders of magnitude. System-level estimates show that the optical power required for large language models scales sublinearly with model size, enabling watt-level operation. Our results indicate that quantum nanophotonics provides a route toward sustainable AI inference.

preprint2026arXiv

Learning Visual Feature-Based World Models via Residual Latent Action

World models predict future transitions from observations and actions. Existing works predominantly focus on image generation only. Visual feature-based world models, on the other hand, predict future visual features instead of raw video pixels, offering a promising alternative that is more efficient and less prone to hallucination. However, current feature-based approaches rely on direct regression, which leads to blurry or collapsed predictions in complex interactions, while generative modeling in high-dimensional feature spaces still remains challenging. In this work, we discover that a new type of latent action representation, which we refer to as *Residual Latent Action* (RLA), can be easily learned from DINO residuals. We also show that RLA is predictive, generalizable, and encodes temporal progression. Building on RLA, we propose *RLA World Model* (RLA-WM), which predicts RLA values via flow matching. RLA-WM outperforms both state-of-the-art feature-based and video-diffusion world models on simulation and real-world datasets, while being orders of magnitude faster than video diffusion. Furthermore, we develop two robot learning techniques that use RLA-WM to improve policy learning. The first one is a minimalist world action model with RLA that learns from actionless demonstration videos. The second one is the first visual RL framework trained entirely inside a world model learned from offline videos only, using a video-aligned reward and no online interactions or handcrafted rewards. Project page: https://mlzxy.github.io/rla-wm

preprint2020arXiv

Optimized Processing of Localized Collisions in Projective Dynamics

We present a method for the efficient processing of contact and collision in volumetric elastic models simulated using the Projective Dynamics paradigm. Our approach enables interactive simulation of tetrahedral meshes with more than half a million elements, provided that the model satisfies two fundamental properties: the region of the model's surface that is susceptible to collision events needs to be known in advance, and the simulation degrees of freedom associated with that surface region should be limited to a small fraction (e.g. 5\%) of the total simulation nodes. Despite this conscious delineation of scope, our hypotheses hold true for common animation subjects, such as simulated models of the human face and parts of the body. In such scenarios, a partial Cholesky factorization can abstract away the behavior of the collision-safe subset of the face into the Schur Complement matrix with respect to the collision-prone region. We demonstrate how fast and accurate updates of penalty-based collision terms can be incorporated into this representation, and solved with high efficiency on the GPU. We also demonstrate the opportunity to iterate a partial update of the element rotations, akin to a selective application of the local step, specifically on the smaller collision-prone region without explicitly paying the cost associated with the rest of the simulation mesh. We demonstrate efficient and robust interactive simulation in detailed models from animation and medical applications.