Source author record

Wenlong Huang

Wenlong Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Robotics Machine Learning Computation and Language physics.plasm-ph

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation

Humans anticipate, from a glance and a contemplated action of their bodies, how the 3D world will respond, a capability that is equally vital for robotic manipulation. We introduce PointWorld, a large pre-trained 3D world model that unifies state and action in a shared 3D space as 3D point flows: given one or few RGB-D images and a sequence of low-level robot action commands, PointWorld forecasts per-pixel displacements in 3D that respond to the given actions. By representing actions as 3D point flows instead of embodiment-specific action spaces (e.g., joint positions), this formulation directly conditions on physical geometries of robots while seamlessly integrating learning across embodiments. To train our 3D world model, we curate a large-scale dataset spanning real and simulated robotic manipulation in open-world environments, enabled by recent advances in 3D vision and simulated environments, totaling about 2M trajectories and 500 hours across a single-arm Franka and a bimanual humanoid. Through rigorous, large-scale empirical studies of backbones, action representations, learning objectives, partial observability, data mixtures, domain transfers, and scaling, we distill design principles for large-scale 3D world modeling. With a real-time (0.1s) inference speed, PointWorld can be efficiently integrated in the model-predictive control (MPC) framework for manipulation. We demonstrate that a single pre-trained checkpoint enables a real-world Franka robot to perform rigid-body pushing, deformable and articulated object manipulation, and tool use, without requiring any demonstrations or post-training and all from a single image captured in-the-wild. Project website at https://point-world.github.io/.

preprint2025arXiv

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Generative video modeling has emerged as a compelling tool to zero-shot reason about plausible physical interactions for open-world manipulation. Yet, it remains a challenge to translate such human-led motions into the low-level actions demanded by robotic systems. We observe that given an initial image and task instruction, these models excel at synthesizing sensible object motions. Thus, we introduce Dream2Flow, a framework that bridges video generation and robotic control through 3D object flow as an intermediate representation. Our method reconstructs 3D object motions from generated videos and formulates manipulation as object trajectory tracking. By separating the state changes from the actuators that realize those changes, Dream2Flow overcomes the embodiment gap and enables zero-shot guidance from pre-trained video models to manipulate objects of diverse categories-including rigid, articulated, deformable, and granular. Through trajectory optimization or reinforcement learning, Dream2Flow converts reconstructed 3D object flow into executable low-level commands without task-specific demonstrations. Simulation and real-world experiments highlight 3D object flow as a general and scalable interface for adapting video generation models to open-world robotic manipulation. Videos and visualizations are available at https://dream2flow.github.io/.

preprint2022arXiv

Inner Monologue: Embodied Reasoning through Planning with Language Models

Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world.

preprint2022arXiv

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Can world knowledge learned by large language models (LLMs) be used to act in interactive environments? In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. "make breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While prior work focused on learning from explicit step-by-step examples of how to act, we surprisingly find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into mid-level plans without any further training. However, the plans produced naively by LLMs often cannot map precisely to admissible actions. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. Our evaluation in the recent VirtualHome environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards extracting actionable knowledge from language models. Website at https://huangwl18.github.io/language-planner

preprint2020arXiv

One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control

Reinforcement learning is typically concerned with learning control policies tailored to a particular agent. We investigate whether there exists a single global policy that can generalize to control a wide variety of agent morphologies -- ones in which even dimensionality of state and action spaces changes. We propose to express this global policy as a collection of identical modular neural networks, dubbed as Shared Modular Policies (SMP), that correspond to each of the agent's actuators. Every module is only responsible for controlling its corresponding actuator and receives information from only its local sensors. In addition, messages are passed between modules, propagating information between distant modules. We show that a single modular policy can successfully generate locomotion behaviors for several planar agents with different skeletal structures such as monopod hoppers, quadrupeds, bipeds, and generalize to variants not seen during training -- a process that would normally require training and manual hyperparameter tuning for each morphology. We observe that a wide variety of drastically diverse locomotion styles across morphologies as well as centralized coordination emerges via message passing between decentralized modules purely from the reinforcement learning objective. Videos and code at https://huangwl18.github.io/modular-rl/

preprint2020arXiv

Plasma flow evolution in response to resonant magnetic perturbation in a tokamak

Externally applied non-axisymmetric magnetic fields such as error field and resonant magnetic perturbation (RMP) are known to influence the plasma momentum transport and flow evolution through plasma response in a tokamak, whereas the evolution of plasma response itself strongly depends on the plasma flow as well. The nonlinear interaction between the two have been captured in the conventional error field theory with a ``no-slip'' condition, which has been recently extended to allow the ``free-slip'' condition. For comparison with simulations, we solve for the nonlinear plasma response and flow evolution driven by a single-helicity RMP in a tokamak, using the full resistive MHD model in the initial-value code NIMROD. Time evolution of the parallel (to ${\mathbf k}$) flow or ``slip frequency'' profile and its asymptotic steady state obtained from the NIMROD simulations are compared with both conventional and extended nonlinear response theories. Here ${\mathbf k}$ is the wave vector of the propagating island. Good agreement with the extended theory with ``free-slip'' condition has been achieved for the parallel flow profile evolution in response to RMP in all resistive regimes, whereas the difference from the conventional theory with the ``no-slip'' condition tends to diminish as the plasma resistivity approaches zero.

preprint2019arXiv

Analytical model of plasma response to external magnetic perturbation in absence of no-slip condition

Recent simulation and experimental results suggest that the magnetic island and flow on resonant surface often do not satisfy the "no-slip" condition in the steady state. A new theory model on nonlinear plasma response to external magnetic perturbation in absence of no-slip condition is proposed. The model is composed of the equations for the evolution of both width and phase of magnetic island due to forced reconnection driven by the external magnetic perturbation, and the force-balance equation for the plasma flow. When the island width is much less than the resistive layer width, the island growth is governed by the linear Hahm-Kulsrud-Taylor solution in presence of time-dependent plasma flow. In the other regime when the island width is much larger than the resistive layer width, the evolution of both island width and phase can be described using the Rutherford theory. The island solution is used to construct the quasi-linear electromagnetic force, which together with viscous one, contributes to the nonlinear variation in plasma flow. The no-slip condition assumed in the conventional error field theory is not imposed here, where the island oscillation frequency depends on but does not necessarily equal to the plasma flow frequency at the rational surface.

Wenlong Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Inner Monologue: Embodied Reasoning through Planning with Language Models

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control

Plasma flow evolution in response to resonant magnetic perturbation in a tokamak

Analytical model of plasma response to external magnetic perturbation in absence of no-slip condition