Researcher profile

Kevin Yang

Kevin Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

We introduce Lucid-XR, a generative data engine for creating diverse and realistic-looking multi-modal data to train real-world robotic systems. At the core of Lucid-XR is vuer, a web-based physics simulation environment that runs directly on the XR headset, enabling internet-scale access to immersive, latency-free virtual interactions without requiring specialized equipment. The complete system integrates on-device physics simulation with human-to-robot pose retargeting. Data collected is further amplified by a physics-guided video generation pipeline steerable via natural language specifications. We demonstrate zero-shot transfer of robot visual policies to unseen, cluttered, and badly lit evaluation environments, after training entirely on Lucid-XR's synthetic data. We include examples across dexterous manipulation tasks that involve soft materials, loosely bound particles, and rigid body contact. Project website: https://lucidxr.github.io

preprint2026arXiv

WildSci: Advancing Scientific Reasoning from In-the-Wild Literature

Recent progress in large language model (LLM) reasoning has focused on domains like mathematics and coding, where abundant high-quality data and objective evaluation metrics are readily available. In contrast, progress in LLM reasoning models remains limited in scientific domains such as medicine and materials science due to limited dataset coverage and the inherent complexity of open-ended scientific questions. To address these challenges, we introduce WildSci, a new dataset of domain-specific science questions automatically synthesized from peer-reviewed literature, covering 9 scientific disciplines and 26 subdomains. By framing complex scientific reasoning tasks in a multiple-choice format, we enable scalable training with well-defined reward signals. We further apply reinforcement learning to finetune models on these data and analyze the resulting training dynamics, including domain-specific performance changes, response behaviors, and generalization trends. Experiments on a suite of scientific benchmarks demonstrate the effectiveness of our dataset and approach. We release WildSci to enable scalable and sustainable research in scientific reasoning, available at https://huggingface.co/datasets/JustinTX/WildSci.

preprint2023arXiv

Non-Stationary KPZ equation from ASEP with slow bonds

We prove the height functions for a class of non-integrable and non-stationary particle systems converge to the KPZ equation, thereby making progress on the universality of the KPZ equation. The models herein are ASEP [4] with a mesoscopic family of slow bonds, thus we partially extend [16] to non-stationary models and add to the almost empty set of non-integrable, non-stationary interacting particle systems for which universality is established. To do this, we develop further the strategy of [41, 42] introduce a method to establish a novel principle that builds upon the classical hydrodynamic limits of [30] and that we call local hydrodynamics.

preprint2022arXiv

Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation

We introduce a novel setup for low-resource task-oriented semantic parsing which incorporates several constraints that may arise in real-world scenarios: (1) lack of similar datasets/models from a related domain, (2) inability to sample useful logical forms directly from a grammar, and (3) privacy requirements for unlabeled natural utterances. Our goal is to improve a low-resource semantic parser using utterances collected through user interactions. In this highly challenging but realistic setting, we investigate data augmentation approaches involving generating a set of structured canonical utterances corresponding to logical forms, before simulating corresponding natural language and filtering the resulting pairs. We find that such approaches are effective despite our restrictive setup: in a low-resource setting on the complex SMCalFlow calendaring dataset (Andreas et al., 2020), we observe 33% relative improvement over a non-data-augmented baseline in top-1 match.

preprint2022arXiv

Automated Crossword Solving

We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles. Our system works by generating answer candidates for each crossword clue using neural question answering models and then combines loopy belief propagation with local search to find full puzzle solutions. Compared to existing approaches, our system improves exact puzzle accuracy from 71% to 82% on crosswords from The New York Times and obtains 99.9% letter accuracy on themeless puzzles. Additionally, in 2021, a hybrid of our system and the existing Dr.Fill system outperformed all human competitors for the first time at the American Crossword Puzzle Tournament. To facilitate research on question answering and crossword solving, we analyze our system's remaining errors and release a dataset of over six million question-answer pairs.

preprint2022arXiv

KPZ Equation from non-simple variations on open ASEP

This paper has two main goals. The first is universality of the KPZ equation for fluctuations of dynamic interfaces associated to interacting particle systems in the presence of open boundary. We consider generalizations on the open-ASEP from [Corwin-Shen '16, Parekh '17] but admitting non-simple interactions both at the boundary and within the bulk of the particle system. These variations on open-ASEP are not integrable models, similar to the long-range variations on ASEP considered in [Dembo-Tsai '15, Y '20]. We establish the KPZ equation with the appropriate Robin boundary conditions as scaling limits for height function fluctuations associated to these non-integrable models, providing further evidence for the aforementioned universality of the KPZ equation. We specialize to compact domains and address non-compact domains in a second paper. The procedure that we employ to establish the aforementioned theorem is the second main point of this paper. Invariant measures in the presence of boundary interactions generally lack reasonable descriptions. Thus, global analyses done through the invariant measure, including the theory of energy solutions in [Goncalves-Jara '14, Goncalves-Jara '17, Goncalves-Jara-Sethuraman '15], is immediately obstructed. To circumvent this obstruction, we appeal to the almost entirely local nature of the analysis in [Y '20].

preprint2022arXiv

Learning Space Partitions for Path Planning

Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function. Popular approaches like CEM and CMA-ES greedily focus on promising regions of the search space and may get trapped in local maxima. DOO and VOOT balance exploration and exploitation, but use space partitioning strategies independent of the reward function to be optimized. Recently, LaMCTS empirically learns to partition the search space in a reward-sensitive manner for black-box optimization. In this paper, we develop a novel formal regret analysis for when and why such an adaptive region partitioning scheme works. We also propose a new path planning method LaP3 which improves the function value estimation within each sub-region, and uses a latent representation of the search space. Empirically, LaP3 outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima, and shows benefits when plugged into the planning components of model-based RL such as PETS. These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 39% on average across 9 tasks, and in molecular design by up to 0.4 on properties on a 0-1 scale. Code is available at https://github.com/yangkevin2/neurips2021-lap3.

preprint2022arXiv

Stochastic Burgers Equation via Energy Solutions from Non-Stationary Particle Systems

We prove that the stochastic Burgers equation, which is related to the Kardar-Parisi-Zhang/KPZ equation via weak derivative, is a "critical" scaling limit for density fluctuations for a family of non-integrable and non-stationary interacting particle systems. The models we consider cannot be linearized by a microscopic Cole-Hopf transform or studied directly by the existing energy solution theory of Goncalves-Jara '14. We develop a novel method based on comparison to stationary models, a technique that has not yet been applied to universality of the KPZ equation and nontrivially expands the set of models for which universality is confirmed. We also study crossover fluctuations and prove, for the first time, the full transition and phase diagram from Gaussian to KPZ fluctuations for non-stationary interacting particle systems, which has not even been done yet for integrable models. We emphasize the method developed herein applies to a general class of models/particle systems, but we restrict to a class of zero-range systems whose non-stationary versions have received widespread interest but had not been treated in the context of KPZ until now as well as class of non-simple exclusion processes that we comment on.

preprint2020arXiv

Uncertainty Quantification Using Neural Networks for Molecular Property Prediction

Uncertainty quantification (UQ) is an important component of molecular property prediction, particularly for drug discovery applications where model predictions direct experimental design and where unanticipated imprecision wastes valuable time and resources. The need for UQ is especially acute for neural models, which are becoming increasingly standard yet are challenging to interpret. While several approaches to UQ have been proposed in the literature, there is no clear consensus on the comparative performance of these models. In this paper, we study this question in the context of regression tasks. We systematically evaluate several methods on five benchmark datasets using multiple complementary performance metrics. Our experiments show that none of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple datasets. While we believe these results show that existing UQ methods are not sufficient for all common use-cases and demonstrate the benefits of further research, we conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.