Source author record

Kevin Yang

Kevin Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language math.PR Machine Learning math-ph math.MP physics.gen-ph Robotics Computer Vision math.NT Quantitative Methods

Catalog footprint

What is connected

12works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

We introduce Lucid-XR, a generative data engine for creating diverse and realistic-looking multi-modal data to train real-world robotic systems. At the core of Lucid-XR is vuer, a web-based physics simulation environment that runs directly on the XR headset, enabling internet-scale access to immersive, latency-free virtual interactions without requiring specialized equipment. The complete system integrates on-device physics simulation with human-to-robot pose retargeting. Data collected is further amplified by a physics-guided video generation pipeline steerable via natural language specifications. We demonstrate zero-shot transfer of robot visual policies to unseen, cluttered, and badly lit evaluation environments, after training entirely on Lucid-XR's synthetic data. We include examples across dexterous manipulation tasks that involve soft materials, loosely bound particles, and rigid body contact. Project website: https://lucidxr.github.io

preprint2026arXiv

WildSci: Advancing Scientific Reasoning from In-the-Wild Literature

Recent progress in large language model (LLM) reasoning has focused on domains like mathematics and coding, where abundant high-quality data and objective evaluation metrics are readily available. In contrast, progress in LLM reasoning models remains limited in scientific domains such as medicine and materials science due to limited dataset coverage and the inherent complexity of open-ended scientific questions. To address these challenges, we introduce WildSci, a new dataset of domain-specific science questions automatically synthesized from peer-reviewed literature, covering 9 scientific disciplines and 26 subdomains. By framing complex scientific reasoning tasks in a multiple-choice format, we enable scalable training with well-defined reward signals. We further apply reinforcement learning to finetune models on these data and analyze the resulting training dynamics, including domain-specific performance changes, response behaviors, and generalization trends. Experiments on a suite of scientific benchmarks demonstrate the effectiveness of our dataset and approach. We release WildSci to enable scalable and sustainable research in scientific reasoning, available at https://huggingface.co/datasets/JustinTX/WildSci.

preprint2023arXiv

Non-Stationary KPZ equation from ASEP with slow bonds

We prove the height functions for a class of non-integrable and non-stationary particle systems converge to the KPZ equation, thereby making progress on the universality of the KPZ equation. The models herein are ASEP [4] with a mesoscopic family of slow bonds, thus we partially extend [16] to non-stationary models and add to the almost empty set of non-integrable, non-stationary interacting particle systems for which universality is established. To do this, we develop further the strategy of [41, 42] introduce a method to establish a novel principle that builds upon the classical hydrodynamic limits of [30] and that we call local hydrodynamics.

preprint2022arXiv

Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation

We introduce a novel setup for low-resource task-oriented semantic parsing which incorporates several constraints that may arise in real-world scenarios: (1) lack of similar datasets/models from a related domain, (2) inability to sample useful logical forms directly from a grammar, and (3) privacy requirements for unlabeled natural utterances. Our goal is to improve a low-resource semantic parser using utterances collected through user interactions. In this highly challenging but realistic setting, we investigate data augmentation approaches involving generating a set of structured canonical utterances corresponding to logical forms, before simulating corresponding natural language and filtering the resulting pairs. We find that such approaches are effective despite our restrictive setup: in a low-resource setting on the complex SMCalFlow calendaring dataset (Andreas et al., 2020), we observe 33% relative improvement over a non-data-augmented baseline in top-1 match.

preprint2022arXiv

Automated Crossword Solving

We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles. Our system works by generating answer candidates for each crossword clue using neural question answering models and then combines loopy belief propagation with local search to find full puzzle solutions. Compared to existing approaches, our system improves exact puzzle accuracy from 71% to 82% on crosswords from The New York Times and obtains 99.9% letter accuracy on themeless puzzles. Additionally, in 2021, a hybrid of our system and the existing Dr.Fill system outperformed all human competitors for the first time at the American Crossword Puzzle Tournament. To facilitate research on question answering and crossword solving, we analyze our system's remaining errors and release a dataset of over six million question-answer pairs.

preprint2022arXiv

KPZ Equation from non-simple variations on open ASEP

This paper has two main goals. The first is universality of the KPZ equation for fluctuations of dynamic interfaces associated to interacting particle systems in the presence of open boundary. We consider generalizations on the open-ASEP from [Corwin-Shen '16, Parekh '17] but admitting non-simple interactions both at the boundary and within the bulk of the particle system. These variations on open-ASEP are not integrable models, similar to the long-range variations on ASEP considered in [Dembo-Tsai '15, Y '20]. We establish the KPZ equation with the appropriate Robin boundary conditions as scaling limits for height function fluctuations associated to these non-integrable models, providing further evidence for the aforementioned universality of the KPZ equation. We specialize to compact domains and address non-compact domains in a second paper. The procedure that we employ to establish the aforementioned theorem is the second main point of this paper. Invariant measures in the presence of boundary interactions generally lack reasonable descriptions. Thus, global analyses done through the invariant measure, including the theory of energy solutions in [Goncalves-Jara '14, Goncalves-Jara '17, Goncalves-Jara-Sethuraman '15], is immediately obstructed. To circumvent this obstruction, we appeal to the almost entirely local nature of the analysis in [Y '20].

preprint2022arXiv

Learning Space Partitions for Path Planning

Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function. Popular approaches like CEM and CMA-ES greedily focus on promising regions of the search space and may get trapped in local maxima. DOO and VOOT balance exploration and exploitation, but use space partitioning strategies independent of the reward function to be optimized. Recently, LaMCTS empirically learns to partition the search space in a reward-sensitive manner for black-box optimization. In this paper, we develop a novel formal regret analysis for when and why such an adaptive region partitioning scheme works. We also propose a new path planning method LaP3 which improves the function value estimation within each sub-region, and uses a latent representation of the search space. Empirically, LaP3 outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima, and shows benefits when plugged into the planning components of model-based RL such as PETS. These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 39% on average across 9 tasks, and in molecular design by up to 0.4 on properties on a 0-1 scale. Code is available at https://github.com/yangkevin2/neurips2021-lap3.

preprint2022arXiv

Stochastic Burgers Equation via Energy Solutions from Non-Stationary Particle Systems

We prove that the stochastic Burgers equation, which is related to the Kardar-Parisi-Zhang/KPZ equation via weak derivative, is a "critical" scaling limit for density fluctuations for a family of non-integrable and non-stationary interacting particle systems. The models we consider cannot be linearized by a microscopic Cole-Hopf transform or studied directly by the existing energy solution theory of Goncalves-Jara '14. We develop a novel method based on comparison to stationary models, a technique that has not yet been applied to universality of the KPZ equation and nontrivially expands the set of models for which universality is confirmed. We also study crossover fluctuations and prove, for the first time, the full transition and phase diagram from Gaussian to KPZ fluctuations for non-stationary interacting particle systems, which has not even been done yet for integrable models. We emphasize the method developed herein applies to a general class of models/particle systems, but we restrict to a class of zero-range systems whose non-stationary versions have received widespread interest but had not been treated in the context of KPZ until now as well as class of non-simple exclusion processes that we comment on.

preprint2020arXiv

Uncertainty Quantification Using Neural Networks for Molecular Property Prediction

Uncertainty quantification (UQ) is an important component of molecular property prediction, particularly for drug discovery applications where model predictions direct experimental design and where unanticipated imprecision wastes valuable time and resources. The need for UQ is especially acute for neural models, which are becoming increasingly standard yet are challenging to interpret. While several approaches to UQ have been proposed in the literature, there is no clear consensus on the comparative performance of these models. In this paper, we study this question in the context of regression tasks. We systematically evaluate several methods on five benchmark datasets using multiple complementary performance metrics. Our experiments show that none of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple datasets. While we believe these results show that existing UQ methods are not sufficient for all common use-cases and demonstrate the benefits of further research, we conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.

preprint2016arXiv

Some Results in the Theory of Low-lying Zeros: Determining the 1-level density, identifying the group symmetry and the arithmetic of moments of Satake parameters

While Random Matrix Theory has successfully modeled many quantities of families of L-functions, it frequently cannot see the family's arithmetic. In some situations this requires an extended theory that inserts arithmetic factors depending on the family, while in other cases these factors result in contributions which vanish in the limit, and are thus not detected. We review the general theory associated to one of the most important statistics, the n-level density of zeros near the central point. According to the Katz-Sarnak density conjecture, to each family of L-functions there is a corresponding symmetry group such that the behavior of zeros near the central point as the conductors tend to infinity agrees with the behavior of eigenvalues near 1 as the matrix size tends to infinity. We show how these calculations are done, emphasizing the techniques, methods and obstructions to improving the results, by considering in full detail a family of Dirichlet characters. We then describe how we may associate a symmetry constant to each family, and how to determine the symmetry group of a compound family in terms of the symmetries of the constituents. These calculations explain the remarkable universality of behavior, where the main terms are independent of the arithmetic (only the first two moments of the Satake parameters contribute in the limit; similar to the Central Limit Theorem, the higher moments are only felt in the rate of convergence). We end by exploring lower order terms in families of elliptic curves. We present evidence supporting a conjecture that the average second moment in one-parameter families without complex multiplication has, when appropriately viewed, a negative bias, and end with a discussion of the consequences of this bias on the distribution of low-lying zeros, in particular relations between such a bias and the observed excess rank in families.

preprint2016arXiv

The mutual energy current interpretation for quantum mechanics

Quantum physics has the probability interpretation. From the knowledge of light, we know that wave is always spread out, and hence the electron wave should also spread out. That means the electron wave beam should like the light wave beam become diverged from the source. When the electron is received by an atom we thought the wave collapse. The place to collapse is depends on the probability calculated from the square of absolute value of the wave function. The recent new discovery tell us that the light is not just wave, it is a combination of waves, retarded potential and advanced potential. These two potentials together produce the mutual energy current or referred as M-current. Another light energy current is P-current related to Poynting vector. We found P-current doesn't carry any energy for light. The contribution of P-current to energy transfer can be omitted. The light energy is transferred only by M-current. The beam of M-current doesn't like the beam of P-current which is diverged from the source, instead, the M-current beam first diverges from a source and then converged to a sink. Since the M-current at the place to be received is localized at one electron, the concept of wave function collapse is needless. The probability results of light is because that we have use P-current to roughly calculate the M-current. We thought if Schrödinger knew today's light theory, he would for sure also build his wave theory for quantum mechanics similar to the new light theory with M-current. Hence we claim that the M-current theory is not only suitable to the light but also can be applied to the quantum physics. This means all particles are M-current. The M-current is composed of not only the retarded wave, but also the advanced wave. M-current is an inner product of a retarded and an advanced waves.

preprint2015arXiv

The modified Poynting theorem and the concept of mutual energy

The goal of this article is to derive the reciprocity theorem, mutual energy theorem from Poynting theorem instead of from Maxwell equation. The Poynting theorem is generalized to the modified Poynting theorem. In the modified Poynting theorem the electromagnetic field is superimposition of different electromagnetic fields including the retarded potential and advanced potential, time-offset field. The media epsilon (permittivity) and mu (permeability) can also be different in the different fields. The concept of mutual energy is introduced which is the difference between the total energy and self-energy. Mixed mutual energy theorem is derived. We derive the mutual energy from Fourier domain. We obtain the time-reversed mutual energy theorem and the mutual energy theorem. Then we derive the mutual energy theorem in time-domain. The instantaneous modified mutual energy theorem is derived. Applying time-offset transform and time integral to the instantaneous modified mutual energy theorem, the time-correlation modified mutual energy theorem is obtained. Assume there are two electromagnetic fields one is retarded potential and one is advanced potential, the convolution reciprocity theorem can be derived. Corresponding to the modified time-correlation mutual energy theorem and the time-convolution reciprocity theorem in Fourier domain, there is the modified mutual energy theorem and the Lorentz reciprocity theorem. Hence all mutual energy theorem and the reciprocity theorems are put in one frame of the concept of the mutual energy. 3 new Complementary theorems are derived. The inner product is introduced for two different electromagnetic fields in both time domain and Fourier domain for the application of the wave expansion.

Kevin Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

WildSci: Advancing Scientific Reasoning from In-the-Wild Literature

Non-Stationary KPZ equation from ASEP with slow bonds

Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation

Automated Crossword Solving

KPZ Equation from non-simple variations on open ASEP

Learning Space Partitions for Path Planning

Stochastic Burgers Equation via Energy Solutions from Non-Stationary Particle Systems

Uncertainty Quantification Using Neural Networks for Molecular Property Prediction

Some Results in the Theory of Low-lying Zeros: Determining the 1-level density, identifying the group symmetry and the arithmetic of moments of Satake parameters

The mutual energy current interpretation for quantum mechanics

The modified Poynting theorem and the concept of mutual energy