Researcher profile

Yuxiao Chen

Yuxiao Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understanding is limited. We introduce Alpamayo-R1 (AR1), a vision-language-action model (VLA) that integrates Chain of Causation reasoning with trajectory planning for complex driving scenarios. Our approach features three key innovations: (1) the Chain of Causation (CoC) dataset, built through a hybrid auto-labeling and human-in-the-loop pipeline producing decision-grounded, causally linked reasoning traces aligned with driving behaviors; (2) a modular VLA architecture combining Cosmos-Reason, a vision-language model pre-trained for Physical AI, with a diffusion-based trajectory decoder that generates dynamically feasible trajectories in real time; (3) a multi-stage training strategy using supervised fine-tuning to elicit reasoning and reinforcement learning (RL) to enforce reasoning-action consistency and optimize reasoning quality. AR1 achieves up to a 12% improvement in planning accuracy on challenging cases compared to a trajectory-only baseline, with a 35% reduction in close encounter rate in closed-loop simulation. RL post-training improves reasoning quality by 45% and reasoning-action consistency by 37%. Model scaling from 0.5B to 7B parameters shows consistent improvements. On-vehicle road tests confirm real-time performance (99 ms latency) and successful urban deployment. By bridging interpretable reasoning with precise control, AR1 demonstrates a practical path towards Level 4 autonomous driving. Model weights are available at https://huggingface.co/nvidia/Alpamayo-R1-10B with inference code at https://github.com/NVlabs/alpamayo.

preprint2026arXiv

Enhancing Rare Codes via Probability-Biased Directed Graph Attention for Long-Tail ICD Coding

Automated international classification of diseases (ICD) coding aims to assign multiple disease codes to clinical documents and plays a critical role in healthcare informatics. However, its performance is hindered by the extreme long-tail distribution of the ICD ontology, where a few common codes dominate while thousands of rare codes have very few examples. To address this issue, we propose a Probability-Biased Directed Graph Attention model (ProBias) that partitions codes into common and rare sets and allows information to flow only from common to rare codes. Edge weights are determined by conditional co-occurrence probabilities, which guide the attention mechanism to enrich rare-code representations with clinically related signals. To provide higher-quality semantic representations as model inputs, we further employ large language models to generate enriched textual descriptions for ICD codes, offering external clinical context that complements statistical co-occurrence signals. Applied to automated ICD coding, our approach significantly improves the representation and prediction of rare codes, achieving state-of-the-art performance on three benchmark datasets. In particular, we observe substantial gains in macro-averaged F1 score, a key metric for long-tail classification.

preprint2026arXiv

Measurement Risk in Supervised Financial NLP: Rubric and Metric Sensitivity on JF-ICR

As LLMs become credible readers of earnings calls, investor-relations Q\&A, guidance, and disclosure language, supervised financial NLP benchmarks increasingly function as decision evidence for model selection and deployment. A hidden assumption is that gold labels make such evidence objective. This assumption breaks down when the benchmark ruler itself is sensitive to rubric wording, metric choice, or aggregation policy. We study this measurement risk on Japanese Financial Implicit-Commitment Recognition (JF-ICR; a pinned 253-item test split x 4 frontier LLMs x 5 rubrics x 3 temperatures x 5 ordinal metrics). Three findings follow. First, rubric wording materially changes model-assigned labels: R2--R3 agreement ranges from 70.0% to 83.4%, with the dominant movement near the +1 / 0 implicit-commitment boundary. This pattern is consistent with a pragmatic-boundary interpretation, but is not a validated linguistic-causality claim because the present rubric variants confound semantics, examples, and verbosity. Second, not every metric remains informative under the JF-ICR class distribution. Within-one accuracy is too easy because near misses receive credit and the majority class dominates; worst-class accuracy is too noisy because the rarest class has only two examples. Exact accuracy, macro-F1, and weighted \k{appa} are therefore the identifiable metrics under our operational rule. Third, ranking claims become more defensible only after this metric-identifiability audit: Bradley--Terry, Borda, and Ranked Pairs agree on the identifiable metric subset, while the full five-metric sweep produces disagreement on the closest pair. The contribution is not a new leaderboard, but a reporting discipline for supervised financial benchmarks whose gold labels exist and whose evaluation ruler still requires governance.

preprint2025arXiv

Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning

Recent reasoning-augmented Vision-Language-Action (VLA) models have improved the interpretability of end-to-end autonomous driving by generating intermediate reasoning traces. Yet these models primarily describe what they perceive and intend to do, rarely questioning whether their planned actions are safe or appropriate. This work introduces Counterfactual VLA (CF-VLA), a self-reflective VLA framework that enables the model to reason about and revise its planned actions before execution. CF-VLA first generates time-segmented meta-actions that summarize driving intent, and then performs counterfactual reasoning conditioned on both the meta-actions and the visual context. This step simulates potential outcomes, identifies unsafe behaviors, and outputs corrected meta-actions that guide the final trajectory generation. To efficiently obtain such self-reflective capabilities, we propose a rollout-filter-label pipeline that mines high-value scenes from a base (non-counterfactual) VLA's rollouts and labels counterfactual reasoning traces for subsequent training rounds. Experiments on large-scale driving datasets show that CF-VLA improves trajectory accuracy by up to 17.6%, enhances safety metrics by 20.5%, and exhibits adaptive thinking: it only enables counterfactual reasoning in challenging scenarios. By transforming reasoning traces from one-shot descriptions to causal self-correction signals, CF-VLA takes a step toward self-reflective autonomous driving agents that learn to think before they act.

preprint2022arXiv

Balancing Efficiency and Comfort in Robot-Assisted Bite Transfer

Robot-assisted feeding in household environments is challenging because it requires robots to generate trajectories that effectively bring food items of varying shapes and sizes into the mouth while making sure the user is comfortable. Our key insight is that in order to solve this challenge, robots must balance the efficiency of feeding a food item with the comfort of each individual bite. We formalize comfort and efficiency as heuristics to incorporate in motion planning. We present an approach based on heuristics-guided bi-directional Rapidly-exploring Random Trees (h-BiRRT) that selects bite transfer trajectories of arbitrary food item geometries and shapes using our developed bite efficiency and comfort heuristics and a learned constraint model. Real-robot evaluations show that optimizing both comfort and efficiency significantly outperforms a fixed-pose based method, and users preferred our method significantly more than that of a method that maximizes only user comfort. Videos and Appendices are found on our website: https://sites.google.com/view/comfortbitetransfer-icra22/home.

preprint2022arXiv

BITS: Bi-level Imitation for Traffic Simulation

Simulation is the key to scaling up validation and verification for robotic systems such as autonomous vehicles. Despite advances in high-fidelity physics and sensor simulation, a critical gap remains in simulating realistic behaviors of road users. This is because, unlike simulating physics and graphics, devising first principle models for human-like behaviors is generally infeasible. In this work, we take a data-driven approach and propose a method that can learn to generate traffic behaviors from real-world driving logs. The method achieves high sample efficiency and behavior diversity by exploiting the bi-level hierarchy of driving behaviors by decoupling the traffic simulation problem into high-level intent inference and low-level driving behavior imitation. The method also incorporates a planning module to obtain stable long-horizon behaviors. We empirically validate our method, named Bi-level Imitation for Traffic Simulation (BITS), with scenarios from two large-scale driving datasets and show that BITS achieves balanced traffic simulation performance in realism, diversity, and long-horizon stability. We also explore ways to evaluate behavior realism and introduce a suite of evaluation metrics for traffic simulation. Finally, as part of our core contributions, we develop and open source a software tool that unifies data formats across different driving datasets and converts scenes from existing datasets into interactive simulation environments. For additional information and videos, see https://sites.google.com/view/nvr-bits2022/home

preprint2022arXiv

Distributionally Robust Decision Making Leveraging Conditional Distributions

Distributionally robust optimization (DRO) is a powerful tool for decision making under uncertainty. It is particularly appealing because of its ability to leverage existing data. However, many practical problems call for decision-making with some auxiliary information, and DRO in the context of conditional distribution is not straightforward. We propose a conditional kernel distributionally robust optimization (CKDRO) method that enables robust decision making under conditional distributions through kernel DRO and the conditional mean operator in the reproducing kernel Hilbert space (RKHS). In particular, we consider problems where there is a correlation between the unknown variable y and an auxiliary observable variable x. Given past data of the two variables and a queried auxiliary variable, CKDRO represents the conditional distribution P(y|x) as the conditional mean operator in the RKHS space and quantifies the ambiguity set in the RKHS as well, which depends on the size of the dataset as well as the query point. To justify the use of RKHS, we demonstrate that the ambiguity set defined in RKHS can be viewed as a ball under a metric that is similar to the Wasserstein metric. The DRO is then dualized and solved via a finite dimensional convex program. The proposed CKDRO approach is applied to a generation scheduling problem and shows that the result of CKDRO is superior to common benchmarks in terms of quality and robustness.

preprint2022arXiv

Interaction-Dynamics-Aware Perception Zones for Obstacle Detection Safety Evaluation

To enable safe autonomous vehicle (AV) operations, it is critical that an AV's obstacle detection module can reliably detect obstacles that pose a safety threat (i.e., are safety-critical). It is therefore desirable that the evaluation metric for the perception system captures the safety-criticality of objects. Unfortunately, existing perception evaluation metrics tend to make strong assumptions about the objects and ignore the dynamic interactions between agents, and thus do not accurately capture the safety risks in reality. To address these shortcomings, we introduce an interaction-dynamics-aware obstacle detection evaluation metric by accounting for closed-loop dynamic interactions between an ego vehicle and obstacles in the scene. By borrowing existing theory from optimal control theory, namely Hamilton-Jacobi reachability, we present a computationally tractable method for constructing a ``safety zone'': a region in state space that defines where safety-critical obstacles lie for the purpose of defining safety metrics. Our proposed safety zone is mathematically complete, and can be easily computed to reflect a variety of safety requirements. Using an off-the-shelf detection algorithm from the nuScenes detection challenge leaderboard, we demonstrate that our approach is computationally lightweight, and can better capture safety-critical perception errors than a baseline approach.

preprint2022arXiv

Onboard Safety Guarantees for Racing Drones: High-speed Geofencing with Control Barrier Functions

This paper details the theory and implementation behind practically ensuring safety of remotely piloted racing drones. We demonstrate robust and practical safety guarantees on a 7" racing drone at speeds exceeding 100 km/h, utilizing only online computations on a 10 gram micro-controller. To achieve this goal, we utilize the framework of control barrier functions (CBFs) which give guaranteed safety encoded as forward set invariance. To make this methodology practically applicable, we present an implicitly defined CBF which leverages backup controllers to enable gradient-free evaluations that ensure safety. The method applied to hardware results in smooth, minimally conservative alterations of the pilots' desired inputs, enabling them to push the limits of their drone without fear of crashing. Moreover, the method works in conjunction with the preexisting flight controller, resulting in unaltered flight when there are no nearby safety risks. Additional benefits include safety and stability of the drone when losing line-of-sight or in the event of radio failure.

preprint2022arXiv

Robust Disturbance Rejection for Robotic Bipedal Walking: System-Level-Synthesis with Step-to-step Dynamics Approximation

We present a stepping stabilization control that addresses external push disturbances on bipedal walking robots. The stepping control is synthesized based on the step-to-step (S2S) dynamics of the robot that is controlled to have an approximately constant center of mass (COM) height. We first learn a linear S2S dynamics with bounded model discrepancy from the undisturbed walking behaviors of the robot, where the walking step size is taken as the control input to the S2S dynamics. External pushes are then considered as disturbances to the learned S2S (L-S2S) dynamics. We then apply the system-level-synthesis (SLS) approach on the disturbed L-S2S dynamics to robustly stabilize the robot to the desired walking while satisfying the kinematic constraints of the robot. We successfully realize the proposed approach on the walking of the bipedal robot AMBER and Cassie subject to push disturbances, showing that the approach is general, effective, and computationally-efficient for robust disturbance rejection.

preprint2022arXiv

ScePT: Scene-consistent, Policy-based Trajectory Predictions for Planning

Trajectory prediction is a critical functionality of autonomous systems that share environments with uncontrolled agents, one prominent example being self-driving vehicles. Currently, most prediction methods do not enforce scene consistency, i.e., there are a substantial amount of self-collisions between predicted trajectories of different agents in the scene. Moreover, many approaches generate individual trajectory predictions per agent instead of joint trajectory predictions of the whole scene, which makes downstream planning difficult. In this work, we present ScePT, a policy planning-based trajectory prediction model that generates accurate, scene-consistent trajectory predictions suitable for autonomous system motion planning. It explicitly enforces scene consistency and learns an agent interaction policy that can be used for conditional prediction. Experiments on multiple real-world pedestrians and autonomous vehicle datasets show that ScePT} matches current state-of-the-art prediction accuracy with significantly improved scene consistency. We also demonstrate ScePT's ability to work with a downstream contingency planner.

preprint2020arXiv

Control Barrier Functions for Sampled-Data Systems with Input Delays

This paper considers the general problem of transitioning theoretically safe controllers to hardware. Concretely, we explore the application of control barrier functions (CBFs) to sampled-data systems: systems that evolve continuously but whose control actions are computed in discrete time-steps. While this model formulation is less commonly used than its continuous counterpart, it more accurately models the reality of most control systems in practice, making the safety guarantees more impactful. In this context, we prove robust set invariance with respect to zero-order hold controllers as well as state uncertainty, without the need to explicitly compute any control invariant sets. It is then shown that this formulation can be exploited to address input delays in this system, with the result being CBF constraints that are affine in the input. The results are demonstrated in a high-fidelity simulation of an unstable Segway robotic system in real-time.

preprint2020arXiv

Counter-example Guided Learning of Bounds on Environment Behavior

There is a growing interest in building autonomous systems that interact with complex environments. The difficulty associated with obtaining an accurate model for such environments poses a challenge to the task of assessing and guaranteeing the system's performance. We present a data-driven solution that allows for a system to be evaluated for specification conformance without an accurate model of the environment. Our approach involves learning a conservative reactive bound of the environment's behavior using data and specification of the system's desired behavior. First, the approach begins by learning a conservative reactive bound on the environment's actions that captures its possible behaviors with high probability. This bound is then used to assist verification, and if the verification fails under this bound, the algorithm returns counter-examples to show how failure occurs and then uses these to refine the bound. We demonstrate the applicability of the approach through two case-studies: i) verifying controllers for a toy multi-robot system, and ii) verifying an instance of human-robot interaction during a lane-change maneuver given real-world human driving data.

preprint2020arXiv

Data-driven Characterization of Human Interaction for Model-based Control of Powered Prostheses

This paper proposes a data-driven method for powered prosthesis control that achieves stable walking without the need for additional sensors on the human. The key idea is to extract the nominal gait and the human interaction information from motion capture data, and reconstruct the walking behavior with a dynamic model of the human-prosthesis system. The walking behavior of a human wearing a powered prosthesis is obtained through motion capture, which yields the limb and joint trajectories. Then a nominal trajectory is obtained by solving a gait optimization problem designed to reconstruct the walking behavior observed by motion capture. Moreover, the interaction force profiles between the human and the prosthesis are recovered by simulating the model following the recorded gaits, which are then used to construct a force tube that covers all the interaction force profiles. Finally, a robust Control Lyapunov Function (CLF) Quadratic Programming (QP) controller is designed to guarantee the convergence to the nominal trajectory under all possible interaction forces within the tube. Simulation results show this controller's improved tracking performance with a perturbed force profile compared to other control methods with less model information.

preprint2020arXiv

Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge

Cross-modal knowledge distillation deals with transferring knowledge from a model trained with superior modalities (Teacher) to another model trained with weak modalities (Student). Existing approaches require paired training examples exist in both modalities. However, accessing the data from superior modalities may not always be feasible. For example, in the case of 3D hand pose estimation, depth maps, point clouds, or stereo images usually capture better hand structures than RGB images, but most of them are expensive to be collected. In this paper, we propose a novel scheme to train the Student in a Target dataset where the Teacher is unavailable. Our key idea is to generalize the distilled cross-modal knowledge learned from a Source dataset, which contains paired examples from both modalities, to the Target dataset by modeling knowledge as priors on parameters of the Student. We name our method "Cross-Modal Knowledge Generalization" and demonstrate that our scheme results in competitive performance for 3D hand pose estimation on standard benchmark datasets.

preprint2020arXiv

Safety-Critical Control Synthesis for network systems with Control Barrier Functions and Assume-Guarantee Contracts

This paper aims at the safety-critical control synthesis of network systems such that the satisfaction of the safety constraints can be guaranteed. To handle the large state dimension of such systems, an assume-guarantee contract is used to break the large synthesis problem into smaller subproblems. Parameterized signal temporal logic (pSTL) is used to formally describe the behaviors of the subsystems, which we use as the template for the contract. We show that robust control invariant sets (RCIs) for the subsystems can be composed to form a robust control invariant set for the whole network system under a valid assume-guarantee contract. An epigraph algorithm is proposed to solve for a contract that is valid, ---an approach that has linear complexity for sparse networks, which leads to a robust control invariant set for the whole network system. Implemented with control barrier function (CBF), the state of each subsystem is guaranteed to stay within the safe set. Furthermore, we propose a contingency tube Model Predictive Control approach based on the RCI, which is capable of handling severe contingencies, including topology changes of the network. A power grid example is used to demonstrate the proposed method. The simulation result includes both set point control and contingency recovery, and the safety constraint is always satisfied.