Source author record

Yanping Zhao

Yanping Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence physics.ins-det physics.plasm-ph Robotics

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ACSAC: Adaptive Chunk Size Actor-Critic with Causal Transformer Q-Network

Long-horizon, sparse-reward tasks pose a fundamental challenge for reinforcement learning, since single-step TD learning suffers from bootstrapping error accumulation across successive Bellman updates. Actor-critic methods with action chunking address this by operating over temporally extended actions, which reduce the effective horizon, enable fast value backups, and support temporally consistent exploration. However, existing methods rely on a fixed chunk size and therefore cannot adaptively balance reactivity against temporal consistency. A large fixed chunk size reduces responsiveness to new observations, while a small one produces incoherent motions, forcing task-specific tuning of the chunk size. To address this limitation, we propose Adaptive Chunk Size Actor-Critic (ACSAC). ACSAC leverages a causal Transformer critic to evaluate expected returns for action chunks of different sizes. At each chunk boundary, it adaptively selects the chunk size that maximizes the expected return, supporting flexible, state-dependent chunk sizes without task-specific tuning. We prove that the ACSAC Bellman operator is a contraction whose unique fixed point is the action-value function of the adaptive policy. Experiments on OGBench demonstrate that ACSAC achieves state-of-the-art performance on long-horizon, sparse-reward manipulation tasks across both offline RL and offline-to-online RL settings.

preprint2026arXiv

Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning

Offline reinforcement learning (RL) faces a critical challenge of overestimating the value of out-of-distribution (OOD) actions. Existing methods mitigate this issue by penalizing unseen samples, yet they fail to accurately identify OOD actions and may suppress beneficial exploration beyond the behavioral support. Although several methods have been proposed to differentiate OOD samples with distinct properties, they typically rely on restrictive assumptions about the data distribution and remain limited in discrimination ability. To address this problem, we propose DOSER (Diffusion-based OOD Detection and Selective Regularization), a novel framework that goes beyond uniform penalization. DOSER trains two diffusion models to capture the behavior policy and state distribution, using single-step denoising reconstruction error as a reliable OOD indicator. During policy optimization, it further distinguishes between beneficial and detrimental OOD actions by evaluating predicted transitions, selectively suppressing risky actions while encouraging exploration of high-potential ones. Theoretically, we prove that DOSER is a $γ$-contraction and therefore admits a unique fixed point with bounded value estimates. We further provide an asymptotic performance guarantee relative to the optimal policy under model approximation and OOD detection errors. Across extensive offline RL benchmarks, DOSER consistently attains superior performance to prior methods, especially on suboptimal datasets.

preprint2016arXiv

Ion cyclotron resonance heating systems upgrade toward high power and CW operations in WEST

The design of the WEST (Tungsten-W Environment in Steady-state Tokamak) Ion cyclotron resonance heating antennas is based on a previously tested conjugate-T Resonant Double Loops prototype equipped with internal vacuum matching capacitors. The design and construction of three new WEST ICRH antennas are being carried out in close collaboration with ASIPP, within the framework of the Associated Laboratory in the fusion field between IRFM and ASIPP. The coupling performance to the plasma and the load-tolerance have been improved, while adding Continuous Wave operation capability by introducing water cooling in the entire antenna. On the generator side, the operation class of the high power tetrodes is changed from AB to B in order to allow high power operation (up to 3 MW per antenna) under higher VSWR (up to 2:1). Reliability of the generators is also improved by increasing the cavity breakdown voltage. The control and data acquisition system is also upgraded in order to resolve and react on fast events, such as ELMs. A new optical arc detection system comes in reinforcement of the V r /V f and SHAD systems.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint