Source author record

Yuxuan Chen

Yuxuan Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Computation and Language Artificial Intelligence cond-mat.mes-hall eess.IV Information Theory Machine Learning math-ph math.IT math.MP Multimedia Robotics

Catalog footprint

What is connected

13works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Dynamic Execution Commitment of Vision-Language-Action Models

Vision-Language-Action (VLA) models predominantly adopt action chunking, i.e., predicting and committing to a short horizon of consecutive low-level actions in a single forward pass, to amortize the inference cost of large-scale backbones and reduce per-step latency. However, committing these multi-step predictions to real-world execution requires balancing success rate against inference efficiency, a decision typically governed by fixed execution horizons tuned per task. Such heuristics ignore the state-dependent nature of predictive reliability, leading to brittle performance in dynamic or out-of-distribution settings. In this paper, we introduce A3, an Adaptive Action Acceptance mechanism that reframes dynamic execution commitment as a self-speculative prefix verification problem. A3 first computes a trajectory-wise consensus score of actions via group sampling, then selects a representative draft and prioritizes downstream verification. Specifically, it enforces: (1) consensus-ordered conditional invariance, which validates low-consensus actions by judging whether they remain consistent when re-decoded conditioned on high-consensus actions; and (2) prefix-closed sequential consistency, which guarantees physical rollout integrity by accepting only the longest continuous sequence of verified actions starting from the beginning. Consequently, the execution horizon emerges as the longest verifiable prefix satisfying both internal model logic and sequential execution constraints. Experiments across diverse VLA models and benchmarks demonstrate that A3 eliminates the need for manual horizon tuning while achieving a superior trade-off between execution robustness and inference throughput.

preprint2026arXiv

Engineering Favorable Propagation: Near-Field IRS Deployment for Spatial Multiplexing

In intelligent reflecting surface IRS assisted multiple input multiple output MIMO systems, a strong line of sight LoS link is required to compensate for the severe cascaded path loss. However, such a link renders the effective channel highly rank deficient and fundamentally limits spatial multiplexing. To overcome this limitation, this paper leverages the large aperture of sparse arrays to harness near field spherical wavefronts, and establishes a deterministic deployment criterion that strategically positions the IRS in the near field of a base station BS. This placement exploits the spherical wavefronts of the BS IRS link to engineer decorrelated channels, thereby fundamentally overcoming the rank deficiency issue in far field cascaded channels. Based on a physical channel model for the sparse BS array and the IRS, we characterize the rank properties and inter user correlation of the cascaded BS IRS user channel. We further derive a closed form favorable propagation metric that reveals how the sparse array geometry and the IRS position can be tuned to reduce inter user channel correlation. The resulting geometry driven deployment rule provides a simple guideline for creating a favorable propagation environment with enhanced effective degrees of freedom. The favorable channel statistics induced by our deployment criterion enable a low complexity maximum ratio transmission MRT precoding scheme. This serves as the foundation for an efficient algorithm that jointly optimizes the IRS phase shifts and power allocation based solely on long term statistical channel state information CSI. Simulation results validate the effectiveness of our deployment criterion and demonstrate that our optimization framework achieves significant performance gains over benchmark schemes.

preprint2026arXiv

External Validation of Deep Learning Models for BI-RADS Breast Density Prediction from Ultrasound Images

We externally validated three deep learning models (DenseNet121, ViT-B/32, and ResNet50) for predicting mammographic breast density from breast ultrasound exams on an independent cohort. The external validation set comprised 2,000 ultrasound exams, including 500 cancer cases defined by an initial negative exam (BI-RADS 1 or 2) followed by a cancer diagnosis within 6 months to 10 years, and 1,500 negative controls matched by manufacturer and study year. Performance was measured using patient-level AUROC across four density categories: A (fatty), B (scattered), C (heterogeneous), and D (extremely dense). As a downstream assessment, we also evaluated 10-year risk prediction by incorporating age and AI-derived density into the Tyrer-Cuzick model and comparing performance against a reference model using age and mammography-reported density. All three models performed best in extremely dense breasts (AUROC 0.868-0.899), with strong performance in fatty (0.814-0.838) and scattered density (0.764-0.799), and lower performance in heterogeneously dense breasts (0.699-0.729). DenseNet121 achieved the highest overall performance (micro-averaged AUROC 0.885), and performance across categories was comparable between internal and external testing. For risk modeling, age combined with AI-derived density yielded a lower AUROC than age combined with mammography-reported density (0.541 vs. 0.570; p = 0.23), with no statistically significant difference. These findings indicate that deep learning models generalize well to external data with different racial composition for breast density assessment. While performance is strongest in extremely dense breasts, heterogeneously dense remains more challenging, highlighting the need for targeted optimization.

preprint2026arXiv

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

Evaluating object removal in images and videos remains challenging because the task is inherently one-to-many, yet existing metrics frequently disagree with human perception. Full-reference metrics reward copy-paste behaviors over genuine erasure; no-reference metrics suffer from systematic biases such as favoring blurry results; and global temporal metrics are insensitive to localized artifacts within edited regions. To address these limitations, we propose RC (Removal Coherence), a pair of perception-aligned metrics: RC-S, which measures spatial coherence via sliding-window feature comparison between masked and background regions, and RC-T, which measures temporal consistency via distribution tracking within shared restored regions across adjacent frames. To validate RC and support community benchmarking, we further introduce PROVE-Bench, a two-tier real-world benchmark comprising PROVE-M, an 80-video paired dataset with motion augmentation, and PROVE-H, a 100-video challenging subset without ground truth. Together, RC metrics and PROVE-Bench form the PROVE (Perceptual RemOVal cohErence) evaluation framework for visual media. Experiments across diverse image and video benchmarks demonstrate that RC achieves substantially stronger alignment with human judgments than existing evaluation protocols. The code for RC metrics and PROVE-Bench are publicly available at: https://github.com/xiaomi-research/prove/.

preprint2022arXiv

A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition

Pre-trained language models (PLM) are effective components of few-shot named entity recognition (NER) approaches when augmented with continued pre-training on task-specific out-of-domain data or fine-tuning on in-domain data. However, their performance in low-resource scenarios, where such data is not available, remains an open question. We introduce an encoder evaluation framework, and use it to systematically compare the performance of state-of-the-art pre-trained representations on the task of low-resource NER. We analyze a wide range of encoders pre-trained with different strategies, model architectures, intermediate-task fine-tuning, and contrastive learning. Our experimental results across ten benchmark NER datasets in English and German show that encoder performance varies significantly, suggesting that the choice of encoder for a specific low-resource scenario needs to be carefully evaluated.

preprint2022arXiv

Creating a Nanoscale Lateral Heterojunction in a Semiconductor Monolayer with a Large Built-in Potential

The ability to engineer atomically thin nanoscale lateral heterojunctions (HJs) is critical to lay the foundation for future two-dimensional (2D) device technology. However, the traditional approach to creating a heterojunction by direct growth of a heterostructure of two different materials constrains the available band offsets, and it is still unclear if large built-in potentials are attainable for 2D materials. The electronic properties of atomically thin semiconducting transition metal dichalcogenides (TMDs) are not static, and their exciton binding energy and quasiparticle band gap depend strongly on the proximal environment. Recent studies have shown that this effect can be harnessed to engineer the lateral band profile of monolayer TMDs to create a heterojunction. Here we demonstrate the synthesis of a nanoscale lateral heterojunction in monolayer MoSe2 by intercalating Se at the interface of a hBN/Ru(0001) substrate. The Se intercalation creates a spatially abrupt modulation of the local hBN/Ru work function, which is imprinted directly onto an overlying MoSe2 monolayer to create a large built-in potential of 0.83 eV. We spatially resolve the MoSe2 band profile and work function using scanning tunneling spectroscopy to map out the nanoscale depletion region. The Se intercalation also modifies the dielectric environment, influencing the local band gap renormalization and increasing the MoSe2 band gap by ~0.26 eV. This work illustrates that environmental proximity engineering provides a robust method to indirectly manipulate the band profile of 2D materials outside the limits of their intrinsic properties, providing avenues for future device design.

preprint2022arXiv

SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos

Self-supervised methods have significantly closed the gap with end-to-end supervised learning for image classification. In the case of human action videos, however, where both appearance and motion are significant factors of variation, this gap remains significant. One of the key reasons for this is that sampling pairs of similar video clips, a required step for many self-supervised contrastive learning methods, is currently done conservatively to avoid false positives. A typical assumption is that similar clips only occur temporally close within a single video, leading to insufficient examples of motion similarity. To mitigate this, we propose SLIC, a clustering-based self-supervised contrastive learning method for human action videos. Our key contribution is that we improve upon the traditional intra-video positive sampling by using iterative clustering to group similar video instances. This enables our method to leverage pseudo-labels from the cluster assignments to sample harder positives and negatives. SLIC outperforms state-of-the-art video retrieval baselines by +15.4% on top-1 recall on UCF101 and by +5.7% when directly transferred to HMDB51. With end-to-end finetuning for action classification, SLIC achieves 83.2% top-1 accuracy (+0.8%) on UCF101 and 54.5% on HMDB51 (+1.6%). SLIC is also competitive with the state-of-the-art in action classification after self-supervised pretraining on Kinetics400.

preprint2022arXiv

Why only Micro-F1? Class Weighting of Measures for Relation Classification

Relation classification models are conventionally evaluated using only a single measure, e.g., micro-F1, macro-F1 or AUC. In this work, we analyze weighting schemes, such as micro and macro, for imbalanced datasets. We introduce a framework for weighting schemes, where existing schemes are extremes, and two new intermediate schemes. We show that reporting results of different weighting schemes better highlights strengths and weaknesses of a model.

preprint2021arXiv

PiP: Planning-informed Trajectory Prediction for Autonomous Driving

It is critical to predict the motion of surrounding vehicles for self-driving planning, especially in a socially compliant and flexible way. However, future prediction is challenging due to the interaction and uncertainty in driving behaviors. We propose planning-informed trajectory prediction (PiP) to tackle the prediction problem in the multi-agent setting. Our approach is differentiated from the traditional manner of prediction, which is only based on historical information and decoupled with planning. By informing the prediction process with the planning of ego vehicle, our method achieves the state-of-the-art performance of multi-agent forecasting on highway datasets. Moreover, our approach enables a novel pipeline which couples the prediction and planning, by conditioning PiP on multiple candidate trajectories of the ego vehicle, which is highly beneficial for autonomous driving in interactive scenarios.

preprint2016arXiv

Band gap renormalization and work function tuning in MoSe2/hBN/Ru(0001) heterostructures

Here we report the successful growth of MoSe2 on single layer hexagonal boron nitride (hBN) on Ru(0001) substrate by using molecular beam epitaxy. We investigated the electronic structures of MoSe2 using scanning tunneling microscopy and spectroscopy. Surprisingly, we found that the quasi-particle gap of the MoSe2 on hBN/Ru is about 0.25 eV smaller than those on graphene or graphite substrates. We attribute this result to the strong interaction between hBN/Ru which causes residual metallic screening from the substrate. The surface of MoSe2 exhibits Moiré pattern that replicates the Moiré pattern of hBN/Ru. In addition, the electronic structure and the work function of MoSe2 are modulated electrostatically with an amplitude of ~ 0.13 eV. Most interestingly, this electrostatic modulation is spatially in phase with the Moiré pattern of hBN on Ru(0001) whose surface also exhibits a work function modulation of the same amplitude.

preprint2015arXiv

Examples of Complete Solvability of 2D Classical Superintegrable Systems

Classical (maximal) superintegrable systems in $n$ dimensions are Hamiltonian systems with $2n-1$ independent constants of the motion, globally defined, the maximum number possible. They are very special because they can be solved algebraically. In this paper we show explicitly, mostly through examples of 2nd order superintegrable systems in 2 dimensions, how the trajectories can be determined in detail using rather elementary algebraic, geometric and analytic methods applied to the closed quadratic algebra of symmetries of the system. We treat a family of 2nd order degenerate systems: oscillator analogies on Darboux, nonzero constant curvature, and flat spaces, related to one another via contractions, and obeying Kepler's laws. Then we treat two 2nd order nondegenerate systems, an analogy of a caged Coulomb problem on the 2-sphere and its contraction to a Euclidean space caged Coulomb problem. In all cases the symmetry algebra structure provides detailed information about the trajectories. An interesting example is the occurrence of ''metronome orbits'', trajectories confined to an arc rather than a loop, which are indicated clearly from the structure equations but might be overlooked using more traditional methods. We also treat the Post-Winternitz system, an example of a classical 4th order superintegrable system that cannot be solved using separation of variables. Finally we treat a superintegrable system, related to the addition theorem for elliptic functions, whose constants of the motion are only rational in the momenta, a system of special interest because its constants of the motion generate a closed polynomial algebra. This paper contains many new results but we have tried to present most of the materials in a fashion that is easily accessible to nonexperts, in order to provide entrée to superintegrablity theory.

preprint2015arXiv

Probing critical point energies of transition metal dichalcogenides: surprising indirect gap of single layer $SL-WSe_2$

Understanding quasiparticle band structures of transition metal dichalcogenides (TMDs) is critical for technological advances of these materials for atomic layer electronics and photonics. Although theoretical calculations to date have shown qualitatively similar features, there exist subtle differences which can lead to important consequences in the device characteristics. For example, most calculations have shown that all single layer (SL) TMDs have direct band gaps, while some have shown that $SL-WSe_2$ have an indirect gap. Moreover, there are large variations in the reported quasiparticle gaps, corresponding to large variations in exciton binding energies. By using a comprehensive form of scanning tunneling spectroscopy, we have revealed detailed quasiparticle electronic structures in TMDs, including the quasi-particle gaps, critical point energy locations and their origins in the Brillouin Zones (BZs). We show that $SL-WSe_2$ actually has an indirect quasi-particle gap with the conduction band minimum located at the Q point (instead of K), albeit the two states are nearly degenerate. Its implications on optical properties are discussed. We have further observed rich quasi-particle electronic structures of TMDs as a function of atomic structures and spin-orbital couplings.

preprint2015arXiv

Visualizing Band Offsets and Edge States in Bilayer-Monolayer Transition Metal Dichalcogenides Lateral Heterojunction

Semiconductor heterostructures are fundamental building blocks for many important device applications. The emergence of two-dimensional semiconductors opens up a new realm for creating heterostructures. As the bandgaps of transition metal dichalcogenides thin films have sensitive layer dependence, it is natural to create lateral heterojunctions using the same materials with different thicknesses. Using scanning tunneling microscopy and spectroscopy, here we show the real space image of electronic structures across the bilayer-monolayer interface in MoSe2 and WSe2. Most bilayer-monolayer heterojunctions are found to have a zigzag-orientated interface, and the band alignment of such atomically sharp heterojunctions is of type-I with a well-defined interface mode which acts as a narrower-gap quantum wire. The ability to utilize such commonly existing thickness terrace as lateral heterojunctions is a crucial addition to the tool set for device applications based on atomically thin transition metal dichalcogenides, with the advantage of easy and flexible implementation.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision cond-mat.mtrl-sci Computation and Language Artificial Intelligence cond-mat.mes-hall eess.IV Information Theory Machine Learning math-ph math.IT math.MP Multimedia Robotics

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.05082:author:1:yuxuan-chen

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.11567:author:3:yuxuan-chen

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.14534:author:5:yuxuan-chen

Imported May 20, 2026Synced May 20, 2026

4 works

Chih-Kang Shih

Researcher

Chih-Kang Shih contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Chendong Zhang

Researcher

Chendong Zhang contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Chi-Ruei Pan

Researcher

Chi-Ruei Pan contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Christoph Alt

Researcher

Christoph Alt contributes to research discovery and scholarly infrastructure.

Open to collaborate

Yuxuan Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Dynamic Execution Commitment of Vision-Language-Action Models

Engineering Favorable Propagation: Near-Field IRS Deployment for Spatial Multiplexing

External Validation of Deep Learning Models for BI-RADS Breast Density Prediction from Ultrasound Images

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition

Creating a Nanoscale Lateral Heterojunction in a Semiconductor Monolayer with a Large Built-in Potential

SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos

Why only Micro-F1? Class Weighting of Measures for Relation Classification

PiP: Planning-informed Trajectory Prediction for Autonomous Driving

Band gap renormalization and work function tuning in MoSe2/hBN/Ru(0001) heterostructures

Examples of Complete Solvability of 2D Classical Superintegrable Systems

Probing critical point energies of transition metal dichalcogenides: surprising indirect gap of single layer $SL-WSe_2$

Visualizing Band Offsets and Edge States in Bilayer-Monolayer Transition Metal Dichalcogenides Lateral Heterojunction