Researcher profile

Yuxuan Sun

Yuxuan Sun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

Computer Use at the Edge of the Statistical Precipice

Evaluating Computer Use Agents (CUAs) on interactive environments is fraught with methodological pitfalls that the field has yet to systematically address. We show that a 1MB replay script that blindly executes a recorded action sequence without ever observing the screen outperforms frontier models on prominent static benchmarks, and prove that its expected success rate is exactly equal to the source agent's pass@k in deterministic environments. We trace this and other failures to two root causes: non-principled environment design (static, unsandboxed, or unreliably verified environments) and non-principled evaluation methodology (naive aggregation and misuse of pass@k for stateful UI interactions). To address the first, we propose PRISM, five design principles for CUA environments (privileged verification, realistic environments, integrity-checked configurations, sandboxed execution, and multifactorial variability) and instantiate them in DigiWorld, a benchmark of 15 realistic sandboxed mobile applications able to evaluate agents in over 3.2 million verified unique configurations. To address the second, we develop an aggregation framework pairing Wilson score intervals with hierarchical bootstrap, producing confidence intervals that correctly account for the nested structure of CUA benchmarks, as we empirically demonstrate. All together, we show that principled environment design and rigorous evaluation methodology are not optional refinements but prerequisites for meaningful CUA research.

preprint2026arXiv

GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference

The recent growth of on-device Large Language Model (LLM) inference has driven significant interest in device-edge collaborative LLM inference. As a promising architecture, Speculative Decoding (SD) is increasingly adopted where a lightweight draft model rapidly generates candidate tokens to be verified by a powerful target model. However, a fundamental challenge lies in achieving per-token resource scheduling to effectively adapt SD paradigm to resource-constrained edge environment. This paper proposes a Generative Entropy- and Lyapunov-based Adaptive Token Offloading framework, named GELATO, to maximize decoding throughput under energy constraints in a device-edge collaborative SD system. Specifically, an outer drift-plus-penalty loop makes online decisions to establish a reference drafting budget, managing long-term energy-throughput trade-off. Further, a nested entropy-driven generation mechanism executes early exiting to adapt to per-token dynamic generative uncertainty. Theoretical analysis establishes a rigorous performance bound on long-term throughput for GELATO. Extensive evaluations demonstrate that GELATO achieves a globally optimal tradeoff, outperforming state-of-the-art distributed SD architectures by 64.98% in token throughput and reducing energy consumption by 47.47% under resource-constrained environments, while preserving LLM decoding quality.

preprint2026arXiv

Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission

Device-edge collaborative inference with Deep Neural Networks (DNNs) faces fundamental trade-offs among accuracy, latency and energy consumption. Current scheduling exhibits two drawbacks: a granularity mismatch between coarse, task-level decisions and fine-grained, packet-level channel dynamics, and insufficient awareness of per-task complexity. Consequently, scheduling solely at the task level leads to inefficient resource utilization. This paper proposes a novel ENergy-ACcuracy Hierarchical optimization framework for split Inference, named ENACHI, that jointly optimizes task- and packet-level scheduling to maximize accuracy under energy and delay constraints. A two-tier Lyapunov-based framework is developed for ENACHI, with a progressive transmission technique further integrated to enhance adaptivity. At the task level, an outer drift-plus-penalty loop makes online decisions for DNN partitioning and bandwidth allocation, and establishes a reference power budget to manage the long-term energy-accuracy trade-off. At the packet level, an uncertainty-aware progressive transmission mechanism is employed to adaptively manage per-sample task complexity. This is integrated with a nested inner control loop implementing a novel reference-tracking policy, which dynamically adjusts per-slot transmit power to adapt to fluctuating channel conditions. Experiments on ImageNet dataset demonstrate that ENACHI outperforms state-of-the-art benchmarks under varying deadlines and bandwidths, achieving a 43.12\% gain in inference accuracy with a 62.13\% reduction in energy consumption under stringent deadlines, and exhibits high scalability by maintaining stable energy consumption in congested multi-user scenarios.

preprint2026arXiv

Timeliness-Oriented Scheduling and Resource Allocation in Multi-Region Collaborative Perception

Collaborative perception (CP) is a critical technology in applications like autonomous driving and smart cities. It involves the sharing and fusion of information among sensors to overcome the limitations of individual perception, such as blind spots and range limitations. However, CP faces two primary challenges. First, due to the dynamic nature of the environment, the timeliness of the transmitted information is critical to perception performance. Second, with limited computational power at the sensors and constrained wireless bandwidth, the communication volume must be carefully designed to ensure feature representations are both effective and sufficient. This work studies the dynamic scheduling problem in a multi-region CP scenario, and presents a Timeliness-Aware Multi-region Prioritized (TAMP) scheduling algorithm to trade-off perception accuracy and communication resource usage. Timeliness reflects the utility of information that decays as time elapses, which is manifested by the perception performance in CP tasks. We propose an empirical penalty function that maps the joint impact of Age of Information (AoI) and communication volume to perception performance. Aiming to minimize this timeliness-oriented penalty in the long-term, and recognizing that scheduling decisions have a cumulative effect on subsequent system states, we propose the TAMP scheduling algorithm. TAMP is a Lyapunov-based optimization policy that decomposes the long-term average objective into a per-slot prioritization problem, balancing the scheduling worth against resource cost. We validate our algorithm in both intersection and corridor scenarios with the real-world Roadside Cooperative perception (RCooper) dataset. Extensive simulations demonstrate that TAMP outperforms the best-performing baseline, achieving an Average Precision (AP) improvement of up to 27% across various configurations.

preprint2024arXiv

Digital-SC: Digital Semantic Communication with Adaptive Network Split and Learned Non-Linear Quantization

Semantic communication, an intelligent communication paradigm that aims to transmit useful information in the semantic domain, is facilitated by deep learning techniques. Robust semantic features can be learned and transmitted in an analog fashion, but it poses new challenges to hardware, protocol, and encryption. In this paper, we propose a digital semantic communication system, which consists of an encoding network deployed on a resource-limited device and a decoding network deployed at the edge. To acquire better semantic representation for digital transmission, a novel non-linear quantization module is proposed to efficiently quantize semantic features with trainable quantization levels. Additionally, structured pruning is incorporated to reduce the dimension of the transmitted features. We also introduce a semantic learning loss (SLL) function to reduce semantic error. To adapt to various channel conditions and inputs under constraints of communication and computing resources, a policy network is designed to adaptively choose the split point and the dimension of the transmitted semantic features. Experiments using the CIFAR-10 and ImageNet dataset for image classification are employed to evaluate the proposed digital semantic communication network, and ablation studies are conducted to assess the proposed quantization module, structured pruning and SLL.

preprint2023arXiv

Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction

In this work we give a case study of an embodied machine-learning (ML) powered agent that improves itself via interactions with crowd-workers. The agent consists of a set of modules, some of which are learned, and others heuristic. While the agent is not "end-to-end" in the ML sense, end-to-end interaction is a vital part of the agent's learning mechanism. We describe how the design of the agent works together with the design of multiple annotation interfaces to allow crowd-workers to assign credit to module errors from end-to-end interactions, and to label data for individual modules. Over multiple automated human-agent interaction, credit assignment, data annotation, and model re-training and re-deployment, rounds we demonstrate agent improvement.

preprint2023arXiv

On the Characterization of Alternating Groups by Codegrees

Let $G$ be a finite group and $\mathrm{Irr}(G)$ the set of all irreducible complex characters of $G$. Define the codegree of $χ\in \mathrm{Irr}(G)$ as $\mathrm{cod}(χ):=\frac{|G:\mathrm{ker}(χ) |}{χ(1)}$ and denote by $\mathrm{cod}(G):=\{\mathrm{cod}(χ) \mid χ\in \mathrm{Irr}(G)\}$ the codegree set of $G$. Let $\mathrm{A}_n$ be an alternating group of degree $n \ge 5$. In this paper, we show that $\mathrm{A}_n$ is determined up to isomorphism by $\mathrm{cod}(\mathrm{A}_n)$.

preprint2023arXiv

On the Characterization of Sporadic Simple Groups by Codegrees

Let $G$ be a finite group and $\mathrm{Irr}(G)$ the set of all irreducible complex characters of $G$. Define the codegree of $χ\in \mathrm{Irr}(G)$ as $\mathrm{cod}(χ):=\frac{|G:\mathrm{ker}(χ) |}{χ(1)}$ and denote by $\mathrm{cod}(G):=\{\mathrm{cod}(χ)|χ\in \mathrm{Irr}(G)\}$ the codegree set of $G$. Let $H$ be one of the $26$ sporadic simple groups. In this paper, we show that $H$ is determined up to isomorphism by cod$(H)$.

preprint2022arXiv

Adaptive actuation of magnetic soft robots using deep reinforcement learning

Magnetic soft robots have attracted growing interest due to their unique advantages in terms of untethered actuation and excellent controllability. However, finding the required magnetization patterns or magnetic fields to achieve the desired functions of these robots is quite challenging in many cases. No unified framework for design has been proposed yet, and existing methods mainly rely on manual heuristics, which are hard to satisfy the high complexity level of the desired robotic motion. Here, we develop an intelligent method to solve the related inverse-design problems, implemented by introducing a novel simulation platform for magnetic soft robots based on Cosserat rod models and a deep reinforcement learning framework based on TD3. We demonstrate that magnetic soft robots with different magnetization patterns can learn to move without human guidance in simulations, and effective magnetic fields can be autonomously generated that can then be applied directly to real magnetic soft robots in an open-loop way.

preprint2022arXiv

Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital Pathology

When designing a diagnostic model for a clinical application, it is crucial to guarantee the robustness of the model with respect to a wide range of image corruptions. Herein, an easy-to-use benchmark is established to evaluate how deep neural networks perform on corrupted pathology images. Specifically, corrupted images are generated by injecting nine types of common corruptions into validation images. Besides, two classification and one ranking metrics are designed to evaluate the prediction and confidence performance under corruption. Evaluated on two resulting benchmark datasets, we find that (1) a variety of deep neural network models suffer from a significant accuracy decrease (double the error on clean images) and the unreliable confidence estimation on corrupted images; (2) A low correlation between the validation and test errors while replacing the validation set with our benchmark can increase the correlation. Our codes are available on https://github.com/superjamessyx/robustness_benchmark.

preprint2022arXiv

IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022

Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collaborative Environment. The primary goal of the competition is to approach the problem of how to develop interactive embodied agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. Understanding the complexity of the challenge, we split it into sub-tasks to make it feasible for participants. This research challenge is naturally related, but not limited, to two fields of study that are highly relevant to the NeurIPS community: Natural Language Understanding and Generation (NLU/G) and Reinforcement Learning (RL). Therefore, the suggested challenge can bring two communities together to approach one of the crucial challenges in AI. Another critical aspect of the challenge is the dedication to perform a human-in-the-loop evaluation as a final evaluation for the agents developed by contestants.

preprint2022arXiv

Interactive Grounded Language Understanding in a Collaborative Environment: IGLU 2021

Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose \emph{IGLU: Interactive Grounded Language Understanding in a Collaborative Environment}. The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. Understanding the complexity of the challenge, we split it into sub-tasks to make it feasible for participants.

preprint2022arXiv

NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in complementary ways. We discuss the top methods submitted for this competition and evaluate their results on our blind test set. Our challenge results show significant improvement of more than 15% accuracy from our current baselines for each track of the competition

preprint2022arXiv

Time-Correlated Sparsification for Efficient Over-the-Air Model Aggregation in Wireless Federated Learning

Federated edge learning (FEEL) is a promising distributed machine learning (ML) framework to drive edge intelligence applications. However, due to the dynamic wireless environments and the resource limitations of edge devices, communication becomes a major bottleneck. In this work, we propose time-correlated sparsification with hybrid aggregation (TCS-H) for communication-efficient FEEL, which exploits jointly the power of model compression and over-the-air computation. By exploiting the temporal correlations among model parameters, we construct a global sparsification mask, which is identical across devices, and thus enables efficient model aggregation over-the-air. Each device further constructs a local sparse vector to explore its own important parameters, which are aggregated via digital communication with orthogonal multiple access. We further design device scheduling and power allocation algorithms for TCS-H. Experiment results show that, under limited communication resources, TCS-H can achieve significantly higher accuracy compared to the conventional top-K sparsification with orthogonal model aggregation, with both i.i.d. and non-i.i.d. data distributions.

preprint2021arXiv

Coded Computation across Shared Heterogeneous Workers with Communication Delay

Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and their assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to workers for parallel computation. The goal is to minimize the communication plus computation delay of the slowest task. We propose worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks of either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov's inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks, and observe that dedicated and fractional worker assignment policies have different scopes of applications.

preprint2021arXiv

droidlet: modular, heterogenous, multi-modal agents

In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale. But most of these systems are: (a) isolated (perception, speech, or language only); (b) trained on static datasets. On the other hand, in the field of robotics, large-scale learning has always been difficult. Supervision is hard to gather and real world physical interactions are expensive. In this work we introduce and open-source droidlet, a modular, heterogeneous agent architecture and platform. It allows us to exploit both large-scale static datasets in perception and language and sophisticated heuristics often used in robotics; and provides tools for interactive annotation. Furthermore, it brings together perception, language and action onto one platform, providing a path towards agents that learn from the richness of real world interactions.

preprint2020arXiv

Distributed Task Replication for Vehicular Edge Computing: Performance Analysis and Learning-based Algorithm

In a vehicular edge computing (VEC) system, vehicles can share their surplus computation resources to provide cloud computing services. The highly dynamic environment of the vehicular network makes it challenging to guarantee the task offloading delay. To this end, we introduce task replication to the VEC system, where the replicas of a task are offloaded to multiple vehicles at the same time, and the task is completed upon the first response among replicas. First, the impact of the number of task replicas on the offloading delay is characterized, and the optimal number of task replicas is approximated in closed-form. Based on the analytical result, we design a learning-based task replication algorithm (LTRA) with combinatorial multi-armed bandit theory, which works in a distributed manner and can automatically adapt itself to the dynamics of the VEC system. A realistic traffic scenario is used to evaluate the delay performance of the proposed algorithm. Results show that, under our simulation settings, LTRA with an optimized number of task replicas can reduce the average offloading delay by over 30% compared to the benchmark without task replication, and at the same time can improve the task completion ratio from 97% to 99.6%.