Source author record

Ye Yu

Ye Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Networking and Internet Architecture physics.optics Computation and Language cond-mat.mtrl-sci Distributed, Parallel, and Cluster Computing Hardware Architecture physics.app-ph physics.atom-ph physics.class-ph quant-ph

Catalog footprint

What is connected

10works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

Recent advances in LLM agents enable systems that autonomously refine workflows, accumulate reusable skills, self-train their underlying models, and maintain persistent memory. However, we show that such self-evolution is often non-monotonic: adapting to new task distributions can progressively degrade previously acquired capabilities across all major evolution channels. We identify this phenomenon as \emph{capability erosion under self-evolution} and show that it consistently emerges across workflow, skill, model, and memory evolution. To mitigate this issue, we propose \emph{Capability-Preserving Evolution} (CPE), a general stabilization principle that constrains destructive capability drift during continual adaptation. Across all four evolution dimensions, CPE consistently improves retained capability stability while preserving adaptation performance. For example, in workflow evolution, CPE improves retained simple-task performance from 41.8\% to 52.8\% under GPT-5.1 optimization while simultaneously achieving stronger complex-task adaptation. Our findings suggest that stable long-horizon self-evolving agents require not only acquiring new capabilities, but also explicitly preserving previously learned ones during continual adaptation.

preprint2022arXiv

Asymmetric topological pumping in nonparaxial photonics

Topological photonics was initially inspired by the quantum-optical analogy between the Schrödinger equation for an electron wavefunction and the paraxial equation for a light beam. Here, we reveal an unexpected phenomenon in topological pumping observed in arrays of nonparaxial optical waveguides where the quantum-optical analogy becomes invalid. We predict theoretically and demonstrate experimentally an asymmetric topological pumping when the injected field transfers from one side of the waveguide array to the other side whereas the reverse process is unexpectedly forbidden. Our finding could open an avenue for exploring topological photonics that enables nontrivial topological phenomena and designs in photonics driven by nonparaxiality.

preprint2022arXiv

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

Video Object Segmentation (VOS) is fundamental to video understanding. Transformer-based methods show significant performance improvement on semi-supervised VOS. However, existing work faces challenges segmenting visually similar objects in close proximity of each other. In this paper, we propose a novel Bilateral Attention Transformer in Motion-Appearance Neighboring space (BATMAN) for semi-supervised VOS. It captures object motion in the video via a novel optical flow calibration module that fuses the segmentation mask with optical flow estimation to improve within-object optical flow smoothness and reduce noise at object boundaries. This calibrated optical flow is then employed in our novel bilateral attention, which computes the correspondence between the query and reference frames in the neighboring bilateral space considering both motion and appearance. Extensive experiments validate the effectiveness of BATMAN architecture by outperforming all existing state-of-the-art on all four popular VOS benchmarks: Youtube-VOS 2019 (85.0%), Youtube-VOS 2018 (85.3%), DAVIS 2017Val/Testdev (86.2%/82.2%), and DAVIS 2016 (92.5%).

preprint2022arXiv

GateHUB: Gated History Unit with Background Suppression for Online Action Detection

Online action detection is the task of predicting the action as soon as it happens in a streaming video. A major challenge is that the model does not have access to the future and has to solely rely on the history, i.e., the frames observed so far, to make predictions. It is therefore important to accentuate parts of the history that are more informative to the prediction of the current frame. We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross-attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction. GateHUB further proposes Future-augmented History (FaH) to make history features more informative by using subsequently observed frames when available. In a single unified framework, GateHUB integrates the transformer's ability of long-range temporal modeling and the recurrent model's capacity to selectively encode relevant information. GateHUB also introduces a background suppression objective to further mitigate false positive background frames that closely resemble the action frames. Extensive validation on three benchmark datasets, THUMOS, TVSeries, and HDD, demonstrates that GateHUB significantly outperforms all existing methods and is also more efficient than the existing best work. Furthermore, a flow-free version of GateHUB is able to achieve higher or close accuracy at 2.8x higher frame rate compared to all existing methods that require both RGB and optical flow information for prediction.

preprint2021arXiv

Outdoor inverse rendering from a single image using multiview self-supervision

In this paper we show how to perform scene-level inverse rendering to recover shape, reflectance and lighting from a single, uncontrolled image using a fully convolutional neural network. The network takes an RGB image as input, regresses albedo, shadow and normal maps from which we infer least squares optimal spherical harmonic lighting coefficients. Our network is trained using large uncontrolled multiview and timelapse image collections without ground truth. By incorporating a differentiable renderer, our network can learn from self-supervision. Since the problem is ill-posed we introduce additional supervision. Our key insight is to perform offline multiview stereo (MVS) on images containing rich illumination variation. From the MVS pose and depth maps, we can cross project between overlapping views such that Siamese training can be used to ensure consistent estimation of photometric invariants. MVS depth also provides direct coarse supervision for normal map estimation. We believe this is the first attempt to use MVS supervision for learning inverse rendering. In addition, we learn a statistical natural illumination prior. We evaluate performance on inverse rendering, normal map estimation and intrinsic image decomposition benchmarks.

preprint2020arXiv

Beyond Adiabatic Elimination in Topological Floquet Engineering

In quantum mechanics, adiabatic elimination is a standard tool that produces a low-lying reduced Hamiltonian for a relevant subspace of states, incorporating effects of its coupling to states with much higher energy. Suppose this powerful elimination approach is applied to quasi-energy states in periodically-driven systems, a critical question then arises that the violation of the adiabatic condition caused by driven forces challenges such a presence of spectral reduction in the non-equilibrium driven system. Here, both theoretically and experimentally, we newly reported two kinds of driven-induced eliminations universal in topologically-protected Floquet systems. We named them "quasi-adiabatic elimination" and "high-frequency-limited elimination", in terms of different driven frequencies that deny the underlying requirement for the adiabatic condition. Both two non-adiabatic eliminations are observed in our recently developed microwave Floquet simulator, a programmable test platform composed of periodically-bending ultrathin metallic coupled corrugated waveguides. Through the near-field imaging on our simulator, the mechanisms between the adiabatic and driven-induced eliminations are revealed, indicating the ubiquitous spectral decomposition for tailoring and manipulating Floquet states with quasi-energies. Finally, we hope our findings may open up profound and applicable possibilities for further developing Floquet engineering in periodically-driven systems, ranging from condensed matter physics to photonics.

preprint2020arXiv

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high computational complexity of DNNs often necessitates extremely fast and efficient hardware. The problem gets worse as the size of neural networks grows exponentially. As a result, customized hardware accelerators have been developed to accelerate DNN processing without sacrificing model accuracy. However, previous accelerator design studies have not fully considered the characteristics of the target applications, which may lead to sub-optimal architecture designs. On the other hand, new DNN models have been developed for better accuracy, but their compatibility with the underlying hardware accelerator is often overlooked. In this article, we propose an application-driven framework for architectural design space exploration of DNN accelerators. This framework is based on a hardware analytical model of individual DNN operations. It models the accelerator design task as a multi-dimensional optimization problem. We demonstrate that it can be efficaciously used in application-driven accelerator architecture design. Given a target DNN, the framework can generate efficient accelerator design solutions with optimized performance and area. Furthermore, we explore the opportunity to use the framework for accelerator configuration optimization under simultaneous diverse DNN applications. The framework is also capable of improving neural network models to best fit the underlying hardware resources.

preprint2020arXiv

SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference

CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring dataflow styles that exploit computational parallelism. However, potential performance speedup from sparsity has not been adequately addressed. The computation and memory footprint of CNNs can be significantly reduced if sparsity is exploited in network evaluations. To take advantage of sparsity, some accelerator designs explore sparsity encoding and evaluation on CNN accelerators. However, sparsity encoding is just performed on activation or weight and only in inference. It has been shown that activation and weight also have high sparsity levels during training. Hence, sparsity-aware computation should also be considered in training. To further improve performance and energy efficiency, some accelerators evaluate CNNs with limited precision. However, this is limited to the inference since reduced precision sacrifices network accuracy if used in training. In addition, CNN evaluation is usually memory-intensive, especially in training. In this paper, we propose SPRING, a SParsity-aware Reduced-precision Monolithic 3D CNN accelerator for trainING and inference. SPRING supports both CNN training and inference. It uses a binary mask scheme to encode sparsities in activation and weight. It uses the stochastic rounding algorithm to train CNNs with reduced precision without accuracy loss. To alleviate the memory bottleneck in CNN evaluation, especially in training, SPRING uses an efficient monolithic 3D NVM interface to increase memory bandwidth. Compared to GTX 1080 Ti, SPRING achieves 15.6X, 4.2X and 66.0X improvements in performance, power reduction, and energy efficiency, respectively, for CNN training, and 15.5X, 4.5X and 69.1X improvements for inference.

preprint2016arXiv

SICS: Secure In-Cloud Service Function Chaining

There is an increasing trend that enterprises outsource their network functions to the cloud for lower cost and ease of management. However, network function outsourcing brings threats to the privacy of enterprises since the cloud is able to access the traffic and rules of in-cloud network functions. Current tools for secure network function outsourcing either incur large performance overhead or do not support real-time updates. In this paper, we present SICS, a secure service function chain outsourcing framework. SICS encrypts each packet header and use a label for in-cloud rule matching, which enables the cloud to perform its functionalities correctly with minimum header information leakage. Evaluation results show that SICS achieves higher throughput, faster construction and update speed, and lower resource overhead at both enterprise and cloud sides, compared to existing solutions.

preprint2014arXiv

Distributed Collaborative Monitoring in Software Defined Networks

We propose a Distributed and Collaborative Monitoring system, DCM, with the following properties. First, DCM allow switches to collaboratively achieve flow monitoring tasks and balance measurement load. Second, DCM is able to perform per-flow monitoring, by which different groups of flows are monitored using different actions. Third, DCM is a memory-efficient solution for switch data plane and guarantees system scalability. DCM uses a novel two-stage Bloom filters to represent monitoring rules using small memory space. It utilizes the centralized SDN control to install, update, and reconstruct the two-stage Bloom filters in the switch data plane. We study how DCM performs two representative monitoring tasks, namely flow size counting and packet sampling, and evaluate its performance. Experiments using real data center and ISP traffic data on real network topologies show that DCM achieves highest measurement accuracy among existing solutions given the same memory budget of switches.

Ye Yu

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

Asymmetric topological pumping in nonparaxial photonics

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

GateHUB: Gated History Unit with Background Suppression for Online Action Detection

Outdoor inverse rendering from a single image using multiview self-supervision

Beyond Adiabatic Elimination in Topological Floquet Engineering

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference

SICS: Secure In-Cloud Service Function Chaining

Distributed Collaborative Monitoring in Software Defined Networks