Source author record

Pengfei Chen

Pengfei Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering astro-ph.SR Computer Vision Distributed, Parallel, and Cluster Computing Machine Learning Artificial Intelligence astro-ph.GA astro-ph.IM eess.IV eess.SP eess.SY math.AP Other Computer Science Performance Systems and Control

Catalog footprint

What is connected

16works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Survey on Failure Analysis and Fault Injection in AI Systems

The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ensure resilience and reliability. Despite the importance of these techniques, there lacks a comprehensive review of FA and FI methodologies in AI systems. This study fills this gap by presenting a detailed survey of existing FA and FI approaches across six layers of AI systems. We systematically analyze 160 papers and repositories to answer three research questions including (1) what are the prevalent failures in AI systems, (2) what types of faults can current FI tools simulate, (3) what gaps exist between the simulated faults and real-world failures. Our findings reveal a taxonomy of AI system failures, assess the capabilities of existing FI tools, and highlight discrepancies between real-world and simulated failures. Moreover, this survey contributes to the field by providing a framework for fault diagnosis, evaluating the state-of-the-art in FI, and identifying areas for improvement in FI techniques to enhance the resilience of AI systems.

preprint2026arXiv

Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios

Segment Anything Model (SAM), known for its remarkable zero-shot segmentation capabilities, has garnered significant attention in the community. Nevertheless, its performance is challenged when dealing with what we refer to as visually non-salient scenarios, where there is low contrast between the foreground and background. In these cases, existing methods often cannot capture accurate contours and fail to produce promising segmentation results. In this paper, we propose Visually Non-Salient SAM (VNS-SAM), aiming to enhance SAM's perception of visually non-salient scenarios while preserving its original zero-shot generalizability. We achieve this by effectively exploiting SAM's low-level features through two designs: Mask-Edge Token Interactive decoder and Non-Salient Feature Mining module. These designs help the SAM decoder gain a deeper understanding of non-salient characteristics with only marginal parameter increments and computational requirements. The additional parameters of VNS-SAM can be optimized within 4 hours, demonstrating its feasibility and practicality. In terms of data, we established VNS-SEG, a unified dataset for various VNS scenarios, with more than 35K images, in contrast to previous single-task adaptations. It is designed to make the model learn more robust VNS features and comprehensively benchmark the model's segmentation performance and generalizability on VNS scenarios. Extensive experiments across various VNS segmentation tasks demonstrate the superior performance of VNS-SAM, particularly under zero-shot settings, highlighting its potential for broad real-world applications. Codes and datasets are publicly available at https://guangqian-guo.github.io/VNS-SAM.

preprint2026arXiv

LIDL: LLM Integration Defect Localization via Knowledge Graph-Enhanced Multi-Agent Analysis

LLM-integrated software, which embeds or interacts with large language models (LLMs) as functional components, exhibits probabilistic and context-dependent behaviors that fundamentally differ from those of traditional software. This shift introduces a new category of integration defects that arise not only from code errors but also from misaligned interactions among LLM-specific artifacts, including prompts, API calls, configurations, and model outputs. However, existing defect localization techniques are ineffective at identifying these LLM-specific integration defects because they fail to capture cross-layer dependencies across heterogeneous artifacts, cannot exploit incomplete or misleading error traces, and lack semantic reasoning capabilities for identifying root causes. To address these challenges, we propose LIDL, a multi-agent framework for defect localization in LLM-integrated software. LIDL (1) constructs a code knowledge graph enriched with LLM-aware annotations that represent interaction boundaries across source code, prompts, and configuration files, (2) fuses three complementary sources of error evidence inferred by LLMs to surface candidate defect locations, and (3) applies context-aware validation that uses counterfactual reasoning to distinguish true root causes from propagated symptoms. We evaluate LIDL on 146 real-world defect instances collected from 105 GitHub repositories and 16 agent-based systems. The results show that LIDL significantly outperforms five state-of-the-art baselines across all metrics, achieving a Top-3 accuracy of 0.64 and a MAP of 0.48, which represents a 64.1% improvement over the best-performing baseline. Notably, LIDL achieves these gains while reducing cost by 92.5%, demonstrating both high accuracy and cost efficiency.

preprint2022arXiv

Calibration procedures for the CHASE/HIS science data

The Hα line is an important optical line in solar observations containing the information from the photosphere to the chromosphere. To study the mechanisms of solar eruptions and the plasma dynamics in the lower atmosphere, the Chinese Hα Solar Explorer (CHASE) was launched into a Sun-synchronous orbit on October 14, 2021. The scientific payload of the CHASE satellite is the Hα Imaging Spectrograph (HIS). The CHASE/HIS acquires, for the first time, seeing-free Hα spectroscopic observations with high spectral and temporal resolutions. It consists of two observational modes. The raster scanning mode provides full-Sun or region-of-interest spectra at Hα (6559.7-6565.9 Å) and Fe I (6567.8-6570.6 Å) wavebands. The continuum imaging mode obtains full-Sun photospheric images at around 6689 Å. In this paper, we present detailed calibration procedures for the CHASE/HIS science data, including the dark-field and flat-field correction, slit image curvature correction, wavelength and intensity calibration, and coordinate transformation. The higher-level data products can be directly used for scientific research.

preprint2022arXiv

Constructing Large-Scale Real-World Benchmark Datasets for AIOps

Recently, AIOps (Artificial Intelligence for IT Operations) has been well studied in academia and industry to enable automated and effective software service management. Plenty of efforts have been dedicated to AIOps, including anomaly detection, root cause localization, incident management, etc. However, most existing works are evaluated on private datasets, so their generality and real performance cannot be guaranteed. The lack of public large-scale real-world datasets has prevented researchers and engineers from enhancing the development of AIOps. To tackle this dilemma, in this work, we introduce three public real-world, large-scale datasets about AIOps, mainly aiming at KPI anomaly detection, root cause localization on multi-dimensional data, and failure discovery and diagnosis. More importantly, we held three competitions in 2018/2019/2020 based on these datasets, attracting thousands of teams to participate. In the future, we will continue to publish more datasets and hold competitions to promote the development of AIOps further.

preprint2022arXiv

Object Localization under Single Coarse Point Supervision

Point-based object localization (POL), which pursues high-performance object sensing under low-cost data annotation, has attracted increased attention. However, the point annotation mode inevitably introduces semantic variance for the inconsistency of annotated points. Existing POL methods heavily reply on accurate key-point annotations which are difficult to define. In this study, we propose a POL method using coarse point annotations, relaxing the supervision signals from accurate key points to freely spotted points. To this end, we propose a coarse point refinement (CPR) approach, which to our best knowledge is the first attempt to alleviate semantic variance from the perspective of algorithm. CPR constructs point bags, selects semantic-correlated points, and produces semantic center points through multiple instance learning (MIL). In this way, CPR defines a weakly supervised evolution procedure, which ensures training high-performance object localizer under coarse point supervision. Experimental results on COCO, DOTA and our proposed SeaPerson dataset validate the effectiveness of the CPR approach. The dataset and code will be available at https://github.com/ucas-vg/PointTinyBenchmark/.

preprint2022arXiv

Robust Medical Image Classification from Noisy Labeled Data with Global and Local Representation Guided Co-training

Deep neural networks have achieved remarkable success in a wide variety of natural image and medical image computing tasks. However, these achievements indispensably rely on accurately annotated training data. If encountering some noisy-labeled images, the network training procedure would suffer from difficulties, leading to a sub-optimal classifier. This problem is even more severe in the medical image analysis field, as the annotation quality of medical images heavily relies on the expertise and experience of annotators. In this paper, we propose a novel collaborative training paradigm with global and local representation learning for robust medical image classification from noisy-labeled data to combat the lack of high quality annotated medical data. Specifically, we employ the self-ensemble model with a noisy label filter to efficiently select the clean and noisy samples. Then, the clean samples are trained by a collaborative training strategy to eliminate the disturbance from imperfect labeled samples. Notably, we further design a novel global and local representation learning scheme to implicitly regularize the networks to utilize noisy samples in a self-supervised manner. We evaluated our proposed robust learning strategy on four public medical image classification datasets with three types of label noise,ie,random noise, computer-generated label noise, and inter-observer variability noise. Our method outperforms other learning from noisy label methods and we also conducted extensive experiments to analyze each component of our method.

preprint2022arXiv

The Chinese Hα Solar Explorer (CHASE) mission: An overview

The Chinese Hα Solar Explorer (CHASE), dubbed "Xihe" - Goddess of the Sun, was launched on October 14, 2021 as the first solar space mission of China National Space Administration (CNSA). The CHASE mission is designed to test a newly developed satellite platform and to acquire the spectroscopic observations in the Hα waveband. The Hα Imaging Spectrograph (HIS) is the scientific payload of the CHASE satellite. It consists of two observational modes: raster scanning mode and continuum imaging mode. The raster scanning mode obtains full-Sun or region-of-interest spectral images from 6559.7 to 6565.9 Å and from 6567.8 to 6570.6 Å with 0.024 Å pixel spectral resolution and 1 minute temporal resolution. The continuum imaging mode obtains photospheric images in continuum around 6689 Å with the full width at half maximum of 13.4 Å. The CHASE mission will advance our understanding of the dynamics of solar activity in the photosphere and chromosphere. In this paper, we present an overview of the CHASE mission including the scientific objectives, HIS instrument overview, data calibration flow, and first results of on-orbit observations.

preprint2021arXiv

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be sparse and delayed in some cases. A typical model-free RL algorithm usually estimates the values of a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking dynamics into consideration. In this paper, we propose Value Decomposition with Future Prediction (VDFP), providing an explicit two-step understanding of the value estimation process: 1) first foresee the latent future, 2) and then evaluate it. We analytically decompose the value function into a latent future dynamics part and a policy-independent trajectory return part, inducing a way to model latent dynamics and returns separately in value estimation. Further, we derive a practical deep RL algorithm, consisting of a convolutional model to learn compact trajectory representation from past experiences, a conditional variational auto-encoder to predict the latent future dynamics and a convex return model that evaluates trajectory representation. In experiments, we empirically demonstrate the effectiveness of our approach for both off-policy and on-policy RL in several OpenAI Gym continuous control tasks as well as a few challenging variants with delayed reward.

preprint2020arXiv

Integrated Traffic Simulation-Prediction System using Neural Networks with Application to the Los Angeles International Airport Road Network

Transportation networks are highly complex and the design of efficient traffic management systems is difficult due to lack of adequate measured data and accurate predictions of the traffic states. Traffic simulation models can capture the complex dynamics of transportation networks by using limited available traffic data and can help central traffic authorities in their decision-making, if appropriate input is fed into the simulator. In this paper, we design an integrated simulation-prediction system which estimates the Origin-Destination (OD) matrix of a road network using only flow rate information and predicts the behavior of the road network in different simulation scenarios. The proposed system includes an optimization-based OD matrix generation method, a Neural Network (NN) model trained to predict OD matrices via the pattern of traffic flow and a microscopic traffic simulator with a Dynamic Traffic Assignment (DTA) scheme to predict the behavior of the transportation system. We test the proposed system on the road network of the central terminal area (CTA) of the Los Angeles International Airport (LAX), which demonstrates that the integrated traffic simulation-prediction system can be used to simulate the effects of several real world scenarios such as lane closures, curbside parking and other changes. The model is an effective tool for learning the impact and possible benefits of changes in the network and for analyzing scenarios at a very low cost without disrupting the network.

preprint2020arXiv

Inviscid Limit for the Free-Boundary problems of MHD Equations with or without Surface Tension

In this paper, we investigate the convergence rates of inviscid limits for the free-boundary problems of the incompressible magnetohydrodynamics (MHD) with or without surface tension in $\mathbb{R}^3$, where the magnetic field is identically constant on the surface and outside of the domain. First, we establish the vorticity, the normal derivatives and the regularity structure of the solutions, and develop a priori co-norm estimates including time derivatives by the vorticity system. Second, we obtain two independent sufficient conditions for the existence of strong vorticity layers: (I) the limit of the difference between the initial MHD vorticity of velocity or magnetic field and that of the ideal MHD equations is nonzero. (II) The cross product of tangential projection on the free surface of the ideal MHD strain tensor of velocity or magnetic field with the normal vector of the free surface is nonzero. Otherwise, the vorticity layer is weak. Third, we prove high order convergence rates of tangential derivatives and the first order normal derivative in standard Sobolev space, where the convergence rates depend on the ideal MHD boundary value.

preprint2015arXiv

CHAOS: Accurate and Realtime Detection of Aging-Oriented Failure Using Entropy

Even well-designed software systems suffer from chronic performance degradation, also named "software aging", due to internal (e.g. software bugs) and external (e.g. resource exhaustion) impairments. These chronic problems often fly under the radar of software monitoring systems before causing severe impacts (e.g. system failure). Therefore it's a challenging issue how to timely detect these problems to prevent system crash. Although a large quantity of approaches have been proposed to solve this issue, the accuracy and effectiveness of these approaches are still far from satisfactory due to the insufficiency of aging indicators adopted by them. In this paper, we present a novel entropy-based aging indicator, Multidimensional Multi-scale Entropy (MMSE). MMSE employs the complexity embedded in runtime performance metrics to indicate software aging and leverages multi-scale and multi-dimension integration to tolerate system fluctuations. Via theoretical proof and experimental evaluation, we demonstrate that MMSE satisfies Stability, Monotonicity and Integration which we conjecture that an ideal aging indicator should have. Based upon MMSE, we develop three failure detection approaches encapsulated in a proof-of-concept named CHAOS. The experimental evaluations in a Video on Demand (VoD) system and in a real-world production system, AntVision, show that CHAOS can detect the failure-prone state in an extraordinarily high accuracy and a near 0 Ahead-Time-To-Failure (ATTF). Compared to previous approaches, CHAOS improves the detection accuracy by about 5 times and reduces the ATTF even by 3 orders of magnitude. In addition, CHAOS is light-weight enough to satisfy the realtime requirement.

preprint2015arXiv

Making Availability as a Service in the Clouds

Cloud computing has achieved great success in modern IT industry as an excellent computing paradigm due to its flexible management and elastic resource sharing. To date, cloud computing takes an irrepalceable position in our socioeconomic system and influences almost every aspect of our daily life. However, it is still in its infancy, many problems still exist.Besides the hotly-debated security problem, availability is also an urgent issue.With the limited power of availability mechanisms provided in present cloud platform, we can hardly get detailed availability information of current applications such as the root causes of availability problem,mean time to failure, etc. Thus a new mechanism based on deep avaliability analysis is neccessary and benificial.Following the prevalent terminology 'XaaS',this paper proposes a new win-win concept for cloud users and providers in term of 'Availability as a Service' (abbreviated as 'AaaS').The aim of 'AaaS' is to provide comprehensive and aimspecific runtime avaliabilty analysis services for cloud users by integrating plent of data-driven and modeldriven approaches. To illustrate this concept, we realize a prototype named 'EagleEye' with all features of 'AaaS'. By subscribing corresponding services in 'EagleEye', cloud users could get specific availability information of their applications deployed in cloud platform. We envision this new kind of service will be merged into the cloud management mechanism in the near future.

preprint2014arXiv

Bio-inspired Mechanism and Model Exploration of Software Aging

Software systems situated in network environment may experience performance degradation, availability decrease and even crash during long time running, which is called software aging. This phenomenon has been studied for more than 15 years, but most of the literatures studied software as a black box, none of them uncovered the fundamental and widely accepted mechanism of software aging as far as we know. Through analyzing the characteristics between biological aging and software aging, we find some interesting common points and bridge the gap between these two seemingly unrelated phenomena. The free radical aging theory in biological studies is also applicative to explore the mechanism and model of software aging. This paper finds an equivalent concept named `software free radical' in software aging to free radical in biological aging. In our study, the accumulation of `software free radical' is a root cause of software aging. Using the free radical modeling methodology in biological aging, we give a model for describing the kinetic of software aging based on feedback loops. Although this paper doesn't give enough theoretical proof of the modeling method, the practical results show that the feedback loop model can describe the kinetic of software aging precisely. To further validate the aging mechanism, we propose several software rejuvenation strategies focusing on cleaning the `software free radical'. The results show that software aging can be mitigated effectively by strengthening negative feedback loop or weakening positive feedback loop. This paper is the first try to answer the question `How software ages' through interdisciplinary studies. Leveraging the conclusions in this paper, people can design better software systems or keep their systems at a high performance level during long time running.

preprint2014arXiv

Scaling Relations for Galaxies Prior to Reionization

The first galaxies in the Universe are the building blocks of all observed galaxies. We present scaling relations for galaxies forming at redshifts $z \ge 15$ when reionization is just beginning. We utilize the ``Rarepeak' cosmological radiation hydrodynamics simulation that captures the complete star formation history in over 3,300 galaxies, starting with massive Population III stars that form in dark matter halos as small as ~$10^6 M_\odot$. We make various correlations between the bulk halo quantities, such as virial, gas, and stellar masses and metallicities and their respective accretion rates, quantifying a variety of properties of the first galaxies up to halo masses of $10^9 M_\odot$. Galaxy formation is not solely relegated to atomic cooling halos with virial temperatures greater than $10^4$ K, where we find a dichotomy in galaxy properties between halos above and below this critical mass scale. Halos below the atomic cooling limit have a stellar mass -- halo mass relationship $\log M_\star \simeq 3.5 + 1.3\log(M_{\rm vir} / 10^7 M_\odot)$. We find a non-monotonic relationship between metallicity and halo mass for the smallest galaxies. Their initial star formation events enrich the interstellar medium and subsequent star formation to a median of $10^{-2} Z_\odot$ and $10^{-1.5} Z_\odot$, respectively, in halos of total mass $10^7 M_\odot$ that is then diluted by metal-poor inflows, well beyond Population III pre-enrichment levels of $10^{-3.5} Z_\odot$. The scaling relations presented here can be employed in models of reionization, galaxy formation and chemical evolution in order to consider these galaxies forming prior to reionization.

preprint2012arXiv

Observation and Simulation of Longitudinal Oscillations of an Active Region Prominence

Filament longitudinal oscillations have been observed on the solar disk in H$α$. We intend to find an example of the longitudinal oscillations of a prominence, where the magnetic dip can be seen directly, and examine what is the restoring force of such kind of oscillations. We carry out a multiwavelength data analysis of the active region prominence oscillations above the western limb on 2007 February 8. Besides, we perform a one-dimensional hydrodynamic simulation of the longitudinal oscillations. The high-resolution observations by Hinode/SOT indicate that the prominence, seen as a concave-inward shape in lower-resolution Extreme Ultraviolet (EUV) images, actually consists of many concave-outward threads, which is indicative of the existence of magnetic dips. After being injected into the dip region, a bulk of prominence material started to oscillate for more than 3.5 hours, with the period being 52 min. The oscillation decayed with time, with the decay timescale being 133 min. Our hydrodynamic simulation can well reproduce the oscillation period, but the damping timescale in the simulation is 1.5 times as long as the observations. The results clearly show the prominence longitudinal oscillations around the dip of the prominence and our study suggests that the restoring force of the longitudinal oscillations might be the gravity. Radiation and heat conduction are insufficient to explain the decay of the oscillations. Other mechanisms, such as wave leakage and mass accretion, have to be considered. The possible relation between the longitudinal oscillations and the later eruption of a prominence thread, as well as a coronal mass ejection (CME), is also discussed.

Pengfei Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

A Survey on Failure Analysis and Fault Injection in AI Systems

Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios

LIDL: LLM Integration Defect Localization via Knowledge Graph-Enhanced Multi-Agent Analysis

Calibration procedures for the CHASE/HIS science data

Constructing Large-Scale Real-World Benchmark Datasets for AIOps

Object Localization under Single Coarse Point Supervision

Robust Medical Image Classification from Noisy Labeled Data with Global and Local Representation Guided Co-training

The Chinese Hα Solar Explorer (CHASE) mission: An overview

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Integrated Traffic Simulation-Prediction System using Neural Networks with Application to the Los Angeles International Airport Road Network

Inviscid Limit for the Free-Boundary problems of MHD Equations with or without Surface Tension

CHAOS: Accurate and Realtime Detection of Aging-Oriented Failure Using Entropy

Making Availability as a Service in the Clouds

Bio-inspired Mechanism and Model Exploration of Software Aging

Scaling Relations for Galaxies Prior to Reionization

Observation and Simulation of Longitudinal Oscillations of an Active Region Prominence