Researcher profile

Yinghui Wu

Yinghui Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2024arXiv

View-based Explanations for Graph Neural Networks

Generating explanations for graph neural networks (GNNs) has been studied to understand their behavior in analytical tasks such as graph classification. Existing approaches aim to understand the overall results of GNNs rather than providing explanations for specific class labels of interest, and may return explanation structures that are hard to access, nor directly queryable.We propose GVEX, a novel paradigm that generates Graph Views for EXplanation. (1) We design a two-tier explanation structure called explanation views. An explanation view consists of a set of graph patterns and a set of induced explanation subgraphs. Given a database G of multiple graphs and a specific class label l assigned by a GNN-based classifier M, it concisely describes the fraction of G that best explains why l is assigned by M. (2) We propose quality measures and formulate an optimization problem to compute optimal explanation views for GNN explanation. We show that the problem is $Σ^2_P$-hard. (3) We present two algorithms. The first one follows an explain-and-summarize strategy that first generates high-quality explanation subgraphs which best explain GNNs in terms of feature influence maximization, and then performs a summarization step to generate patterns. We show that this strategy provides an approximation ratio of 1/2. Our second algorithm performs a single-pass to an input node stream in batches to incrementally maintain explanation views, having an anytime quality guarantee of 1/4 approximation. Using real-world benchmark data, we experimentally demonstrate the effectiveness, efficiency, and scalability of GVEX. Through case studies, we showcase the practical applications of GVEX.

preprint2022arXiv

Graph Neural Network and Koopman Models for Learning Networked Dynamics: A Comparative Study on Power Grid Transients Prediction

Continuous monitoring of the spatio-temporal dynamic behavior of critical infrastructure networks, such as the power systems, is a challenging but important task. In particular, accurate and timely prediction of the (electro-mechanical) transient dynamic trajectories of the power grid is necessary for early detection of any instability and prevention of catastrophic failures. Existing approaches for the prediction of dynamic trajectories either rely on the availability of accurate physical models of the system, use computationally expensive time-domain simulations, or are applicable only at local prediction problems (e.g., a single generator). In this paper, we report the application of two broad classes of data-driven learning models -- along with their algorithmic implementation and performance evaluation -- in predicting transient trajectories in power networks using only streaming measurements and the network topology as input. One class of models is based on the Koopman operator theory which allows for capturing the nonlinear dynamic behavior via an infinite-dimensional linear operator. The other class of models is based on the graph convolutional neural networks which are adept at capturing the inherent spatio-temporal correlations within the power network. Transient dynamic datasets for training and testing the models are synthesized by simulating a wide variety of load change events in the IEEE 68-bus system, categorized by the load change magnitudes, as well as by the degree of connectivity and the distance to nearest generator nodes. The results confirm that the proposed predictive models can successfully predict the post-disturbance transient evolution of the system with a high level of accuracy.

preprint2022arXiv

Temporal Graph Functional Dependencies [Extended Version]

Data dependencies have been extended to graphs to characterize topological and value constraints. Existing data dependencies are defined to capture inconsistencies in static graphs. Nevertheless, inconsistencies may occur over evolving graphs and only for certain time periods. The need for capturing such inconsistencies in temporal graphs is evident in anomaly detection and predictive dynamic network analysis. This paper introduces a class of data dependencies called Temporal Graph Functional Dependencies (TGFDs). TGFDs generalize functional dependencies to temporal graphs as a sequence of graph snapshots that are induced by time intervals, and enforce both topological constraints and attribute value dependencies that must be satisfied by these snapshots. (1) We establish the complexity results for the satisfiability and implication problems of TGFDs. (2) We propose a sound and complete axiomatization system for TGFDs. (3) We also present efficient parallel algorithms to detect inconsistencies in temporal graphs as violations of TGFDs. The algorithm exploits data and temporal locality induced by time intervals, and uses incremental pattern matching and load balancing strategies to enable feasible error detection in large temporal graphs. Using real datasets, we experimentally verify that our algorithms achieve lower runtimes compared to existing baselines, while improving the accuracy over error detection using existing graph data constraints, e.g., GFDs and GTARs with 55% and 74% gain in F1-score, respectively.

preprint2020arXiv

Vamsa: Automated Provenance Tracking in Data Science Scripts

There has recently been a lot of ongoing research in the areas of fairness, bias and explainability of machine learning (ML) models due to the self-evident or regulatory requirements of various ML applications. We make the following observation: All of these approaches require a robust understanding of the relationship between ML models and the data used to train them. In this work, we introduce the ML provenance tracking problem: the fundamental idea is to automatically track which columns in a dataset have been used to derive the features/labels of an ML model. We discuss the challenges in capturing such information in the context of Python, the most common language used by data scientists. We then present Vamsa, a modular system that extracts provenance from Python scripts without requiring any changes to the users' code. Using 26K real data science scripts, we verify the effectiveness of Vamsa in terms of coverage, and performance. We also evaluate Vamsa's accuracy on a smaller subset of manually labeled data. Our analysis shows that Vamsa's precision and recall range from 90.4% to 99.1% and its latency is in the order of milliseconds for average size scripts. Drawing from our experience in deploying ML models in production, we also present an example in which Vamsa helps automatically identify models that are affected by data corruption issues.