Researcher profile

Wenyuan Yu

Wenyuan Yu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Revisiting Graph Analytics Benchmark

The rise of graph analytics platforms has led to the development of various benchmarks for evaluating and comparing platform performance. However, existing benchmarks often fall short of fully assessing performance due to limitations in core algorithm selection, data generation processes (and the corresponding synthetic datasets), as well as the neglect of API usability evaluation. To address these shortcomings, we propose a novel graph analytics benchmark. First, we select eight core algorithms by extensively reviewing both academic and industrial settings. Second, we design an efficient and flexible data generator and produce eight new synthetic datasets as the default datasets for our benchmark. Lastly, we introduce a multi-level large language model (LLM)-based framework for API usability evaluation-the first of its kind in graph analytics benchmarks. We conduct comprehensive experimental evaluations on existing platforms (GraphX, PowerGraph, Flash, Grape, Pregel+, Ligra and G-thinker). The experimental results demonstrate the superiority of our proposed benchmark.

preprint2023arXiv

Unicron: Economizing Self-Healing LLM Training at Scale

Training large-scale language models is increasingly critical in various domains, but it is hindered by frequent failures, leading to significant time and economic costs. Current failure recovery methods in cloud-based settings inadequately address the diverse and complex scenarios that arise, focusing narrowly on erasing downtime for individual tasks without considering the overall cost impact on a cluster. We introduce Unicron, a workload manager designed for efficient self-healing in large-scale language model training. Unicron optimizes the training process by minimizing failure-related costs across multiple concurrent tasks within a cluster. Its key features include in-band error detection for real-time error identification without extra overhead, a dynamic cost-aware plan generation mechanism for optimal reconfiguration, and an efficient transition strategy to reduce downtime during state changes. Deployed on a 128-GPU distributed cluster, Unicron demonstrates up to a 1.9x improvement in training efficiency over state-of-the-art methods, significantly reducing failure recovery costs and enhancing the reliability of large-scale language model training.

preprint2022arXiv

A Coronal Mass Ejection and Magnetic Ejecta Observed In Situ by STEREO-A and Wind at 55$^\circ$ Angular Separation

We present an analysis of {\it in situ} and remote-sensing measurements of a coronal mass ejection (CME) that erupted on 2021 February 20 and impacted both the Solar TErrestrial RElations Observatory (STEREO)-A and the {\it Wind} spacecraft, which were separated longitudinally by 55$^\circ$. Measurements on 2021 February 24 at both spacecraft are consistent with the passage of a magnetic ejecta (ME), making this one of the widest reported multi-spacecraft ME detections. The CME is associated with a low-inclined and wide filament eruption from the Sun's southern hemisphere, which propagates between STEREO-A and {\it Wind} around E34. At STEREO-A, the measurements indicate the passage of a moderately fast ($\sim 425$~km\,s$^{-1}$) shock-driving ME, occurring 2--3 days after the end of a high speed stream (HSS). At {\it Wind}, the measurements show a faster ($\sim 490$~km\,s$^{-1}$) and much shorter ME, not preceded by a shock nor a sheath, and occurring inside the back portion of the HSS. The ME orientation measured at both spacecraft is consistent with a passage close to the legs of a curved flux rope. The short duration of the ME observed at {\it Wind} and the difference in the suprathermal electron pitch-angle data between the two spacecraft are the only results that do not satisfy common expectations. We discuss the consequence of these measurements on our understanding of the CME shape and extent and the lack of clear signatures of the interaction between the CME and the HSS.

preprint2022arXiv

Banyan: A Scoped Dataflow Engine for Graph Query Service

Graph query services (GQS) are widely used today to interactively answer graph traversal queries on large-scale graph data. Existing graph query engines focus largely on optimizing the latency of a single query. This ignores significant challenges posed by GQS, including fine-grained control and scheduling during query execution, as well as performance isolation and load balancing in various levels from across user to intra-query. To tackle these control and scheduling challenges, we propose a novel scoped dataflow for modeling graph traversal queries, which explicitly exposes concurrent execution and control of any subquery to the finest granularity. We implemented Banyan, an engine based on the scoped dataflow model for GQS. Banyan focuses on scaling up the performance on a single machine, and provides the ability to easily scale out. Extensive experiments on multiple benchmarks show that Banyan improves performance by up to three orders of magnitude over state-of-the-art graph query engines, while providing performance isolation and load balancing.

preprint2022arXiv

DMCS : Density Modularity based Community Search

Community Search, or finding a connected subgraph (known as a community) containing the given query nodes in a social network, is a fundamental problem. Most of the existing community search models only focus on the internal cohesiveness of a community. However, a high-quality community often has high modularity, which means dense connections inside communities and sparse connections to the nodes outside the community. In this paper, we conduct a pioneer study on searching a community with high modularity. We point out that while modularity has been popularly used in community detection (without query nodes), it has not been adopted for community search, surprisingly, and its application in community search (related to query nodes) brings in new challenges. We address these challenges by designing a new graph modularity function named Density Modularity. To the best of our knowledge, this is the first work on the community search problem using graph modularity. The community search based on the density modularity, termed as DMCS, is to find a community in a social network that contains all the query nodes and has high density-modularity. We prove that the DMCS problem is NP-hard. To efficiently address DMCS, we present new algorithms that run in log-linear time to the graph size. We conduct extensive experimental studies in real-world and synthetic networks, which offer insights into the efficiency and effectiveness of our algorithms. In particular, our algorithm achieves up to 8.5 times higher accuracy in terms of NMI than baseline algorithms.

preprint2022arXiv

On the utility of flux rope models for CME magnetic structure below 30$R_{\odot}$

We present a comprehensive analysis of the three-dimensional magnetic flux rope structure generated during the Lynch et al. (2019) magnetohydrodynamic (MHD) simulation of a global-scale, 360 degree-wide streamer blowout coronal mass ejection (CME) eruption. We create both fixed and moving synthetic spacecraft to generate time series of the MHD variables through different regions of the flux rope CME. Our moving spacecraft trajectories are derived from the spatial coordinates of Parker Solar Probe's past encounters 7 and 9 and future encounter 23. Each synthetic time series through the simulation flux rope ejecta is fit with three different in-situ flux rope models commonly used to characterize the large-scale, coherent magnetic field rotations observed in a significant fraction of interplanetary CMEs (ICMEs). We present each of the in-situ flux rope model fits to the simulation data and discuss the similarities and differences between the model fits and the MHD simulation's flux rope spatial orientations, field strengths and rotations, expansion profiles, and magnetic flux content. We compare in-situ model properties to those calculated with the MHD data for both classic bipolar and unipolar ICME flux rope configurations as well as more problematic profiles such as those with a significant radial component to the flux rope axis orientation or profiles obtained with large impact parameters. We find general agreement among the in-situ flux rope fitting results for the classic profiles and much more variation among results for the problematic profiles. We also examine the force-free assumption for a subset of the flux rope models and quantify properties of the Lorentz force within MHD ejecta intervals. We conclude that the in-situ flux rope models are generally a decent approximation to the field structure, but all the caveats associated with in-situ flux rope models will still apply...