Source author record

Pinghui Wang

Pinghui Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks physics.soc-ph Artificial Intelligence Computation and Language Machine Learning Computation cs.CY Data Structures and Algorithms Information Theory math.IT math.ST Statistics Theory

Catalog footprint

What is connected

17works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents and interaction rounds, producing failures that are difficult to diagnose and rarely translate into structural self-improvement. Existing surveys cover individual agent capabilities, multi-agent collaboration, or agent self-evolution separately, leaving the causal dependencies among them unexamined. This survey provides a unified review organized around four causally linked stages, which we term the LIFE progression: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. For each stage, we provide systematic taxonomies and formally characterize the dependencies between adjacent stages, revealing how each stage both depends on and constrains the next. Beyond synthesizing existing work, we identify open challenges at stage boundaries and propose a cross-stage research agenda for closed-loop multi-agent systems capable of continuously diagnosing failures, reorganizing structures, and refining agent behaviors, extending current coordination frameworks toward more self-organizing forms of collective intelligence. By bridging these previously fragmented research threads, this survey aims to offer both a systematic reference and a conceptual roadmap toward autonomous, self-improving multi-agent intelligence.

preprint2022arXiv

"Think Before You Speak": Improving Multi-Action Dialog Policy by Planning Single-Action Dialogs

Multi-action dialog policy (MADP), which generates multiple atomic dialog actions per turn, has been widely applied in task-oriented dialog systems to provide expressive and efficient system responses. Existing MADP models usually imitate action combinations from the labeled multi-action dialog samples. Due to data limitations, they generalize poorly toward unseen dialog flows. While interactive learning and reinforcement learning algorithms can be applied to incorporate external data sources of real users and user simulators, they take significant manual effort to build and suffer from instability. To address these issues, we propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics to enhance multi-action prediction. Our PEDP method employs model-based planning for conceiving what to express before deciding the current response through simulating single-action dialogs. Experimental results on the MultiWOZ dataset demonstrate that our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.

preprint2020arXiv

Distinguish Confusing Law Articles for Legal Judgment Prediction

Legal Judgment Prediction (LJP) is the task of automatically predicting a law case's judgment results given a text describing its facts, which has excellent prospects in judicial assistance systems and convenient services for the public. In practice, confusing charges are frequent, because law cases applicable to similar law articles are easily misjudged. For addressing this issue, the existing method relies heavily on domain experts, which hinders its application in different law systems. In this paper, we present an end-to-end model, LADAN, to solve the task of LJP. To distinguish confusing charges, we propose a novel graph neural network to automatically learn subtle differences between confusing law articles and design a novel attention mechanism that fully exploits the learned differences to extract compelling discriminative features from fact descriptions attentively. Experiments conducted on real-world datasets demonstrate the superiority of our LADAN.

preprint2020arXiv

Fast Generating A Large Number of Gumbel-Max Variables

The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a nonnegative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element $i$ (or a Gumbel-Max variable $i$) in proportion to its positive weight $v_i$, the Gumbel-Max Trick first computes a Gumbel random variable $g_i$ for each positive weight element $i$, and then samples the element $i$ with the largest value of $g_i+\ln v_i$. Recently, applications including similarity estimation and graph embedding require to generate $k$ independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large $k$ (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, \emph{FastGM}, that reduces the time complexity from $O(kn^+)$ to $O(k \ln k + n^+)$, where $n^+$ is the number of positive elements in the vector of interest. Instead of computing $k$ independent Gumbel random variables directly, we find that there exists a technique to generate these variables in descending order. Using this technique, our method FastGM computes variables $g_i+\ln v_i$ for all positive elements $i$ in descending order. As a result, FastGM significantly reduces the computation time because we can stop the procedure of Gumbel random variables computing for many elements especially for those with small weights. Experiments on a variety of real-world datasets show that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy and incurring additional expenses.

preprint2019arXiv

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

Predicting interactions between structured entities lies at the core of numerous tasks such as drug regimen and new material design. In recent years, graph neural networks have become attractive. They represent structured entities as graphs and then extract features from each individual graph using graph convolution operations. However, these methods have some limitations: i) their networks only extract features from a fix-sized subgraph structure (i.e., a fix-sized receptive field) of each node, and ignore features in substructures of different sizes, and ii) features are extracted by considering each entity independently, which may not effectively reflect the interaction between two entities. To resolve these problems, we present MR-GNN, an end-to-end graph neural network with the following features: i) it uses a multi-resolution based architecture to extract node features from different neighborhoods of each node, and, ii) it uses dual graph-state long short-term memory networks (L-STMs) to summarize local features of each graph and extracts the interaction features between pairwise graphs. Experiments conducted on real-world datasets show that MR-GNN improves the prediction of state-of-the-art methods.

preprint2016arXiv

A Fast Sampling Method of Exploring Graphlet Degrees of Large Directed and Undirected Graphs

Exploring small connected and induced subgraph patterns (CIS patterns, or graphlets) has recently attracted considerable attention. Despite recent efforts on computing the number of instances a specific graphlet appears in a large graph (i.e., the total number of CISes isomorphic to the graphlet), little attention has been paid to characterizing a node's graphlet degree, i.e., the number of CISes isomorphic to the graphlet that include the node, which is an important metric for analyzing complex networks such as social and biological networks. Similar to global graphlet counting, it is challenging to compute node graphlet degrees for a large graph due to the combinatorial nature of the problem. Unfortunately, previous methods of computing global graphlet counts are not suited to solve this problem. In this paper we propose sampling methods to estimate node graphlet degrees for undirected and directed graphs, and analyze the error of our estimates. To the best of our knowledge, we are the first to study this problem and give a fast scalable solution. We conduct experiments on a variety of real-word datasets that demonstrate that our methods accurately and efficiently estimate node graphlet degrees for graphs with millions of edges.

preprint2016arXiv

A General Framework for Estimating Graphlet Statistics via Random Walk

Graphlets are induced subgraph patterns and have been frequently applied to characterize the local topology structures of graphs across various domains, e.g., online social networks (OSNs) and biological networks. Discovering and computing graphlet statistics are highly challenging. First, the massive size of real-world graphs makes the exact computation of graphlets extremely expensive. Secondly, the graph topology may not be readily available so one has to resort to web crawling using the available application programming interfaces (APIs). In this work, we propose a general and novel framework to estimate graphlet statistics of "any size". Our framework is based on collecting samples through consecutive steps of random walks. We derive an analytical bound on the sample size (via the Chernoff-Hoeffding technique) to guarantee the convergence of our unbiased estimator. To further improve the accuracy, we introduce two novel optimization techniques to reduce the lower bound on the sample size. Experimental evaluations demonstrate that our methods outperform the state-of-the-art method up to an order of magnitude both in terms of accuracy and time cost.

preprint2015arXiv

Minfer: Inferring Motif Statistics From Sampled Edges

Characterizing motif (i.e., locally connected subgraph patterns) statistics is important for understanding complex networks such as online social networks and communication networks. Previous work made the strong assumption that the graph topology of interest is known, and that the dataset either fits into main memory or stored on disks such that it is not expensive to obtain all neighbors of any given node. In practice, researchers have to deal with the situation where the graph topology is unknown, either because the graph is dynamic, or because it is expensive to collect and store all topological and meta information on disk. Hence, what is available to researchers is only a snapshot of the graph generated by sampling edges from the graph at random, which we called a "RESampled graph". Clearly, a RESampled graph's motif statistics may be quite different from the underlying original graph. To solve this challenge, we propose a framework and implement a system called Minfer, which can take the given RESampled graph and accurately infer the underlying graph's motif statistics. We also use Fisher information to bound the error of our estimates. Experiments using large scale datasets show that our method to be accurate.

preprint2015arXiv

Moss: A Scalable Tool for Efficiently Sampling and Counting 4- and 5-Node Graphlets

Counting the frequencies of 3-, 4-, and 5-node undirected motifs (also know as graphlets) is widely used for understanding complex networks such as social and biology networks. However, it is a great challenge to compute these metrics for a large graph due to the intensive computation. Despite recent efforts to count triangles (i.e., 3-node undirected motif counting), little attention has been given to developing scalable tools that can be used to characterize 4- and 5-node motifs. In this paper, we develop computational efficient methods to sample and count 4- and 5- node undirected motifs. Our methods provide unbiased estimators of motif frequencies, and we derive simple and exact formulas for the variances of the estimators. Moreover, our methods are designed to fit vertex centric programming models, so they can be easily applied to current graph computing systems such as Pregel and GraphLab. We conduct experiments on a variety of real-word datasets, and experimental results show that our methods are several orders of magnitude faster than the state-of-the-art methods under the same estimation errors.

preprint2015arXiv

Tracking Triadic Cardinality Distributions for Burst Detection in Social Activity Streams

In everyday life, we often observe unusually frequent interactions among people before or during important events, e.g., we receive/send more greetings from/to our friends on Christmas Day, than usual. We also observe that some videos suddenly go viral through people's sharing in online social networks (OSNs). Do these seemingly different phenomena share a common structure? All these phenomena are associated with sudden surges of user activities in networks, which we call "bursts" in this work. We find that the emergence of a burst is accompanied with the formation of triangles in networks. This finding motivates us to propose a new method to detect bursts in OSNs. We first introduce a new measure, "triadic cardinality distribution", corresponding to the fractions of nodes with different numbers of triangles, i.e., triadic cardinalities, within a network. We demonstrate that this distribution changes when a burst occurs, and is naturally immunized against spamming social-bot attacks. Hence, by tracking triadic cardinality distributions, we can reliably detect bursts in OSNs. To avoid handling massive activity data generated by OSN users, we design an efficient sample-estimate solution to estimate the triadic cardinality distribution from sampled data. Extensive experiments conducted on real data demonstrate the usefulness of this triadic cardinality distribution and the effectiveness of our sample-estimate solution.

preprint2014arXiv

Design of Efficient Sampling Methods on Hybrid Social-Affiliation Networks

Graph sampling via crawling has become increasingly popular and important in the study of measuring various characteristics of large scale complex networks. While powerful, it is known to be challenging when the graph is loosely connected or disconnected which slows down the convergence of random walks and can cause poor estimation accuracy. In this work, we observe that the graph under study, or called target graph, usually does not exist in isolation. In many situations, the target graph is related to an auxiliary graph and an affiliation graph, and the target graph becomes well connected when we view it from the perspective of these three graphs together, or called a hybrid social-affiliation graph in this paper. When directly sampling the target graph is difficult or inefficient, we can indirectly sample it efficiently with the assistances of the other two graphs. We design three sampling methods on such a hybrid social-affiliation network. Experiments conducted on both synthetic and real datasets demonstrate the effectiveness of our proposed methods.

preprint2014arXiv

Efficiently Estimating Motif Statistics of Large Networks

Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and online social networks (OSNs). Nowadays the massive size of some critical networks -- often stored in already overloaded relational databases -- effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propose sampling methods to accurately estimate subgraph statistics from as few queried nodes as possible. We present sampling algorithms that efficiently and accurately estimate subgraph properties of massive networks. Our algorithms require no pre-computation or complete network topology information. At the same time, we provide theoretical guarantees of convergence. We perform experiments using widely known data sets, and show that for the same accuracy, our algorithms require an order of magnitude less queries (samples) than the current state-of-the-art algorithms.

preprint2013arXiv

A Peep on the Interplays between Online Video Websites and Online Social Networks

Many online video websites provide the shortcut links to facilitate the video sharing to other websites especially to the online social networks (OSNs). Such video sharing behavior greatly changes the interplays between the two types of websites. For example, users in OSNs may watch and re-share videos shared by their friends from online video websites, and this can also boost the popularity of videos in online video websites and attract more people to watch and share them. Characterizing these interplays can provide great insights for understanding the relationships among online video websites, OSNs, ISPs and so on. In this paper we conduct empirical experiments to study the interplays between video sharing websites and OSNs using three totally different data sources: online video websites, OSNs, and campus network traffic. We find that, a) there are many factors that can affect the external sharing probability of videos in online video websites. b) The popularity of a video itself in online video websites can greatly impact on its popularity in OSNs. Videos in Renren, Qzone (the top two most popular Chinese OSNs) usually attract more viewers than in Sina and Tencent Weibo (the top two most popular Chinese microblogs), which indicates the different natures of the two kinds of OSNs. c) The analysis based on real traffic data illustrates that 10\% of video flows are related to OSNs, and they account for 25\% of traffic generated by all videos.

preprint2013arXiv

Practical Characterization of Large Networks Using Neighborhood Information

Characterizing large online social networks (OSNs) through node querying is a challenging task. OSNs often impose severe constraints on the query rate, hence limiting the sample size to a small fraction of the total network. Various ad-hoc subgraph sampling methods have been proposed, but many of them give biased estimates and no theoretical basis on the accuracy. In this work, we focus on developing sampling methods for OSNs where querying a node also reveals partial structural information about its neighbors. Our methods are optimized for NoSQL graph databases (if the database can be accessed directly), or utilize Web API available on most major OSNs for graph sampling. We show that our sampling method has provable convergence guarantees on being an unbiased estimator, and it is more accurate than current state-of-the-art methods. We characterize metrics such as node label density estimation and edge label density estimation, two of the most fundamental network characteristics from which other network characteristics can be derived. We evaluate our methods on-the-fly over several live networks using their native APIs. Our simulation studies over a variety of offline datasets show that by including neighborhood information, our method drastically (4-fold) reduces the number of samples required to achieve the same estimation accuracy of state-of-the-art methods.

preprint2013arXiv

Sampling Content Distributed Over Graphs

Despite recent effort to estimate topology characteristics of large graphs (i.e., online social networks and peer-to-peer networks), little attention has been given to develop a formal methodology to characterize the vast amount of content distributed over these networks. Due to the large scale nature of these networks, exhaustive enumeration of this content is computationally prohibitive. In this paper, we show how one can obtain content properties by sampling only a small fraction of vertices. We first show that when sampling is naively applied, this can produce a huge bias in content statistics (i.e., average number of content duplications). To remove this bias, one may use maximum likelihood estimation to estimate content characteristics. However our experimental results show that one needs to sample most vertices in the graph to obtain accurate statistics using such a method. To address this challenge, we propose two efficient estimators: special copy estimator (SCE) and weighted copy estimator (WCE) to measure content characteristics using available information in sampled contents. SCE uses the special content copy indicator to compute the estimate, while WCE derives the estimate based on meta-information in sampled vertices. We perform experiments to show WCE and SCE are cost effective and also ``{\em asymptotically unbiased}''. Our methodology provides a new tool for researchers to efficiently query content distributed in large scale networks.

preprint2013arXiv

Social Sensor Placement in Large Scale Networks: A Graph Sampling Perspective

Sensor placement for the purpose of detecting/tracking news outbreak and preventing rumor spreading is a challenging problem in a large scale online social network (OSN). This problem is a kind of subset selection problem: choosing a small set of items from a large population so to maximize some prespecified set function. However, it is known to be NP-complete. Existing heuristics are very costly especially for modern OSNs which usually contain hundreds of millions of users. This paper aims to design methods to find \emph{good solutions} that can well trade off efficiency and accuracy. We first show that it is possible to obtain a high quality solution with a probabilistic guarantee from a "{\em candidate set}" of the underlying social network. By exploring this candidate set, one can increase the efficiency of placing social sensors. We also present how this candidate set can be obtained using "{\em graph sampling}", which has an advantage over previous methods of not requiring the prior knowledge of the complete network topology. Experiments carried out on two real datasets demonstrate not only the accuracy and efficiency of our approach, but also effectiveness in detecting and predicting news outbreak.

preprint2012arXiv

On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling

In this work we study the set size distribution estimation problem, where elements are randomly sampled from a collection of non-overlapping sets and we seek to recover the original set size distribution from the samples. This problem has applications to capacity planning, network theory, among other areas. Examples of real-world applications include characterizing in-degree distributions in large graphs and uncovering TCP/IP flow size distributions on the Internet. We demonstrate that it is hard to estimate the original set size distribution. The recoverability of original set size distributions presents a sharp threshold with respect to the fraction of elements that remain in the sets. If this fraction remains below a threshold, typically half of the elements in power-law and heavier-than-exponential-tailed distributions, then the original set size distribution is unrecoverable. We also discuss practical implications of our findings.

Pinghui Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

"Think Before You Speak": Improving Multi-Action Dialog Policy by Planning Single-Action Dialogs

Distinguish Confusing Law Articles for Legal Judgment Prediction

Fast Generating A Large Number of Gumbel-Max Variables

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

A Fast Sampling Method of Exploring Graphlet Degrees of Large Directed and Undirected Graphs

A General Framework for Estimating Graphlet Statistics via Random Walk

Minfer: Inferring Motif Statistics From Sampled Edges

Moss: A Scalable Tool for Efficiently Sampling and Counting 4- and 5-Node Graphlets

Tracking Triadic Cardinality Distributions for Burst Detection in Social Activity Streams

Design of Efficient Sampling Methods on Hybrid Social-Affiliation Networks

Efficiently Estimating Motif Statistics of Large Networks

A Peep on the Interplays between Online Video Websites and Online Social Networks

Practical Characterization of Large Networks Using Neighborhood Information

Sampling Content Distributed Over Graphs

Social Sensor Placement in Large Scale Networks: A Graph Sampling Perspective

On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling