Researcher profile

Shenghua Liu

Shenghua Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Learning node embeddings via summary graphs: a brief theoretical analysis

Graph representation learning plays an important role in many graph mining applications, but learning embeddings of large-scale graphs remains a problem. Recent works try to improve scalability via graph summarization -- i.e., they learn embeddings on a smaller summary graph, and then restore the node embeddings of the original graph. However, all existing works depend on heuristic designs and lack theoretical analysis. Different from existing works, we contribute an in-depth theoretical analysis of three specific embedding learning methods based on introduced kernel matrix, and reveal that learning embeddings via graph summarization is actually learning embeddings on a approximate graph constructed by the configuration model. We also give analysis about approximation error. To the best of our knowledge, this is the first work to give theoretical analysis of this approach. Furthermore, our analysis framework gives interpretation of some existing methods and provides great insights for future work on this problem.

preprint2022arXiv

MonLAD: Money Laundering Agents Detection in Transaction Streams

Given a stream of money transactions between accounts in a bank, how can we accurately detect money laundering agent accounts and suspected behaviors in real-time? Money laundering agents try to hide the origin of illegally obtained money by dispersive multiple small transactions and evade detection by smart strategies. Therefore, it is challenging to accurately catch such fraudsters in an unsupervised manner. Existing approaches do not consider the characteristics of those agent accounts and are not suitable to the streaming settings. Therefore, we propose MonLAD and MonLAD-W to detect money laundering agent accounts in a transaction stream by keeping track of their residuals and other features; we devise AnoScore algorithm to find anomalies based on the robust measure of statistical deviation. Experimental results show that MonLAD outperforms the state-of-the-art baselines on real-world data and finds various suspicious behavior patterns of money laundering. Additionally, several detected suspected accounts have been manually-verified as agents in real money laundering scenario.

preprint2022arXiv

Multi-scale Anomaly Detection for Big Time Series of Industrial Sensors

Given a multivariate big time series, can we detect anomalies as soon as they occur? Many existing works detect anomalies by learning how much a time series deviates away from what it should be in the reconstruction framework. However, most models have to cut the big time series into small pieces empirically since optimization algorithms cannot afford such a long series. The question is raised: do such cuts pollute the inherent semantic segments, like incorrect punctuation in sentences? Therefore, we propose a reconstruction-based anomaly detection method, MissGAN, iteratively learning to decode and encode naturally smooth time series in coarse segments, and finding out a finer segment from low-dimensional representations based on HMM. As a result, learning from multi-scale segments, MissGAN can reconstruct a meaningful and robust time series, with the help of adversarial regularization and extra conditional states. MissGAN does not need labels or only needs labels of normal instances, making it widely applicable. Experiments on industrial datasets of real water network sensors show our MissGAN outperforms the baselines with scalability. Besides, we use a case study on the CMU Motion dataset to demonstrate that our model can well distinguish unexpected gestures from a given conditional motion.

preprint2020arXiv

Summarizing graphs using the configuration model

Given a large graph, how can we summarize it with fewer nodes and edges while maintaining its key properties, such as spectral property? Although graphs play more and more important roles in many real-world applications, the growth of their size presents great challenges to graph analysis. As a solution, graph summarization, which aims to find a compact representation that preserves the important properties of a given graph, has received much attention, and numerous algorithms have been developed for it. However, most of the algorithms adopt the uniform reconstruction scheme, which is based on an unrealistic assumption that edges are uniformly distributed. In this work, we propose a novel and realistic reconstruction scheme, which preserves the degree of nodes, and we develop an efficient graph summarization algorithm called DPGS based on the Minimum Description Length principle. We theoretically analyze the difference between the original and summary graphs from a spectral perspective, and we perform extensive experiments on multiple real-world datasets. The results show that DPGS yields compact representation that preserves the essential properties of the original graph.