Source author record

My T. Thai

My T. Thai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Data Structures and Algorithms Machine Learning Cryptography and Security physics.soc-ph Artificial Intelligence Computer Science and Game Theory cs.CY eess.SY Human-Computer Interaction quant-ph Systems and Control

Catalog footprint

What is connected

19works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

QUPID: A Partitioned Quantum Neural Network for Anomaly Detection in Smart Grid

Smart grid infrastructures have revolutionized energy distribution, but their day-to-day operations require robust anomaly detection methods to counter risks associated with cyber-physical threats and system faults potentially caused by natural disasters, equipment malfunctions, and cyber attacks. Conventional machine learning (ML) models are effective in several domains, yet they struggle to represent the complexities observed in smart grid systems. Furthermore, traditional ML models are highly susceptible to adversarial manipulations, making them increasingly unreliable for real-world deployment. Quantum ML (QML) provides a unique advantage, utilizing quantum-enhanced feature representations to model the intricacies of the high-dimensional nature of smart grid systems while demonstrating greater resilience to adversarial manipulation. In this work, we propose QUPID, a partitioned quantum neural network (PQNN) that outperforms traditional state-of-the-art ML models in anomaly detection. We extend our model to R-QUPID that even maintains its performance when including differential privacy (DP) for enhanced robustness. Moreover, our partitioning framework addresses a significant scalability problem in QML by efficiently distributing computational workloads, making quantum-enhanced anomaly detection practical in large-scale smart grid environments. Our experimental results across various scenarios exemplifies the efficacy of QUPID and R-QUPID to significantly improve anomaly detection capabilities and robustness compared to traditional ML approaches.

preprint2026arXiv

Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders

Learning hierarchical features in Sparse Autoencoders (SAEs) is essential for capturing the structured nature of real-world data and mitigating issues like feature absorption or splitting. Existing works attempt to identify hierarchical relationships within independent feature sets by relying on activation coverage, the assumption that child feature should only activate when its parent feature activates. However, we demonstrate that this condition alone is insufficient; that is, it often produces false positives where parent and child concepts are semantically unrelated. To address this, we introduce a novel reconstruction condition that enforces a deeper functional link between hierarchical levels. By combining both activation and reconstruction constraints, we propose the Tree SAE, a model designed to learn hierarchical structures directly from within the feature set. Our results demonstrate that Tree SAEs significantly surpass the existing SAEs at learning hierarchical pairs while maintaining competitive performance to the state-of-the-art on several key benchmarks. Finally, we demonstrate the practical utility of our Tree SAE in mapping the geometry of child feature subspaces and uncovering the complex hierarchical concept structures encoded within large language models.

preprint2023arXiv

Investigating the Dynamics of Social Norm Emergence within Online Communities

Although the effects of the social norm on mitigating misinformation are identified, scant knowledge exists about the patterns of social norm emergence, such as the patterns and variations of social tipping in online communities with diverse characteristics. Accordingly, this study investigates the features of social tipping in online communities and examines the correlations between the tipping features and characteristics of online communities. Taking the side effects of COVID-19 vaccination as the case topic, we first track the patterns of tipping features in 100 online communities, which are detected using Louvain Algorithm from the aggregated communication network on Twitter between May 2020 and April 2021. Then, we use multi-variant linear regression to explore the correlations between tipping features and community characteristics. We find that social tipping in online communities can sustain for two to four months and lead to a 50% increase in populations who accept the normative belief in online communities. The regression indicates that the duration of social tipping is positively related to the community populations and original acceptance of social norms, while the correlation between the tipping duration and the degrees among community members is negative. Additionally, the network modularity and original acceptance of social norms have negative relationships with the extent of social tipping, while the degree and betweenness centrality can have significant positive relationships with the extent of tipping. Our findings shed light on more precise normative interventions on misinformation in digital environments as it offers preliminary evidence about the timing and mechanism of social norm emergence.

preprint2022arXiv

An Explainer for Temporal Graph Neural Networks

Temporal graph neural networks (TGNNs) have been widely used for modeling time-evolving graph-related tasks due to their ability to capture both graph topology dependency and non-linear temporal dynamic. The explanation of TGNNs is of vital importance for a transparent and trustworthy model. However, the complex topology structure and temporal dependency make explaining TGNN models very challenging. In this paper, we propose a novel explainer framework for TGNN models. Given a time series on a graph to be explained, the framework can identify dominant explanations in the form of a probabilistic graphical model in a time period. Case studies on the transportation domain demonstrate that the proposed approach can discover dynamic dependency structures in a road network for a time period.

preprint2022arXiv

Blockchain-based Secure Client Selection in Federated Learning

Despite the great potential of Federated Learning (FL) in large-scale distributed learning, the current system is still subject to several privacy issues due to the fact that local models trained by clients are exposed to the central server. Consequently, secure aggregation protocols for FL have been developed to conceal the local models from the server. However, we show that, by manipulating the client selection process, the server can circumvent the secure aggregation to learn the local models of a victim client, indicating that secure aggregation alone is inadequate for privacy protection. To tackle this issue, we leverage blockchain technology to propose a verifiable client selection protocol. Owing to the immutability and transparency of blockchain, our proposed protocol enforces a random selection of clients, making the server unable to control the selection process at its discretion. We present security proofs showing that our protocol is secure against this attack. Additionally, we conduct several experiments on an Ethereum-like blockchain to demonstrate the feasibility and practicality of our solution.

preprint2022arXiv

Efficient Algorithms for Monotone Non-Submodular Maximization with Partition Matroid Constraint

In this work, we study the problem of monotone non-submodular maximization with partition matroid constraint. Although a generalization of this problem has been studied in literature, our work focuses on leveraging properties of partition matroid constraint to (1) propose algorithms with theoretical bound and efficient query complexity; and (2) provide better analysis on theoretical performance guarantee of some existing techniques. We further investigate those algorithms' performance in two applications: Boosting Influence Spread and Video Summarization. Experiments show our algorithms return comparative results to the state-of-the-art algorithms while taking much fewer queries.

preprint2022arXiv

FastHare: Fast Hamiltonian Reduction for Large-scale Quantum Annealing

Quantum annealing (QA) that encodes optimization problems into Hamiltonians remains the only near-term quantum computing paradigm that provides sufficient many qubits for real-world applications. To fit larger optimization instances on existing quantum annealers, reducing Hamiltonians into smaller equivalent Hamiltonians provides a promising approach. Unfortunately, existing reduction techniques are either computationally expensive or ineffective in practice. To this end, we introduce a novel notion of non-separable~group, defined as a subset of qubits in a Hamiltonian that obtains the same value in optimal solutions. We develop a theoretical framework on non-separability accordingly and propose FastHare, a highly efficient reduction method. FastHare, iteratively, detects and merges non-separable groups into single qubits. It does so within a provable worst-case time complexity of only $O(αn^2)$, for some user-defined parameter $α$. Our extensive benchmarks for the feasibility of the reduction are done on both synthetic Hamiltonians and 3000+ instances from the MQLIB library. The results show FastHare outperforms the roof duality, the implemented reduction method in D-Wave's SDK library, with 3.6x higher average reduction ratio. It demonstrates a high level of effectiveness with an average of 62% qubits saving and 0.3s processing time, advocating for Hamiltonian reduction as an inexpensive necessity for QA.

preprint2022arXiv

Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning

In this paper, we show that the process of continually learning new tasks and memorizing previous tasks introduces unknown privacy risks and challenges to bound the privacy loss. Based upon this, we introduce a formal definition of Lifelong DP, in which the participation of any data tuples in the training set of any tasks is protected, under a consistently bounded DP protection, given a growing stream of tasks. A consistently bounded DP means having only one fixed value of the DP privacy budget, regardless of the number of tasks. To preserve Lifelong DP, we propose a scalable and heterogeneous algorithm, called L2DP-ML with a streaming batch training, to efficiently train and continue releasing new versions of an L2M model, given the heterogeneity in terms of data sizes and the training order of tasks, without affecting DP protection of the private training set. An end-to-end theoretical analysis and thorough evaluations show that our mechanism is significantly better than baseline approaches in preserving Lifelong DP. The implementation of L2DP-ML is available at: https://github.com/haiphanNJIT/PrivateDeepLearning.

preprint2022arXiv

SaPHyRa: A Learning Theory Approach to Ranking Nodes in Large Networks

Ranking nodes based on their centrality stands a fundamental, yet, challenging problem in large-scale networks. Approximate methods can quickly estimate nodes' centrality and identify the most central nodes, but the ranking for the majority of remaining nodes may be meaningless. For example, ranking for less-known websites in search queries is known to be noisy and unstable. To this end, we investigate a new node ranking problem with two important distinctions: a) ranking quality, rather than the centrality estimation quality, as the primary objective; and b) ranking only nodes of interest, e.g., websites that matched search criteria. We propose Sample space Partitioning Hypothesis Ranking, or SaPHyRa, that transforms node ranking into a hypothesis ranking in machine learning. This transformation maps nodes' centrality to the expected risks of hypotheses, opening doors for theoretical machine learning (ML) tools. The key of SaPHyRa is to partition the sample space into exact and approximate subspaces. The exact subspace contains samples related to the nodes of interest, increasing both estimation and ranking qualities. The approximate space can be efficiently sampled with ML-based techniques to provide theoretical guarantees on the estimation error. Lastly, we present SaPHyRa_bc, an illustration of SaPHyRa on ranking nodes' betweenness centrality (BC). By combining a novel bi-component sampling, a 2-hop sample partitioning, and improved bounds on the Vapnik-Chervonenkis dimension, SaPHyRa_bc can effectively rank any node subset in BC. Its performance is up to 200x faster than state-of-the-art methods in approximating BC, while its rank correlation to the ground truth is improved by multifold.

preprint2020arXiv

A Blockchain-based Iterative Double Auction Protocol using Multiparty State Channels

Although the iterative double auction has been widely used in many different applications, one of the major problems in its current implementations is that they rely on a trusted third party to handle the auction process. This imposes the risk of single point of failures, monopoly, and bribery. In this paper, we aim to tackle this problem by proposing a novel decentralized and trustless framework for iterative double auction based on blockchain. Our design adopts the smart contract and state channel technologies to enable a double auction process among parties that do not need to trust each other, while minimizing the blockchain transactions. In specific, we propose an extension to the original concept of state channels that can support multiparty computation. Then we provide a formal development of the proposed framework and prove the security of our design against adversaries. Finally, we develop a proof-of-concept implementation of our framework using Elixir and Solidity, on which we conduct various experiments to demonstrate its feasibility and practicality.

preprint2020arXiv

Auditing the Sensitivity of Graph-based Ranking with Visual Analytics

Graph mining plays a pivotal role across a number of disciplines, and a variety of algorithms have been developed to answer who/what type questions. For example, what items shall we recommend to a given user on an e-commerce platform? The answers to such questions are typically returned in the form of a ranked list, and graph-based ranking methods are widely used in industrial information retrieval settings. However, these ranking algorithms have a variety of sensitivities, and even small changes in rank can lead to vast reductions in product sales and page hits. As such, there is a need for tools and methods that can help model developers and analysts explore the sensitivities of graph ranking algorithms with respect to perturbations within the graph structure. In this paper, we present a visual analytics framework for explaining and exploring the sensitivity of any graph-based ranking algorithm by performing perturbation-based what-if analysis. We demonstrate our framework through three case studies inspecting the sensitivity of two classic graph-based ranking algorithms (PageRank and HITS) as applied to rankings in political news media and social networks.

preprint2020arXiv

c-Eval: A Unified Metric to Evaluate Feature-based Explanations via Perturbation

In many modern image-classification applications, understanding the cause of model's prediction can be as critical as the prediction's accuracy itself. Various feature-based local explanations generation methods have been designed to give us more insights on the decision of complex classifiers. Nevertheless, there is no consensus on evaluating the quality of different explanations. In response to this lack of comprehensive evaluation, we introduce the c-Eval metric and its corresponding framework to quantify the feature-based local explanation's quality. Given a classifier's prediction and the corresponding explanation on that prediction, c-Eval is the minimum-distortion perturbation that successfully alters the prediction while keeping the explanation's features unchanged. We then demonstrate how c-Eval can be computed using some modifications on existing adversarial generation libraries. To show that c-Eval captures the importance of input's features, we establish the connection between c-Eval and the features returned by explainers in affine and nearly-affine classifiers. We then introduce the c-Eval plot, which not only displays a strong connection between c-Eval and explainers' quality, but also helps automatically determine explainer's parameters. Since the generation of c-Eval relies on adversarial generation, we provide a demo of c-Eval on adversarial-robust models and show that the metric is applicable in those models. Finally, extensive experiments of explainers on different datasets are conducted to support the adoption of c-Eval in evaluating explainers' performance.

preprint2020arXiv

Cost-aware Targeted Viral Marketing: Approximation with Less Samples

Cost-aware Targeted Viral Marketing (CTVM), a generalization of Influence Maximization (IM), has received a lot of attentions recently due to its commercial values. Previous approximation algorithms for this problem required a large number of samples to ensure approximate guarantee. In this paper, we propose an efficient approximation algorithm which uses fewer samples but provides the same theoretical guarantees based on generating and using important samples in its operation. Experiments on real social networks show that our proposed method outperforms the state-of-the-art algorithm which provides the same approximation ratio in terms of the number of required samples and running time.

preprint2020arXiv

Length-Bounded Paths Interdiction in Continuous Domain for Network Performance Assessment

Studying on networked systems, in which a communication between nodes is functional if their distance under a given metric is lower than a pre-defined threshold, has received significant attention recently. In this work, we propose a metric to measure network resilience on guaranteeing the pre-defined performance constraint. This metric is investigated under an optimization problem, namely \textbf{Length-bounded Paths Interdiction in Continuous Domain} (cLPI), which aims to identify a minimum set of nodes whose changes cause routing paths between nodes become undesirable for the network service. We show the problem is NP-hard and propose a framework by designing two oracles, \textit{Threshold Blocking} (TB) and \textit{Critical Path Listing} (CPL), which communicate back and forth to construct a feasible solution to cLPI with theoretical bicriteria approximation guarantees. Based on this framework, we propose two solutions for each oracle. Each combination of one solution to \tb and one solution to \cpl gives us a solution to cLPI. The bicriteria guarantee of our algorithms allows us to control the solutions's trade-off between the returned size and the performance accuracy. New insights into the advantages of each solution are further discussed via experimental analysis.

preprint2020arXiv

Scalable Differential Privacy with Certified Robustness in Adversarial Learning

In this paper, we aim to develop a scalable algorithm to preserve differential privacy (DP) in adversarial learning for deep neural networks (DNNs), with certified robustness to adversarial examples. By leveraging the sequential composition theory in DP, we randomize both input and latent spaces to strengthen our certified robustness bounds. To address the trade-off among model utility, privacy loss, and robustness, we design an original adversarial objective function, based on the post-processing property in DP, to tighten the sensitivity of our model. A new stochastic batch training is proposed to apply our mechanism on large DNNs and datasets, by bypassing the vanilla iterative batch-by-batch training in DP DNNs. An end-to-end theoretical analysis and evaluations show that our mechanism notably improves the robustness and scalability of DP DNNs.

preprint2020arXiv

Threat from being Social: Vulnerability Analysis of Social Network Coupled Smart Grid

Social Networks (SNs) have been gradually applied by utility companies as an addition to smart grid and are proved to be helpful in smoothing load curves and reducing energy usage. However, SNs also bring in new threats to smart grid: misinformation in SNs may cause smart grid users to alter their demand, resulting in transmission line overloading and in turn leading to catastrophic impact to the grid. In this paper, we discuss the interdependency in the social network coupled smart grid and focus on its vulnerability. That is, how much can the smart grid be damaged when misinformation related to it diffuses in SNs? To analytically study the problem, we propose the Misinformation Attack Problem in Social-Smart Grid (MAPSS) that identifies the top critical nodes in the SN, such that the smart grid can be greatly damaged when misinformation propagates from those nodes. This problem is challenging as we have to incorporate the complexity of the two networks concurrently. Nevertheless, we propose a technique that can explicitly take into account information diffusion in SN, power flow balance and cascading failure in smart grid integratedly when evaluating node criticality, based on which we propose various strategies in selecting the most critical nodes. Also, we introduce controlled load shedding as a protection strategy to reduce the impact of cascading failure. The effectiveness of our algorithms are demonstrated by experiments on IEEE bus test cases as well as the Pegase data set.

preprint2016arXiv

Least Cost Influence Maximization Across Multiple Social Networks

Recently in Online Social Networks (OSNs), the Least Cost Influence (LCI) problem has become one of the central research topics. It aims at identifying a minimum number of seed users who can trigger a wide cascade of information propagation. Most of existing literature investigated the LCI problem only based on an individual network. However, nowadays users often join several OSNs such that information could be spread across different networks simultaneously. Therefore, in order to obtain the best set of seed users, it is crucial to consider the role of overlapping users under this circumstances. In this article, we propose a unified framework to represent and analyze the influence diffusion in multiplex networks. More specifically, we tackle the LCI problem by mapping a set of networks into a single one via lossless and lossy coupling schemes. The lossless coupling scheme preserves all properties of original networks to achieve high quality solutions, while the lossy coupling scheme offers an attractive alternative when the running time and memory consumption are of primary concern. Various experiments conducted on both real and synthesized datasets have validated the effectiveness of the coupling schemes, which also provide some interesting insights into the process of influence propagation in multiplex networks.

preprint2016arXiv

Network Clustering via Maximizing Modularity: Approximation Algorithms and Theoretical Limits

Many social networks and complex systems are found to be naturally divided into clusters of densely connected nodes, known as community structure (CS). Finding CS is one of fundamental yet challenging topics in network science. One of the most popular classes of methods for this problem is to maximize Newman's modularity. However, there is a little understood on how well we can approximate the maximum modularity as well as the implications of finding community structure with provable guarantees. In this paper, we settle definitely the approximability of modularity clustering, proving that approximating the problem within any (multiplicative) positive factor is intractable, unless P = NP. Yet we propose the first additive approximation algorithm for modularity clustering with a constant factor. Moreover, we provide a rigorous proof that a CS with modularity arbitrary close to maximum modularity QOPT might bear no similarity to the optimal CS of maximum modularity. Thus even when CS with near-optimal modularity are found, other verification methods are needed to confirm the significance of the structure.

preprint2011arXiv

Finding Community Structure with Performance Guarantees in Complex Networks

Many networks including social networks, computer networks, and biological networks are found to divide naturally into communities of densely connected individuals. Finding community structure is one of fundamental problems in network science. Since Newman's suggestion of using \emph{modularity} as a measure to qualify the goodness of community structures, many efficient methods to maximize modularity have been proposed but without a guarantee of optimality. In this paper, we propose two polynomial-time algorithms to the modularity maximization problem with theoretical performance guarantees. The first algorithm comes with a \emph{priori guarantee} that the modularity of found community structure is within a constant factor of the optimal modularity when the network has the power-law degree distribution. Despite being mainly of theoretical interest, to our best knowledge, this is the first approximation algorithm for finding community structure in networks. In our second algorithm, we propose a \emph{sparse metric}, a substantially faster linear programming method for maximizing modularity and apply a rounding technique based on this sparse metric with a \emph{posteriori approximation guarantee}. Our experiments show that the rounding algorithm returns the optimal solutions in most cases and are very scalable, that is, it can run on a network of a few thousand nodes whereas the LP solution in the literature only ran on a network of at most 235 nodes.

My T. Thai

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

QUPID: A Partitioned Quantum Neural Network for Anomaly Detection in Smart Grid

Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders

Investigating the Dynamics of Social Norm Emergence within Online Communities

An Explainer for Temporal Graph Neural Networks

Blockchain-based Secure Client Selection in Federated Learning

Efficient Algorithms for Monotone Non-Submodular Maximization with Partition Matroid Constraint

FastHare: Fast Hamiltonian Reduction for Large-scale Quantum Annealing

Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning

SaPHyRa: A Learning Theory Approach to Ranking Nodes in Large Networks

A Blockchain-based Iterative Double Auction Protocol using Multiparty State Channels

Auditing the Sensitivity of Graph-based Ranking with Visual Analytics

c-Eval: A Unified Metric to Evaluate Feature-based Explanations via Perturbation

Cost-aware Targeted Viral Marketing: Approximation with Less Samples

Length-Bounded Paths Interdiction in Continuous Domain for Network Performance Assessment

Scalable Differential Privacy with Certified Robustness in Adversarial Learning

Threat from being Social: Vulnerability Analysis of Social Network Coupled Smart Grid

Least Cost Influence Maximization Across Multiple Social Networks

Network Clustering via Maximizing Modularity: Approximation Algorithms and Theoretical Limits

Finding Community Structure with Performance Guarantees in Complex Networks