Source author record

Hung Nguyen

Hung Nguyen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Cryptography and Security Distributed, Parallel, and Cluster Computing Machine Learning math.PR Networking and Internet Architecture Neural and Evolutionary Computing nlin.CD Social and Information Networks

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Federated Learning for distribution skewed data using sample weights

One of the most challenging issues in federated learning is that the data is often not independent and identically distributed (nonIID). Clients are expected to contribute the same type of data and drawn from one global distribution. However, data are often collected in different ways from different resources. Thus, the data distributions among clients might be different from the underlying global distribution. This creates a weight divergence issue and reduces federated learning performance. This work focuses on improving federated learning performance for skewed data distribution across clients. The main idea is to adjust the client distribution closer to the global distribution using sample weights. Thus, the machine learning model converges faster with higher accuracy. We start from the fundamental concept of empirical risk minimization and theoretically derive a solution for adjusting the distribution skewness using sample weights. To determine sample weights, we implicitly exchange density information by leveraging a neural network-based density estimation model, MADE. The clients data distribution can then be adjusted without exposing their raw data. Our experiment results on three real-world datasets show that the proposed method not only improves federated learning accuracy but also significantly reduces communication costs compared to the other experimental methods.

preprint2023arXiv

Defending Active Directory by Combining Neural Network based Dynamic Program and Evolutionary Diversity Optimisation

Active Directory (AD) is the default security management system for Windows domain networks. We study a Stackelberg game model between one attacker and one defender on an AD attack graph. The attacker initially has access to a set of entry nodes. The attacker can expand this set by strategically exploring edges. Every edge has a detection rate and a failure rate. The attacker aims to maximize their chance of successfully reaching the destination before getting detected. The defender's task is to block a constant number of edges to decrease the attacker's chance of success. We show that the problem is #P-hard and, therefore, intractable to solve exactly. We convert the attacker's problem to an exponential sized Dynamic Program that is approximated by a Neural Network (NN). Once trained, the NN provides an efficient fitness function for the defender's Evolutionary Diversity Optimisation (EDO). The diversity emphasis on the defender's solution provides a diverse set of training samples, which improves the training accuracy of our NN for modelling the attacker. We go back and forth between NN training and EDO. Experimental results show that for R500 graph, our proposed EDO based defense is less than 1% away from the optimal defense.

preprint2023arXiv

Defending SDN against packet injection attacks using deep learning

The (logically) centralised architecture of the software-defined networks makes them an easy target for packet injection attacks. In these attacks, the attacker injects malicious packets into the SDN network to affect the services and performance of the SDN controller and overflow the capacity of the SDN switches. Such attacks have been shown to ultimately stop the network functioning in real-time, leading to network breakdowns. There have been significant works on detecting and defending against similar DoS attacks in non-SDN networks, but detection and protection techniques for SDN against packet injection attacks are still in their infancy. Furthermore, many of the proposed solutions have been shown to be easily by-passed by simple modifications to the attacking packets or by altering the attacking profile. In this paper, we develop novel Graph Convolutional Neural Network models and algorithms for grouping network nodes/users into security classes by learning from network data. We start with two simple classes - nodes that engage in suspicious packet injection attacks and nodes that are not. From these classes, we then partition the network into separate segments with different security policies using distributed Ryu controllers in an SDN network. We show in experiments on an emulated SDN that our detection solution outperforms alternative approaches with above 99\% detection accuracy on various types (both old and new) of injection attacks. More importantly, our mitigation solution maintains continuous functions of non-compromised nodes while isolating compromised/suspicious nodes in real-time. All code and data are publicly available for reproducibility of our results.

preprint2023arXiv

Performance of Distributed File Systems on Cloud Computing Environment: An Evaluation for Small-File Problem

Various performance characteristics of distributed file systems have been well studied. However, the performance efficiency of distributed file systems on small-file problems with complex machine learning algorithms scenarios is not well addressed. In addition, demands for unified storage of big data processing and high-performance computing have been crucial. Hence, developing a solution combining high-performance computing and big data with shared storage is very important. This paper focuses on the performance efficiency of distributed file systems with small-file datasets. We propose an architecture combining both high-performance computing and big data with shared storage and perform a series of experiments to investigate the performance of these distributed file systems. The result of the experiments confirms the applicability of the proposed architecture in terms of complex machine learning algorithms.

preprint2022arXiv

Space Time Recurrent Memory Network

Transformers have recently been popular for learning and inference in the spatial-temporal domain. However, their performance relies on storing and applying attention to the feature tensor of each frame in video. Hence, their space and time complexity increase linearly as the length of video grows, which could be very costly for long videos. We propose a novel visual memory network architecture for the learning and inference problem in the spatial-temporal domain. We maintain a fixed set of memory slots in our memory network and propose an algorithm based on Gumbel-Softmax to learn an adaptive strategy to update this memory. Finally, this architecture is benchmarked on the video object segmentation (VOS) and video prediction problems. We demonstrate that our memory architecture achieves state-of-the-art results, outperforming transformer-based methods on VOS and other recent methods on video prediction while maintaining constant memory capacity independent of the sequence length.

preprint2020arXiv

The Generalized Langevin Equation with a power-law memory in a nonlinear potential well

The generalized Langevin equation (GLE) is a stochastic integro-differential equation that has been used to describe the velocity of microparticles in viscoelastic fluids. In this work, we consider the large-time asymptotic properties of a Markovian approximation to the GLE in the presence of a wide class of external potential wells. The qualitative behavior of the GLE is largely determined by its memory kernel $K$, which summarizes the delayed response of the fluid medium on the particles past movement. When $K$ can be expressed as a finite sum of exponentials, it has been shown that long-term time-averaged properties of the position and velocity do not depend on $K$ at all. In certain applications, however, it is important to consider the GLE with a power law memory kernel. Using the fact that infinite sums of exponentials can have power law tails, we study the infinite-dimensional version of the Markovian GLE in a potential well. In the case where the memory kernel $K$ is integrable (i.e. in the asymptotically diffusive regime), we are able to extend previous results and show that there is a unique stationary distribution for the GLE system and that the long-term statistics of the position and velocity do not depend on $K$. However, when $K$ is not integrable (i.e. in the asymptotically subdiffusive regime), we are able to show the existence of an invariant probability measure but uniqueness remains an open question. In particular, the method of asymptotic coupling used in the integrable case to show uniqueness does not apply when $K$ fails to be integrable.

preprint2016arXiv

Evaluating Marijuana-Related Tweets On Twitter

This paper studies marijuana-related tweets in social network Twitter. We collected more than 300,000 marijuana related tweets during November 2016 in our study. Our text-mining based algorithms and data analysis unveil some interesting patterns including: (i) users' attitudes (e.g., positive or negative) can be characterized by the existence of outer links in a tweet; (ii) 67% users use their mobile phones to post their messages while many users publish their messages using third-party automatic posting services; and (3) the number of tweets during weekends is much higher than during weekdays. Our data also showed the impact of the political events such as the U.S. presidential election or state marijuana legalization votes on the marijuana-related tweeting frequencies.

preprint2016arXiv

Joint Network Coding and Machine Learning for Error-prone Wireless Broadcast

Reliable broadcasting data to multiple receivers over lossy wireless channels is challenging due to the heterogeneity of the wireless link conditions. Automatic Repeat-reQuest (ARQ) based retransmission schemes are bandwidth inefficient due to data duplication at receivers. Network coding (NC) has been shown to be a promising technique for improving network bandwidth efficiency by combining multiple lost data packets for retransmission. However, it is challenging to accurately determine which lost packets should be combined together due to disrupted feedback channels. This paper proposes an adaptive data encoding scheme at the transmitter by joining network coding and machine learning (NCML) for retransmission of lost packets. Our proposed NCML extracts the important features from historical feedback signals received by the transmitter to train a classifier. The constructed classifier is then used to predict states of transmitted data packets at different receivers based on their corrupted feedback signals for effective data mixing. We have conducted extensive simulations to collaborate the efficiency of our proposed approach. The simulation results show that our machine learning algorithm can be trained efficiently and accurately. The simulation results show that on average the proposed NCML can correctly classify 90% of the states of transmitted data packets at different receivers. It achieves significant bandwidth gain compared with the ARQ and NC based schemes in different transmission terrains, power levels, and the distances between the transmitter and receivers.

preprint2014arXiv

Numerical Polynomial Homotopy Continuation Method to Locate All The Power Flow Solutions

The manuscript addresses the problem of finding all solutions of power flow equations or other similar nonlinear system of algebraic equations. This problem arises naturally in a number of power systems contexts, most importantly in the context of direct methods for transient stability analysis and voltage stability assessment. We introduce a novel form of homotopy continuation method called the numerical polynomial homotopy continuation (NPHC) method that is mathematically guaranteed to find all the solutions without ever encountering a bifurcation. The method is based on embedding the real form of power flow equation in complex space, and tracking the generally unphysical solutions with complex values of real and imaginary parts of the voltage. The solutions converge to physical real form in the end of the homotopy. The so-called $γ$-trick mathematically rigorously ensures that all the paths are well-behaved along the paths, so unlike other continuation approaches, no special handling of bifurcations is necessary. The method is \textit{embarrassingly parallelizable} and can be applied to reasonably large sized systems. We demonstrate the technique by analysis of several standard test cases up to the 14-bus system size. Finally, we discuss possible strategies for scaling the method to large size systems, and propose several applications for transient stability analysis and voltage stability assessment.

Hung Nguyen

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Federated Learning for distribution skewed data using sample weights

Defending Active Directory by Combining Neural Network based Dynamic Program and Evolutionary Diversity Optimisation

Defending SDN against packet injection attacks using deep learning

Performance of Distributed File Systems on Cloud Computing Environment: An Evaluation for Small-File Problem

Space Time Recurrent Memory Network

The Generalized Langevin Equation with a power-law memory in a nonlinear potential well

Evaluating Marijuana-Related Tweets On Twitter

Joint Network Coding and Machine Learning for Error-prone Wireless Broadcast

Numerical Polynomial Homotopy Continuation Method to Locate All The Power Flow Solutions