Source author record

Huy Nguyen

Huy Nguyen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

25works

21topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FIBER: A Differentially Private Optimizer with Filter-Aware Innovation Bias Correction

Differentially private (DP) training protects individual examples by adding noise to gradients, but the injected noise interacts nontrivially with adaptive optimizers. Recent DP methods temporally filter privatized gradients to reduce variance; however, filtering also changes the DP noise statistics seen by AdamW's second-moment accumulator. As a result, bias corrections derived for unfiltered DP noise, such as subtracting sigma_w squared, can become miscalibrated when filtering is present. We propose FiBeR, a DP optimizer designed for temporally filtered privatized gradients. FiBeR (i) performs denoising in innovation space by filtering the residual stream and integrating it to form the filtered gradient estimate, (ii) decouples the two-point observation geometry from the innovation gain to enable independent tuning, and (iii) introduces a filter-aware second-moment calibration that subtracts the attenuated DP noise contribution A(omega) sigma_w squared, where A(omega) is derived in closed form for the innovation filter and can be computed for general stable linear filters. Across vision and language benchmarks, FiBeR consistently demonstrates substantial improvements in the performance of DP optimizers, surpassing state-of-the-art results under equivalent privacy constraints on multiple tasks.

preprint2022arXiv

Entropic Gromov-Wasserstein between Gaussian Distributions

We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. When the metric is the inner product, which we refer to as inner product Gromov-Wasserstein (IGW), we demonstrate that the optimal transportation plans of entropic IGW and its unbalanced variant are (unbalanced) Gaussian distributions. Via an application of von Neumann's trace inequality, we obtain closed-form expressions for the entropic IGW between these Gaussian distributions. Finally, we consider an entropic inner product Gromov-Wasserstein barycenter of multiple Gaussian distributions. We prove that the barycenter is a Gaussian distribution when the entropic regularization parameter is small. We further derive a closed-form expression for the covariance matrix of the barycenter.

preprint2022arXiv

Generative Adversarial Networks and Image-Based Malware Classification

For efficient malware removal, determination of malware threat levels, and damage estimation, malware family classification plays a critical role. In this paper, we extract features from malware executable files and represent them as images using various approaches. We then focus on Generative Adversarial Networks (GAN) for multiclass classification and compare our GAN results to other popular machine learning techniques, including Support Vector Machine (SVM), XGBoost, and Restricted Boltzmann Machines (RBM). We find that the AC-GAN discriminator is generally competitive with other machine learning techniques. We also evaluate the utility of the GAN generative model for adversarial attacks on image-based malware detection. While AC-GAN generated images are visually impressive, we find that they are easily distinguished from real malware images using any of several learning techniques. This result indicates that our GAN generated images would be of little value in adversarial attacks.

preprint2022arXiv

Hybrid III-V/SiGe solar cells on Si substrates and porous Si substrates

A tandem GaAsP/SiGe solar cell has been developed employing group-IV reverse buffer layers grown on silicon substrates with a subsurface porous layer. Reverse buffer layers facilitate a reduction in the threading dislocation density with limited thicknesses, but ease the appearance of cracks, as observed in previous designs grown on regular Si substrates. In this new design, a porous silicon layer has been incorporated close to the substrate surface. The ductility of this layer helps repress the propagation of cracks, diminishing the problems of low shunt resistance and thus improving solar cell performance. The first results of this new architecture are presented here.

preprint2022arXiv

On Label Shift in Domain Adaptation via Wasserstein Distance

We study the label shift problem between the source and target domains in general domain adaptation (DA) settings. We consider transformations transporting the target to source domains, which enable us to align the source and target examples. Through those transformations, we define the label shift between two domains via optimal transport and develop theory to investigate the properties of DA under various DA settings (e.g., closed-set, partial-set, open-set, and universal settings). Inspired from the developed theory, we propose Label and Data Shift Reduction via Optimal Transport (LDROT) which can mitigate the data and label shifts simultaneously. Finally, we conduct comprehensive experiments to verify our theoretical findings and compare LDROT with state-of-the-art baselines.

preprint2022arXiv

On Multimarginal Partial Optimal Transport: Equivalent Forms and Computational Complexity

We study the multi-marginal partial optimal transport (POT) problem between $m$ discrete (unbalanced) measures with at most $n$ supports. We first prove that we can obtain two equivalence forms of the multimarginal POT problem in terms of the multimarginal optimal transport problem via novel extensions of cost tensor. The first equivalence form is derived under the assumptions that the total masses of each measure are sufficiently close while the second equivalence form does not require any conditions on these masses but at the price of more sophisticated extended cost tensor. Our proof techniques for obtaining these equivalence forms rely on novel procedures of moving mass in graph theory to push transportation plan into appropriate regions. Finally, based on the equivalence forms, we develop optimization algorithm, named ApproxMPOT algorithm, that builds upon the Sinkhorn algorithm for solving the entropic regularized multimarginal optimal transport. We demonstrate that the ApproxMPOT algorithm can approximate the optimal value of multimarginal POT problem with a computational complexity upper bound of the order $\tilde{\mathcal{O}}(m^3(n+1)^{m}/ \varepsilon^2)$ where $\varepsilon > 0$ stands for the desired tolerance.

preprint2020arXiv

Development of a Robotic System for Automated Decaking of 3D-Printed Parts

With the rapid rise of 3D-printing as a competitive mass manufacturing method, manual "decaking" - i.e. removing the residual powder that sticks to a 3D-printed part - has become a significant bottleneck. Here, we introduce, for the first time to our knowledge, a robotic system for automated decaking of 3D-printed parts. Combining Deep Learning for 3D perception, smart mechanical design, motion planning, and force control for industrial robots, we developed a system that can automatically decake parts in a fast and efficient way. Through a series of decaking experiments performed on parts printed by a Multi Jet Fusion printer, we demonstrated the feasibility of robotic decaking for 3D-printing-based mass manufacturing.

preprint2020arXiv

Differentially private $k$-means clustering via exponential mechanism and max cover

We introduce a new $(ε_p, δ_p)$-differentially private algorithm for the $k$-means clustering problem. Given a dataset in Euclidean space, the $k$-means clustering problem requires one to find $k$ points in that space such that the sum of squares of Euclidean distances between each data point and its closest respective point among the $k$ returned is minimised. Although there exist privacy-preserving methods with good theoretical guarantees to solve this problem [Balcan et al., 2017; Kaplan and Stemmer, 2018], in practice it is seen that it is the additive error which dictates the practical performance of these methods. By reducing the problem to a sequence of instances of maximum coverage on a grid, we are able to derive a new method that achieves lower additive error then previous works. For input datasets with cardinality $n$ and diameter $Δ$, our algorithm has an $O(Δ^2 (k \log^2 n \log(1/δ_p)/ε_p + k\sqrt{d \log(1/δ_p)}/ε_p))$ additive error whilst maintaining constant multiplicative error. We conclude with some experiments and find an improvement over previously implemented work for this problem.

preprint2020arXiv

Differentially Private Decomposable Submodular Maximization

We study the problem of differentially private constrained maximization of decomposable submodular functions. A submodular function is decomposable if it takes the form of a sum of submodular functions. The special case of maximizing a monotone, decomposable submodular function under cardinality constraints is known as the Combinatorial Public Projects (CPP) problem [Papadimitriou et al., 2008]. Previous work by Gupta et al. [2010] gave a differentially private algorithm for the CPP problem. We extend this work by designing differentially private algorithms for both monotone and non-monotone decomposable submodular maximization under general matroid constraints, with competitive utility guarantees. We complement our theoretical bounds with experiments demonstrating empirical performance, which improves over the differentially private algorithms for the general case of submodular maximization and is close to the performance of non-private algorithms.

preprint2016arXiv

LOH and behold: Web-scale visual search, recommendation and clustering using Locally Optimized Hashing

We propose a novel hashing-based matching scheme, called Locally Optimized Hashing (LOH), based on a state-of-the-art quantization algorithm that can be used for efficient, large-scale search, recommendation, clustering, and deduplication. We show that matching with LOH only requires set intersections and summations to compute and so is easily implemented in generic distributed computing systems. We further show application of LOH to: a) large-scale search tasks where performance is on par with other state-of-the-art hashing approaches; b) large-scale recommendation where queries consisting of thousands of images can be used to generate accurate recommendations from collections of hundreds of millions of images; and c) efficient clustering with a graph-based algorithm that can be scaled to massive collections in a distributed environment or can be used for deduplication for small collections, like search results, performing better than traditional hashing approaches while only requiring a few milliseconds to run. In this paper we experiment on datasets of up to 100 million images, but in practice our system can scale to larger collections and can be used for other types of data that have a vector representation in a Euclidean space.

preprint2014arXiv

CPMC-Lab: A Matlab Package for Constrained Path Monte Carlo Calculations

We describe CPMC-Lab, a Matlab program for the constrained-path and phaseless auxiliary-field Monte Carlo methods. These methods have allowed applications ranging from the study of strongly correlated models, such as the Hubbard model, to ab initio calculations in molecules and solids. The present package implements the full ground-state constrained-path Monte Carlo (CPMC) method in Matlab with a graphical interface, using the Hubbard model as an example. The package can perform calculations in finite supercells in any dimensions, under periodic or twist boundary conditions. Importance sampling and all other algorithmic details of a total energy calculation are included and illustrated. This open-source tool allows users to experiment with various model and run parameters and visualize the results. It provides a direct and interactive environment to learn the method and study the code with minimal overhead for setup. Furthermore, the package can be easily generalized for auxiliary-field quantum Monte Carlo (AFQMC) calculations in many other models for correlated electron systems, and can serve as a template for developing a production code for AFQMC total energy calculations in real materials. Several illustrative studies are carried out in one- and two-dimensional lattices on total energy, kinetic energy, potential energy, and charge- and spin-gaps.

preprint2013arXiv

A Data-driven Study of Influences in Twitter Communities

This paper presents a quantitative study of Twitter, one of the most popular micro-blogging services, from the perspective of user influence. We crawl several datasets from the most active communities on Twitter and obtain 20.5 million user profiles, along with 420.2 million directed relations and 105 million tweets among the users. User influence scores are obtained from influence measurement services, Klout and PeerIndex. Our analysis reveals interesting findings, including non-power-law influence distribution, strong reciprocity among users in a community, the existence of homophily and hierarchical relationships in social influences. Most importantly, we observe that whether a user retweets a message is strongly influenced by the first of his followees who posted that message. To capture such an effect, we propose the first influencer (FI) information diffusion model and show through extensive evaluation that compared to the widely adopted independent cascade model, the FI model is more stable and more accurate in predicting influence spreads in Twitter communities.

preprint2013arXiv

On Budgeted Influence Maximization in Social Networks

Given a budget and arbitrary cost for selecting each node, the budgeted influence maximization (BIM) problem concerns selecting a set of seed nodes to disseminate some information that maximizes the total number of nodes influenced (termed as influence spread) in social networks at a total cost no more than the budget. Our proposed seed selection algorithm for the BIM problem guarantees an approximation ratio of (1 - 1/sqrt(e)). The seed selection algorithm needs to calculate the influence spread of candidate seed sets, which is known to be #P-complex. Identifying the linkage between the computation of marginal probabilities in Bayesian networks and the influence spread, we devise efficient heuristic algorithms for the latter problem. Experiments using both large-scale social networks and synthetically generated networks demonstrate superior performance of the proposed algorithm with moderate computation costs. Moreover, synthetic datasets allow us to vary the network parameters and gain important insights on the impact of graph structures on the performance of different algorithms.

preprint2013arXiv

On Quality of Monitoring for Multi-channel Wireless Infrastructure Networks

Passive monitoring utilizing distributed wireless sniffers is an effective technique to monitor activities in wireless infrastructure networks for fault diagnosis, resource management and critical path analysis. In this paper, we introduce a quality of monitoring (QoM) metric defined by the expected number of active users monitored, and investigate the problem of maximizing QoM by judiciously assigning sniffers to channels based on the knowledge of user activities in a multi-channel wireless network. Two types of capture models are considered. The user-centric model assumes frame-level capturing capability of sniffers such that the activities of different users can be distinguished while the sniffer-centric model only utilizes the binary channel information (active or not) at a sniffer. For the user-centric model, we show that the implied optimization problem is NP-hard, but a constant approximation ratio can be attained via polynomial complexity algorithms. For the sniffer-centric model, we devise stochastic inference schemes to transform the problem into the user-centric domain, where we are able to apply our polynomial approximation algorithms. The effectiveness of our proposed schemes and algorithms is further evaluated using both synthetic data as well as real-world traces from an operational WLAN.

preprint2012arXiv

Binary is Good: A Binary Inference Framework for Primary User Separation in Cognitive Radio Networks

Primary users (PU) separation concerns with the issues of distinguishing and characterizing primary users in cognitive radio (CR) networks. We argue the need for PU separation in the context of collaborative spectrum sensing and monitor selection. In this paper, we model the observations of monitors as boolean OR mixtures of underlying binary latency sources for PUs, and devise a novel binary inference algorithm for PU separation. Simulation results show that without prior knowledge regarding PUs' activities, the algorithm achieves high inference accuracy. An interesting implication of the proposed algorithm is the ability to effectively represent n independent binary sources via (correlated) binary vectors of logarithmic length.

preprint2012arXiv

On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation

We study classic streaming and sparse recovery problems using deterministic linear sketches, including l1/l1 and linf/l1 sparse recovery problems (the latter also being known as l1-heavy hitters), norm estimation, and approximate inner product. We focus on devising a fixed matrix A in R^{m x n} and a deterministic recovery/estimation procedure which work for all possible input vectors simultaneously. Our results improve upon existing work, the following being our main contributions: * A proof that linf/l1 sparse recovery and inner product estimation are equivalent, and that incoherent matrices can be used to solve both problems. Our upper bound for the number of measurements is m=O(eps^{-2}*min{log n, (log n / log(1/eps))^2}). We can also obtain fast sketching and recovery algorithms by making use of the Fast Johnson-Lindenstrauss transform. Both our running times and number of measurements improve upon previous work. We can also obtain better error guarantees than previous work in terms of a smaller tail of the input vector. * A new lower bound for the number of linear measurements required to solve l1/l1 sparse recovery. We show Omega(k/eps^2 + klog(n/k)/eps) measurements are required to recover an x' with |x - x'|_1 <= (1+eps)|x_{tail(k)}|_1, where x_{tail(k)} is x projected onto all but its largest k coordinates in magnitude. * A tight bound of m = Theta(eps^{-2}log(eps^2 n)) on the number of measurements required to solve deterministic norm estimation, i.e., to recover |x|_2 +/- eps|x|_1. For all the problems we study, tight bounds are already known for the randomized complexity from previous work, except in the case of l1/l1 sparse recovery, where a nearly tight bound is known. Our work thus aims to study the deterministic complexities of these problems.

preprint2010arXiv

Application of Data Mining to Network Intrusion Detection: Classifier Selection Model

As network attacks have increased in number and severity over the past few years, intrusion detection system (IDS) is increasingly becoming a critical component to secure the network. Due to large volumes of security audit data as well as complex and dynamic properties of intrusion behaviors, optimizing performance of IDS becomes an important open problem that is receiving more and more attention from the research community. The uncertainty to explore if certain algorithms perform better for certain attack classes constitutes the motivation for the reported herein. In this paper, we evaluate performance of a comprehensive set of classifier algorithms using KDD99 dataset. Based on evaluation results, best algorithms for each attack category is chosen and two classifier algorithm selection models are proposed. The simulation result comparison indicates that noticeable performance improvement and real-time intrusion detection can be achieved as we apply the proposed models to detect different kinds of network attacks.

preprint2010arXiv

Binary Independent Component Analysis with OR Mixtures

Independent component analysis (ICA) is a computational method for separating a multivariate signal into subcomponents assuming the mutual statistical independence of the non-Gaussian source signals. The classical Independent Components Analysis (ICA) framework usually assumes linear combinations of independent sources over the field of realvalued numbers R. In this paper, we investigate binary ICA for OR mixtures (bICA), which can find applications in many domains including medical diagnosis, multi-cluster assignment, Internet tomography and network resource management. We prove that bICA is uniquely identifiable under the disjunctive generation model, and propose a deterministic iterative algorithm to determine the distribution of the latent random variables and the mixing matrix. The inverse problem concerning inferring the values of latent variables are also considered along with noisy measurements. We conduct an extensive simulation study to verify the effectiveness of the propose algorithm and present examples of real-world applications where bICA can be applied.

preprint2010arXiv

Binary Inference for Primary User Separation in Cognitive Radio Networks

Spectrum sensing receives much attention recently in the cognitive radio (CR) network research, i.e., secondary users (SUs) constantly monitor channel condition to detect the presence of the primary users (PUs). In this paper, we go beyond spectrum sensing and introduce the PU separation problem, which concerns with the issues of distinguishing and characterizing PUs in the context of collaborative spectrum sensing and monitor selection. The observations of monitors are modeled as boolean OR mixtures of underlying binary sources for PUs. We first justify the use of the binary OR mixture model as opposed to the traditional linear mixture model through simulation studies. Then we devise a novel binary inference algorithm for PU separation. Not only PU-SU relationship are revealed, but PUs' transmission statistics and activities at each time slot can also be inferred. Simulation results show that without any prior knowledge regarding PUs' activities, the algorithm achieves high inference accuracy even in the presence of noisy measurements.

preprint2010arXiv

Context Awareness Framework Based on Contextual Graph

Nowadays computing becomes increasingly mobile and pervasive. One of the important steps in pervasive computing is context-awareness. Context-aware pervasive systems rely on information about the context and user preferences to adapt their behavior. However, context-aware applications do not always behave as user's desire, and can cause users to feel dissatisfied with unexpected actions. To solve these problems, context-aware systems must provide mechanisms to adapt automatically when the context changes significantly. The interesting characteristic of context is its own behaviors which depend on various aspects of the surrounding contexts. This paper uses contextual graphs to solve the problem "the mutual relationships among the contexts". We describe the most relevant work in this area, as well as ongoing research on developing context-aware system for ubiquitous computing based on contextual graph. The usage of contextual graph in context-awareness is expected to make it effective for developers to develop various applications with the need of context reasoning.

preprint2010arXiv

Context Ontology Implementation for Smart Home

Context awareness is one of the important fields in ubiquitous computing. Smart Home, a specific instance of ubiquitous computing, provides every family with opportunities to enjoy the power of hi-tech home living. Discovering that relationship among user, activity and context data in home environment is semantic, therefore, we apply ontology to model these relationships and then reason them as the semantic information. In this paper, we present the realization of smart home's context-aware system based on ontology. We discuss the current challenges in realizing the ontology context base. These challenges can be listed as collecting context information from heterogeneous sources, such as devices, agents, sensors into ontology, ontology management, ontology querying, and the issue related to environment database explosion.

preprint2010arXiv

Development of a Context Aware Virtual Smart Home Simulator

Context awareness is the most important research area in ubiquitous computing. In particular, for smart home, context awareness attempts to bring the best services to the home habitants. However, the implementation in the real environment is not easy and takes a long time from building the scratch. Thus, to support the implementation in the real smart home, it is necessary to demonstrate that thing can be done in the simulator in which context information can be created by virtual sensors instead of physical sensors. In this paper, we propose ISS, an Interactive Smart home Simulator system aiming at controlling and simulating the behavior of an intelligent house. The developed system aims to provide architects, designers a simulation and useful tool for understanding the interaction between environment, people and the impact of embedded and pervasive technology on in daily life. In this research, the smart house is considered as an environment made up of independent and distributed devices interacting to support user's goals and tasks. Therefore, by using ISS, the developer can realize the relationship among virtual home space, surrounded environment, use and home appliances.

preprint2010arXiv

How to Maximize User Satisfaction Degree in Multi-service IP Networks

Bandwidth allocation is a fundamental problem in communication networks. With current network moving towards the Future Internet model, the problem is further intensified as network traffic demanding far from exceeds network bandwidth capability. Maintaining a certain user satisfaction degree therefore becomes a challenge research topic. In this paper, we deal with the problem by proposing BASMIN, a novel bandwidth allocation scheme that aims to maximize network user's happiness. We also defined a new metric for evaluating network user satisfaction degree: network worth. A three-step evaluation process is then conducted to compare BASMIN efficiency with other three popular bandwidth allocation schemes. Throughout the tests, we experienced BASMIN's advantages over the others; we even found out that one of the most widely used bandwidth allocation schemes, in fact, is not effective at all.

preprint2010arXiv

Network Anomaly Detection: Flow-based or Packet-based Approach?

One of the most critical tasks for network administrator is to ensure system uptime and availability. For the network security, anomaly detection systems, along with firewalls and intrusion prevention systems are the must-have tools. So far in the field of network anomaly detection, people are working on two different approaches. One is flow-based; usually rely on network elements to make so-called flow information available for analysis. The second approach is packet-based; which directly analyzes the data packet information for the detection of anomalies. This paper describes the main differences between the two approaches through an in-depth analysis. We try to answer the question of when and why an approach is better than the other. The answer is critical for network administrators to make their choices in deploying a defending system, securing the network and ensuring business continuity.

preprint2010arXiv

Network Traffic Anomalies Detection and Identification with Flow Monitoring

Network management and security is currently one of the most vibrant research areas, among which, research on detecting and identifying anomalies has attracted a lot of interest. Researchers are still struggling to find an effective and lightweight method for anomaly detection purpose. In this paper, we propose a simple, robust method that detects network anomalous traffic data based on flow monitoring. Our method works based on monitoring the four predefined metrics that capture the flow statistics of the network. In order to prove the power of the new method, we did build an application that detects network anomalies using our method. And the result of the experiments proves that by using the four simple metrics from the flow data, we do not only effectively detect but can also identify the network traffic anomalies.

Huy Nguyen

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

FIBER: A Differentially Private Optimizer with Filter-Aware Innovation Bias Correction

Entropic Gromov-Wasserstein between Gaussian Distributions

Generative Adversarial Networks and Image-Based Malware Classification

Hybrid III-V/SiGe solar cells on Si substrates and porous Si substrates

On Label Shift in Domain Adaptation via Wasserstein Distance

On Multimarginal Partial Optimal Transport: Equivalent Forms and Computational Complexity

Development of a Robotic System for Automated Decaking of 3D-Printed Parts

Differentially private $k$-means clustering via exponential mechanism and max cover

Differentially Private Decomposable Submodular Maximization

LOH and behold: Web-scale visual search, recommendation and clustering using Locally Optimized Hashing

CPMC-Lab: A Matlab Package for Constrained Path Monte Carlo Calculations

A Data-driven Study of Influences in Twitter Communities

On Budgeted Influence Maximization in Social Networks

On Quality of Monitoring for Multi-channel Wireless Infrastructure Networks

Binary is Good: A Binary Inference Framework for Primary User Separation in Cognitive Radio Networks

On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation

Application of Data Mining to Network Intrusion Detection: Classifier Selection Model

Binary Independent Component Analysis with OR Mixtures

Binary Inference for Primary User Separation in Cognitive Radio Networks

Context Awareness Framework Based on Contextual Graph

Context Ontology Implementation for Smart Home

Development of a Context Aware Virtual Smart Home Simulator

How to Maximize User Satisfaction Degree in Multi-service IP Networks

Network Anomaly Detection: Flow-based or Packet-based Approach?

Network Traffic Anomalies Detection and Identification with Flow Monitoring