Source author record

Anh Nguyen

Anh Nguyen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Neural and Evolutionary Computing Robotics Artificial Intelligence Human-Computer Interaction Social and Information Networks eess.AS eess.IV Information Retrieval Multimedia Sound Cryptography and Security cs.CY Distributed, Parallel, and Cluster Computing eess.SP Networking and Internet Architecture Quantitative Methods

Catalog footprint

What is connected

27works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Multi-scale Coarse-to-fine Modeling for Test-time Human Motion Control

We present MSCoT, a multi-scale, coarse-to-fine model for test-time human motion synthesis and control. Unlike recent approaches that rely on multiple iterative denoising/token-prediction steps, or modules tailored for specific control signals, MSCoT discretizes motion into a multi-scale hierarchical representation and predicts the entire token sequence at each temporal scale in a coarse-to-fine fashion. Building on this coarse-to-fine paradigm, we propose an efficient multi-scale token guidance strategy that overcomes the challenge of discrete sampling and steers the token distribution towards the control goals, allowing for fast and flexible control. To address the limitations of a discrete codebook, a lightweight token refiner further adds continuous residuals to the discrete token embeddings and allows differentiable test-time refinement optimization to ensure precise alignment with the control objectives. MSCoT is able to produce quality motions, consistent with the control constraints, while offering substantially faster sampling than diffusion-based approaches. Experiments on popular benchmarks demonstrate state-of-the-art controllable text-to-motion generation performance of MSCoT over existing baselines, with better motion quality (48% FID improvement), higher control accuracy (-61% avg error), and $10 \times$ faster inference speed on HumanML3D.

preprint2022arXiv

Coarse-to-Fine Reasoning for Visual Question Answering

Bridging the semantic gap between image and question is an important step to improve the accuracy of the Visual Question Answering (VQA) task. However, most of the existing VQA methods focus on attention mechanisms or visual relations for reasoning the answer, while the features at different semantic levels are not fully utilized. In this paper, we present a new reasoning framework to fill the gap between visual features and semantic clues in the VQA task. Our method first extracts the features and predicates from the image and question. We then propose a new reasoning framework to effectively jointly learn these features and predicates in a coarse-to-fine manner. The intensively experimental results on three large-scale VQA datasets show that our proposed approach achieves superior accuracy comparing with other state-of-the-art methods. Furthermore, our reasoning framework also provides an explainable way to understand the decision of the deep neural network when predicting the answer.

preprint2022arXiv

Deep Federated Learning for Autonomous Driving

Autonomous driving is an active research topic in both academia and industry. However, most of the existing solutions focus on improving the accuracy by training learnable models with centralized large-scale data. Therefore, these methods do not take into account the user's privacy. In this paper, we present a new approach to learn autonomous driving policy while respecting privacy concerns. We propose a peer-to-peer Deep Federated Learning (DFL) approach to train deep architectures in a fully decentralized manner and remove the need for central orchestration. We design a new Federated Autonomous Driving network (FADNet) that can improve the model stability, ensure convergence, and handle imbalanced data distribution problems while is being trained with federated learning methods. Intensively experimental results on three datasets show that our approach with FADNet and DFL achieves superior accuracy compared with other recent methods. Furthermore, our approach can maintain privacy by not collecting user data to a central server.

preprint2022arXiv

DeepECMP: Predicting Extracellular Matrix Proteins using Deep Learning

Introduction: The extracellular matrix (ECM) is a networkof proteins and carbohydrates that has a structural and bio-chemical function. The ECM plays an important role in dif-ferentiation, migration and signaling. Several studies havepredicted ECM proteins using machine learning algorithmssuch as Random Forests, K-nearest neighbours and supportvector machines but is yet to be explored using deep learn-ing. Method: DeepECMP was developed using several previ-ously used ECM datasets, asymmetric undersampling andan ensemble of 11 feed-forward neural networks. Results: The performance of DeepECMP was 83.6% bal-anced accuracy which outperformed several algorithms. Inaddition, the pipeline of DeepECMP has been shown to behighly efficient. Conclusion: This paper is the first to focus on utilizingdeep learning for ECM prediction. Several limitations areovercome by DeepECMP such as computational expense,availability to the public and usability outside of the humanspecies

preprint2022arXiv

DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover's Distance Improves Out-Of-Distribution Face Identification

Face identification (FI) is ubiquitous and drives many high-stake decisions made by law enforcement. State-of-the-art FI approaches compare two images by taking the cosine similarity between their image embeddings. Yet, such an approach suffers from poor out-of-distribution (OOD) generalization to new types of images (e.g., when a query face is masked, cropped, or rotated) not included in the training set or the gallery. Here, we propose a re-ranking approach that compares two faces using the Earth Mover's Distance on the deep, spatial features of image patches. Our extra comparison stage explicitly examines image similarity at a fine-grained level (e.g., eyes to eyes) and is more robust to OOD perturbations and occlusions than traditional FI. Interestingly, without finetuning feature extractors, our method consistently improves the accuracy on all tested OOD queries: masked, cropped, rotated, and adversarial while obtaining similar results on in-distribution images.

preprint2022arXiv

Fine-Grained Visual Classification using Self Assessment Classifier

Extracting discriminative features plays a crucial role in the fine-grained visual classification task. Most of the existing methods focus on developing attention or augmentation mechanisms to achieve this goal. However, addressing the ambiguity in the top-k prediction classes is not fully investigated. In this paper, we introduce a Self Assessment Classifier, which simultaneously leverages the representation of the image and top-k prediction classes to reassess the classification results. Our method is inspired by continual learning with coarse-grained and fine-grained classifiers to increase the discrimination of features in the backbone and produce attention maps of informative areas on the image. In practice, our method works as an auxiliary branch and can be easily integrated into different architectures. We show that by effectively addressing the ambiguity in the top-k prediction classes, our method achieves new state-of-the-art results on CUB200-2011, Stanford Dog, and FGVC Aircraft datasets. Furthermore, our method also consistently improves the accuracy of different existing fine-grained classifiers with a unified setup.

preprint2022arXiv

The effectiveness of feature attribution methods and its correlation with automatic evaluation scores

Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers have either proposed new feature attribution methods, discussed or harnessed these tools in their work. However, despite humans being the target end-users, most attribution methods were only evaluated on proxy automatic-evaluation metrics (Zhang et al. 2018; Zhou et al. 2016; Petsiuk et al. 2018). In this paper, we conduct the first user study to measure attribution map effectiveness in assisting humans in ImageNet classification and Stanford Dogs fine-grained classification, and when an image is natural or adversarial (i.e., contains adversarial perturbations). Overall, feature attribution is surprisingly not more effective than showing humans nearest training-set examples. On a harder task of fine-grained dog categorization, presenting attribution maps to humans does not help, but instead hurts the performance of human-AI teams compared to AI alone. Importantly, we found automatic attribution-map evaluation measures to correlate poorly with the actual human-AI team performance. Our findings encourage the community to rigorously test their methods on the downstream human-in-the-loop applications and to rethink the existing evaluation metrics.

preprint2022arXiv

The shape and simplicity biases of adversarially robust ImageNet-trained CNNs

Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years. Yet, vanilla CNNs often fall short in generalizing to adversarial or out-of-distribution (OOD) examples which humans demonstrate superior performance. Adversarial training is a leading learning algorithm for improving the robustness of CNNs on adversarial and OOD data; however, little is known about the properties, specifically the shape bias and internal features learned inside adversarially-robust CNNs. In this paper, we perform a thorough, systematic study to understand the shape bias and some internal mechanisms that enable the generalizability of AlexNet, GoogLeNet, and ResNet-50 models trained via adversarial training. We find that while standard ImageNet classifiers have a strong texture bias, their R counterparts rely heavily on shapes. Remarkably, adversarial training induces three simplicity biases into hidden neurons in the process of "robustifying" CNNs. That is, each convolutional neuron in R networks often changes to detecting (1) pixel-wise smoother patterns, i.e., a mechanism that blocks high-frequency noise from passing through the network; (2) more lower-level features i.e. textures and colors (instead of objects);and (3) fewer types of inputs. Our findings reveal the interesting mechanisms that made networks more adversarially robust and also explain some recent findings e.g., why R networks benefit from a much larger capacity (Xie et al. 2020) and can act as a strong image prior in image synthesis (Santurkar et al. 2019).

preprint2022arXiv

Understanding Public Opinion on Using Hydroxychloroquine for COVID-19 Treatment via Social Media

Hydroxychloroquine (HCQ) is used to prevent or treat malaria caused by mosquito bites. Recently, the drug has been suggested to treat COVID-19, but that has not been supported by scientific evidence. The information regarding the drug efficacy has flooded social networks, posting potential threats to the community by perverting their perceptions of the drug efficacy. This paper studies the reactions of social network users on the recommendation of using HCQ for COVID-19 treatment by analyzing the reaction patterns and sentiment of the tweets. We collected 164,016 tweets from February to December 2020 and used a text mining approach to identify social reaction patterns and opinion change over time. Our descriptive analysis identified an irregularity of the users' reaction patterns associated tightly with the social and news feeds on the development of HCQ and COVID-19 treatment. The study linked the tweets and Google search frequencies to reveal the viewpoints of local communities on the use of HCQ for COVID-19 treatment across different states. Further, our tweet sentiment analysis reveals that public opinion changed significantly over time regarding the recommendation of using HCQ for COVID-19 treatment. The data showed that high support in the early dates but it significantly declined in October. Finally, using the manual classification of 4,850 tweets by humans as our benchmark, our sentiment analysis showed that the Google Cloud Natural Language algorithm outperformed the Valence Aware Dictionary and sEntiment Reasoner in classifying tweets, especially in the sarcastic tweet group.

preprint2021arXiv

Speech Emotion Recognition using Semantic Information

Speech emotion recognition is a crucial problem manifesting in a multitude of applications such as human computer interaction and education. Although several advancements have been made in the recent years, especially with the advent of Deep Neural Networks (DNN), most of the studies in the literature fail to consider the semantic information in the speech signal. In this paper, we propose a novel framework that can capture both the semantic and the paralinguistic information in the signal. In particular, our framework is comprised of a semantic feature extractor, that captures the semantic information, and a paralinguistic feature extractor, that captures the paralinguistic information. Both semantic and paraliguistic features are then combined to a unified representation using a novel attention mechanism. The unified feature vector is passed through a LSTM to capture the temporal dynamics in the signal, before the final prediction. To validate the effectiveness of our framework, we use the popular SEWA dataset of the AVEC challenge series and compare with the three winning papers. Our model provides state-of-the-art results in the valence and liking dimensions.

preprint2021arXiv

WaNet -- Imperceptible Warping-based Backdoor Attack

With the thriving of deep learning and the widespread practice of using pre-trained networks, backdoor attacks have become an increasing security threat drawing many research interests in recent years. A third-party model can be poisoned in training to work well in normal conditions but behave maliciously when a trigger pattern appears. However, the existing backdoor attacks are all built on noise perturbation triggers, making them noticeable to humans. In this paper, we instead propose using warping-based triggers. The proposed backdoor outperforms the previous methods in a human inspection test by a wide margin, proving its stealthiness. To make such models undetectable by machine defenders, we propose a novel training mode, called the ``noise mode. The trained networks successfully attack and bypass the state-of-the-art defense methods on standard classification datasets, including MNIST, CIFAR-10, GTSRB, and CelebA. Behavior analyses show that our backdoors are transparent to network inspection, further proving this novel attack mechanism's efficiency.

preprint2020arXiv

Are there any 'object detectors' in the hidden layers of CNNs trained to identify objects or scenes?

Various methods of measuring unit selectivity have been developed with the aim of better understanding how neural networks work. But the different measures provide divergent estimates of selectivity, and this has led to different conclusions regarding the conditions in which selective object representations are learned and the functional relevance of these representations. In an attempt to better characterize object selectivity, we undertake a comparison of various selectivity measures on a large set of units in AlexNet, including localist selectivity, precision, class-conditional mean activity selectivity (CCMAS), network dissection,the human interpretation of activation maximization (AM) images, and standard signal-detection measures. We find that the different measures provide different estimates of object selectivity, with precision and CCMAS measures providing misleadingly high estimates. Indeed, the most selective units had a poor hit-rate or a high false-alarm rate (or both) in object classification, making them poor object detectors. We fail to find any units that are even remotely as selective as the 'grandmother cell' units reported in recurrent neural networks. In order to generalize these results, we compared selectivity measures on units in VGG-16 and GoogLeNet trained on the ImageNet or Places-365 datasets that have been described as 'object detectors'. Again, we find poor hit-rates and high false-alarm rates for object classification. We conclude that signal-detection measures provide a better assessment of single-unit selectivity compared to common alternative approaches, and that deep convolutional networks of image classification do not learn object detectors in their hidden layers.

preprint2020arXiv

Autonomous Navigation in Complex Environments with Deep Multimodal Fusion Network

Autonomous navigation in complex environments is a crucial task in time-sensitive scenarios such as disaster response or search and rescue. However, complex environments pose significant challenges for autonomous platforms to navigate due to their challenging properties: constrained narrow passages, unstable pathway with debris and obstacles, or irregular geological structures and poor lighting conditions. In this work, we propose a multimodal fusion approach to address the problem of autonomous navigation in complex environments such as collapsed cites, or natural caves. We first simulate the complex environments in a physics-based simulation engine and collect a large-scale dataset for training. We then propose a Navigation Multimodal Fusion Network (NMFNet) which has three branches to effectively handle three visual modalities: laser, RGB images, and point cloud data. The extensively experimental results show that our NMFNet outperforms recent state of the art by a fair margin while achieving real-time performance. We further show that the use of multiple modalities is essential for autonomous navigation in complex environments. Finally, we successfully deploy our network to both simulated and real mobile robots.

preprint2020arXiv

End-to-End Real-time Catheter Segmentation with Optical Flow-Guided Warping during Endovascular Intervention

Accurate real-time catheter segmentation is an important pre-requisite for robot-assisted endovascular intervention. Most of the existing learning-based methods for catheter segmentation and tracking are only trained on small-scale datasets or synthetic data due to the difficulties of ground-truth annotation. Furthermore, the temporal continuity in intraoperative imaging sequences is not fully utilised. In this paper, we present FW-Net, an end-to-end and real-time deep learning framework for endovascular intervention. The proposed FW-Net has three modules: a segmentation network with encoder-decoder architecture, a flow network to extract optical flow information, and a novel flow-guided warping function to learn the frame-to-frame temporal continuity. We show that by effectively learning temporal continuity, the network can successfully segment and track the catheters in real-time sequences using only raw ground-truth for training. Detailed validation results confirm that our FW-Net outperforms state-of-the-art techniques while achieving real-time performance.

preprint2020arXiv

Hybrid Data-Driven and Analytical Model for Kinematic Control of a Surgical Robotic Tool

Accurate kinematic models are essential for effective control of surgical robots. For tendon driven robots, which is common for minimally invasive surgery, intrinsic nonlinearities are important to consider. Traditional analytical methods allow to build the kinematic model of the system by making certain assumptions and simplifications on the nonlinearities. Machine learning techniques, instead, allow to recover a more complex model based on the acquired data. However, analytical models are more generalisable, but can be over-simplified; data-driven models, on the other hand, can cater for more complex models, but are less generalisable and the result is highly affected by the training dataset. In this paper, we present a novel approach to combining analytical and data-driven approaches to model the kinematics of nonlinear tendon-driven surgical robots. Gaussian Process Regression (GPR) is used for learning the data-driven model and the proposed method is tested on both simulated data and real experimental data.

preprint2020arXiv

Nonlinearity Compensation in a Multi-DoF Shoulder Sensing Exosuit for Real-Time Teleoperation

The compliant nature of soft wearable robots makes them ideal for complex multiple degrees of freedom (DoF) joints, but also introduce additional structural nonlinearities. Intuitive control of these wearable robots requires robust sensing to overcome the inherent nonlinearities. This paper presents a joint kinematics estimator for a bio-inspired multi-DoF shoulder exosuit capable of compensating the encountered nonlinearities. To overcome the nonlinearities and hysteresis inherent to the soft and compliant nature of the suit, we developed a deep learning-based method to map the sensor data to the joint space. The experimental results show that the new learning-based framework outperforms recent state-of-the-art methods by a large margin while achieving 12ms inference time using only a GPU-based edge-computing device. The effectiveness of our combined exosuit and learning framework is demonstrated through real-time teleoperation with a simulated NAO humanoid robot.

preprint2020arXiv

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Attribution methods can provide powerful insights into the reasons for a classifier's decision. We argue that a key desideratum of an explanation method is its robustness to input hyperparameters which are often randomly set or empirically tuned. High sensitivity to arbitrary hyperparameter choices does not only impede reproducibility but also questions the correctness of an explanation and impairs the trust of end-users. In this paper, we provide a thorough empirical study on the sensitivity of existing attribution methods. We found an alarming trend that many methods are highly sensitive to changes in their common hyperparameters e.g. even changing a random seed can yield a different explanation! Interestingly, such sensitivity is not reflected in the average explanation accuracy scores over the dataset as commonly reported in the literature. In addition, explanations generated for robust classifiers (i.e. which are trained to be invariant to pixel-wise perturbations) are surprisingly more robust than those generated for regular classifiers.

preprint2020arXiv

Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features

In this paper, we present a deep learning framework applied for Acoustic Scene Classification (ASC), the task of classifying scene contexts from environmental input sounds. An ASC system generally comprises of two main steps, referred to as front-end feature extraction and back-end classification. In the first step, an extractor is used to extract low-level features from raw audio signals. Next, the discriminative features extracted are fed into and classified by a classifier, reporting accuracy results. Aim to develop a robust framework applied for ASC, we address exited issues of both the front-end and back-end components in an ASC system, thus present three main contributions: Firstly, we carry out a comprehensive analysis of spectrogram representation extracted from sound scene input, thus propose the best multi-spectrogram combinations. In terms of back-end classification, we propose a novel join learning architecture using parallel convolutional recurrent networks, which is effective to learn spatial features and temporal sequences of spectrogram input. Finally, good experimental results obtained over benchmark datasets of IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 1, 2017 Task 1, 2018 Task 1A & 1B, LITIS Rouen prove our proposed framework general and robust for ASC task.

preprint2016arXiv

"420 Friendly": Revealing Marijuana Use via Craigslist Rental Ads

Recent studies have shown that information mined from Craigslist can be used for informing public health policy or monitoring risk behavior. This paper presents a text-mining method for conducting public health surveillance of marijuana use concerns in the U.S. using online classified ads in Craigslist. We collected more than 200 thousands of rental ads in the housing categories in Craigslist and devised text-mining methods for efficiently and accurately extract rental ads associated with concerns about the uses of marijuana in different states across the U.S. We linked the extracted ads to their geographic locations and computed summary statistics of the ads having marijuana use concerns. Our data is then compared with the State Marijuana Laws Map published by the U.S. government and marijuana related keywords search in Google to verify our collected data with respect to the demographics of marijuana use concerns. Our data not only indicates strong correlations between Craigslist ads, Google search and the State Marijuana Laws Map in states where marijuana uses are legal, but also reveals some hidden world of marijuana use concerns in other states where marijuana use is illegal. Our approach can be utilized as a marijuana surveillance tool for policy makers to develop public health policy and regulations.

preprint2016arXiv

Evaluating Marijuana-Related Tweets On Twitter

This paper studies marijuana-related tweets in social network Twitter. We collected more than 300,000 marijuana related tweets during November 2016 in our study. Our text-mining based algorithms and data analysis unveil some interesting patterns including: (i) users' attitudes (e.g., positive or negative) can be characterized by the existence of outer links in a tweet; (ii) 67% users use their mobile phones to post their messages while many users publish their messages using third-party automatic posting services; and (3) the number of tweets during weekends is much higher than during weekdays. Our data also showed the impact of the political events such as the U.S. presidential election or state marijuana legalization votes on the marijuana-related tweeting frequencies.

preprint2016arXiv

Joint Network Coding and Machine Learning for Error-prone Wireless Broadcast

Reliable broadcasting data to multiple receivers over lossy wireless channels is challenging due to the heterogeneity of the wireless link conditions. Automatic Repeat-reQuest (ARQ) based retransmission schemes are bandwidth inefficient due to data duplication at receivers. Network coding (NC) has been shown to be a promising technique for improving network bandwidth efficiency by combining multiple lost data packets for retransmission. However, it is challenging to accurately determine which lost packets should be combined together due to disrupted feedback channels. This paper proposes an adaptive data encoding scheme at the transmitter by joining network coding and machine learning (NCML) for retransmission of lost packets. Our proposed NCML extracts the important features from historical feedback signals received by the transmitter to train a classifier. The constructed classifier is then used to predict states of transmitted data packets at different receivers based on their corrupted feedback signals for effective data mixing. We have conducted extensive simulations to collaborate the efficiency of our proposed approach. The simulation results show that our machine learning algorithm can be trained efficiently and accurately. The simulation results show that on average the proposed NCML can correctly classify 90% of the states of transmitted data packets at different receivers. It achieves significant bandwidth gain compared with the ARQ and NC based schemes in different transmission terrains, power levels, and the distances between the transmitter and receivers.

preprint2016arXiv

Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks

We can better understand deep neural networks by identifying which features each of their neurons have learned to detect. To do so, researchers have created Deep Visualization techniques including activation maximization, which synthetically generates inputs (e.g. images) that maximally activate each neuron. A limitation of current techniques is that they assume each neuron detects only one type of feature, but we know that neurons can be multifaceted, in that they fire in response to many different types of features: for example, a grocery store class neuron must activate either for rows of produce or for a storefront. Previous activation maximization techniques constructed images without regard for the multiple different facets of a neuron, creating inappropriate mixes of colors, parts of objects, scales, orientations, etc. Here, we introduce an algorithm that explicitly uncovers the multiple facets of each neuron by producing a synthetic visualization of each of the types of images that activate a neuron. We also introduce regularization methods that produce state-of-the-art results in terms of the interpretability of images obtained by activation maximization. By separately synthesizing each type of image a neuron fires in response to, the visualizations have more appropriate colors and coherent global structure. Multifaceted feature visualization thus provides a clearer and more comprehensive description of the role of each neuron.

preprint2016arXiv

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks

Deep neural networks (DNNs) have demonstrated state-of-the-art results on many pattern recognition tasks, especially vision classification problems. Understanding the inner workings of such computational brains is both fascinating basic science that is interesting in its own right - similar to why we study the human brain - and will enable researchers to further improve DNNs. One path to understanding how a neural network functions internally is to study what each of its neurons has learned to detect. One such method is called activation maximization (AM), which synthesizes an input (e.g. an image) that highly activates a neuron. Here we dramatically improve the qualitative state of the art of activation maximization by harnessing a powerful, learned prior: a deep generator network (DGN). The algorithm (1) generates qualitatively state-of-the-art synthetic images that look almost real, (2) reveals the features learned by each neuron in an interpretable way, (3) generalizes well to new datasets and somewhat well to different network architectures without requiring the prior to be relearned, and (4) can be considered as a high-quality generative method (in this case, by generating novel, creative, interesting, recognizable images).

preprint2015arXiv

3DTouch: A wearable 3D input device with an optical sensor and a 9-DOF inertial measurement unit

We present 3DTouch, a novel 3D wearable input device worn on the fingertip for 3D manipulation tasks. 3DTouch is designed to fill the missing gap of a 3D input device that is self-contained, mobile, and universally working across various 3D platforms. This paper presents a low-cost solution to designing and implementing such a device. Our approach relies on relative positioning technique using an optical laser sensor and a 9-DOF inertial measurement unit. 3DTouch is self-contained, and designed to universally work on various 3D platforms. The device employs touch input for the benefits of passive haptic feedback, and movement stability. On the other hand, with touch interaction, 3DTouch is conceptually less fatiguing to use over many hours than 3D spatial input devices. We propose a set of 3D interaction techniques including selection, translation, and rotation using 3DTouch. An evaluation also demonstrates the device's tracking accuracy of 1.10 mm and 2.33 degrees for subtle touch interaction in 3D space. Modular solutions like 3DTouch opens up a whole new design space for interaction techniques to further develop on.

preprint2015arXiv

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Given that DNNs are now able to classify objects in images with near-human-level performance, questions naturally arise as to what differences remain between computer and human vision. A recent study revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library). Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion). Specifically, we take convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and then find images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class. It is possible to produce images totally unrecognizable to human eyes that DNNs believe with near certainty are familiar objects, which we call "fooling images" (more generally, fooling examples). Our results shed light on interesting differences between human vision and current DNNs, and raise questions about the generality of DNN computer vision.

preprint2015arXiv

Understanding Neural Networks Through Deep Visualization

Recent years have produced great advances in training large, deep neural networks (DNNs), including notable successes in training convolutional neural networks (convnets) to recognize natural images. However, our understanding of how these models work, especially what computations they perform at intermediate layers, has lagged behind. Progress in the field will be further accelerated by the development of better tools for visualizing and interpreting neural nets. We introduce two such tools here. The first is a tool that visualizes the activations produced on each layer of a trained convnet as it processes an image or video (e.g. a live webcam stream). We have found that looking at live activations that change in response to user input helps build valuable intuitions about how convnets work. The second tool enables visualizing features at each layer of a DNN via regularized optimization in image space. Because previous versions of this idea produced less recognizable images, here we introduce several new regularization methods that combine to produce qualitatively clearer, more interpretable visualizations. Both tools are open source and work on a pre-trained convnet with minimal setup.

preprint2014arXiv

Low-cost Augmented Reality prototype for controlling network devices

With the evolution of mobile devices, and smart-phones in particular, comes the ability to create new experiences that enhance the way we see, interact, and manipulate objects, within the world that surrounds us. It is now possible to blend data from our senses and our devices in numerous ways that simply were not possible before using Augmented Reality technology. In a near future, when all of the office devices as well as your personal electronic gadgets are on a common wireless network, operating them using a universal remote controller would be possible. This paper presents an off-the-shelf, low-cost prototype that leverages the Augmented Reality technology to deliver a novel and interactive way of operating office network devices around using a mobile device. We believe this type of system may provide benefits to controlling multiple integrated devices and visualizing interconnectivity or utilizing visual elements to pass information from one device to another, or may be especially beneficial to control devices when interacting with them physically may be difficult or pose danger or harm.

Anh Nguyen

What is connected

Connect this record

See the researcher in context

Building this map preview

27 published item(s)

Multi-scale Coarse-to-fine Modeling for Test-time Human Motion Control

Coarse-to-Fine Reasoning for Visual Question Answering

Deep Federated Learning for Autonomous Driving

DeepECMP: Predicting Extracellular Matrix Proteins using Deep Learning

DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover's Distance Improves Out-Of-Distribution Face Identification

Fine-Grained Visual Classification using Self Assessment Classifier

The effectiveness of feature attribution methods and its correlation with automatic evaluation scores

The shape and simplicity biases of adversarially robust ImageNet-trained CNNs

Understanding Public Opinion on Using Hydroxychloroquine for COVID-19 Treatment via Social Media

Speech Emotion Recognition using Semantic Information

WaNet -- Imperceptible Warping-based Backdoor Attack

Are there any 'object detectors' in the hidden layers of CNNs trained to identify objects or scenes?

Autonomous Navigation in Complex Environments with Deep Multimodal Fusion Network

End-to-End Real-time Catheter Segmentation with Optical Flow-Guided Warping during Endovascular Intervention

Hybrid Data-Driven and Analytical Model for Kinematic Control of a Surgical Robotic Tool

Nonlinearity Compensation in a Multi-DoF Shoulder Sensing Exosuit for Real-Time Teleoperation

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features

"420 Friendly": Revealing Marijuana Use via Craigslist Rental Ads

Evaluating Marijuana-Related Tweets On Twitter

Joint Network Coding and Machine Learning for Error-prone Wireless Broadcast

Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks

3DTouch: A wearable 3D input device with an optical sensor and a 9-DOF inertial measurement unit

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Understanding Neural Networks Through Deep Visualization

Low-cost Augmented Reality prototype for controlling network devices