Source author record

Rick Siow Mong Goh

Rick Siow Mong Goh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Neural and Evolutionary Computing Artificial Intelligence Distributed, Parallel, and Cluster Computing eess.IV Hardware Architecture Computation and Language Cryptography and Security Emerging Technologies Sound

Catalog footprint

What is connected

13works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images

Focusing on the complicated pathological features, such as blurred boundaries, severe scale differences between symptoms, background noise interference, etc., in the task of retinal edema lesions joint segmentation from OCT images and enabling the segmentation results more reliable. In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network, which can provide accurate segmentation results with reliability assessment. Specifically, aiming at improving the model's ability to learn the complex pathological features of retinal edema lesions in OCT images, we develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module of our newly designed. Meanwhile, to make the segmentation results more reliable, a novel uncertainty segmentation head based on the subjective logical evidential theory is introduced to generate the final segmentation results with a corresponding overall uncertainty evaluation score map. We conduct comprehensive experiments on the public database of AI-Challenge 2018 for retinal edema lesions segmentation, and the results show that our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches. The code will be released on: https://github.com/LooKing9218/ReliableRESeg.

preprint2022arXiv

A Resource-efficient Spiking Neural Network Accelerator Supporting Emerging Neural Encoding

Spiking neural networks (SNNs) recently gained momentum due to their low-power multiplication-free computing and the closer resemblance of biological processes in the nervous system of humans. However, SNNs require very long spike trains (up to 1000) to reach an accuracy similar to their artificial neural network (ANN) counterparts for large models, which offsets efficiency and inhibits its application to low-power systems for real-world use cases. To alleviate this problem, emerging neural encoding schemes are proposed to shorten the spike train while maintaining the high accuracy. However, current accelerators for SNN cannot well support the emerging encoding schemes. In this work, we present a novel hardware architecture that can efficiently support SNN with emerging neural encoding. Our implementation features energy and area efficient processing units with increased parallelism and reduced memory accesses. We verified the accelerator on FPGA and achieve 25% and 90% improvement over previous work in power consumption and latency, respectively. At the same time, high area efficiency allows us to scale for large neural network models. To the best of our knowledge, this is the first work to deploy the large neural network model VGG on physical FPGA-based neuromorphic hardware.

preprint2022arXiv

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.

preprint2021arXiv

E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs

Compiler frameworks are crucial for the widespread use of FPGA-based deep learning accelerators. They allow researchers and developers, who are not familiar with hardware engineering, to harness the performance attained by domain-specific logic. There exists a variety of frameworks for conventional artificial neural networks. However, not much research effort has been put into the creation of frameworks optimized for spiking neural networks (SNNs). This new generation of neural networks becomes increasingly interesting for the deployment of AI on edge devices, which have tight power and resource constraints. Our end-to-end framework E3NE automates the generation of efficient SNN inference logic for FPGAs. Based on a PyTorch model and user parameters, it applies various optimizations and assesses trade-offs inherent to spike-based accelerators. Multiple levels of parallelism and the use of an emerging neural encoding scheme result in an efficiency superior to previous SNN hardware implementations. For a similar model, E3NE uses less than 50% of hardware resources and 20% less power, while reducing the latency by an order of magnitude. Furthermore, scalability and generality allowed the deployment of the large-scale SNN models AlexNet and VGG.

preprint2021arXiv

Natural Language Video Localization: A Revisit in Span-based Question Answering Framework

Natural Language Video Localization (NLVL) aims to locate a target moment from an untrimmed video that semantically corresponds to a text query. Existing approaches mainly solve the NLVL problem from the perspective of computer vision by formulating it as ranking, anchor, or regression tasks. These methods suffer from large performance degradation when localizing on long videos. In this work, we address the NLVL from a new perspective, i.e., span-based question answering (QA), by treating the input video as a text passage. We propose a video span localizing network (VSLNet), on top of the standard span-based QA framework (named VSLBase), to address NLVL. VSLNet tackles the differences between NLVL and span-based QA through a simple yet effective query-guided highlighting (QGH) strategy. QGH guides VSLNet to search for the matching video span within a highlighted region. To address the performance degradation on long videos, we further extend VSLNet to VSLNet-L by applying a multi-scale split-and-concatenation strategy. VSLNet-L first splits the untrimmed video into short clip segments; then, it predicts which clip segment contains the target moment and suppresses the importance of other segments. Finally, the clip segments are concatenated, with different confidences, to locate the target moment accurately. Extensive experiments on three benchmark datasets show that the proposed VSLNet and VSLNet-L outperform the state-of-the-art methods; VSLNet-L addresses the issue of performance degradation on long videos. Our study suggests that the span-based QA framework is an effective strategy to solve the NLVL problem.

preprint2020arXiv

EDCompress: Energy-Aware Model Compression for Dataflows

Edge devices demand low energy consumption, cost and small form factor. To efficiently deploy convolutional neural network (CNN) models on edge device, energy-aware model compression becomes extremely important. However, existing work did not study this problem well because the lack of considering the diversity of dataflow types in hardware architectures. In this paper, we propose EDCompress, an Energy-aware model compression method for various Dataflows. It can effectively reduce the energy consumption of various edge devices, with different dataflow types. Considering the very nature of model compression procedures, we recast the optimization process to a multi-step problem, and solve it by reinforcement learning algorithms. Experiments show that EDCompress could improve 20X, 17X, 37X energy efficiency in VGG-16, MobileNet, LeNet-5 networks, respectively, with negligible loss of accuracy. EDCompress could also find the optimal dataflow type for specific neural networks in terms of energy consumption, which can guide the deployment of CNN models on hardware systems.

preprint2020arXiv

Feature Lenses: Plug-and-play Neural Modules for Transformation-Invariant Visual Representations

Convolutional Neural Networks (CNNs) are known to be brittle under various image transformations, including rotations, scalings, and changes of lighting conditions. We observe that the features of a transformed image are drastically different from the ones of the original image. To make CNNs more invariant to transformations, we propose "Feature Lenses", a set of ad-hoc modules that can be easily plugged into a trained model (referred to as the "host model"). Each individual lens reconstructs the original features given the features of a transformed image under a particular transformation. These lenses jointly counteract feature distortions caused by various transformations, thus making the host model more robust without retraining. By only updating lenses, the host model is freed from iterative updating when facing new transformations absent in the training data; as feature semantics are preserved, downstream applications, such as classifiers and detectors, automatically gain robustness without retraining. Lenses are trained in a self-supervised fashion with no annotations, by minimizing a novel "Top-K Activation Contrast Loss" between lens-transformed features and original features. Evaluated on ImageNet, MNIST-rot, and CIFAR-10, Feature Lenses show clear advantages over baseline methods.

preprint2020arXiv

Privacy-preserving Weighted Federated Learning within Oracle-Aided MPC Framework

This paper studies privacy-preserving weighted federated learning within the oracle-aided multi-party computation (MPC) framework. The contribution of this paper mainly comprises the following three-fold: In the first fold, a new notion which we call weighted federated learning (wFL) is introduced and formalized inspired by McMahan et al.'s seminal paper. The weighted federated learning concept formalized in this paper differs from that presented in McMahan et al.'s paper since both addition and multiplication operations are executed over ciphers in our model while these operations are executed over plaintexts in McMahan et al.'s model. In the second fold, an oracle-aided MPC solution for computing weighted federated learning is formalized by decoupling the security of federated learning systems from that of underlying multi-party computations. Our decoupling formulation may benefit machine learning developers to select their best security practices from the state-of-the-art security tool sets; In the third fold, a concrete solution to the weighted federated learning problem is presented and analysed. The security of our implementation is guaranteed by the security composition theorem assuming that the underlying multiplication algorithm is secure against honest-but-curious adversaries.

preprint2020arXiv

Two-Phase Multi-Party Computation Enabled Privacy-Preserving Federated Learning

Countries across the globe have been pushing strict regulations on the protection of personal or private data collected. The traditional centralized machine learning method, where data is collected from end-users or IoT devices, so that it can discover insights behind real-world data, may not be feasible for many data-driven industry applications in light of such regulations. A new machine learning method, coined by Google as Federated Learning (FL) enables multiple participants to train a machine learning model collectively without directly exchanging data. However, recent studies have shown that there is still a possibility to exploit the shared models to extract personal or confidential data. In this paper, we propose to adopt Multi Party Computation (MPC) to achieve privacy-preserving model aggregation for FL. The MPC-enabled model aggregation in a peer-to-peer manner incurs high communication overhead with low scalability. To address this problem, the authors proposed to develop a two-phase mechanism by 1) electing a small committee and 2) providing MPC-enabled model aggregation service to a larger number of participants through the committee. The MPC enabled FL framework has been integrated in an IoT platform for smart manufacturing. It enables a set of companies to train high quality models collectively by leveraging their complementary data-sets on their own premises, without compromising privacy, model accuracy vis-a-vis traditional machine learning methods and execution efficiency in terms of communication cost and execution time.

preprint2016arXiv

Multi-Modal Hybrid Deep Neural Network for Speech Enhancement

Deep Neural Networks (DNN) have been successful in en- hancing noisy speech signals. Enhancement is achieved by learning a nonlinear mapping function from the features of the corrupted speech signal to that of the reference clean speech signal. The quality of predicted features can be improved by providing additional side channel information that is robust to noise, such as visual cues. In this paper we propose a novel deep learning model inspired by insights from human audio visual perception. In the proposed unified hybrid architecture, features from a Convolution Neural Network (CNN) that processes the visual cues and features from a fully connected DNN that processes the audio signal are integrated using a Bidirectional Long Short-Term Memory (BiLSTM) network. The parameters of the hybrid model are jointly learned using backpropagation. We compare the quality of enhanced speech from the hybrid models with those from traditional DNN and BiLSTM models.

preprint2016arXiv

Simple and Efficient Learning using Privileged Information

The Support Vector Machine using Privileged Information (SVM+) has been proposed to train a classifier to utilize the additional privileged information that is only available in the training phase but not available in the test phase. In this work, we propose an efficient solution for SVM+ by simply utilizing the squared hinge loss instead of the hinge loss as in the existing SVM+ formulation, which interestingly leads to a dual form with less variables and in the same form with the dual of the standard SVM. The proposed algorithm is utilized to leverage the additional web knowledge that is only available during training for the image categorization tasks. The extensive experimental results on both Caltech101 andWebQueries datasets show that our proposed method can achieve a factor of up to hundred times speedup with the comparable accuracy when compared with the existing SVM+ method.

preprint2016arXiv

Transfer Hashing with Privileged Information

Most existing learning to hash methods assume that there are sufficient data, either labeled or unlabeled, on the domain of interest (i.e., the target domain) for training. However, this assumption cannot be satisfied in some real-world applications. To address this data sparsity issue in hashing, inspired by transfer learning, we propose a new framework named Transfer Hashing with Privileged Information (THPI). Specifically, we extend the standard learning to hash method, Iterative Quantization (ITQ), in a transfer learning manner, namely ITQ+. In ITQ+, a new slack function is learned from auxiliary data to approximate the quantization error in ITQ. We developed an alternating optimization approach to solve the resultant optimization problem for ITQ+. We further extend ITQ+ to LapITQ+ by utilizing the geometry structure among the auxiliary data for learning more precise binary codes in the target domain. Extensive experiments on several benchmark datasets verify the effectiveness of our proposed approaches through comparisons with several state-of-the-art baselines.

preprint2013arXiv

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

With the ease-of-programming, flexibility and yet efficiency, MapReduce has become one of the most popular frameworks for building big-data applications. MapReduce was originally designed for distributed-computing, and has been extended to various architectures, e,g, multi-core CPUs, GPUs and FPGAs. In this work, we focus on optimizing the MapReduce framework on Xeon Phi, which is the latest product released by Intel based on the Many Integrated Core Architecture. To the best of our knowledge, this is the first work to optimize the MapReduce framework on the Xeon Phi. In our work, we utilize advanced features of the Xeon Phi to achieve high performance. In order to take advantage of the SIMD vector processing units, we propose a vectorization friendly technique for the map phase to assist the auto-vectorization as well as develop SIMD hash computation algorithms. Furthermore, we utilize MIMD hyper-threading to pipeline the map and reduce to improve the resource utilization. We also eliminate multiple local arrays but use low cost atomic operations on the global array for some applications, which can improve the thread scalability and data locality due to the coherent L2 caches. Finally, for a given application, our framework can either automatically detect suitable techniques to apply or provide guideline for users at compilation time. We conduct comprehensive experiments to benchmark the Xeon Phi and compare our optimized MapReduce framework with a state-of-the-art multi-core based MapReduce framework (Phoenix++). By evaluating six real-world applications, the experimental results show that our optimized framework is 1.2X to 38X faster than Phoenix++ for various applications on the Xeon Phi.

Rick Siow Mong Goh

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images

A Resource-efficient Spiking Neural Network Accelerator Supporting Emerging Neural Encoding

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs

Natural Language Video Localization: A Revisit in Span-based Question Answering Framework

EDCompress: Energy-Aware Model Compression for Dataflows

Feature Lenses: Plug-and-play Neural Modules for Transformation-Invariant Visual Representations

Privacy-preserving Weighted Federated Learning within Oracle-Aided MPC Framework

Two-Phase Multi-Party Computation Enabled Privacy-Preserving Federated Learning

Multi-Modal Hybrid Deep Neural Network for Speech Enhancement

Simple and Efficient Learning using Privileged Information

Transfer Hashing with Privileged Information

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor