Source author record

Alexander Wong

Alexander Wong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

61works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

COVID-Net USPro: An Open-Source Explainable Few-Shot Deep Prototypical Network to Monitor and Detect COVID-19 Infection from Point-of-Care Ultrasound Images

As the Coronavirus Disease 2019 (COVID-19) continues to impact many aspects of life and the global healthcare systems, the adoption of rapid and effective screening methods to prevent further spread of the virus and lessen the burden on healthcare providers is a necessity. As a cheap and widely accessible medical image modality, point-of-care ultrasound (POCUS) imaging allows radiologists to identify symptoms and assess severity through visual inspection of the chest ultrasound images. Combined with the recent advancements in computer science, applications of deep learning techniques in medical image analysis have shown promising results, demonstrating that artificial intelligence-based solutions can accelerate the diagnosis of COVID-19 and lower the burden on healthcare professionals. However, the lack of a huge amount of well-annotated data poses a challenge in building effective deep neural networks in the case of novel diseases and pandemics. Motivated by this, we present COVID-Net USPro, an explainable few-shot deep prototypical network, that monitors and detects COVID-19 positive cases with high precision and recall from minimal ultrasound images. COVID-Net USPro achieves 99.65% overall accuracy, 99.7% recall and 99.67% precision for COVID-19 positive cases when trained with only 5 shots. The analytic pipeline and results were verified by our contributing clinician with extensive experience in POCUS interpretation, ensuring that the network makes decisions based on actual patterns.

preprint2022arXiv

AI-Powered Non-Contact In-Home Gait Monitoring and Activity Recognition System Based on mm-Wave FMCW Radar and Cloud Computing

In this paper, leveraging AI, cloud computing and radar technology, we create intelligent sensing that enables smarter applications to improve people's daily lives.

preprint2022arXiv

CellDefectNet: A Machine-designed Attention Condenser Network for Electroluminescence-based Photovoltaic Cell Defect Inspection

Photovoltaic cells are electronic devices that convert light energy to electricity, forming the backbone of solar energy harvesting systems. An essential step in the manufacturing process for photovoltaic cells is visual quality inspection using electroluminescence imaging to identify defects such as cracks, finger interruptions, and broken cells. A big challenge faced by industry in photovoltaic cell visual inspection is the fact that it is currently done manually by human inspectors, which is extremely time consuming, laborious, and prone to human error. While deep learning approaches holds great potential to automating this inspection, the hardware resource-constrained manufacturing scenario makes it challenging for deploying complex deep neural network architectures. In this work, we introduce CellDefectNet, a highly efficient attention condenser network designed via machine-driven design exploration specifically for electroluminesence-based photovoltaic cell defect detection on the edge. We demonstrate the efficacy of CellDefectNet on a benchmark dataset comprising of a diversity of photovoltaic cells captured using electroluminescence imagery, achieving an accuracy of ~86.3% while possessing just 410K parameters (~13$\times$ lower than EfficientNet-B0, respectively) and ~115M FLOPs (~12$\times$ lower than EfficientNet-B0) and ~13$\times$ faster on an ARM Cortex A-72 embedded processor when compared to EfficientNet-B0.

preprint2022arXiv

COVID-Net US-X: Enhanced Deep Neural Network for Detection of COVID-19 Patient Cases from Convex Ultrasound Imaging Through Extended Linear-Convex Ultrasound Augmentation Learning

As the global population continues to face significant negative impact by the on-going COVID-19 pandemic, there has been an increasing usage of point-of-care ultrasound (POCUS) imaging as a low-cost and effective imaging modality of choice in the COVID-19 clinical workflow. A major barrier with widespread adoption of POCUS in the COVID-19 clinical workflow is the scarcity of expert clinicians that can interpret POCUS examinations, leading to considerable interest in deep learning-driven clinical decision support systems to tackle this challenge. A major challenge to building deep neural networks for COVID-19 screening using POCUS is the heterogeneity in the types of probes used to capture ultrasound images (e.g., convex vs. linear probes), which can lead to very different visual appearances. In this study, we explore the impact of leveraging extended linear-convex ultrasound augmentation learning on producing enhanced deep neural networks for COVID-19 assessment, where we conduct data augmentation on convex probe data alongside linear probe data that have been transformed to better resemble convex probe data. Experimental results using an efficient deep columnar anti-aliased convolutional neural network designed via a machined-driven design exploration strategy (which we name COVID-Net US-X) show that the proposed extended linear-convex ultrasound augmentation learning significantly increases performance, with a gain of 5.1% in test accuracy and 13.6% in AUC.

preprint2022arXiv

COVID-Net UV: An End-to-End Spatio-Temporal Deep Neural Network Architecture for Automated Diagnosis of COVID-19 Infection from Ultrasound Videos

Besides vaccination, as an effective way to mitigate the further spread of COVID-19, fast and accurate screening of individuals to test for the disease is yet necessary to ensure public health safety. We propose COVID-Net UV, an end-to-end hybrid spatio-temporal deep neural network architecture, to detect COVID-19 infection from lung point-of-care ultrasound videos captured by convex transducers. COVID-Net UV comprises a convolutional neural network that extracts spatial features and a recurrent neural network that learns temporal dependence. After careful hyperparameter tuning, the network achieves an average accuracy of 94.44% with no false-negative cases for COVID-19 cases. The goal with COVID-Net UV is to assist front-line clinicians in the fight against COVID-19 via accelerating the screening of lung point-of-care ultrasound videos and automatic detection of COVID-19 positive cases.

preprint2022arXiv

FenceNet: Fine-grained Footwork Recognition in Fencing

Current data analysis for the Canadian Olympic fencing team is primarily done manually by coaches and analysts. Due to the highly repetitive, yet dynamic and subtle movements in fencing, manual data analysis can be inefficient and inaccurate. We propose FenceNet as a novel architecture to automate the classification of fine-grained footwork techniques in fencing. FenceNet takes 2D pose data as input and classifies actions using a skeleton-based action recognition approach that incorporates temporal convolutional networks to capture temporal information. We train and evaluate FenceNet on the Fencing Footwork Dataset (FFD), which contains 10 fencers performing 6 different footwork actions for 10-11 repetitions each (652 total videos). FenceNet achieves 85.4% accuracy under 10-fold cross-validation, where each fencer is left out as the test set. This accuracy is within 1% of the current state-of-the-art method, JLJA (86.3%), which selects and fuses features engineered from skeleton data, depth videos, and inertial measurement units. BiFenceNet, a variant of FenceNet that captures the "bidirectionality" of human movement through two separate networks, achieves 87.6% accuracy, outperforming JLJA. Since neither FenceNet nor BiFenceNet requires data from wearable sensors, unlike JLJA, they could be directly applied to most fencing videos, using 2D pose data as input extracted from off-the-shelf 2D human pose estimators. In comparison to JLJA, our methods are also simpler as they do not require manual feature engineering, selection, or fusion.

preprint2022arXiv

ICDBigBird: A Contextual Embedding Model for ICD Code Classification

The International Classification of Diseases (ICD) system is the international standard for classifying diseases and procedures during a healthcare encounter and is widely used for healthcare reporting and management purposes. Assigning correct codes for clinical procedures is important for clinical, operational, and financial decision-making in healthcare. Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks. However, these models have yet to achieve state-of-the-art results in the ICD classification task since one of their main disadvantages is that they can only process documents that contain a small number of tokens which is rarely the case with real patient notes. In this paper, we introduce ICDBigBird a BigBird-based model which can integrate a Graph Convolutional Network (GCN), that takes advantage of the relations between ICD codes in order to create 'enriched' representations of their embeddings, with a BigBird contextual model that can process larger documents. Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task as it outperforms the previous state-of-the-art models.

preprint2022arXiv

Improving Classification Model Performance on Chest X-Rays through Lung Segmentation

Chest radiography is an effective screening tool for diagnosing pulmonary diseases. In computer-aided diagnosis, extracting the relevant region of interest, i.e., isolating the lung region of each radiography image, can be an essential step towards improved performance in diagnosing pulmonary disorders. Methods: In this work, we propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations. Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets. The proposed pipeline is evaluated on Shenzhen Hospital (SH) data set for the segmentation module, and COVIDx data set for both segmentation and classification modules. Novel statistical analysis is conducted in addition to regular evaluation metrics for the segmentation module. Furthermore, the results of the optimized approach are analyzed with gradient-weighted class activation mapping (Grad-CAM) to investigate the rationale behind the classification decisions and to interpret its choices. Results and Conclusion: Different data sets, methods, and scenarios for each module of the proposed pipeline are examined for designing an optimized approach, which has achieved an accuracy of 0.946 in distinguishing abnormal CXR images (i.e., Pneumonia and COVID-19) from normal ones. Numerical and visual validations suggest that applying automated segmentation as a pre-processing step for classification improves the generalization capability and the performance of the classification models.

preprint2022arXiv

LexSubCon: Integrating Knowledge from Lexical Resources into Contextual Embeddings for Lexical Substitution

Lexical substitution is the task of generating meaningful substitutes for a word in a given textual context. Contextual word embedding models have achieved state-of-the-art results in the lexical substitution task by relying on contextual information extracted from the replaced word within the sentence. However, such models do not take into account structured knowledge that exists in external lexical databases. We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models that can identify highly accurate substitute candidates. This is achieved by combining contextual information with knowledge from structured lexical resources. Our approach involves: (i) introducing a novel mix-up embedding strategy in the creation of the input embedding of the target word through linearly interpolating the pair of the target input embedding and the average embedding of its probable synonyms; (ii) considering the similarity of the sentence-definition embeddings of the target word and its proposed candidates; and, (iii) calculating the effect of each substitution in the semantics of the sentence through a fine-tuned sentence similarity model. Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets that are widely used for lexical substitution tasks.

preprint2022arXiv

LightDefectNet: A Highly Compact Deep Anti-Aliased Attention Condenser Neural Network Architecture for Light Guide Plate Surface Defect Detection

Light guide plates are essential optical components widely used in a diverse range of applications ranging from medical lighting fixtures to back-lit TV displays. An essential step in the manufacturing of light guide plates is the quality inspection of defects such as scratches, bright/dark spots, and impurities. This is mainly done in industry through manual visual inspection for plate pattern irregularities, which is time-consuming and prone to human error and thus act as a significant barrier to high-throughput production. Advances in deep learning-driven computer vision has led to the exploration of automated visual quality inspection of light guide plates to improve inspection consistency, accuracy, and efficiency. However, given the cost constraints in visual inspection scenarios, the widespread adoption of deep learning-driven computer vision methods for inspecting light guide plates has been greatly limited due to high computational requirements. In this study, we explore the utilization of machine-driven design exploration with computational and "best-practices" constraints as well as L$_1$ paired classification discrepancy loss to create LightDefectNet, a highly compact deep anti-aliased attention condenser neural network architecture tailored specifically for light guide plate surface defect detection in resource-constrained scenarios. Experiments show that LightDetectNet achieves a detection accuracy of $\sim$98.2% on the LGPSDD benchmark while having just 770K parameters ($\sim$33$\times$ and $\sim$6.9$\times$ lower than ResNet-50 and EfficientNet-B0, respectively) and $\sim$93M FLOPs ($\sim$88$\times$ and $\sim$8.4$\times$ lower than ResNet-50 and EfficientNet-B0, respectively) and $\sim$8.8$\times$ faster inference speed than EfficientNet-B0 on an embedded ARM processor.

preprint2022arXiv

MAPLE-Edge: A Runtime Latency Predictor for Edge Devices

Neural Architecture Search (NAS) has enabled automatic discovery of more efficient neural network architectures, especially for mobile and embedded vision applications. Although recent research has proposed ways of quickly estimating latency on unseen hardware devices with just a few samples, little focus has been given to the challenges of estimating latency on runtimes using optimized graphs, such as TensorRT and specifically for edge devices. In this work, we propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware, where we train a regression network on architecture-latency pairs in conjunction with a hardware-runtime descriptor to effectively estimate latency on a diverse pool of edge devices. Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters that are widely available on all Linux kernels, while still achieving up to +49.6% accuracy gains against previous state-of-the-art baseline methods on optimized edge device runtimes, using just 10 measurements from an unseen target device. We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes by applying a trick of normalizing performance counters by the operator latency, in the measured hardware-runtime descriptor. Lastly, we show that for runtimes exhibiting lower than desired accuracy, performance can be boosted by collecting additional samples from the target device, with an extra 90 samples translating to gains of nearly +40%.

preprint2022arXiv

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge

Deep neural network (DNN) latency characterization is a time-consuming process and adds significant cost to Neural Architecture Search (NAS) processes when searching for efficient convolutional neural networks for embedded vision applications. DNN Latency is a hardware dependent metric and requires direct measurement or inference on target hardware. A recently introduced latency estimation technique known as MAPLE predicts DNN execution time on previously unseen hardware devices by using hardware performance counters. Leveraging these hardware counters in the form of an implicit prior, MAPLE achieves state-of-the-art performance in latency prediction. Here, we propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency to better account for model stability and robustness. First, by identifying DNN architectures that exhibit a similar latency to each other, we can generate multiple virtual examples to significantly improve the accuracy over MAPLE. Secondly, the hardware specifications are used to determine the similarity between training and test hardware to emphasize training samples captured from comparable devices (domains) and encourages improved domain alignment. Experimental results using a convolution neural network NAS benchmark across different types of devices, including an Intel processor that is now used for embedded vision applications, demonstrate a 5% improvement over MAPLE and 9% over HELP. Furthermore, we include ablation studies to independently assess the benefits of virtual examples and hardware-based sample importance.

preprint2022arXiv

MAPLE: Microprocessor A Priori for Latency Estimation

Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation MAPLE that does not rely on transfer learning or domain adaptation but instead generalizes to new hardware by incorporating a prior hardware characteristics during training. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. Moreover, the proposed MAPLE benefits from the tightly coupled I/O between the CPU and GPU and their dependency to predict DNN latency on GPUs while measuring microprocessor performance hardware counters from the CPU feeding the GPU hardware. Through this quantitative strategy as the hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy where with as few as 3 samples it exhibits a 6% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. Furthermore, it was demonstrated that MAPLE exhibiting 8-10% better accuracy, on average, compared to relevant baselines at any number of adaptation samples.

preprint2022arXiv

MetaGraspNet_v0: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis

There has been increasing interest in smart factories powered by robotics systems to tackle repetitive, laborious tasks. One impactful yet challenging task in robotics-powered smart factory applications is robotic grasping: using robotic arms to grasp objects autonomously in different settings. Robotic grasping requires a variety of computer vision tasks such as object detection, segmentation, grasp prediction, pick planning, etc. While significant progress has been made in leveraging of machine learning for robotic grasping, particularly with deep learning, a big challenge remains in the need for large-scale, high-quality RGBD datasets that cover a wide diversity of scenarios and permutations. To tackle this big, diverse data problem, we are inspired by the recent rise in the concept of metaverse, which has greatly closed the gap between virtual worlds and the physical world. Metaverses allow us to create digital twins of real-world manufacturing scenarios and to virtually create different scenarios from which large volumes of data can be generated for training models. In this paper, we present MetaGraspNet: a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis. The proposed dataset contains 100,000 images and 25 different object types and is split into 5 difficulties to evaluate object detection and segmentation model performance in different grasping scenarios. We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance in a manner that is more appropriate for robotic grasp applications compared to existing general-purpose performance metrics. Our benchmark dataset is available open-source on Kaggle, with the first phase consisting of detailed object detection, segmentation, layout annotations, and a layout-weighted performance metric script.

preprint2022arXiv

MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware Ambidextrous Bin Picking via Physics-based Metaverse Synthesis

Autonomous bin picking poses significant challenges to vision-driven robotic systems given the complexity of the problem, ranging from various sensor modalities, to highly entangled object layouts, to diverse item properties and gripper types. Existing methods often address the problem from one perspective. Diverse items and complex bin scenes require diverse picking strategies together with advanced reasoning. As such, to build robust and effective machine-learning algorithms for solving this complex task requires significant amounts of comprehensive and high quality data. Collecting such data in real world would be too expensive and time prohibitive and therefore intractable from a scalability perspective. To tackle this big, diverse data problem, we take inspiration from the recent rise in the concept of metaverses, and introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis. The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper. We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties. Finally, we conduct extensive experiments showing that our proposed vacuum seal model and synthetic dataset achieves state-of-the-art performance and generalizes to real world use-cases.

preprint2022arXiv

Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation

In keypoint estimation tasks such as human pose estimation, heatmap-based regression is the dominant approach despite possessing notable drawbacks: heatmaps intrinsically suffer from quantization error and require excessive computation to generate and post-process. Motivated to find a more efficient solution, we propose to model individual keypoints and sets of spatially related keypoints (i.e., poses) as objects within a dense single-stage anchor-based detection framework. Hence, we call our method KAPAO (pronounced "Ka-Pow"), for Keypoints And Poses As Objects. KAPAO is applied to the problem of single-stage multi-person human pose estimation by simultaneously detecting human pose and keypoint objects and fusing the detections to exploit the strengths of both object representations. In experiments, we observe that KAPAO is faster and more accurate than previous methods, which suffer greatly from heatmap post-processing. The accuracy-speed trade-off is especially favourable in the practical setting when not using test-time augmentation. Source code: https://github.com/wmcnally/kapao.

preprint2022arXiv

Towards Generating Large Synthetic Phytoplankton Datasets for Efficient Monitoring of Harmful Algal Blooms

Climate change is increasing the frequency and severity of harmful algal blooms (HABs), which cause significant fish deaths in aquaculture farms. This contributes to ocean pollution and greenhouse gas (GHG) emissions since dead fish are either dumped into the ocean or taken to landfills, which in turn negatively impacts the climate. Currently, the standard method to enumerate harmful algae and other phytoplankton is to manually observe and count them under a microscope. This is a time-consuming, tedious and error-prone process, resulting in compromised management decisions by farmers. Hence, automating this process for quick and accurate HAB monitoring is extremely helpful. However, this requires large and diverse datasets of phytoplankton images, and such datasets are hard to produce quickly. In this work, we explore the feasibility of generating novel high-resolution photorealistic synthetic phytoplankton images, containing multiple species in the same image, given a small dataset of real images. To this end, we employ Generative Adversarial Networks (GANs) to generate synthetic images. We evaluate three different GAN architectures: ProjectedGAN, FastGAN, and StyleGANv2 using standard image quality metrics. We empirically show the generation of high-fidelity synthetic phytoplankton images using a training dataset of only 961 real images. Thus, this work demonstrates the ability of GANs to create large synthetic datasets of phytoplankton from small training datasets, accomplishing a key step towards sustainable systematic monitoring of harmful algal blooms.

preprint2022arXiv

Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for COVID-19 Screening With Chest Radiography

Building AI models with trustworthiness is important especially in regulated areas such as healthcare. In tackling COVID-19, previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions, rendering them less trustworthy -- a crucial flaw in the context of medical imaging. In this study, we propose a feature learning approach using Vision Transformers, which use an attention-based mechanism, and examine the representation learning capability of Transformers as a new backbone architecture for medical imaging. Through the task of classifying COVID-19 chest radiographs, we investigate into whether generalization capabilities benefit solely from Vision Transformers' architectural advances. Quantitative and qualitative evaluations are conducted on the trustworthiness of the models, through the use of "trust score" computation and a visual explainability technique. We conclude that the attention-based feature learning approach is promising in building trustworthy deep learning models for healthcare.

preprint2021arXiv

COVID-Net CT-2: Enhanced Deep Neural Networks for Detection of COVID-19 from Chest CT Images Through Bigger, More Diverse Learning

The COVID-19 pandemic continues to rage on, with multiple waves causing substantial harm to health and economies around the world. Motivated by the use of CT imaging at clinical institutes around the world as an effective complementary screening method to RT-PCR testing, we introduced COVID-Net CT, a neural network tailored for detection of COVID-19 cases from chest CT images as part of the open source COVID-Net initiative. However, one potential limiting factor is restricted quantity and diversity given the single nation patient cohort used. In this study, we introduce COVID-Net CT-2, enhanced deep neural networks for COVID-19 detection from chest CT images trained on the largest quantity and diversity of multinational patient cases in research literature. We introduce two new CT benchmark datasets, the largest comprising a multinational cohort of 4,501 patients from at least 15 countries. We leverage explainability to investigate the decision-making behaviour of COVID-Net CT-2, with the results for select cases reviewed and reported on by two board-certified radiologists with over 10 and 30 years of experience, respectively. The COVID-Net CT-2 neural networks achieved accuracy, COVID-19 sensitivity, PPV, specificity, and NPV of 98.1%/96.2%/96.7%/99%/98.8% and 97.9%/95.7%/96.4%/98.9%/98.7%, respectively. Explainability-driven performance validation shows that COVID-Net CT-2's decision-making behaviour is consistent with radiologist interpretation by leveraging correct, clinically relevant critical factors. The results are promising and suggest the strong potential of deep neural networks as an effective tool for computer-aided COVID-19 assessment. While not a production-ready solution, we hope the open-source, open-access release of COVID-Net CT-2 and benchmark datasets will continue to enable researchers, clinicians, and citizen data scientists alike to build upon them.

preprint2021arXiv

When Segmentation is Not Enough: Rectifying Visual-Volume Discordance Through Multisensor Depth-Refined Semantic Segmentation for Food Intake Tracking in Long-Term Care

Malnutrition is a multidomain problem affecting 54% of older adults in long-term care (LTC). Monitoring nutritional intake in LTC is laborious and subjective, limiting clinical inference capabilities. Recent advances in automatic image-based food estimation have not yet been evaluated in LTC settings. Here, we describe a fully automatic imaging system for quantifying food intake. We propose a novel deep convolutional encoder-decoder food network with depth-refinement (EDFN-D) using an RGB-D camera for quantifying a plate's remaining food volume relative to reference portions in whole and modified texture foods. We trained and validated the network on the pre-labelled UNIMIB2016 food dataset and tested on our two novel LTC-inspired plate datasets (689 plate images, 36 unique foods). EDFN-D performed comparably to depth-refined graph cut on IOU (0.879 vs. 0.887), with intake errors well below typical 50% (mean percent intake error: -4.2%). We identify how standard segmentation metrics are insufficient due to visual-volume discordance, and include volume disparity analysis to facilitate system trust. This system provides improved transparency, approximates human assessors with enhanced objectivity, accuracy, and precision while avoiding hefty semi-automatic method time requirements. This may help address short-comings currently limiting utility of automated early malnutrition detection in resource-constrained LTC and hospital settings.

preprint2020arXiv

COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images

The COVID-19 pandemic continues to have a devastating effect on the health and well-being of the global population. A critical step in the fight against COVID-19 is effective screening of infected patients, with one of the key screening approaches being radiology examination using chest radiography. Motivated by this and inspired by the open source efforts of the research community, in this study we introduce COVID-Net, a deep convolutional neural network design tailored for the detection of COVID-19 cases from chest X-ray (CXR) images that is open source and available to the general public. To the best of the authors' knowledge, COVID-Net is one of the first open source network designs for COVID-19 detection from CXR images at the time of initial release. We also introduce COVIDx, an open access benchmark dataset that we generated comprising of 13,975 CXR images across 13,870 patient patient cases, with the largest number of publicly available COVID-19 positive cases to the best of the authors' knowledge. Furthermore, we investigate how COVID-Net makes predictions using an explainability method in an attempt to not only gain deeper insights into critical factors associated with COVID cases, which can aid clinicians in improved screening, but also audit COVID-Net in a responsible and transparent manner to validate that it is making decisions based on relevant information from the CXR images. By no means a production-ready solution, the hope is that the open access COVID-Net, along with the description on constructing the open source COVIDx dataset, will be leveraged and build upon by both researchers and citizen data scientists alike to accelerate the development of highly accurate yet practical deep learning solutions for detecting COVID-19 cases and accelerate treatment of those who need it the most.

preprint2020arXiv

COVIDNet-CT: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest CT Images

The coronavirus disease 2019 (COVID-19) pandemic continues to have a tremendous impact on patients and healthcare systems around the world. In the fight against this novel disease, there is a pressing need for rapid and effective screening tools to identify patients infected with COVID-19, and to this end CT imaging has been proposed as one of the key screening methods which may be used as a complement to RT-PCR testing, particularly in situations where patients undergo routine CT scans for non-COVID-19 related reasons, patients with worsening respiratory status or developing complications that require expedited care, and patients suspected to be COVID-19-positive but have negative RT-PCR test results. Motivated by this, in this study we introduce COVIDNet-CT, a deep convolutional neural network architecture that is tailored for detection of COVID-19 cases from chest CT images via a machine-driven design exploration approach. Additionally, we introduce COVIDx-CT, a benchmark CT image dataset derived from CT imaging data collected by the China National Center for Bioinformation comprising 104,009 images across 1,489 patient cases. Furthermore, in the interest of reliability and transparency, we leverage an explainability-driven performance validation strategy to investigate the decision-making behaviour of COVIDNet-CT, and in doing so ensure that COVIDNet-CT makes predictions based on relevant indicators in CT images. Both COVIDNet-CT and the COVIDx-CT dataset are available to the general public in an open-source and open access manner as part of the COVID-Net initiative. While COVIDNet-CT is not yet a production-ready screening solution, we hope that releasing the model and dataset will encourage researchers, clinicians, and citizen data scientists alike to leverage and build upon them.

preprint2020arXiv

Deep Neural Network Perception Models and Robust Autonomous Driving Systems

This paper analyzes the robustness of deep learning models in autonomous driving applications and discusses the practical solutions to address that.

preprint2020arXiv

DepthNet Nano: A Highly Compact Self-Normalizing Neural Network for Monocular Depth Estimation

Depth estimation is an active area of research in the field of computer vision, and has garnered significant interest due to its rising demand in a large number of applications ranging from robotics and unmanned aerial vehicles to autonomous vehicles. A particularly challenging problem in this area is monocular depth estimation, where the goal is to infer depth from a single image. An effective strategy that has shown considerable promise in recent years for tackling this problem is the utilization of deep convolutional neural networks. Despite these successes, the memory and computational requirements of such networks have made widespread deployment in embedded scenarios very challenging. In this study, we introduce DepthNet Nano, a highly compact self normalizing network for monocular depth estimation designed using a human machine collaborative design strategy, where principled network design prototyping based on encoder-decoder design principles are coupled with machine-driven design exploration. The result is a compact deep neural network with highly customized macroarchitecture and microarchitecture designs, as well as self-normalizing characteristics, that are highly tailored for the task of embedded depth estimation. The proposed DepthNet Nano possesses a highly efficient network architecture (e.g., 24X smaller and 42X fewer MAC operations than Alhashim et al. on KITTI), while still achieving comparable performance with state-of-the-art networks on the NYU-Depth V2 and KITTI datasets. Furthermore, experiments on inference speed and energy efficiency on a Jetson AGX Xavier embedded module further illustrate the efficacy of DepthNet Nano at different resolutions and power budgets (e.g., ~14 FPS and >0.46 images/sec/watt at 384 X 1280 at a 30W power budget on KITTI).

preprint2020arXiv

EmotionNet Nano: An Efficient Deep Convolutional Neural Network Design for Real-time Facial Expression Recognition

While recent advances in deep learning have led to significant improvements in facial expression classification (FEC), a major challenge that remains a bottleneck for the widespread deployment of such systems is their high architectural and computational complexities. This is especially challenging given the operational requirements of various FEC applications, such as safety, marketing, learning, and assistive living, where real-time requirements on low-cost embedded devices is desired. Motivated by this need for a compact, low latency, yet accurate system capable of performing FEC in real-time on low-cost embedded devices, this study proposes EmotionNet Nano, an efficient deep convolutional neural network created through a human-machine collaborative design strategy, where human experience is combined with machine meticulousness and speed in order to craft a deep neural network design catered towards real-time embedded usage. Two different variants of EmotionNet Nano are presented, each with a different trade-off between architectural and computational complexity and accuracy. Experimental results using the CK+ facial expression benchmark dataset demonstrate that the proposed EmotionNet Nano networks demonstrated accuracies comparable to state-of-the-art in FEC networks, while requiring significantly fewer parameters (e.g., 23$\times$ fewer at a higher accuracy). Furthermore, we demonstrate that the proposed EmotionNet Nano networks achieved real-time inference speeds (e.g. $>25$ FPS and $>70$ FPS at 15W and 30W, respectively) and high energy efficiency (e.g. $>1.7$ images/sec/watt at 15W) on an ARM embedded processor, thus further illustrating the efficacy of EmotionNet Nano for deployment on embedded devices.

preprint2020arXiv

Equivalence Relations for Computing Permutation Polynomials

We present a new technique for computing permutation polynomials based on equivalence relations. The equivalence relations are defined by expanded normalization operations and new functions that map permutation polynomials (PPs) to other PPs. Our expanded normalization applies to almost all PPs, including when the characteristic of the finite field divides the degree of the polynomial. The equivalence relations make it possible to reduce the size of the space, when doing an exhaustive search. As a result, we have been able to compute almost all permutation polynomials of degree $d$ at most 10 over $GF(q)$, where $q$ is at most 97. We have also been able to compute nPPs of degrees 11 and 12 in a few cases. The techniques apply to arbitrary $q$ and $d$. In addition, the equivalence relations allow the set all PPs for a given degree and a given field $GF(q)$ to be succinctly described by their representative nPPs. We give several tables at the end of the paper listing the representative nPPs (\ie the equivalence classes) for several values of $q$ and $d$. We also give several new lower bounds for $M(n,D)$, the maximum number of permutations on $n$ symbols with pairwise Hamming distance $D$, mostly derived from our results on PPs.

preprint2020arXiv

Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification

Modern face recognition systems leverage datasets containing images of hundreds of thousands of specific individuals' faces to train deep convolutional neural networks to learn an embedding space that maps an arbitrary individual's face to a vector representation of their identity. The performance of a face recognition system in face verification (1:1) and face identification (1:N) tasks is directly related to the ability of an embedding space to discriminate between identities. Recently, there has been significant public scrutiny into the source and privacy implications of large-scale face recognition training datasets such as MS-Celeb-1M and MegaFace, as many people are uncomfortable with their face being used to train dual-use technologies that can enable mass surveillance. However, the impact of an individual's inclusion in training data on a derived system's ability to recognize them has not previously been studied. In this work, we audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model's training data and an accuracy of 75.73% for those not present. This modest difference in accuracy demonstrates that face recognition systems using deep learning work better for individuals they are trained on, which has serious privacy implications when one considers all major open source face recognition training datasets do not obtain informed consent from individuals during their collection.

preprint2020arXiv

Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness

While deep neural networks have been achieving state-of-the-art performance across a wide variety of applications, their vulnerability to adversarial attacks limits their widespread deployment for safety-critical applications. Alongside other adversarial defense approaches being investigated, there has been a very recent interest in improving adversarial robustness in deep neural networks through the introduction of perturbations during the training process. However, such methods leverage fixed, pre-defined perturbations and require significant hyper-parameter tuning that makes them very difficult to leverage in a general fashion. In this study, we introduce Learn2Perturb, an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks. More specifically, we introduce novel perturbation-injection modules that are incorporated at each layer to perturb the feature space and increase uncertainty in the network. This feature perturbation is performed at both the training and the inference stages. Furthermore, inspired by the Expectation-Maximization, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively. Experimental results on CIFAR-10 and CIFAR-100 datasets show that the proposed Learn2Perturb method can result in deep neural networks which are $4-7\%$ more robust on $l_{\infty}$ FGSM and PDG adversarial attacks and significantly outperforms the state-of-the-art against $l_2$ $C\&W$ attack and a wide range of well-known black-box attacks.

preprint2020arXiv

Quantization in Relative Gradient Angle Domain For Building Polygon Estimation

Building footprint extraction in remote sensing data benefits many important applications, such as urban planning and population estimation. Recently, rapid development of Convolutional Neural Networks (CNNs) and open-sourced high resolution satellite building image datasets have pushed the performance boundary further for automated building extractions. However, CNN approaches often generate imprecise building morphologies including noisy edges and round corners. In this paper, we leverage the performance of CNNs, and propose a module that uses prior knowledge of building corners to create angular and concise building polygons from CNN segmentation outputs. We describe a new transform, Relative Gradient Angle Transform (RGA Transform) that converts object contours from time vs. space to time vs. angle. We propose a new shape descriptor, Boundary Orientation Relation Set (BORS), to describe angle relationship between edges in RGA domain, such as orthogonality and parallelism. Finally, we develop an energy minimization framework that makes use of the angle relationship in BORS to straighten edges and reconstruct sharp corners, and the resulting corners create a polygon. Experimental results demonstrate that our method refines CNN output from a rounded approximation to a more clear-cut angular shape of the building footprint.

preprint2020arXiv

Squeeze-and-Attention Networks for Semantic Segmentation

The recent integration of attention mechanisms into segmentation networks improves their representational capabilities through a great emphasis on more informative features. However, these attention mechanisms ignore an implicit sub-task of semantic segmentation and are constrained by the grid structure of convolution kernels. In this paper, we propose a novel squeeze-and-attention network (SANet) architecture that leverages an effective squeeze-and-attention (SA) module to account for two distinctive characteristics of segmentation: i) pixel-group attention, and ii) pixel-wise prediction. Specifically, the proposed SA modules impose pixel-group attention on conventional convolution by introducing an 'attention' convolutional channel, thus taking into account spatial-channel inter-dependencies in an efficient manner. The final segmentation results are produced by merging outputs from four hierarchical stages of a SANet to integrate multi-scale contexts for obtaining an enhanced pixel-wise prediction. Empirical experiments on two challenging public datasets validate the effectiveness of the proposed SANets, which achieves 83.2% mIoU (without COCO pre-training) on PASCAL VOC and a state-of-the-art mIoU of 54.4% on PASCAL Context.

preprint2020arXiv

TimeConvNets: A Deep Time Windowed Convolution Neural Network Design for Real-time Video Facial Expression Recognition

A core challenge faced by the majority of individuals with Autism Spectrum Disorder (ASD) is an impaired ability to infer other people's emotions based on their facial expressions. With significant recent advances in machine learning, one potential approach to leveraging technology to assist such individuals to better recognize facial expressions and reduce the risk of possible loneliness and depression due to social isolation is the design of computer vision-driven facial expression recognition systems. Motivated by this social need as well as the low latency requirement of such systems, this study explores a novel deep time windowed convolutional neural network design (TimeConvNets) for the purpose of real-time video facial expression recognition. More specifically, we explore an efficient convolutional deep neural network design for spatiotemporal encoding of time windowed video frame sub-sequences and study the respective balance between speed and accuracy. Furthermore, to evaluate the proposed TimeConvNet design, we introduce a more difficult dataset called BigFaceX, composed of a modified aggregation of the extended Cohn-Kanade (CK+), BAUM-1, and the eNTERFACE public datasets. Different variants of the proposed TimeConvNet design with different backbone network architectures were evaluated using BigFaceX alongside other network designs for capturing spatiotemporal information, and experimental results demonstrate that TimeConvNets can better capture the transient nuances of facial expressions and boost classification accuracy while maintaining a low inference time.

preprint2020arXiv

Understanding the temporal evolution of COVID-19 research through machine learning and natural language processing

The outbreak of the novel coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been continuously affecting human lives and communities around the world in many ways, from cities under lockdown to new social experiences. Although in most cases COVID-19 results in mild illness, it has drawn global attention due to the extremely contagious nature of SARS-CoV-2. Governments and healthcare professionals, along with people and society as a whole, have taken any measures to break the chain of transition and flatten the epidemic curve. In this study, we used multiple data sources, i.e., PubMed and ArXiv, and built several machine learning models to characterize the landscape of current COVID-19 research by identifying the latent topics and analyzing the temporal evolution of the extracted research themes, publications similarity, and sentiments, within the time-frame of January- May 2020. Our findings confirm the types of research available in PubMed and ArXiv differ significantly, with the former exhibiting greater diversity in terms of COVID-19 related issues and the latter focusing more on intelligent systems/tools to predict/diagnose COVID-19. The special attention of the research community to the high-risk groups and people with complications was also confirmed.

preprint2020arXiv

Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis

An ongoing major challenge in computer vision is the task of person re-identification, where the goal is to match individuals across different, non-overlapping camera views. While recent success has been achieved via supervised learning using deep neural networks, such methods have limited widespread adoption due to the need for large-scale, customized data annotation. As such, there has been a recent focus on unsupervised learning approaches to mitigate the data annotation issue; however, current approaches in literature have limited performance compared to supervised learning approaches as well as limited applicability for adoption in new environments. In this paper, we address the aforementioned challenges faced in person re-identification for real-world, practical scenarios by introducing a novel, unsupervised domain adaptation approach for person re-identification. This is accomplished through the introduction of: i) k-reciprocal tracklet Clustering for Unsupervised Domain Adaptation (ktCUDA) (for pseudo-label generation on target domain), and ii) Synthesized Heterogeneous RE-id Domain (SHRED) composed of large-scale heterogeneous independent source environments (for improving robustness and adaptability to a wide diversity of target environments). Experimental results across four different image and video benchmark datasets show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance when compared to existing state-of-the-art methods, as well as demonstrate better adaptability to different types of environments.

preprint2020arXiv

Vulnerability Under Adversarial Machine Learning: Bias or Variance?

Prior studies have unveiled the vulnerability of the deep neural networks in the context of adversarial machine learning, leading to great recent attention into this area. One interesting question that has yet to be fully explored is the bias-variance relationship of adversarial machine learning, which can potentially provide deeper insights into this behaviour. The notion of bias and variance is one of the main approaches to analyze and evaluate the generalization and reliability of a machine learning model. Although it has been extensively used in other machine learning models, it is not well explored in the field of deep learning and it is even less explored in the area of adversarial machine learning. In this study, we investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network and analyze how adversarial perturbations can affect the generalization of a network. We derive the bias-variance trade-off for both classification and regression applications based on two main loss functions: (i) mean squared error (MSE), and (ii) cross-entropy. Furthermore, we perform quantitative analysis with both simulated and real data to empirically evaluate consistency with the derived bias-variance tradeoffs. Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation from a bias-variance point of view and how this type of perturbation would change the performance of a network. Moreover, given these new theoretical findings, we introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies (e.g., PGD) while providing a high success rate in fooling deep neural networks in lower perturbation magnitudes.

preprint2016arXiv

A spectral-spatial fusion model for robust blood pulse waveform extraction in photoplethysmographic imaging

Photoplethysmographic imaging is a camera-based solution for non-contact cardiovascular monitoring from a distance. This technology enables monitoring in situations where contact-based devices may be problematic or infeasible, such as ambulatory, sleep, and multi-individual monitoring. However, extracting the blood pulse waveform signal is challenging due to the unknown mixture of relevant (pulsatile) and irrelevant pixels in the scene. Here, we design and implement a signal fusion framework, FusionPPG, for extracting a blood pulse waveform signal with strong temporal fidelity from a scene without requiring anatomical priors (e.g., facial tracking). The extraction problem is posed as a Bayesian least squares fusion problem, and solved using a novel probabilistic pulsatility model that incorporates both physiologically derived spectral and spatial waveform priors to identify pulsatility characteristics in the scene. Experimental results show statistically significantly improvements compared to the FaceMeanPPG method ($p<0.001$) and DistancePPG ($p<0.001$) methods. Heart rates predicted using FusionPPG correlated strongly with ground truth measurements ($r^2=0.9952$). FusionPPG was the only method able to assess cardiac arrhythmia via temporal analysis.

preprint2016arXiv

Deep Quality: A Deep No-reference Quality Assessment System

Image quality assessment (IQA) continues to garner great interest in the research community, particularly given the tremendous rise in consumer video capture and streaming. Despite significant research effort in IQA in the past few decades, the area of no-reference image quality assessment remains a great challenge and is largely unsolved. In this paper, we propose a novel no-reference image quality assessment system called Deep Quality, which leverages the power of deep learning to model the complex relationship between visual content and the perceived quality. Deep Quality consists of a novel multi-scale deep convolutional neural network, trained to learn to assess image quality based on training samples consisting of different distortions and degradations such as blur, Gaussian noise, and compression artifacts. Preliminary results using the CSIQ benchmark image quality dataset showed that Deep Quality was able to achieve strong quality prediction performance (89% patch-level and 98% image-level prediction accuracy), being able to achieve similar performance as full-reference IQA methods.

preprint2016arXiv

Evolutionary Synthesis of Deep Neural Networks via Synaptic Cluster-driven Genetic Encoding

There has been significant recent interest towards achieving highly efficient deep neural network architectures. A promising paradigm for achieving this is the concept of evolutionary deep intelligence, which attempts to mimic biological evolution processes to synthesize highly-efficient deep neural networks over successive generations. An important aspect of evolutionary deep intelligence is the genetic encoding scheme used to mimic heredity, which can have a significant impact on the quality of offspring deep neural networks. Motivated by the neurobiological phenomenon of synaptic clustering, we introduce a new genetic encoding scheme where synaptic probability is driven towards the formation of a highly sparse set of synaptic clusters. Experimental results for the task of image classification demonstrated that the synthesized offspring networks using this synaptic cluster-driven genetic encoding scheme can achieve state-of-the-art performance while having network architectures that are not only significantly more efficient (with a ~125-fold decrease in synapses for MNIST) compared to the original ancestor network, but also tailored for GPU-accelerated machine learning applications.

preprint2016arXiv

Laser light-field fusion for wide-field lensfree on-chip phase contrast nanoscopy

Wide-field lensfree on-chip microscopy, which leverages holography principles to capture interferometric light-field encodings without lenses, is an emerging imaging modality with widespread interest given the large field-of-view compared to lens-based techniques. In this study, we introduce the idea of laser light-field fusion for lensfree on-chip phase contrast nanoscopy, where interferometric laser light-field encodings acquired using an on-chip setup with laser pulsations at different wavelengths are fused to produce marker-free phase contrast images of superior quality with resolving power more than five times below the pixel pitch of the sensor array and more than 40% beyond the diffraction limit. As a proof of concept, we demonstrate, for the first time, a wide-field lensfree on-chip instrument successfully detecting 300 nm particles, resulting in a numerical aperture of 1.1, across a large field-of-view of $\sim$ 30 mm$^2$ without any specialized or intricate sample preparation, or the use of synthetic aperture- or shift-based techniques.

preprint2016arXiv

Noise-Compensated, Bias-Corrected Diffusion Weighted Endorectal Magnetic Resonance Imaging via a Stochastically Fully-Connected Joint Conditional Random Field Model

Diffusion weighted magnetic resonance imaging (DW-MR) is a powerful tool in imaging-based prostate cancer screening and detection. Endorectal coils are commonly used in DW-MR imaging to improve the signal-to-noise ratio (SNR) of the acquisition, at the expense of significant intensity inhomogeneities (bias field) that worsens as we move away from the endorectal coil. The presence of bias field can have a significant negative impact on the accuracy of different image analysis tasks, as well as prostate tumor localization, thus leading to increased inter- and intra-observer variability. Retrospective bias correction approaches are introduced as a more efficient way of bias correction compared to the prospective methods such that they correct for both of the scanner and anatomy-related bias fields in MR imaging. Previously proposed retrospective bias field correction methods suffer from undesired noise amplification that can reduce the quality of bias-corrected DW-MR image. Here, we propose a unified data reconstruction approach that enables joint compensation of bias field as well as data noise in DW-MR imaging. The proposed noise-compensated, bias-corrected (NCBC) data reconstruction method takes advantage of a novel stochastically fully connected joint conditional random field (SFC-JCRF) model to mitigate the effects of data noise and bias field in the reconstructed MR data. The proposed NCBC reconstruction method was tested on synthetic DW-MR data, physical DW-phantom as well as real DW-MR data all acquired using endorectal MR coil. Both qualitative and quantitative analysis illustrated that the proposed NCBC method can achieve improved image quality when compared to other tested bias correction methods. As such, the proposed NCBC method may have potential as a useful retrospective approach for improving the consistency of image interpretations.

preprint2016arXiv

Non-contact hemodynamic imaging reveals the jugular venous pulse waveform

Cardiovascular monitoring is important to prevent diseases from progressing. The jugular venous pulse (JVP) waveform offers important clinical information about cardiac health, but is not routinely examined due to its invasive catheterisation procedure. Here, we demonstrate for the first time that the JVP can be consistently observed in a non-contact manner using a novel light-based photoplethysmographic imaging system, coded hemodynamic imaging (CHI). While traditional monitoring methods measure the JVP at a single location, CHI's wide-field imaging capabilities were able to observe the jugular venous pulse's spatial flow profile for the first time. The important inflection points in the JVP were observed, meaning that cardiac abnormalities can be assessed through JVP distortions. CHI provides a new way to assess cardiac health through non-contact light-based JVP monitoring, and can be used in non-surgical environments for cardiac assessment.

preprint2016arXiv

Resolution- and throughput-enhanced spectroscopy using high-throughput computational slit

There exists a fundamental tradeoff between spectral resolution and the efficiency or throughput for all optical spectrometers. The primary factors affecting the spectral resolution and throughput of an optical spectrometer are the size of the entrance aperture and the optical power of the focusing element. Thus far collective optimization of the above mentioned has proven difficult. Here, we introduce the concept of high-throughput computational slits (HTCS), a numerical technique for improving both the effective spectral resolution and efficiency of a spectrometer. The proposed HTCS approach was experimentally validated using an optical spectrometer configured with a 200 um entrance aperture, test, and a 50 um entrance aperture, control, demonstrating improvements in spectral resolution of the spectrum by ~ 50% over the control spectral resolution and improvements in efficiency of > 2 times over the efficiency of the largest entrance aperture used in the study while producing highly accurate spectra.

preprint2016arXiv

Scene Invariant Crowd Segmentation and Counting Using Scale-Normalized Histogram of Moving Gradients (HoMG)

The problem of automated crowd segmentation and counting has garnered significant interest in the field of video surveillance. This paper proposes a novel scene invariant crowd segmentation and counting algorithm designed with high accuracy yet low computational complexity in mind, which is key for widespread industrial adoption. A novel low-complexity, scale-normalized feature called Histogram of Moving Gradients (HoMG) is introduced for highly effective spatiotemporal representation of individuals and crowds within a video. Real-time crowd segmentation is achieved via boosted cascade of weak classifiers based on sliding-window HoMG features, while linear SVM regression of crowd-region HoMG features is employed for real-time crowd counting. Experimental results using multi-camera crowd datasets show that the proposed algorithm significantly outperform state-of-the-art crowd counting algorithms, as well as achieve very promising crowd segmentation results, thus demonstrating the efficacy of the proposed method for highly-accurate, real-time video-driven crowd analysis.

preprint2016arXiv

Spatial probabilistic pulsatility model for enhancing photoplethysmographic imaging systems

Photolethysmographic imaging (PPGI) is a widefield non-contact biophotonic technology able to remotely monitor cardiovascular function over anatomical areas. Though spatial context can provide increased physiological insight, existing PPGI systems rely on coarse spatial averaging with no anatomical priors for assessing arterial pulsatility. Here, we developed a continuous probabilistic pulsatility model for importance-weighted blood pulse waveform extraction. Using a data-driven approach, the model was constructed using a 23 participant sample with large demographic variation (11/12 female/male, age 11-60 years, BMI 16.4-35.1 kg$\cdot$m$^{-2}$). Using time-synchronized ground-truth waveforms, spatial correlation priors were computed and projected into a co-aligned importance-weighted Cartesian space. A modified Parzen-Rosenblatt kernel density estimation method was used to compute the continuous resolution-agnostic probabilistic pulsatility model. The model identified locations that consistently exhibited pulsatility across the sample. Blood pulse waveform signals extracted with the model exhibited significantly stronger temporal correlation ($W=35,p<0.01$) and spectral SNR ($W=31,p<0.01$) compared to uniform spatial averaging. Heart rate estimation was in strong agreement with true heart rate ($r^2=0.9619$, error $(μ,σ)=(0.52,1.69)$ bpm).

preprint2015arXiv

A Bayesian Residual Transform for Signal Processing

Multi-scale decomposition has been an invaluable tool for the processing of physiological signals. Much focus in multi-scale decomposition for processing such signals have been based on scale-space theory and wavelet transforms. In this study, we take a different perspective on multi-scale decomposition by investigating the feasibility of utilizing a Bayesian-based method for multi-scale signal decomposition called Bayesian Residual Transform (BRT) for the purpose of physiological signal processing. In BRT, a signal is modeled as the summation of residual signals, each characterizing information from the signal at different scales. A deep cascading framework is introduced as a realization of the BRT. Signal-to-noise ratio (SNR) analysis using electrocardiography (ECG) signals was used to illustrate the feasibility of using the BRT for suppressing noise in physiological signals. Results in this study show that it is feasible to utilize the BRT for processing physiological signals for tasks such as noise suppression.

preprint2015arXiv

A Deep-structured Conditional Random Field Model for Object Silhouette Tracking

In this work, we introduce a deep-structured conditional random field (DS-CRF) model for the purpose of state-based object silhouette tracking. The proposed DS-CRF model consists of a series of state layers, where each state layer spatially characterizes the object silhouette at a particular point in time. The interactions between adjacent state layers are established by inter-layer connectivity dynamically determined based on inter-frame optical flow. By incorporate both spatial and temporal context in a dynamic fashion within such a deep-structured probabilistic graphical model, the proposed DS-CRF model allows us to develop a framework that can accurately and efficiently track object silhouettes that can change greatly over time, as well as under different situations such as occlusion and multiple targets within the scene. Experiment results using video surveillance datasets containing different scenarios such as occlusion and multiple targets showed that the proposed DS-CRF approach provides strong object silhouette tracking performance when compared to baseline methods such as mean-shift tracking, as well as state-of-the-art methods such as context tracking and boosted particle filtering.

preprint2015arXiv

A deep-structured fully-connected random field model for structured inference

There has been significant interest in the use of fully-connected graphical models and deep-structured graphical models for the purpose of structured inference. However, fully-connected and deep-structured graphical models have been largely explored independently, leaving the unification of these two concepts ripe for exploration. A fundamental challenge with unifying these two types of models is in dealing with computational complexity. In this study, we investigate the feasibility of unifying fully-connected and deep-structured models in a computationally tractable manner for the purpose of structured inference. To accomplish this, we introduce a deep-structured fully-connected random field (DFRF) model that integrates a series of intermediate sparse auto-encoding layers placed between state layers to significantly reduce computational complexity. The problem of image segmentation was used to illustrate the feasibility of using the DFRF for structured inference in a computationally tractable manner. Results in this study show that it is feasible to unify fully-connected and deep-structured models in a computationally tractable manner for solving structured inference problems such as image segmentation.

preprint2015arXiv

A Weakly Supervised Learning Approach based on Spectral Graph-Theoretic Grouping

In this study, a spectral graph-theoretic grouping strategy for weakly supervised classification is introduced, where a limited number of labelled samples and a larger set of unlabelled samples are used to construct a larger annotated training set composed of strongly labelled and weakly labelled samples. The inherent relationship between the set of strongly labelled samples and the set of unlabelled samples is established via spectral grouping, with the unlabelled samples subsequently weakly annotated based on the strongly labelled samples within the associated spectral groups. A number of similarity graph models for spectral grouping, including two new similarity graph models introduced in this study, are explored to investigate their performance in the context of weakly supervised classification in handling different types of data. Experimental results using benchmark datasets as well as real EMG datasets demonstrate that the proposed approach to weakly supervised classification can provide noticeable improvements in classification performance, and that the proposed similarity graph models can lead to ultimate learning results that are either better than or on a par with existing similarity graph models in the context of spectral grouping for weakly supervised classification.

preprint2015arXiv

Bayesian-based aberration correction and numerical diffraction for improved lensfree on-chip microscopy of biological specimens

Lensfree on-chip microscopy is an emerging imaging technique that can be used to visualize and study biological specimens without the need for imaging lens systems. Important issues that can limit the performance of lensfree on-chip microscopy include interferometric aberrations, acquisition noise, and image reconstruction artifacts. In this study, we introduce a Bayesian-based method for performing aberration correction and numerical diffraction that accounts for all three of these issues to improve the effective numerical aperture (NA) and signal-to-noise ratio (SNR) of the reconstructed microscopic image. The proposed method was experimentally validated using the USAF resolution target as well as real waterborne Anabaena flos-aquae samples, demonstrating improvements in NA by ~25% over the standard method, and improvements in SNR of 2.3 dB and 3.8 dB in the reconstructed image when compared to the reconstructed images produced using the standard method and a maximum likelihood estimation method, respectively.

preprint2015arXiv

Bayesian-based deconvolution fluorescence microscopy using dynamically updated nonparametric nonstationary expectation estimates

Fluorescence microscopy is widely used for the study of biological specimens. Deconvolution can significantly improve the resolution and contrast of images produced using fluorescence microscopy; in particular, Bayesian-based methods have become very popular in deconvolution fluorescence microscopy. An ongoing challenge with Bayesian-based methods is in dealing with the presence of noise in low SNR imaging conditions. In this study, we present a Bayesian-based method for performing deconvolution using dynamically updated nonparametric nonstationary expectation estimates that can improve the fluorescence microscopy image quality in the presence of noise, without explicit use of spatial regularization.

preprint2015arXiv

Depth Compensated Spectral Domain Optical Coherence Tomography via Digital Compensation

Spectral Domain Optical Coherence Tomography (SD-OCT) is a well-known imaging modality which allows for \textit{in-vivo} visualization of the morphology of different biological tissues at cellular level resolutions. The overall SD-OCT imaging quality in terms of axial resolution and Signal-to-Noise Ratio (SNR) degrades with imaging depth, while the lateral resolution degrades with distance from the focal plane. This image quality degradation is due both to the design of the SD-OCT imaging system and the optical properties of the imaged object. Here, we present a novel Depth Compensated SD-OCT (DC-OCT) system that integrates a Depth Compensating Digital Signal Processing (DC-DSP) module to improve the overall imaging quality via digital compensation. The designed DC-DSP module can be integrated to any SD-OCT system and is able to simultaneously compensate for the depth-dependent loss of axial and lateral resolutions, depth-varying SNR, as well as sidelobe artifact for improved imaging quality. The integrated DC-DSP module is based on a unified Maximum a Posteriori (MAP) framework which incorporates a Stochastically Fully-connected Conditional Random Field (SF-CRF) model to produce tomograms with reduced speckle noise and artifact, as well as higher spatial resolution. The performance of our proposed DC-OCT system was tested on a USAF resolution target as well as different biological tissue samples, and the results demonstrate the potential of the proposed DC-OCT system for improving spatial resolution, contrast, and SNR.

preprint2015arXiv

Discovery Radiomics for Multi-Parametric MRI Prostate Cancer Detection

Prostate cancer is the most diagnosed form of cancer in Canadian men, and is the third leading cause of cancer death. Despite these statistics, prognosis is relatively good with a sufficiently early diagnosis, making fast and reliable prostate cancer detection crucial. As imaging-based prostate cancer screening, such as magnetic resonance imaging (MRI), requires an experienced medical professional to extensively review the data and perform a diagnosis, radiomics-driven methods help streamline the process and has the potential to significantly improve diagnostic accuracy and efficiency, and thus improving patient survival rates. These radiomics-driven methods currently rely on hand-crafted sets of quantitative imaging-based features, which are selected manually and can limit their ability to fully characterize unique prostate cancer tumour phenotype. In this study, we propose a novel \textit{discovery radiomics} framework for generating custom radiomic sequences tailored for prostate cancer detection. Discovery radiomics aims to uncover abstract imaging-based features that capture highly unique tumour traits and characteristics beyond what can be captured using predefined feature models. In this paper, we discover new custom radiomic sequencers for generating new prostate radiomic sequences using multi-parametric MRI data. We evaluated the performance of the discovered radiomic sequencer against a state-of-the-art hand-crafted radiomic sequencer for computer-aided prostate cancer detection with a feedforward neural network using real clinical prostate multi-parametric MRI data. Results for the discovered radiomic sequencer demonstrate good performance in prostate cancer detection and clinical decision support relative to the hand-crafted radiomic sequencer. The use of discovery radiomics shows potential for more efficient and reliable automatic prostate cancer detection.

preprint2015arXiv

Discovery Radiomics via StochasticNet Sequencers for Cancer Detection

Radiomics has proven to be a powerful prognostic tool for cancer detection, and has previously been applied in lung, breast, prostate, and head-and-neck cancer studies with great success. However, these radiomics-driven methods rely on pre-defined, hand-crafted radiomic feature sets that can limit their ability to characterize unique cancer traits. In this study, we introduce a novel discovery radiomics framework where we directly discover custom radiomic features from the wealth of available medical imaging data. In particular, we leverage novel StochasticNet radiomic sequencers for extracting custom radiomic features tailored for characterizing unique cancer tissue phenotype. Using StochasticNet radiomic sequencers discovered using a wealth of lung CT data, we perform binary classification on 42,340 lung lesions obtained from the CT scans of 93 patients in the LIDC-IDRI dataset. Preliminary results show significant improvement over previous state-of-the-art methods, indicating the potential of the proposed discovery radiomics framework for improving cancer screening and diagnosis.

preprint2015arXiv

Domain Adaptation and Transfer Learning in StochasticNets

Transfer learning is a recent field of machine learning research that aims to resolve the challenge of dealing with insufficient training data in the domain of interest. This is a particular issue with traditional deep neural networks where a large amount of training data is needed. Recently, StochasticNets was proposed to take advantage of sparse connectivity in order to decrease the number of parameters that needs to be learned, which in turn may relax training data size requirements. In this paper, we study the efficacy of transfer learning on StochasticNet frameworks. Experimental results show ~7% improvement on StochasticNet performance when the transfer learning is applied in training step.

preprint2015arXiv

Efficient Deep Feature Learning and Extraction via StochasticNets

Deep neural networks are a powerful tool for feature learning and extraction given their ability to model high-level abstractions in highly complex data. One area worth exploring in feature learning and extraction using deep neural networks is efficient neural connectivity formation for faster feature learning and extraction. Motivated by findings of stochastic synaptic connectivity formation in the brain as well as the brain's uncanny ability to efficiently represent information, we propose the efficient learning and extraction of features via StochasticNets, where sparsely-connected deep neural networks can be formed via stochastic connectivity between neurons. To evaluate the feasibility of such a deep neural network architecture for feature learning and extraction, we train deep convolutional StochasticNets to learn abstract features using the CIFAR-10 dataset, and extract the learned features from images to perform classification on the SVHN and STL-10 datasets. Experimental results show that features learned using deep convolutional StochasticNets, with fewer neural connections than conventional deep convolutional neural networks, can allow for better or comparable classification accuracy than conventional deep neural networks: relative test error decrease of ~4.5% for classification on the STL-10 dataset and ~1% for classification on the SVHN dataset. Furthermore, it was shown that the deep features extracted using deep convolutional StochasticNets can provide comparable classification accuracy even when only 10% of the training data is used for feature learning. Finally, it was also shown that significant gains in feature extraction speed can be achieved in embedded applications using StochasticNets. As such, StochasticNets allow for faster feature learning and extraction performance while facilitate for better or comparable accuracy performances.

preprint2015arXiv

Forming A Random Field via Stochastic Cliques: From Random Graphs to Fully Connected Random Fields

Random fields have remained a topic of great interest over past decades for the purpose of structured inference, especially for problems such as image segmentation. The local nodal interactions commonly used in such models often suffer the short-boundary bias problem, which are tackled primarily through the incorporation of long-range nodal interactions. However, the issue of computational tractability becomes a significant issue when incorporating such long-range nodal interactions, particularly when a large number of long-range nodal interactions (e.g., fully-connected random fields) are modeled. In this work, we introduce a generalized random field framework based around the concept of stochastic cliques, which addresses the issue of computational tractability when using fully-connected random fields by stochastically forming a sparse representation of the random field. The proposed framework allows for efficient structured inference using fully-connected random fields without any restrictions on the potential functions that can be utilized. Several realizations of the proposed framework using graph cuts are presented and evaluated, and experimental results demonstrate that the proposed framework can provide competitive performance for the purpose of image segmentation when compared to existing fully-connected and principled deep random field frameworks.

preprint2015arXiv

Lensfree Spectral Light-field Fusion Microscopy for Contrast- and Resolution-enhanced Imaging of Biological Specimens

A lensfree spectral light-field fusion microscopy (LSLFM) system is presented for enabling contrast- and resolution-enhanced imaging of biological specimens. LSLFM consists of a pulsed multispectral lensfree microscope for capturing interferometric light-field encodings at various wavelengths, and Bayesian-based fusion to reconstruct a fused object light-field from the encodings. By fusing unique object detail information captured at different wavelengths, LSLFM can achieve improved resolution, contrast, and signal-to-noise ratio (SNR) over a single-channel lensfree microscopy system. A five-channel LSLFM system was developed and quantitatively evaluated to validate the design. Experimental results demonstrated that the LSLFM system provided SNR improvements of 6-12 dB, as well as a six-fold improvement in the dispersion index (DI), over that achieved using a single-channel, resolution-enhancing lensfree deconvolution microscopy system or its multi-wavelength counterpart. Furthermore, the LSLFM system achieved an increase in numerical aperture (NA) of ~16% over a single-channel resolution-enhancing lensfree deconvolution microscopy system at the highest-resolution wavelength used in the study. Samples of Staurastrum paradoxum, a waterborne algae, and human corneal epithelial cells were imaged using the system to illustrate its potential for enhanced imaging of biological specimens.

preprint2015arXiv

Monte Carlo-based Noise Compensation in Coil Intensity Corrected Endorectal MRI

Background: Prostate cancer is one of the most common forms of cancer found in males making early diagnosis important. Magnetic resonance imaging (MRI) has been useful in visualizing and localizing tumor candidates and with the use of endorectal coils (ERC), the signal-to-noise ratio (SNR) can be improved. The coils introduce intensity inhomogeneities and the surface coil intensity correction built into MRI scanners is used to reduce these inhomogeneities. However, the correction typically performed at the MRI scanner level leads to noise amplification and noise level variations. Methods: In this study, we introduce a new Monte Carlo-based noise compensation approach for coil intensity corrected endorectal MRI which allows for effective noise compensation and preservation of details within the prostate. The approach accounts for the ERC SNR profile via a spatially-adaptive noise model for correcting non-stationary noise variations. Such a method is useful particularly for improving the image quality of coil intensity corrected endorectal MRI data performed at the MRI scanner level and when the original raw data is not available. Results: SNR and contrast-to-noise ratio (CNR) analysis in patient experiments demonstrate an average improvement of 11.7 dB and 11.2 dB respectively over uncorrected endorectal MRI, and provides strong performance when compared to existing approaches. Conclusions: A new noise compensation method was developed for the purpose of improving the quality of coil intensity corrected endorectal MRI data performed at the MRI scanner level. We illustrate that promising noise compensation performance can be achieved for the proposed approach, which is particularly important for processing coil intensity corrected endorectal MRI data performed at the MRI scanner level and when the original raw data is not available.

preprint2015arXiv

Non-contact transmittance photoplethysmographic imaging (PPGI) for long-distance cardiovascular monitoring

Photoplethysmography (PPG) devices are widely used for monitoring cardiovascular function. However, these devices require skin contact, which restrict their use to at-rest short-term monitoring using single-point measurements. Photoplethysmographic imaging (PPGI) has been recently proposed as a non-contact monitoring alternative by measuring blood pulse signals across a spatial region of interest. Existing systems operate in reflectance mode, of which many are limited to short-distance monitoring and are prone to temporal changes in ambient illumination. This paper is the first study to investigate the feasibility of long-distance non-contact cardiovascular monitoring at the supermeter level using transmittance PPGI. For this purpose, a novel PPGI system was designed at the hardware and software level using ambient correction via temporally coded illumination (TCI) and signal processing for PPGI signal extraction. Experimental results show that the processing steps yield a substantially more pulsatile PPGI signal than the raw acquired signal, resulting in statistically significant increases in correlation to ground-truth PPG in both short- ($p \in [<0.0001, 0.040]$) and long-distance ($p \in [<0.0001, 0.056]$) monitoring. The results support the hypothesis that long-distance heart rate monitoring is feasible using transmittance PPGI, allowing for new possibilities of monitoring cardiovascular function in a non-contact manner.

preprint2015arXiv

Numerical Demultiplexing of Color Image Sensor Measurements via Non-linear Random Forest Modeling

The simultaneous capture of imaging data at multiple wavelengths across the electromagnetic spectrum is highly challenging, requiring complex and costly multispectral image sensors. In this study, we introduce a comprehensive framework for performing simultaneous multispectral imaging using conventional image sensors with color filter arrays via numerical demultiplexing of the color image sensor measurements. A numerical forward model characterizing the formation of sensor measurements from light spectra hitting the sensor is constructed based on a comprehensive spectral characterization of the sensor. A numerical demultiplexer is then learned via non-linear random forest modeling based on the forward model. Given the learned numerical demultiplexer, one can then demultiplex simultaneously-acquired measurements made by the image sensor into reflectance intensities at discrete selectable wavelengths, resulting in a higher resolution reflectance spectrum. Simulation and real-world experimental results demonstrate the efficacy of such a method for simultaneous multispectral imaging.

preprint2015arXiv

Sparse Reconstruction of Compressive Sensing MRI using Cross-Domain Stochastically Fully Connected Conditional Random Fields

Magnetic Resonance Imaging (MRI) is a crucial medical imaging technology for the screening and diagnosis of frequently occurring cancers. However image quality may suffer by long acquisition times for MRIs due to patient motion, as well as result in great patient discomfort. Reducing MRI acquisition time can reduce patient discomfort and as a result reduces motion artifacts from the acquisition process. Compressive sensing strategies, when applied to MRI, have been demonstrated to be effective at decreasing acquisition times significantly by sparsely sampling the \emph{k}-space during the acquisition process. However, such a strategy requires advanced reconstruction algorithms to produce high quality and reliable images from compressive sensing MRI. This paper proposes a new reconstruction approach based on cross-domain stochastically fully connected conditional random fields (CD-SFCRF) for compressive sensing MRI. The CD-SFCRF introduces constraints in both \emph{k}-space and spatial domains within a stochastically fully connected graphical model to produce improved MRI reconstruction. Experimental results using T2-weighted (T2w) imaging and diffusion-weighted imaging (DWI) of the prostate show strong performance in preserving fine details and tissue structures in the reconstructed images when compared to other tested methods even at low sampling rates.

preprint2015arXiv

StochasticNet: Forming Deep Neural Networks via Stochastic Connectivity

Deep neural networks is a branch in machine learning that has seen a meteoric rise in popularity due to its powerful abilities to represent and model high-level abstractions in highly complex data. One area in deep neural networks that is ripe for exploration is neural connectivity formation. A pivotal study on the brain tissue of rats found that synaptic formation for specific functional connectivity in neocortical neural microcircuits can be surprisingly well modeled and predicted as a random formation. Motivated by this intriguing finding, we introduce the concept of StochasticNet, where deep neural networks are formed via stochastic connectivity between neurons. As a result, any type of deep neural networks can be formed as a StochasticNet by allowing the neuron connectivity to be stochastic. Stochastic synaptic formations, in a deep neural network architecture, can allow for efficient utilization of neurons for performing specific tasks. To evaluate the feasibility of such a deep neural network architecture, we train a StochasticNet using four different image datasets (CIFAR-10, MNIST, SVHN, and STL-10). Experimental results show that a StochasticNet, using less than half the number of neural connections as a conventional deep neural network, achieves comparable accuracy and reduces overfitting on the CIFAR-10, MNIST and SVHN dataset. Interestingly, StochasticNet with less than half the number of neural connections, achieved a higher accuracy (relative improvement in test error rate of ~6% compared to ConvNet) on the STL-10 dataset than a conventional deep neural network. Finally, StochasticNets have faster operational speeds while achieving better or similar accuracy performances.

Alexander Wong

What is connected

Connect this record

See the researcher in context

Building this map preview

61 published item(s)

COVID-Net USPro: An Open-Source Explainable Few-Shot Deep Prototypical Network to Monitor and Detect COVID-19 Infection from Point-of-Care Ultrasound Images

AI-Powered Non-Contact In-Home Gait Monitoring and Activity Recognition System Based on mm-Wave FMCW Radar and Cloud Computing

CellDefectNet: A Machine-designed Attention Condenser Network for Electroluminescence-based Photovoltaic Cell Defect Inspection

COVID-Net US-X: Enhanced Deep Neural Network for Detection of COVID-19 Patient Cases from Convex Ultrasound Imaging Through Extended Linear-Convex Ultrasound Augmentation Learning

COVID-Net UV: An End-to-End Spatio-Temporal Deep Neural Network Architecture for Automated Diagnosis of COVID-19 Infection from Ultrasound Videos

FenceNet: Fine-grained Footwork Recognition in Fencing

ICDBigBird: A Contextual Embedding Model for ICD Code Classification

Improving Classification Model Performance on Chest X-Rays through Lung Segmentation

LexSubCon: Integrating Knowledge from Lexical Resources into Contextual Embeddings for Lexical Substitution

LightDefectNet: A Highly Compact Deep Anti-Aliased Attention Condenser Neural Network Architecture for Light Guide Plate Surface Defect Detection

MAPLE-Edge: A Runtime Latency Predictor for Edge Devices

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge

MAPLE: Microprocessor A Priori for Latency Estimation

MetaGraspNet_v0: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis

MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware Ambidextrous Bin Picking via Physics-based Metaverse Synthesis

Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation

Towards Generating Large Synthetic Phytoplankton Datasets for Efficient Monitoring of Harmful Algal Blooms

Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for COVID-19 Screening With Chest Radiography

COVID-Net CT-2: Enhanced Deep Neural Networks for Detection of COVID-19 from Chest CT Images Through Bigger, More Diverse Learning

When Segmentation is Not Enough: Rectifying Visual-Volume Discordance Through Multisensor Depth-Refined Semantic Segmentation for Food Intake Tracking in Long-Term Care

COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images

COVIDNet-CT: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest CT Images

Deep Neural Network Perception Models and Robust Autonomous Driving Systems

DepthNet Nano: A Highly Compact Self-Normalizing Neural Network for Monocular Depth Estimation

EmotionNet Nano: An Efficient Deep Convolutional Neural Network Design for Real-time Facial Expression Recognition

Equivalence Relations for Computing Permutation Polynomials

Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification

Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness

Quantization in Relative Gradient Angle Domain For Building Polygon Estimation

Squeeze-and-Attention Networks for Semantic Segmentation

TimeConvNets: A Deep Time Windowed Convolution Neural Network Design for Real-time Video Facial Expression Recognition

Understanding the temporal evolution of COVID-19 research through machine learning and natural language processing

Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis

Vulnerability Under Adversarial Machine Learning: Bias or Variance?

A spectral-spatial fusion model for robust blood pulse waveform extraction in photoplethysmographic imaging

Deep Quality: A Deep No-reference Quality Assessment System

Evolutionary Synthesis of Deep Neural Networks via Synaptic Cluster-driven Genetic Encoding

Laser light-field fusion for wide-field lensfree on-chip phase contrast nanoscopy

Noise-Compensated, Bias-Corrected Diffusion Weighted Endorectal Magnetic Resonance Imaging via a Stochastically Fully-Connected Joint Conditional Random Field Model

Non-contact hemodynamic imaging reveals the jugular venous pulse waveform

Resolution- and throughput-enhanced spectroscopy using high-throughput computational slit

Scene Invariant Crowd Segmentation and Counting Using Scale-Normalized Histogram of Moving Gradients (HoMG)

Spatial probabilistic pulsatility model for enhancing photoplethysmographic imaging systems

A Bayesian Residual Transform for Signal Processing

A Deep-structured Conditional Random Field Model for Object Silhouette Tracking

A deep-structured fully-connected random field model for structured inference

A Weakly Supervised Learning Approach based on Spectral Graph-Theoretic Grouping

Bayesian-based aberration correction and numerical diffraction for improved lensfree on-chip microscopy of biological specimens

Bayesian-based deconvolution fluorescence microscopy using dynamically updated nonparametric nonstationary expectation estimates

Depth Compensated Spectral Domain Optical Coherence Tomography via Digital Compensation

Discovery Radiomics for Multi-Parametric MRI Prostate Cancer Detection

Discovery Radiomics via StochasticNet Sequencers for Cancer Detection

Domain Adaptation and Transfer Learning in StochasticNets

Efficient Deep Feature Learning and Extraction via StochasticNets

Forming A Random Field via Stochastic Cliques: From Random Graphs to Fully Connected Random Fields

Lensfree Spectral Light-field Fusion Microscopy for Contrast- and Resolution-enhanced Imaging of Biological Specimens

Monte Carlo-based Noise Compensation in Coil Intensity Corrected Endorectal MRI

Non-contact transmittance photoplethysmographic imaging (PPGI) for long-distance cardiovascular monitoring

Numerical Demultiplexing of Color Image Sensor Measurements via Non-linear Random Forest Modeling

Sparse Reconstruction of Compressive Sensing MRI using Cross-Domain Stochastically Fully Connected Conditional Random Fields

StochasticNet: Forming Deep Neural Networks via Stochastic Connectivity