Topic overview

eess.IV

4094 works15787 researchers0 institutions

Topic snapshot

What this area looks like now

4094works
15787authors
0experts visible
0communities

Next steps

Move from topic reading into action

The graph preview below keeps the nearby papers, people and communities visible in the same reading flow.

Topic graph

See the topic as a live network

Open full explorer

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Papers in this area

24 featured work(s)

preprint2019arXiv

Programmable Spectrometry -- Per-pixel Classification of Materials using Learned Spectral Filters

Many materials have distinct spectral profiles. This facilitates estimation of the material composition of a scene at each pixel by first acquiring its hyperspectral image, and subsequently filtering it using a bank of spectral profiles. This process is inherently wasteful since only a set of linear projections of the acquired measurements contribute to the classification task. We propose a novel programmable camera that is capable of producing images of a scene with an arbitrary spectral filter. We use this camera to optically implement the spectral filtering of the scene's hyperspectral image with the bank of spectral profiles needed to perform per-pixel material classification. This provides gains both in terms of acquisition speed --- since only the relevant measurements are acquired --- and in signal-to-noise ratio --- since we invariably avoid narrowband filters that are light inefficient. Given training data, we use a range of classical and modern techniques including SVMs and neural networks to identify the bank of spectral profiles that facilitate material classification. We verify the method in simulations on standard datasets as well as real data using a lab prototype of the camera.

preprint2018arXiv

Unsupervised brain lesion segmentation from MRI using a convolutional autoencoder

Lesions that appear hyperintense in both Fluid Attenuated Inversion Recovery (FLAIR) and T2-weighted magnetic resonance images (MRIs) of the human brain are common in the brains of the elderly population and may be caused by ischemia or demyelination. Lesions are biomarkers for various neurodegenerative diseases, making accurate quantification of them important for both disease diagnosis and progression. Automatic lesion detection using supervised learning requires manually annotated images, which can often be impractical to acquire. Unsupervised lesion detection, on the other hand, does not require any manual delineation; however, these methods can be challenging to construct due to the variability in lesion load, placement of lesions, and voxel intensities. Here we present a novel approach to address this problem using a convolutional autoencoder, which learns to segment brain lesions as well as the white matter, gray matter, and cerebrospinal fluid by reconstructing FLAIR images as conical combinations of softmax layer outputs generated from the corresponding T1, T2, and FLAIR images. Some of the advantages of this model are that it accurately learns to segment lesions regardless of lesion load, and it can be used to quickly and robustly segment new images that were not in the training set. Comparisons with state-of-the-art segmentation methods evaluated on ground truth manual labels indicate that the proposed method works well for generating accurate lesion segmentations without the need for manual annotations.

preprint2019arXiv

Determining Image Sensor Temperature Using Dark Current

The state of the art method for fingerprinting digital cameras focuses on the non-uniform output of an array of photodiodes due to the distinct construction of the PN junction when excited by photons. This photo-response non-uniformity (PRNU) noise has shown to be effective but ignores knowledge of image sensor output under equilibrium states without excitation (dark current). The dark current response (DSN) traditionally has been deemed unsuitable as a source of fingerprinting as it is unstable across multiple variables including exposure time and temperature. As such it is currently ignored even though studies have shown it to be a viable method similar to that of PRNU. We hypothesise that DSN is not only a viable method for forensic identification but, through proper analysis of the thermal component, can lead to insights regarding the specific temperature at which an individual image under test was taken. We also show that digital filtering based on the discrete cosine transformation, rather than the state-of-the-art wavelet filtering, there is significant computational gain albeit with some performance degradation. This approach is beneficial for triage purposes.

preprint2019arXiv

Reverse Engineering the Raspberry Pi Camera V2: A study of Pixel Non-Uniformity using a Scanning Electron Microscope

In this paper we reverse engineer the Sony IMX219PQ image sensor, otherwise known as the Raspberry Pi Camera v2.0. We provide a visual reference for pixel non-uniformity by analysing variations in transistor length, microlens optic system and in the photodiode. We use these measurements to demonstrate irregularities at the microscopic level and link this to the signal variation measured as pixel non-uniformity used for unique identification of discrete image sensors.

preprint2019arXiv

Deep Learning-Based Quantification of Pulmonary Hemosiderophages in Cytology Slides

Purpose: Exercise-induced pulmonary hemorrhage (EIPH) is a common syndrome in sport horses with negative impact on performance. Cytology of bronchoalveolar lavage fluid by use of a scoring system is considered the most sensitive diagnostic method. Macrophages are classified depending on the degree of cytoplasmic hemosiderin content. The current gold standard is manual grading, which is however monotonous and time-consuming. Methods: We evaluated state-of-the-art deep learning-based methods for single cell macrophage classification and compared them against the performance of nine cytology experts and evaluated inter- and intra-observer variability. Additionally, we evaluated object detection methods on a novel data set of 17 completely annotated cytology whole slide images (WSI) containing 78,047 hemosiderophages. Resultsf: Our deep learning-based approach reached a concordance of 0.85, partially exceeding human expert concordance (0.68 to 0.86, $μ$=0.73, $σ$ =0.04). Intra-observer variability was high (0.68 to 0.88) and inter-observer concordance was moderate (Fleiss kappa = 0.67). Our object detection approach has a mean average precision of 0.66 over the five classes from the whole slide gigapixel image and a computation time of below two minutes. Conclusion: To mitigate the high inter- and intra-rater variability, we propose our automated object detection pipeline, enabling accurate, reproducible and quick EIPH scoring in WSI.

preprint2019arXiv

Approaching Quantum Limited Super-Resolution Imaging without Prior Knowledge of the Object Location

A recently identified class of receivers which demultiplex an optical field into a set of orthogonal spatial modes prior to detection can surpass canonical diffraction limits on spatial resolution for simple incoherent imaging tasks. However, these mode-sorting receivers tend to exhibit high sensitivity to contextual nuisance parameters (e.g., the centroid of a clustered or extended object), raising questions on their viability in realistic imaging scenarios where little or no prior information about the scene is available. We propose a multi-stage passive imaging strategy which segments the total recording time between different physical measurements to build up the required prior information for near quantum-optimal imaging performance at sub-Rayleigh length scales. We show via Monte Carlo simulations that an adaptive two-stage scheme which dynamically allocates the total recording time between a traditional direct detection measurement and a binary mode-sorting receiver outperforms idealized direct detection alone for simple estimation tasks when no prior knowledge of the object centroid is available, achieving one to two orders of magnitude improvement in mean squared error. Our scheme can be generalized for more sophisticated imaging tasks with multiple parameters and minimal prior information.

preprint2019arXiv

Are Quantitative Features of Lung Nodules Reproducible at Different CT Acquisition and Reconstruction Parameters?

Consistency and duplicability in Computed Tomography (CT) output is essential to quantitative imaging for lung cancer detection and monitoring. This study of CT-detected lung nodules investigated the reproducibility of volume-, density-, and texture-based features (outcome variables) over routine ranges of radiation-dose, reconstruction kernel, and slice thickness. CT raw data of 23 nodules were reconstructed using 320 acquisition/reconstruction conditions (combinations of 4 doses, 10 kernels, and 8 thicknesses). Scans at 12.5%, 25%, and 50% of protocol dose were simulated; reduced-dose and full-dose data were reconstructed using conventional filtered back-projection and iterative-reconstruction kernels at a range of thicknesses (0.6-5.0 mm). Full-dose/B50f kernel reconstructions underwent expert segmentation for reference Region-Of-Interest (ROI) and nodule volume per thickness; each ROI was applied to 40 corresponding images (combinations of 4 doses and 10 kernels). Typical texture analysis metrics (including 5 histogram features, 13 Gray Level Co-occurrence Matrix, 5 Run Length Matrix, 2 Neighboring Gray-Level Dependence Matrix, and 2 Neighborhood Gray-Tone Difference Matrix) were computed per ROI. Reconstruction conditions resulting in no significant change in volume, density, or texture metrics were identified as "compatible pairs" for a given outcome variable. Our results indicate that as thickness increases, volumetric reproducibility decreases, while reproducibility of histogram- and texture-based features across different acquisition and reconstruction parameters improves. In order to achieve concomitant reproducibility of volumetric and radiomic results across studies, balanced standardization of the imaging acquisition parameters is required.

preprint2019arXiv

Lung segmentation on chest x-ray images in patients with severe abnormal findings using deep learning

Rationale and objectives: Several studies have evaluated the usefulness of deep learning for lung segmentation using chest x-ray (CXR) images with small- or medium-sized abnormal findings. Here, we built a database including both CXR images with severe abnormalities and experts' lung segmentation results, and aimed to evaluate our network's efficacy in lung segmentation from these images. Materials and Methods: For lung segmentation, CXR images from the Japanese Society of Radiological Technology (JSRT, N = 247) and Montgomery databases (N = 138), were included, and 65 additional images depicting severe abnormalities from a public database were evaluated and annotated by a radiologist, thereby adding lung segmentation results to these images. Baseline U-net was used to segment the lungs in images from the three databases. Subsequently, the U-net network architecture was automatically optimized for lung segmentation from CXR images using Bayesian optimization. Dice similarity coefficient (DSC) was calculated to confirm segmentation. Results: Our results demonstrated that using baseline U-net yielded poorer lung segmentation results in our database than those in the JSRT and Montgomery databases, implying that robust segmentation of lungs may be difficult because of severe abnormalities. The DSC values with baseline U-net for the JSRT, Montgomery and our databases were 0.979, 0.941, and 0.889, respectively, and with optimized U-net, 0.976, 0.973, and 0.932, respectively. Conclusion: For robust lung segmentation, the U-net architecture was optimized via Bayesian optimization, and our results demonstrate that the optimized U-net was more robust than baseline U-net in lung segmentation from CXR images with large-sized abnormalities.

preprint2019arXiv

Full-count PET Recovery from Low-count Image Using a Dilated Convolutional Neural Network

Positron Emission Tomography (PET) is an essential technique in many clinical applications that allows for quantitative imaging at the molecular level. This study aims to develop a denoising method using novel dilated convolutional neural network to recover full-count images from low-count images. We adopted similar hierarchal structure from the conventional uNet and incorporated dilated kernels in each convolution to allow the network to observe larger, and perhaps, more robust, features within the image. Our dNet were trained alongside a uNet for comparison. Our 2.5D model used a training set (N=30) and testing set (N=5) that were obtained from an ongoing 18F-FDG study. Low-count PET data (10% count) were generated through Poisson thinning from the full listmode file. Both low-count PET and full-count PET were reconstructed with the OSEM algorithm. Objective imaging metrics including mean absolute percent error (MAPE), peak signal-to-noise ratio (PSNR) and structural similarity index metric (SSIM) were used to analyze the denoising methods. Both the uNet and our proposed dNet were successfully trained to synthesize full-count PET images from low-count PET images. Compared to low-count PET, both the uNet and dNet methods significantly improved MAPE, PSNR and SSIM. Our dNet also systematically outperformed uNet on all three metrics and across all testing subjects. This study proposed a novel approach of using dilated convolutions for recovering full-count PET images from low-count PET images. Our dNet significantly outperformed the well-established uNet and demonstrates great potential for denoising low-count PET images.

preprint2019arXiv

Deep Model with Siamese Network for Viability and Necrosis Tumor Assessment in Osteosarcoma

Osteosarcoma is the most common primary malignant bone tumor, which has high mortality due to easy lung metastasis. Osteosarcoma is a highly anaplastic, pleomorphic tumor with a variety of tumor cell morphology, including fusiform, oval, epithelial, lymphocyte like, small round, transparent cells, etc. Due to the multiple patterns of osteosarcoma cell morphology, pathologists have differences in the classification (viable tumor, necrotic tumor, non-tumor) of osteosarcoma. Therefore, automatic and accurate recognition algorithms can help pathologists greatly reduce time and improve diagnostic accuracy. In recent years, deep learning technology has made great progress in the field of natural images and medical images, and has achieved excellent results beyond human performance in classification. In this paper, we propose a Deep Model with Siamese Network (DS-Net) for automatic classification in Hematoxylin and Eosin (H&E) stained osteosarcoma histology images.

preprint2019arXiv

CLCI-Net: Cross-Level fusion and Context Inference Networks for Lesion Segmentation of Chronic Stroke

Segmenting stroke lesions from T1-weighted MR images is of great value for large-scale stroke rehabilitation neuroimaging analyses. Nevertheless, there are great challenges with this task, such as large range of stroke lesion scales and the tissue intensity similarity. The famous encoder-decoder convolutional neural network, which although has made great achievements in medical image segmentation areas, may fail to address these challenges due to the insufficient uses of multi-scale features and context information. To address these challenges, this paper proposes a Cross-Level fusion and Context Inference Network (CLCI-Net) for the chronic stroke lesion segmentation from T1-weighted MR images. Specifically, a Cross-Level feature Fusion (CLF) strategy was developed to make full use of different scale features across different levels; Extending Atrous Spatial Pyramid Pooling (ASPP) with CLF, we have enriched multi-scale features to handle the different lesion sizes; In addition, convolutional long short-term memory (ConvLSTM) is employed to infer context information and thus capture fine structures to address the intensity similarity issue. The proposed approach was evaluated on an open-source dataset, the Anatomical Tracings of Lesions After Stroke (ATLAS) with the results showing that our network outperforms five state-of-the-art methods. We make our code and models available at https://github.com/YH0517/CLCI_Net.

preprint2019arXiv

D-UNet: a dimension-fusion U shape network for chronic stroke lesion segmentation

Assessing the location and extent of lesions caused by chronic stroke is critical for medical diagnosis, surgical planning, and prognosis. In recent years, with the rapid development of 2D and 3D convolutional neural networks (CNN), the encoder-decoder structure has shown great potential in the field of medical image segmentation. However, the 2D CNN ignores the 3D information of medical images, while the 3D CNN suffers from high computational resource demands. This paper proposes a new architecture called dimension-fusion-UNet (D-UNet), which combines 2D and 3D convolution innovatively in the encoding stage. The proposed architecture achieves a better segmentation performance than 2D networks, while requiring significantly less computation time in comparison to 3D networks. Furthermore, to alleviate the data imbalance issue between positive and negative samples for the network training, we propose a new loss function called Enhance Mixing Loss (EML). This function adds a weighted focal coefficient and combines two traditional loss functions. The proposed method has been tested on the ATLAS dataset and compared to three state-of-the-art methods. The results demonstrate that the proposed method achieves the best quality performance in terms of DSC = 0.5349+0.2763 and precision = 0.6331+0.295).

preprint2019arXiv

KRISM --- Krylov Subspace-based Optical Computing of Hyperspectral Images

We present an adaptive imaging technique that optically computes a low-rank approximation of a scene's hyperspectral image, conceptualized as a matrix. Central to the proposed technique is the optical implementation of two measurement operators: a spectrally-coded imager and a spatially-coded spectrometer. By iterating between the two operators, we show that the top singular vectors and singular values of a hyperspectral image can be adaptively and optically computed with only a few iterations. We present an optical design that uses pupil plane coding for implementing the two operations and show several compelling results using a lab prototype to demonstrate the effectiveness of the proposed hyperspectral imager.

preprint2019arXiv

Spatially Continuous and High-resolution Land Surface Temperature: A Review of Reconstruction and Spatiotemporal Fusion Techniques

Remotely sensed, spatially continuous and high spatiotemporal resolution (hereafter referred to as high resolution) land surface temperature (LST) is a key parameter for studying the thermal environment and has important applications in many fields. However, difficult atmospheric conditions, sensor malfunctioning and scanning gaps between orbits frequently introduce spatial discontinuities into satellite-retri1eved LST products. For a single sensor, there is also a trade-off between temporal and spatial resolution and, therefore, it is impossible to obtain high temporal and spatial resolution simultaneously. In recent years the reconstruction and spatiotemporal fusion of LST products have become active research topics that aim at overcoming this limitation. They are two of most investigated approaches in thermal remote sensing and attract increasing attention, which has resulted in a number of different algorithms. However, to the best of our knowledge, currently no review exists that expatiates and summarizes the available LST reconstruction and spatiotemporal fusion methods and algorithms. This paper introduces the principles and theories behind LST reconstruction and spatiotemporal fusion and provides an overview of the published research and algorithms. We summarized three kinds of reconstruction methods for missing pixels (spatial, temporal and spatiotemporal methods), two kinds of reconstruction methods for cloudy pixels (Satellite Passive Microwave (PMW)-based and Surface Energy Balance (SEB)-based methods) and three kinds of spatiotemporal fusion methods (weighted function-based, unmixing-based and hybrid methods). The review concludes by summarizing validation methods and by identifying some promising future research directions for generating spatially continuous and high resolution LST products.

preprint2019arXiv

On Space-spectrum Uncertainty Analysis for Coded Aperture Systems

We introduce and analyze the concept of space-spectrum uncertainty for certain commonly-used designs for spectrally programmable cameras. Our key finding states that, it is impossible to simultaneously capture high-resolution spatial images while programming the spectrum at high resolution. This phenomenon arises due to a Fourier relationship between the aperture used for obtaining spectrum and its corresponding diffraction blur in the (spatial) image. We show that the product of spatial and spectral standard deviations is lower bounded by λ/4π{ν_0} femto square-meters, where {ν_0} is the density of groves in the diffraction grating and λ is the wavelength of light. Experiments with a lab prototype for simultaneously measuring spectrum and image validate our findings and its implication for spectral filtering.

preprint2019arXiv

Inverse Halftoning Through Structure-Aware Deep Convolutional Neural Networks

The primary issue in inverse halftoning is removing noisy dots on flat areas and restoring image structures (e.g., lines, patterns) on textured areas. Hence, a new structure-aware deep convolutional neural network that incorporates two subnetworks is proposed in this paper. One subnetwork is for image structure prediction while the other is for continuous-tone image reconstruction. First, to predict image structures, patch pairs comprising continuous-tone patches and the corresponding halftoned patches generated through digital halftoning are trained. Subsequently, gradient patches are generated by convolving gradient filters with the continuous-tone patches. The subnetwork for the image structure prediction is trained using the mini-batch gradient descent algorithm given the halftoned patches and gradient patches, which are fed into the input and loss layers of the subnetwork, respectively. Next, the predicted map including the image structures is stacked on the top of the input halftoned image through a fusion layer and fed into the image reconstruction subnetwork such that the entire network is trained adaptively to the image structures. The experimental results confirm that the proposed structure-aware network can remove noisy dot-patterns well on flat areas and restore details clearly on textured areas. Furthermore, it is demonstrated that the proposed method surpasses the conventional state-of-the-art methods based on deep convolutional neural networks and locally learned dictionaries.

preprint2020arXiv

A Physics-Guided Modular Deep-Learning Based Automated Framework for Tumor Segmentation in PET Images

The objective of this study was to develop a PET tumor-segmentation framework that addresses the challenges of limited spatial resolution, high image noise, and lack of clinical training data with ground-truth tumor boundaries in PET imaging. We propose a three-module PET-segmentation framework in the context of segmenting primary tumors in 3D FDG-PET images of patients with lung cancer on a per-slice basis. The first module generates PET images containing highly realistic tumors with known ground-truth using a new stochastic and physics-based approach, addressing lack of training data. The second module trains a modified U-net using these images, helping it learn the tumor-segmentation task. The third module fine-tunes this network using a small-sized clinical dataset with radiologist-defined delineations as surrogate ground-truth, helping the framework learn features potentially missed in simulated tumors. The framework's accuracy, generalizability to different scanners, sensitivity to partial volume effects (PVEs) and efficacy in reducing the number of training images were quantitatively evaluated using Dice similarity coefficient (DSC) and several other metrics. The framework yielded reliable performance in both simulated (DSC: 0.87 (95% CI: 0.86, 0.88)) and patient images (DSC: 0.73 (95% CI: 0.71, 0.76)), outperformed several widely used semi-automated approaches, accurately segmented relatively small tumors (smallest segmented cross-section was 1.83 cm2), generalized across five PET scanners (DSC: 0.74), was relatively unaffected by PVEs, and required low training data (training with data from even 30 patients yielded DSC of 0.70). In conclusion, the proposed framework demonstrated the ability for reliable automated tumor delineation in FDG-PET images of patients with lung cancer.

preprint2019arXiv

Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds

Constructing high-quality generative models for 3D shapes is a fundamental task in computer vision with diverse applications in geometry processing, engineering, and design. Despite the recent progress in deep generative modelling, synthesis of finely detailed 3D surfaces, such as high-resolution point clouds, from scratch has not been achieved with existing approaches. In this work, we propose to employ the latent-space Laplacian pyramid representation within a hierarchical generative model for 3D point clouds. We combine the recently proposed latent-space GAN and Laplacian GAN architectures to form a multi-scale model capable of generating 3D point clouds at increasing levels of detail. Our evaluation demonstrates that our model outperforms the existing generative models for 3D point clouds.

preprint2020arXiv

High signal-to-noise ratio reconstruction of low bit-depth optical coherence tomography using deep learning

Reducing the bit-depth is an effective approach to lower the cost of optical coherence tomography (OCT) systems and increase the transmission efficiency in data acquisition and telemedicine. However, a low bit-depth will lead to the degeneration of the detection sensitivity thus reduce the signal-to-noise ratio (SNR) of OCT images. In this paper, we propose to use deep learning for the reconstruction of the high SNR OCT images from the low bit-depth acquisition. Its feasibility was preliminarily evaluated by applying the proposed method to the quantized $3\sim8$-bit data from native 12-bit interference fringes. We employed a pixel-to-pixel generative adversarial network architecture in the low to high bit-depth OCT image transition. Retinal OCT data of a healthy subject from a homemade spectral-domain OCT system was used in the study. Extensively qualitative and quantitative results show this deep-learning-based approach could significantly improve the SNR of the low bit-depth OCT images especially at the choroidal region. Superior similarity and SNR between the reconstructed images and the original 12-bit OCT images could be derived when the bit-depth $\geq 5$. This work demonstrates the proper integration of OCT and deep learning could benefit the development of healthcare in low-resource settings.

preprint2020arXiv

DeepFocus: a Few-Shot Microscope Slide Auto-Focus using a Sample Invariant CNN-based Sharpness Function

Autofocus (AF) methods are extensively used in biomicroscopy, for example to acquire timelapses, where the imaged objects tend to drift out of focus. AD algorithms determine an optimal distance by which to move the sample back into the focal plane. Current hardware-based methods require modifying the microscope and image-based algorithms either rely on many images to converge to the sharpest position or need training data and models specific to each instrument and imaging configuration. Here we propose DeepFocus, an AF method we implemented as a Micro-Manager plugin, and characterize its Convolutional neural network-based sharpness function, which we observed to be depth co-variant and sample-invariant. Sample invariance allows our AF algorithm to converge to an optimal axial position within as few as three iterations using a model trained once for use with a wide range of optical microscopes and a single instrument-dependent calibration stack acquisition of a flat (but arbitrary) textured object. From experiments carried out both on synthetic and experimental data, we observed an average precision, given 3 measured images, of 0.30 +- 0.16 micrometers with a 10x, NA 0.3 objective. We foresee that this performance and low image number will help limit photodamage during acquisitions with light-sensitive samples.

preprint2020arXiv

Boosting rare benthic macroinvertebrates taxa identification with one-class classification

Insect monitoring is crucial for understanding the consequences of rapid ecological changes, but taxa identification currently requires tedious manual expert work and cannot be scaled-up efficiently. Deep convolutional neural networks (CNNs), provide a viable way to significantly increase the biomonitoring volumes. However, taxa abundances are typically very imbalanced and the amounts of training images for the rarest classes are simply too low for deep CNNs. As a result, the samples from the rare classes are often completely missed, while detecting them has biological importance. In this paper, we propose combining the trained deep CNN with one-class classifiers to improve the rare species identification. One-class classification models are traditionally trained with much fewer samples and they can provide a mechanism to indicate samples potentially belonging to the rare classes for human inspection. Our experiments confirm that the proposed approach may indeed support moving towards partial automation of the taxa identification task.

preprint2019arXiv

Flexible Architecture for Real-time Processing of Multiple Video Signals

Simultaneous processing of multiple video sources requires each pixel in a frame from a video source to be processed synchronously with the pixels at the same spatial positions in corresponding frames from the other video sources. However, simultaneous processing is challenging as corresponding frames from different video signals provided by multiple sources have time-varying delay because of the electrical and mechanical restrictions inside the video sources hardware that cause deviation in the corresponding frame rates. Researchers overcome the aforementioned challenges either by utilizing ready-made video processing systems or designing and implementing a custom system tailored to their specific application. These video processing systems lack flexibility in handling different applications requirements such as the required number of video sources and outputs, video standards, or frame rates of the input/output videos. In this paper, we present a design for a flexible simultaneous video processing architecture that is suitable for various applications. The proposed architecture is upgradeable to deal with multiple video standards, scalable to process/produce a variable number of input/output videos, and compatible with most video processors. Moreover, we present in details the analog/digital mixed-signals and power distribution considerations used in designing the proposed architecture. As a case study application of the proposed flexible architecture, we utilized the architecture for a realization of a simultaneous video processing system that performs video fusion from visible and near-infrared video sources in real time. We make available the source files of the hardware design along with the bill of material (BOM) of the case study to be a reference for researchers who intend to design and implement simultaneous multi-video processing systems.

preprint2020arXiv

Express Wavenet -- a low parameter optical neural network with random shift wavelet pattern

Express Wavenet is an improved optical diffractive neural network. At each layer, it uses wavelet-like pattern to modulate the phase of optical waves. For input image with n2 pixels, express wavenet reduce parameter number from O(n2) to O(n). Only need one percent of the parameters, and the accuracy is still very high. In the MNIST dataset, it only needs 1229 parameters to get accuracy of 92%, while the standard optical network needs 125440 parameters. The random shift wavelets show the characteristics of optical network more vividly. Especially the vanishing gradient phenomenon in the training process. We present a modified expressway structure for this problem. Experiments verified the effect of random shift wavelet and expressway structure. Our work shows optical diffractive network would use much fewer parameters than other neural networks. The source codes are available at https://github.com/closest-git/ONNet.

preprint2020arXiv

Comparing Different Deep Learning Architectures for Classification of Chest Radiographs

Chest radiographs are among the most frequently acquired images in radiology and are often the subject of computer vision research. However, most of the models used to classify chest radiographs are derived from openly available deep neural networks, trained on large image-datasets. These datasets routinely differ from chest radiographs in that they are mostly color images and contain several possible image classes, while radiographs are greyscale images and often only contain fewer image classes. Therefore, very deep neural networks, which can represent more complex relationships in image-features, might not be required for the comparatively simpler task of classifying grayscale chest radiographs. We compared fifteen different architectures of artificial neural networks regarding training-time and performance on the openly available CheXpert dataset to identify the most suitable models for deep learning tasks on chest radiographs. We could show, that smaller networks such as ResNet-34, AlexNet or VGG-16 have the potential to classify chest radiographs as precisely as deeper neural networks such as DenseNet-201 or ResNet-151, while being less computationally demanding.

People in this topic

12 visible researcher(s)