Topic overview

Computer Vision

30606 works73720 researchers

Computation and Language Graphics

Open map Browse papers

Map preview

Start with the graph, then narrow the list

30606works

73720researchers

Next steps

Use the topic as a working map

Open the full map for clusters, then return here to scan ranked papers and people.

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Live Video Synopsis for Multiple Cameras

Video surveillance cameras generate most of recorded video, and there is far more recorded video than operators can watch. Much progress has recently been made using summarization of recorded video, but such techniques do not have much impact on live video surveillance. We assume a camera hierarchy where a Master camera observes the decision-critical region, and one or more Slave cameras observe regions where past activity is important for making the current decision. We propose that when people appear in the live Master camera, the Slave cameras will display their past activities, and the operator could use past information for real-time decision making. The basic units of our method are action tubes, representing objects and their trajectories over time. Our object-based method has advantages over frame based methods, as it can handle multiple people, multiple activities for each person, and can address re-identification uncertainty.

preprint2015arXiv

Event Retrieval Using Motion Barcodes

We introduce a simple and effective method for retrieval of videos showing a specific event, even when the videos of that event were captured from significantly different viewpoints. Appearance-based methods fail in such cases, as appearances change with large changes of viewpoints. Our method is based on a pixel-based feature, "motion barcode", which records the existence/non-existence of motion as a function of time. While appearance, motion magnitude, and motion direction can vary greatly between disparate viewpoints, the existence of motion is viewpoint invariant. Based on the motion barcode, a similarity measure is developed for videos of the same event taken from very different viewpoints. This measure is robust to occlusions common under different viewpoints, and can be computed efficiently. Event retrieval is demonstrated using challenging videos from stationary and hand held cameras.

preprint2015arXiv

An Egocentric Look at Video Photographer Identity

Egocentric cameras are being worn by an increasing number of users, among them many security forces worldwide. GoPro cameras already penetrated the mass market, reporting substantial increase in sales every year. As head-worn cameras do not capture the photographer, it may seem that the anonymity of the photographer is preserved even when the video is publicly distributed. We show that camera motion, as can be computed from the egocentric video, provides unique identity information. The photographer can be reliably recognized from a few seconds of video captured when walking. The proposed method achieves more than 90% recognition accuracy in cases where the random success rate is only 3%. Applications can include theft prevention by locking the camera when not worn by its rightful owner. Searching video sharing services (e.g. YouTube) for egocentric videos shot by a specific photographer may also become possible. An important message in this paper is that photographers should be aware that sharing egocentric video will compromise their anonymity, even when their face is not visible.

preprint2016arXiv

p-DLA: A Predictive System Model for Onshore Oil and Gas Pipeline Dataset Classification and Monitoring - Part 1

With the rise in militant activity and rogue behaviour in oil and gas regions around the world, oil pipeline disturbances is on the increase leading to huge losses to multinational operators and the countries where such facilities exist. However, this situation can be averted if adequate predictive monitoring schemes are put in place. We propose in the first part of this paper, an artificial intelligence predictive monitoring system capable of predictive classification and pattern recognition of pipeline datasets. The predictive system is based on a highly sparse predictive Deviant Learning Algorithm (p-DLA) designed to synthesize a sequence of memory predictive clusters for eventual monitoring, control and decision making. The DLA (p-DLA) is compared with a popular machine learning algorithm, the Long Short-Term Memory (LSTM) which is based on a temporal version of the standard feed-forward back-propagation trained artificial neural networks (ANNs). The results of simulations study show impressive results and validates the sparse memory predictive approach which favours the sub-synthesis of a highly compressed and low dimensional knowledge discovery and information prediction sche

preprint2016arXiv

Automatic labeling of molecular biomarkers of whole slide immunohistochemistry images using fully convolutional networks

This paper addresses the problem of quantifying biomarkers in multi-stained tissues, based on color and spatial information. A deep learning based method that can automatically localize and quantify the cells expressing biomarker(s) in a whole slide image is proposed. The deep learning network is a fully convolutional network (FCN) whose input is the true RGB color image of a tissue and output is a map of the different biomarkers. The FCN relies on a convolutional neural network (CNN) that classifies each cell separately according to the biomarker it expresses. In this study, images of immunohistochemistry (IHC) stained slides were collected and used. More than 4,500 RGB images of cells were manually labeled based on the expressing biomarkers. The labeled cell images were used to train the CNN (obtaining an accuracy of 92% in a test set). The trained CNN is then extended to an FCN that generates a map of all biomarkers in the whole slide image acquired by the scanner (instead of classifying every cell image). To evaluate our method, we manually labeled all nuclei expressing different biomarkers in two whole slide images and used theses as the ground truth. Our proposed method for i

preprint2016arXiv

Video Ladder Networks

We present the Video Ladder Network (VLN) for efficiently generating future video frames. VLN is a neural encoder-decoder model augmented at all layers by both recurrent and feedforward lateral connections. At each layer, these connections form a lateral recurrent residual block, where the feedforward connection represents a skip connection and the recurrent connection represents the residual. Thanks to the recurrent connections, the decoder can exploit temporal summaries generated from all layers of the encoder. This way, the top layer is relieved from the pressure of modeling lower-level spatial and temporal details. Furthermore, we extend the basic version of VLN to incorporate ResNet-style residual blocks in the encoder and decoder, which help improving the prediction results. VLN is trained in self-supervised regime on the Moving MNIST dataset, achieving competitive results while having very simple structure and providing fast inference.

preprint2017arXiv

A robust approach for tree segmentation in deciduous forests using small-footprint airborne LiDAR data

This paper presents a non-parametric approach for segmenting trees from airborne LiDAR data in deciduous forests. Based on the LiDAR point cloud, the approach collects crown information such as steepness and height on-the-fly to delineate crown boundaries, and most importantly, does not require a priori assumptions of crown shape and size. The approach segments trees iteratively starting from the tallest within a given area to the smallest until all trees have been segmented. To evaluate its performance, the approach was applied to the University of Kentucky Robinson Forest, a deciduous closed-canopy forest with complex terrain and vegetation conditions. The approach identified 94% of dominant and co-dominant trees with a false detection rate of 13%. About 62% of intermediate, overtopped, and dead trees were also detected with a false detection rate of 15%. The overall segmentation accuracy was 77%. Correlations of the segmentation scores of the proposed approach with local terrain and stand metrics was not significant, which is likely an indication of the robustness of the approach as results are not sensitive to the differences in terrain and stand structures.

preprint2017arXiv

Image denoising using group sparsity residual and external nonlocal self-similarity prior

Nonlocal image representation has been successfully used in many image-related inverse problems including denoising, deblurring and deblocking. However, a majority of reconstruction methods only exploit the nonlocal self-similarity (NSS) prior of the degraded observation image, it is very challenging to reconstruct the latent clean image. In this paper we propose a novel model for image denoising via group sparsity residual and external NSS prior. To boost the performance of image denoising, the concept of group sparsity residual is proposed, and thus the problem of image denoising is transformed into one that reduces the group sparsity residual. Due to the fact that the groups contain a large amount of NSS information of natural images, we obtain a good estimation of the group sparse coefficients of the original image by the external NSS prior based on Gaussian Mixture model (GMM) learning and the group sparse coefficients of noisy image is used to approximate the estimation. Experimental results have demonstrated that the proposed method not only outperforms many state-of-the-art methods, but also delivers the best qualitative denoising results with finer details and less ringing

preprint2017arXiv

The Geodesic Distance between $\mathcal{G}_I^0$ Models and its Application to Region Discrimination

The $\mathcal{G}_I^0$ distribution is able to characterize different regions in monopolarized SAR imagery. It is indexed by three parameters: the number of looks (which can be estimated in the whole image), a scale parameter and a texture parameter. This paper presents a new proposal for feature extraction and region discrimination in SAR imagery, using the geodesic distance as a measure of dissimilarity between $\mathcal{G}_I^0$ models. We derive geodesic distances between models that describe several practical situations, assuming the number of looks known, for same and different texture and for same and different scale. We then apply this new tool to the problems of (i)~identifying edges between regions with different texture, and (ii)~quantify the dissimilarity between pairs of samples in actual SAR data. We analyze the advantages of using the geodesic distance when compared to stochastic distances.

preprint2017arXiv

Mixed one-bit compressive sensing with applications to overexposure correction for CT reconstruction

When a measurement falls outside the quantization or measurable range, it becomes saturated and cannot be used in classical reconstruction methods. For example, in C-arm angiography systems, which provide projection radiography, fluoroscopy, digital subtraction angiography, and are widely used for medical diagnoses and interventions, the limited dynamic range of C-arm flat detectors leads to overexposure in some projections during an acquisition, such as imaging relatively thin body parts (e.g., the knee). Aiming at overexposure correction for computed tomography (CT) reconstruction, we in this paper propose a mixed one-bit compressive sensing (M1bit-CS) to acquire information from both regular and saturated measurements. This method is inspired by the recent progress on one-bit compressive sensing, which deals with only sign observations. Its successful applications imply that information carried by saturated measurements is useful to improve recovery quality. For the proposed M1bit-CS model, alternating direction methods of multipliers is developed and an iterative saturation detection scheme is established. Then we evaluate M1bit-CS on one-dimensional signal recovery tasks. In s

preprint2017arXiv

Retrieving Similar X-Ray Images from Big Image Data Using Radon Barcodes with Single Projections

The idea of Radon barcodes (RBC) has been introduced recently. In this paper, we propose a content-based image retrieval approach for big datasets based on Radon barcodes. Our method (Single Projection Radon Barcode, or SP-RBC) uses only a few Radon single projections for each image as global features that can serve as a basis for weak learners. This is our most important contribution in this work, which improves the results of the RBC considerably. As a matter of fact, only one projection of an image, as short as a single SURF feature vector, can already achieve acceptable results. Nevertheless, using multiple projections in a long vector will not deliver anticipated improvements. To exploit the information inherent in each projection, our method uses the outcome of each projection separately and then applies more precise local search on the small subset of retrieved images. We have tested our method using IRMA 2009 dataset a with 14,400 x-ray images as part of imageCLEF initiative. Our approach leads to a substantial decrease in the error rate in comparison with other non-learning methods.

preprint2016arXiv

EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras (Extended Abstract)

Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual-reality headset. It combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a new automatically annotated and augmented dataset. Our inside-in method captures full-body motion in general indoor and outdoor scenes, and also crowded scenes.

preprint2017arXiv

Global Hypothesis Generation for 6D Object Pose Estimation

This paper addresses the task of estimating the 6D pose of a known 3D object from a single RGB-D image. Most modern approaches solve this task in three steps: i) Compute local features; ii) Generate a pool of pose-hypotheses; iii) Select and refine a pose from the pool. This work focuses on the second step. While all existing approaches generate the hypotheses pool via local reasoning, e.g. RANSAC or Hough-voting, we are the first to show that global reasoning is beneficial at this stage. In particular, we formulate a novel fully-connected Conditional Random Field (CRF) that outputs a very small number of pose-hypotheses. Despite the potential functions of the CRF being non-Gaussian, we give a new and efficient two-step optimization procedure, with some guarantees for optimality. We utilize our global hypotheses generation procedure to produce results that exceed state-of-the-art for the challenging "Occluded Object Dataset".

preprint2017arXiv

Labeling Topics with Images using Neural Networks

Topics generated by topic models are usually represented by lists of $t$ terms or alternatively using short phrases and images. The current state-of-the-art work on labeling topics using images selects images by re-ranking a small set of candidates for a given topic. In this paper, we present a more generic method that can estimate the degree of association between any arbitrary pair of an unseen topic and image using a deep neural network. Our method has better runtime performance $O(n)$ compared to $O(n^2)$ for the current state-of-the-art method, and is also significantly more accurate.

preprint2016arXiv

Shape Estimation from Defocus Cue for Microscopy Images via Belief Propagation

In recent years, the usefulness of 3D shape estimation is being realized in microscopic or close-range imaging, as the 3D information can further be used in various applications. Due to limited depth of field at such small distances, the defocus blur induced in images can provide information about the 3D shape of the object. The task of `shape from defocus' (SFD), involves the problem of estimating good quality 3D shape estimates from images with depth-dependent defocus blur. While the research area of SFD is quite well-established, the approaches have largely demonstrated results on objects with bulk/coarse shape variation. However, in many cases, objects studied under microscopes often involve fine/detailed structures, which have not been explicitly considered in most methods. In addition, given that, in recent years, large data volumes are typically associated with microscopy related applications, it is also important for such SFD methods to be efficient. In this work, we provide an indication of the usefulness of the Belief Propagation (BP) approach in addressing these concerns for SFD. BP has been known to be an efficient combinatorial optimization approach, and has been e

preprint2016arXiv

Memory Efficient Multi-Scale Line Detector Architecture for Retinal Blood Vessel Segmentation

This paper presents a memory efficient architecture that implements the Multi-Scale Line Detector (MSLD) algorithm for real-time retinal blood vessel detection in fundus images on a Zynq FPGA. This implementation benefits from the FPGA parallelism to drastically reduce the memory requirements of the MSLD from two images to a few values. The architecture is optimized in terms of resource utilization by reusing the computations and optimizing the bit-width. The throughput is increased by designing fully pipelined functional units. The architecture is capable of achieving a comparable accuracy to its software implementation but 70x faster for low resolution images. For high resolution images, it achieves an acceleration by a factor of 323x.

preprint2017arXiv

Visualization Regularizers for Neural Network based Image Recognition

The success of deep neural networks is mostly due their ability to learn meaningful features from the data. Features learned in the hidden layers of deep neural networks trained in computer vision tasks have been shown to be similar to mid-level vision features. We leverage this fact in this work and propose the visualization regularizer for image tasks. The proposed regularization technique enforces smoothness of the features learned by hidden nodes and turns out to be a special case of Tikhonov regularization. We achieve higher classification accuracy as compared to existing regularizers such as the L2 norm regularizer and dropout, on benchmark datasets without changing the training computational complexity.

preprint2017arXiv

Challenges ahead Electron Microscopy for Structural Biology from the Image Processing point of view

Since the introduction of Direct Electron Detectors (DEDs), the resolution and range of macromolecules amenable to this technique has significantly widened, generating a broad interest that explains the well over a dozen reviews in top journal in the last two years. Similarly, the number of job offers to lead EM groups and/or coordinate EM facilities has exploded, and FEI (the main microscope manufacturer for Life Sciences) has received more than 100 orders of high-end electron microscopes by summer 2016. Strategic corporate movements are also happening, with very big players entering the market through key acquisitions (Thermo Fisher has recently bought FEI for \$4.2B), partly attracted by new Pharma interest in the field, now perceived to be in a position to impact structure-based drug design. The scientific perspectives are indeed extremely positive but, in these moments of well-founded generalized optimists, we want to make a reflection on some of the hurdles ahead us, since they certainly exist and they indeed limit the informational content of cryoEM projects. Here we focus on image processing aspects, particularly in the so-called area of Single Particle Analysis, discussing

preprint2015arXiv

EgoSampling: Fast-Forward and Stereo for Egocentric Videos

While egocentric cameras like GoPro are gaining popularity, the videos they capture are long, boring, and difficult to watch from start to end. Fast forwarding (i.e. frame sampling) is a natural choice for faster video browsing. However, this accentuates the shake caused by natural head motion, making the fast forwarded video useless. We propose EgoSampling, an adaptive frame sampling that gives more stable fast forwarded videos. Adaptive frame sampling is formulated as energy minimization, whose optimal solution can be found in polynomial time. In addition, egocentric video taken while walking suffers from the left-right movement of the head as the body weight shifts from one leg to another. We turn this drawback into a feature: Stereo video can be created by sampling the frames from the left most and right most head positions of each step, forming approximate stereo-pairs.

preprint2016arXiv

Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks

Convolutional Neural Networks (ConvNets) have recently shown promising performance in many computer vision tasks, especially image-based recognition. How to effectively apply ConvNets to sequence-based data is still an open problem. This paper proposes an effective yet simple method to represent spatio-temporal information carried in $3D$ skeleton sequences into three $2D$ images by encoding the joint trajectories and their dynamics into color distribution in the images, referred to as Joint Trajectory Maps (JTM), and adopts ConvNets to learn the discriminative features for human action recognition. Such an image-based representation enables us to fine-tune existing ConvNets models for the classification of skeleton sequences without training the networks afresh. The three JTMs are generated in three orthogonal planes and provide complimentary information to each other. The final recognition is further improved through multiply score fusion of the three JTMs. The proposed method was evaluated on four public benchmark datasets, the large NTU RGB+D Dataset, MSRC-12 Kinect Gesture Dataset (MSRC-12), G3D Dataset and UTD Multimodal Human Action Dataset (UTD-MHAD) and achieved the state-

preprint2017arXiv

Meta-Unsupervised-Learning: A supervised approach to unsupervised learning

We introduce a new paradigm to investigate unsupervised learning, reducing unsupervised learning to supervised learning. Specifically, we mitigate the subjectivity in unsupervised decision-making by leveraging knowledge acquired from prior, possibly heterogeneous, supervised learning tasks. We demonstrate the versatility of our framework via comprehensive expositions and detailed experiments on several unsupervised problems such as (a) clustering, (b) outlier detection, and (c) similarity prediction under a common umbrella of meta-unsupervised-learning. We also provide rigorous PAC-agnostic bounds to establish the theoretical foundations of our framework, and show that our framing of meta-clustering circumvents Kleinberg's impossibility theorem for clustering.

preprint2017arXiv

Peak-Piloted Deep Network for Facial Expression Recognition

Objective functions for training of deep networks for face-related recognition tasks, such as facial expression recognition (FER), usually consider each sample independently. In this work, we present a novel peak-piloted deep network (PPDN) that uses a sample with peak expression (easy sample) to supervise the intermediate feature responses for a sample of non-peak expression (hard sample) of the same type and from the same subject. The expression evolving process from non-peak expression to peak expression can thus be implicitly embedded in the network to achieve the invariance to expression intensities. A special purpose back-propagation procedure, peak gradient suppression (PGS), is proposed for network training. It drives the intermediate-layer feature responses of non-peak expression samples towards those of the corresponding peak expression samples, while avoiding the inverse. This avoids degrading the recognition capability for samples of peak expression due to interference from their non-peak expression counterparts. Extensive comparisons on two popular FER datasets, Oulu-CASIA and CK+, demonstrate the superiority of the PPDN over state-ofthe-art FER methods, as well as the

preprint2017arXiv

Towards a Deep Learning Framework for Unconstrained Face Detection

Robust face detection is one of the most important pre-processing steps to support facial expression analysis, facial landmarking, face recognition, pose estimation, building of 3D facial models, etc. Although this topic has been intensely studied for decades, it is still challenging due to numerous variants of face images in real-world scenarios. In this paper, we present a novel approach named Multiple Scale Faster Region-based Convolutional Neural Network (MS-FRCNN) to robustly detect human facial regions from images collected under various challenging conditions, e.g. large occlusions, extremely low resolutions, facial expressions, strong illumination variations, etc. The proposed approach is benchmarked on two challenging face detection databases, i.e. the Wider Face database and the Face Detection Dataset and Benchmark (FDDB), and compared against recent other face detection methods, e.g. Two-stage CNN, Multi-scale Cascade CNN, Faceness, Aggregate Chanel Features, HeadHunter, Multi-view Face Detection, Cascade CNN, etc. The experimental results show that our proposed approach consistently achieves highly competitive results with the state-of-the-art performance against other

preprint2016arXiv

Robust Structure from Motion in the Presence of Outliers and Missing Data

Structure from motion is an import theme in computer vision. Although great progress has been made both in theory and applications, most of the algorithms only work for static scenes and rigid objects. In recent years, structure and motion recovery of non-rigid objects and dynamic scenes have received a lot of attention. In this paper, the state-of-the-art techniques for structure and motion factorization of non-rigid objects are reviewed and discussed. First, an introduction of the structure from motion problem is presented, followed by a general formulation of non-rigid structure from motion. Second, an augmented affined factorization framework, by using homogeneous representation, is presented to solve the registration issue in the presence of outlying and missing data. Third, based on the observation that the reprojection residuals of outliers are significantly larger than those of inliers, a robust factorization strategy with outlier rejection is proposed by means of the reprojection residuals, followed by some comparative experimental evaluations. Finally, some future research topics in non-rigid structure from motion are discussed.

575 works