Source author record

Feras Dayoub

Feras Dayoub appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics Artificial Intelligence Human-Computer Interaction Machine Learning

Catalog footprint

What is connected

17works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Hyperdimensional Feature Fusion for Out-Of-Distribution Detection

We introduce powerful ideas from Hyperdimensional Computing into the challenging field of Out-of-Distribution (OOD) detection. In contrast to most existing work that performs OOD detection based on only a single layer of a neural network, we use similarity-preserving semi-orthogonal projection matrices to project the feature maps from multiple layers into a common vector space. By repeatedly applying the bundling operation $\oplus$, we create expressive class-specific descriptor vectors for all in-distribution classes. At test time, a simple and efficient cosine similarity calculation between descriptor vectors consistently identifies OOD samples with better performance than the current state-of-the-art. We show that the hyperdimensional fusion of multiple network layers is critical to achieve best general performance.

preprint2021arXiv

Class Anchor Clustering: a Loss for Distance-based Open Set Recognition

In open set recognition, deep neural networks encounter object classes that were unknown during training. Existing open set classifiers distinguish between known and unknown classes by measuring distance in a network's logit space, assuming that known classes cluster closer to the training data than unknown classes. However, this approach is applied post-hoc to networks trained with cross-entropy loss, which does not guarantee this clustering behaviour. To overcome this limitation, we introduce the Class Anchor Clustering (CAC) loss. CAC is a distance-based loss that explicitly trains known classes to form tight clusters around anchored class-dependent centres in the logit space. We show that training with CAC achieves state-of-the-art performance for distance-based open set classifiers on all six standard benchmark datasets, with a 15.2% AUROC increase on the challenging TinyImageNet, without sacrificing classification accuracy. We also show that our anchored class centres achieve higher open set performance than learnt class centres, particularly on object-based datasets and large numbers of training classes.

preprint2021arXiv

Online Monitoring of Object Detection Performance During Deployment

During deployment, an object detector is expected to operate at a similar performance level reported on its testing dataset. However, when deployed onboard mobile robots that operate under varying and complex environmental conditions, the detector's performance can fluctuate and occasionally degrade severely without warning. Undetected, this can lead the robot to take unsafe and risky actions based on low-quality and unreliable object detections. We address this problem and introduce a cascaded neural network that monitors the performance of the object detector by predicting the quality of its mean average precision (mAP) on a sliding window of the input frames. The proposed cascaded network exploits the internal features from the deep neural network of the object detector. We evaluate our proposed approach using different combinations of autonomous driving datasets and object detectors.

preprint2021arXiv

Semantics for Robotic Mapping, Perception and Interaction: A Survey

For robots to navigate and interact more richly with the world around them, they will likely require a deeper understanding of the world in which they operate. In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning. With humans and robots increasingly operating in the same world, the prospects of human-robot interaction also bring semantics and ontology of natural language into the picture. Driven by need, as well as by enablers like increasing availability of training data and computational resources, semantics is a rapidly growing research area in robotics. The field has received significant attention in the research literature to date, but most reviews and surveys have focused on particular aspects of the topic: the technical research issues regarding its use in specific robotic topics like mapping or segmentation, or its relevance to one particular application domain like autonomous driving. A new treatment is therefore required, and is also timely because so much relevant research has occurred since many of the key surveys were published. This survey therefore provides an overarching snapshot of where semantics in robotics stands today. We establish a taxonomy for semantics research in or relevant to robotics, split into four broad categories of activity, in which semantics are extracted, used, or both. Within these broad categories we survey dozens of major topics including fundamentals from the computer vision field and key robotics research areas utilizing semantics, including mapping, navigation and interaction with the world. The survey also covers key practical considerations, including enablers like increased data availability and improved computational hardware, and major application areas where...

preprint2021arXiv

Semi-supervised Keypoint Localization

Knowledge about the locations of keypoints of an object in an image can assist in fine-grained classification and identification tasks, particularly for the case of objects that exhibit large variations in poses that greatly influence their visual appearance, such as wild animals. However, supervised training of a keypoint detection network requires annotating a large image dataset for each animal species, which is a labor-intensive task. To reduce the need for labeled data, we propose to learn simultaneously keypoint heatmaps and pose invariant keypoint representations in a semi-supervised manner using a small set of labeled images along with a larger set of unlabeled images. Keypoint representations are learnt with a semantic keypoint consistency constraint that forces the keypoint detection network to learn similar features for the same keypoint across the dataset. Pose invariance is achieved by making keypoint representations for the image and its augmented copies closer together in feature space. Our semi-supervised approach significantly outperforms previous methods on several benchmarks for human and animal body landmark localization.

preprint2021arXiv

VarifocalNet: An IoU-aware Dense Object Detector

Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by $\sim$2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.Code is available at https://github.com/hyz-xmaster/VarifocalNet .

preprint2020arXiv

BenchBot: Evaluating Robotics Research in Photorealistic 3D Simulation and on Real Robots

We introduce BenchBot, a novel software suite for benchmarking the performance of robotics research across both photorealistic 3D simulations and real robot platforms. BenchBot provides a simple interface to the sensorimotor capabilities of a robot when solving robotics research problems; an interface that is consistent regardless of whether the target platform is simulated or a real robot. In this paper we outline the BenchBot system architecture, and explore the parallels between its user-centric design and an ideal research development process devoid of tangential robot engineering challenges. The paper describes the research benefits of using the BenchBot system, including: enhanced capacity to focus solely on research problems, direct quantitative feedback to inform research development, tools for deriving comprehensive performance characteristics, and submission formats which promote sharability and repeatability of research outcomes. BenchBot is publicly available (http://benchbot.org), and we encourage its use in the research community for comprehensively evaluating the simulated and real world performance of novel robotic algorithms.

preprint2020arXiv

Close-Proximity Underwater Terrain Mapping Using Learning-based Coarse Range Estimation

This paper presents a novel approach to underwater terrain mapping for Autonomous Underwater Vehicles (AUVs) operating in close proximity to complex 3D environments. The proposed methodology creates a probabilistic elevation map of the terrain using a monocular image learning-based scene range estimator as a sensor. This scene range estimator can filter transient objects such as fish and lighting variations. The mapping approach considers uncertainty in both the estimated scene range and robot pose as the AUV moves through the environment. The resulting elevation map can be used for reactive path planning and obstacle avoidance to allow robotic systems to approach the underwater terrain as closely as possible. The performance of our approach is evaluated in a simulated underwater environment by comparing the reconstructed terrain to ground truth reference maps, as well as demonstrated using AUV field data collected within in a coral reef environment. The simulations and field results show that the proposed approach is feasible for obstacle detection and range estimation using a monocular camera in reef environments.

preprint2020arXiv

Control of the Final-Phase of Closed-Loop Visual Grasping using Image-Based Visual Servoing

This paper considers the final approach phase of visual-closed-loop grasping where the RGB-D camera is no longer able to provide valid depth information. Many current robotic grasping controllers are not closed-loop and therefore fail for moving objects. Closed-loop grasp controllers based on RGB-D imagery can track a moving object, but fail when the sensor's minimum object distance is violated just before grasping. To overcome this we propose the use of image-based visual servoing (IBVS) to guide the robot to the object-relative grasp pose using camera RGB information. IBVS robustly moves the camera to a goal pose defined implicitly in terms of an image-plane feature configuration. In this work, the goal image feature coordinates are predicted from RGB-D data to enable RGB-only tracking once depth data becomes unavailable -- this enables more reliable grasping of previously unseen moving objects. Experimental results are provided.

preprint2020arXiv

Keypoint-Aligned Embeddings for Image Retrieval and Re-identification

Learning embeddings that are invariant to the pose of the object is crucial in visual image retrieval and re-identification. The existing approaches for person, vehicle, or animal re-identification tasks suffer from high intra-class variance due to deformable shapes and different camera viewpoints. To overcome this limitation, we propose to align the image embedding with a predefined order of the keypoints. The proposed keypoint aligned embeddings model (KAE-Net) learns part-level features via multi-task learning which is guided by keypoint locations. More specifically, KAE-Net extracts channels from a feature map activated by a specific keypoint through learning the auxiliary task of heatmap reconstruction for this keypoint. The KAE-Net is compact, generic and conceptually simple. It achieves state of the art performance on the benchmark datasets of CUB-200-2011, Cars196 and VeRi-776 for retrieval and re-identification tasks.

preprint2020arXiv

Learning landmark guided embeddings for animal re-identification

Re-identification of individual animals in images can be ambiguous due to subtle variations in body markings between different individuals and no constraints on the poses of animals in the wild. Person re-identification is a similar task and it has been approached with a deep convolutional neural network (CNN) that learns discriminative embeddings for images of people. However, learning discriminative features for an individual animal is more challenging than for a person's appearance due to the relatively small size of ecological datasets compared to labelled datasets of person's identities. We propose to improve embedding learning by exploiting body landmarks information explicitly. Body landmarks are provided to the input of a CNN as confidence heatmaps that can be obtained from a separate body landmark predictor. The model is encouraged to use heatmaps by learning an auxiliary task of reconstructing input heatmaps. Body landmarks guide a feature extraction network to learn the representation of a distinctive pattern and its position on the body. We evaluate the proposed method on a large synthetic dataset and a small real dataset. Our method outperforms the same model without body landmarks input by 26% and 18% on the synthetic and the real datasets respectively. The method is robust to noise in input coordinates and can tolerate an error in coordinates up to 10% of the image size.

preprint2020arXiv

Probabilistic Object Detection: Definition and Evaluation

We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. Given the lack of methods capable of assessing such probabilistic object detections, we present the new Probability-based Detection Quality measure (PDQ).Unlike AP-based measures, PDQ has no arbitrary thresholds and rewards spatial and label quality, and foreground/background separation quality while explicitly penalising false positive and false negative detections. We contrast PDQ with existing mAP and moLRP measures by evaluating state-of-the-art detectors and a Bayesian object detector based on Monte Carlo Dropout. Our experiments indicate that conventional object detectors tend to be spatially overconfident and thus perform poorly on the task of probabilistic object detection. Our paper aims to encourage the development of new object detection approaches that provide detections with accurately estimated spatial and label uncertainties and are of critical importance for deployment on robots and embodied AI systems in the real world.

preprint2020arXiv

Robot Navigation in Unseen Spaces using an Abstract Map

Human navigation in built environments depends on symbolic spatial information which has unrealised potential to enhance robot navigation capabilities. Information sources such as labels, signs, maps, planners, spoken directions, and navigational gestures communicate a wealth of spatial information to the navigators of built environments; a wealth of information that robots typically ignore. We present a robot navigation system that uses the same symbolic spatial information employed by humans to purposefully navigate in unseen built environments with a level of performance comparable to humans. The navigation system uses a novel data structure called the abstract map to imagine malleable spatial models for unseen spaces from spatial symbols. Sensorimotor perceptions from a robot are then employed to provide purposeful navigation to symbolic goal locations in the unseen environment. We show how a dynamic system can be used to create malleable spatial models for the abstract map, and provide an open source implementation to encourage future work in the area of symbolic navigation. Symbolic navigation performance of humans and a robot is evaluated in a real-world built environment. The paper concludes with a qualitative analysis of human navigation strategies, providing further insights into how the symbolic navigation capabilities of robots in unseen built environments can be improved in the future.

preprint2020arXiv

The Robotic Vision Scene Understanding Challenge

Being able to explore an environment and understand the location and type of all objects therein is important for indoor robotic platforms that must interact closely with humans. However, it is difficult to evaluate progress in this area due to a lack of standardized testing which is limited due to the need for active robot agency and perfect object ground-truth. To help provide a standard for testing scene understanding systems, we present a new robot vision scene understanding challenge using simulation to enable repeatable experiments with active robot agency. We provide two challenging task types, three difficulty levels, five simulated environments and a new evaluation measure for evaluating 3D cuboid object maps. Our aim is to drive state-of-the-art research in scene understanding through enabling evaluation and comparison of active robotic vision systems.

preprint2020arXiv

What can robotics research learn from computer vision research?

The computer vision and robotics research communities are each strong. However progress in computer vision has become turbo-charged in recent years due to big data, GPU computing, novel learning algorithms and a very effective research methodology. By comparison, progress in robotics seems slower. It is true that robotics came later to exploring the potential of learning -- the advantages over the well-established body of knowledge in dynamics, kinematics, planning and control is still being debated, although reinforcement learning seems to offer real potential. However, the rapid development of computer vision compared to robotics cannot be only attributed to the former's adoption of deep learning. In this paper, we argue that the gains in computer vision are due to research methodology -- evaluation under strict constraints versus experiments; bold numbers versus videos.

preprint2015arXiv

On the Performance of ConvNet Features for Place Recognition

After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essential, and performance priorities can be different. This paper comprehensively evaluates and compares the utility of three state-of-the-art ConvNets on the problems of particular relevance to navigation for robots; viewpoint-invariance and condition-invariance, and for the first time enables real-time place recognition performance using ConvNets with large maps by integrating a variety of existing (locality-sensitive hashing) and novel (semantic search space partitioning) optimization techniques. We present extensive experiments on four real world datasets cultivated to evaluate each of the specific challenges in place recognition. The results demonstrate that speed-ups of two orders of magnitude can be achieved with minimal accuracy degradation, enabling real-time performance. We confirm that networks trained for semantic place categorization also perform better at (specific) place recognition when faced with severe appearance changes and provide a reference for which networks and layers are optimal for different aspects of the place recognition problem.

preprint2015arXiv

Place Categorization and Semantic Mapping on a Mobile Robot

In this paper we focus on the challenging problem of place categorization and semantic mapping on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can learn to recognize new semantic classes online. Prior domain knowledge is incorporated by embedding the classification system into a Bayesian filter framework that also ensures temporal coherence. We evaluate the classification accuracy of the system on a robot that maps a variety of places on our campus in real-time. We show how semantic information can boost robotic object detection performance and how the semantic map can be used to modulate the robot's behaviour during navigation tasks. The system is made available to the community as a ROS module.

Feras Dayoub

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Hyperdimensional Feature Fusion for Out-Of-Distribution Detection

Class Anchor Clustering: a Loss for Distance-based Open Set Recognition

Online Monitoring of Object Detection Performance During Deployment

Semantics for Robotic Mapping, Perception and Interaction: A Survey

Semi-supervised Keypoint Localization

VarifocalNet: An IoU-aware Dense Object Detector

BenchBot: Evaluating Robotics Research in Photorealistic 3D Simulation and on Real Robots

Close-Proximity Underwater Terrain Mapping Using Learning-based Coarse Range Estimation

Control of the Final-Phase of Closed-Loop Visual Grasping using Image-Based Visual Servoing

Keypoint-Aligned Embeddings for Image Retrieval and Re-identification

Learning landmark guided embeddings for animal re-identification

Probabilistic Object Detection: Definition and Evaluation

Robot Navigation in Unseen Spaces using an Abstract Map

The Robotic Vision Scene Understanding Challenge

What can robotics research learn from computer vision research?

On the Performance of ConvNet Features for Place Recognition

Place Categorization and Semantic Mapping on a Mobile Robot