Source author record

Francesca Odone

Francesca Odone appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Robotics Artificial Intelligence Human-Computer Interaction

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Efficient Unsupervised Learning for Plankton Images

Monitoring plankton populations in situ is fundamental to preserve the aquatic ecosystem. Plankton microorganisms are in fact susceptible of minor environmental perturbations, that can reflect into consequent morphological and dynamical modifications. Nowadays, the availability of advanced automatic or semi-automatic acquisition systems has been allowing the production of an increasingly large amount of plankton image data. The adoption of machine learning algorithms to classify such data may be affected by the significant cost of manual annotation, due to both the huge quantity of acquired data and the numerosity of plankton species. To address these challenges, we propose an efficient unsupervised learning pipeline to provide accurate classification of plankton microorganisms. We build a set of image descriptors exploiting a two-step procedure. First, a Variational Autoencoder (VAE) is trained on features extracted by a pre-trained neural network. We then use the learnt latent space as image descriptor for clustering. We compare our method with state-of-the-art unsupervised approaches, where a set of pre-defined hand-crafted features is used for clustering of plankton images. The proposed pipeline outperforms the benchmark algorithms for all the plankton datasets included in our analysis, providing better image embedding properties.

preprint2022arXiv

FasterVideo: Efficient Online Joint Object Detection And Tracking

Object detection and tracking in videos represent essential and computationally demanding building blocks for current and future visual perception systems. In order to reduce the efficiency gap between available methods and computational requirements of real-world applications, we propose to re-think one of the most successful methods for image object detection, Faster R-CNN, and extend it to the video domain. Specifically, we extend the detection framework to learn instance-level embeddings which prove beneficial for data association and re-identification purposes. Focusing on the computational aspects of detection and tracking, our proposed method reaches a very high computational efficiency necessary for relevant applications, while still managing to compete with recent and state-of-the-art methods as shown in the experiments we conduct on standard object tracking benchmarks

preprint2022arXiv

Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

Monocular 3D object detection continues to attract attention due to the cost benefits and wider availability of RGB cameras. Despite the recent advances and the ability to acquire data at scale, annotation cost and complexity still limit the size of 3D object detection datasets in the supervised settings. Self-supervised methods, on the other hand, aim at training deep networks relying on pretext tasks or various consistency constraints. Moreover, other 3D perception tasks (such as depth estimation) have shown the benefits of temporal priors as a self-supervision signal. In this work, we argue that the temporal consistency on the level of object poses, provides an important supervision signal given the strong prior on physical motion. Specifically, we propose a self-supervised loss which uses this consistency, in addition to render-and-compare losses, to refine noisy pose predictions and derive high-quality pseudo labels. To assess the effectiveness of the proposed method, we finetune a synthetically trained monocular 3D object detection model using the pseudo-labels that we generated on real data. Evaluation on the standard KITTI3D benchmark demonstrates that our method reaches competitive performance compared to other monocular self-supervised and supervised methods.

preprint2020arXiv

Action similarity judgment based on kinematic primitives

Understanding which features humans rely on -- in visually recognizing action similarity is a crucial step towards a clearer picture of human action perception from a learning and developmental perspective. In the present work, we investigate to which extent a computational model based on kinematics can determine action similarity and how its performance relates to human similarity judgments of the same actions. To this aim, twelve participants perform an action similarity task, and their performances are compared to that of a computational model solving the same task. The chosen model has its roots in developmental robotics and performs action classification based on learned kinematic primitives. The comparative experiment results show that both the model and human participants can reliably identify whether two actions are the same or not. However, the model produces more false hits and has a greater selection bias than human participants. A possible reason for this is the particular sensitivity of the model towards kinematic primitives of the presented actions. In a second experiment, human participants' performance on an action identification task indicated that they relied solely on kinematic information rather than on action semantics. The results show that both the model and human performance are highly accurate in an action similarity task based on kinematic-level features, which can provide an essential basis for classifying human actions.

preprint2015arXiv

Real-world Object Recognition with Off-the-shelf Deep Conv Nets: How Many Objects can iCub Learn?

The ability to visually recognize objects is a fundamental skill for robotics systems. Indeed, a large variety of tasks involving manipulation, navigation or interaction with other agents, deeply depends on the accurate understanding of the visual scene. Yet, at the time being, robots are lacking good visual perceptual systems, which often become the main bottleneck preventing the use of autonomous agents for real-world applications. Lately in computer vision, systems that learn suitable visual representations and based on multi-layer deep convolutional networks are showing remarkable performance in tasks such as large-scale visual recognition and image retrieval. To this regard, it is natural to ask whether such remarkable performance would generalize also to the robotic setting. In this paper we investigate such possibility, while taking further steps in developing a computational vision system to be embedded on a robotic platform, the iCub humanoid robot. In particular, we release a new dataset ({\sc iCubWorld28}) that we use as a benchmark to address the question: {\it how many objects can iCub recognize?} Our study is developed in a learning framework which reflects the typical visual experience of a humanoid robot like the iCub. Experiments shed interesting insights on the strength and weaknesses of current computer vision approaches applied in real robotic settings.

preprint2014arXiv

Real-time Automatic Emotion Recognition from Body Gestures

Although psychological research indicates that bodily expressions convey important affective information, to date research in emotion recognition focused mainly on facial expression or voice analysis. In this paper we propose an approach to realtime automatic emotion recognition from body movements. A set of postural, kinematic, and geometrical features are extracted from sequences 3D skeletons and fed to a multi-class SVM classifier. The proposed method has been assessed on data acquired through two different systems: a professionalgrade optical motion capture system, and Microsoft Kinect. The system has been assessed on a "six emotions" recognition problem, and using a leave-one-subject-out cross validation strategy, reached an overall recognition rate of 61.3% which is very close to the recognition rate of 61.9% obtained by human observers. To provide further testing of the system, two games were developed, where one or two users have to interact to understand and express emotions with their body.

preprint2013arXiv

iCub World: Friendly Robots Help Building Good Vision Data-Sets

In this paper we present and start analyzing the iCub World data-set, an object recognition data-set, we acquired using a Human-Robot Interaction (HRI) scheme and the iCub humanoid robot platform. Our set up allows for rapid acquisition and annotation of data with corresponding ground truth. While more constrained in its scopes -- the iCub world is essentially a robotics research lab -- we demonstrate how the proposed data-set poses challenges to current recognition systems. The iCubWorld data-set is publicly available. The data-set can be downloaded from: http://www.iit.it/en/projects/data-sets.html.

Francesca Odone

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Efficient Unsupervised Learning for Plankton Images

FasterVideo: Efficient Online Joint Object Detection And Tracking

Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

Action similarity judgment based on kinematic primitives

Real-world Object Recognition with Off-the-shelf Deep Conv Nets: How Many Objects can iCub Learn?

Real-time Automatic Emotion Recognition from Body Gestures

iCub World: Friendly Robots Help Building Good Vision Data-Sets