Source author record

Davide Alessandro Coccomini

Davide Alessandro Coccomini appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning physics.ao-ph

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection

Deepfake Generation Techniques are evolving at a rapid pace, making it possible to create realistic manipulated images and videos and endangering the serenity of modern society. The continual emergence of new and varied techniques brings with it a further problem to be faced, namely the ability of deepfake detection models to update themselves promptly in order to be able to identify manipulations carried out using even the most recent methods. This is an extremely complex problem to solve, as training a model requires large amounts of data, which are difficult to obtain if the deepfake generation method is too recent. Moreover, continuously retraining a network would be unfeasible. In this paper, we ask ourselves if, among the various deep learning techniques, there is one that is able to generalise the concept of deepfake to such an extent that it does not remain tied to one or more specific deepfake generation methods used in the training set. We compared a Vision Transformer with an EfficientNetV2 on a cross-forgery context based on the ForgeryNet dataset. From our experiments, It emerges that EfficientNetV2 has a greater tendency to specialize often obtaining better results on training methods while Vision Transformers exhibit a superior generalization ability that makes them more competent even on images generated with new methodologies.

preprint2022arXiv

Predicting Tornadoes days ahead with Machine Learning

Developing methods to predict disastrous natural phenomena is more important than ever, and tornadoes are among the most dangerous ones in nature. Due to the unpredictability of the weather, counteracting them is not an easy task and today it is mainly carried out by expert meteorologists, who interpret meteorological models. In this paper we propose a system for the early detection of a tornado, validating its effectiveness in a real-world context and exploiting meteorological data collection systems that are already widespread throughout the world. Our system was able to predict tornadoes with a maximum probability of 84% up to five days before the event on a novel dataset of more than 5000 tornadic and non-tornadic events. The dataset and the code to reproduce our results are available at: https://tinyurl.com/3brsfwpk

preprint2022arXiv

Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching

With the increased accessibility of web and online encyclopedias, the amount of data to manage is constantly increasing. In Wikipedia, for example, there are millions of pages written in multiple languages. These pages contain images that often lack the textual context, remaining conceptually floating and therefore harder to find and manage. In this work, we present the system we designed for participating in the Wikipedia Image-Caption Matching challenge on Kaggle, whose objective is to use data associated with images (URLs and visual data) to find the correct caption among a large pool of available ones. A system able to perform this task would improve the accessibility and completeness of multimedia content on large online encyclopedias. Specifically, we propose a cascade of two models, both powered by the recent Transformer model, able to efficiently and effectively infer a relevance score between the query image data and the captions. We verify through extensive experimentation that the proposed two-model approach is an effective way to handle a large pool of images and captions while maintaining bounded the overall computational complexity at inference time. Our approach achieves remarkable results, obtaining a normalized Discounted Cumulative Gain (nDCG) value of 0.53 on the private leaderboard of the Kaggle challenge.

Davide Alessandro Coccomini

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection

Predicting Tornadoes days ahead with Machine Learning

Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching