Source author record

Coloma Ballester

Coloma Ballester appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence eess.IV

Catalog footprint

What is connected

11works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

TRIMMER: A New Paradigm for Video Summarization through Self-Supervised Reinforcement Learning

The rapid growth of video content across domains such as surveillance, education, and social media has made efficient content understanding increasingly critical. Video summarization addresses this challenge by generating concise yet semantically meaningful representations, but existing approaches often rely on expensive manual annotations, struggle to generalize across domains, and incur significant computational costs due to complex architectures. Moreover, unsupervised and weakly supervised methods typically underperform compared to supervised counterparts in capturing long-range temporal dependencies and semantic structure. In this work, we propose TRIMMER (Temporal Relative Information Maximization for Multi-objective Efficient Reinforcement), a novel self-supervised reinforcement learning framework for video summarization. TRIMMER operates in two stages: it first learns robust representations via self-supervised learning and then performs spatio-temporal decision making through reinforcement learning guided by information-theoretic reward functions. Unlike prior approaches that rely on similarity-based objectives, our method introduces entropy-based metrics to capture higher-order temporal dynamics and semantic diversity, while computing rewards directly over selected frame indices to improve computational efficiency. Extensive experiments on standard benchmarks demonstrate that TRIMMER achieves state-of-the-art performance among unsupervised and self-supervised methods, while remaining competitive with leading supervised approaches, highlighting its effectiveness for scalable and generalizable video summarization.

preprint2022arXiv

An Analysis of Generative Methods for Multiple Image Inpainting

Image inpainting refers to the restoration of an image with missing regions in a way that is not detectable by the observer. The inpainting regions can be of any size and shape. This is an ill-posed inverse problem that does not have a unique solution. In this work, we focus on learning-based image completion methods for multiple and diverse inpainting which goal is to provide a set of distinct solutions for a given damaged image. These methods capitalize on the probabilistic nature of certain generative models to sample various solutions that coherently restore the missing content. Along the chapter, we will analyze the underlying theory and analyze the recent proposals for multiple inpainting. To investigate the pros and cons of each method, we present quantitative and qualitative comparisons, on common datasets, regarding both the quality and the diversity of the set of inpainted solutions. Our analysis allows us to identify the most successful generative strategies in both inpainting quality and inpainting diversity. This task is closely related to the learning of an accurate probability distribution of images. Depending on the dataset in use, the challenges that entail the training of such a model will be discussed through the analysis.

preprint2022arXiv

Analysis of Different Losses for Deep Learning Image Colorization

Image colorization aims to add color information to a grayscale image in a realistic way. Recent methods mostly rely on deep learning strategies. While learning to automatically colorize an image, one can define well-suited objective functions related to the desired color output. Some of them are based on a specific type of error between the predicted image and ground truth one, while other losses rely on the comparison of perceptual properties. But, is the choice of the objective function that crucial, i.e., does it play an important role in the results? In this chapter, we aim to answer this question by analyzing the impact of the loss function on the estimated colorization results. To that goal, we review the different losses and evaluation metrics that are used in the literature. We then train a baseline network with several of the reviewed objective functions: classic L1 and L2 losses, as well as more complex combinations such as Wasserstein GAN and VGG-based LPIPS loss. Quantitative results show that the models trained with VGG-based LPIPS provide overall slightly better results for most evaluation metrics. Qualitative results exhibit more vivid colors when with Wasserstein GAN plus the L2 loss or again with the VGG-based LPIPS. Finally, the convenience of quantitative user studies is also discussed to overcome the difficulty of properly assessing on colorized images, notably for the case of old archive photographs where no ground truth is available.

preprint2022arXiv

Influence of Color Spaces for Deep Learning Image Colorization

Colorization is a process that converts a grayscale image into a color one that looks as natural as possible. Over the years this task has received a lot of attention. Existing colorization methods rely on different color spaces: RGB, YUV, Lab, etc. In this chapter, we aim to study their influence on the results obtained by training a deep neural network, to answer the question: "Is it crucial to correctly choose the right color space in deep-learning based colorization?". First, we briefly summarize the literature and, in particular, deep learning-based methods. We then compare the results obtained with the same deep neural network architecture with RGB, YUV and Lab color spaces. Qualitative and quantitative analysis do not conclude similarly on which color space is better. We then show the importance of carefully designing the architecture and evaluation protocols depending on the types of images that are being processed and their specificities: strong/small contours, few/many objects, recent/archive images.

preprint2021arXiv

Automatic Flare Spot Artifact Detection and Removal in Photographs

Flare spot is one type of flare artifact caused by a number of conditions, frequently provoked by one or more high-luminance sources within or close to the camera field of view. When light rays coming from a high-luminance source reach the front element of a camera, it can produce intra-reflections within camera elements that emerge at the film plane forming non-image information or flare on the captured image. Even though preventive mechanisms are used, artifacts can appear. In this paper, we propose a robust computational method to automatically detect and remove flare spot artifacts. Our contribution is threefold: firstly, we propose a characterization which is based on intrinsic properties that a flare spot is likely to satisfy; secondly, we define a new confidence measure able to select flare spots among the candidates; and, finally, a method to accurately determine the flare region is given. Then, the detected artifacts are removed by using exemplar-based inpainting. We show that our algorithm achieve top-tier quantitative and qualitative performance.

preprint2020arXiv

Always Look on the Bright Side of the Field: Merging Pose and Contextual Data to Estimate Orientation of Soccer Players

Although orientation has proven to be a key skill of soccer players in order to succeed in a broad spectrum of plays, body orientation is a yet-little-explored area in sports analytics' research. Despite being an inherently ambiguous concept, player orientation can be defined as the projection (2D) of the normal vector placed in the center of the upper-torso of players (3D). This research presents a novel technique to obtain player orientation from monocular video recordings by mapping pose parts (shoulders and hips) in a 2D field by combining OpenPose with a super-resolution network, and merging the obtained estimation with contextual information (ball position). Results have been validated with players-held EPTS devices, obtaining a median error of 27 degrees/player. Moreover, three novel types of orientation maps are proposed in order to make raw orientation data easy to visualize and understand, thus allowing further analysis at team- or player-level.

preprint2020arXiv

ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution

The colorization of grayscale images is an ill-posed problem, with multiple correct solutions. In this paper, we propose an adversarial learning colorization approach coupled with semantic information. A generative network is used to infer the chromaticity of a given grayscale image conditioned to semantic clues. This network is framed in an adversarial model that learns to colorize by incorporating perceptual and semantic understanding of color and class distributions. The model is trained via a fully self-supervised strategy. Qualitative and quantitative results show the capacity of the proposed method to colorize images in a realistic way achieving state-of-the-art results.

preprint2020arXiv

History-based Anomaly Detector: an Adversarial Approach to Anomaly Detection

Anomaly detection is a difficult problem in many areas and has recently been subject to a lot of attention. Classifying unseen data as anomalous is a challenging matter. Latest proposed methods rely on Generative Adversarial Networks (GANs) to estimate the normal data distribution, and produce an anomaly score prediction for any given data. In this article, we propose a simple yet new adversarial method to tackle this problem, denoted as History-based anomaly detector (HistoryAD). It consists of a self-supervised model, trained to recognize 'normal' samples by comparing them to samples based on the training history of a previously trained GAN. Quantitative and qualitative results are presented evaluating its performance. We also present a comparison to several state-of-the-art methods for anomaly detection showing that our proposal achieves top-tier results on several datasets.

preprint2020arXiv

Using Player's Body-Orientation to Model Pass Feasibility in Soccer

Given a monocular video of a soccer match, this paper presents a computational model to estimate the most feasible pass at any given time. The method leverages offensive player's orientation (plus their location) and opponents' spatial configuration to compute the feasibility of pass events within players of the same team. Orientation data is gathered from body pose estimations that are properly projected onto the 2D game field; moreover, a geometrical solution is provided, through the definition of a feasibility measure, to determine which players are better oriented towards each other. Once analyzed more than 6000 pass events, results show that, by including orientation as a feasibility measure, a robust computational model can be built, reaching more than 0.7 Top-3 accuracy. Finally, the combination of the orientation feasibility measure with the recently introduced Expected Possession Value metric is studied; promising results are obtained, thus showing that existing models can be refined by using orientation as a key feature. These models could help both coaches and analysts to have a better understanding of the game and to improve the players' decision-making process.

preprint2016arXiv

A Computational Model for Amodal Completion

This paper presents a computational model to recover the most likely interpretation of the 3D scene structure from a planar image, where some objects may occlude others. The estimated scene interpretation is obtained by integrating some global and local cues and provides both the complete disoccluded objects that form the scene and their ordering according to depth. Our method first computes several distal scenes which are compatible with the proximal planar image. To compute these different hypothesized scenes, we propose a perceptually inspired object disocclusion method, which works by minimizing the Euler's elastica as well as by incorporating the relatability of partially occluded contours and the convexity of the disoccluded objects. Then, to estimate the preferred scene we rely on a Bayesian model and define probabilities taking into account the global complexity of the objects in the hypothesized scenes as well as the effort of bringing these objects in their relative position in the planar image, which is also measured by an Euler's elastica-based quantity. The model is illustrated with numerical experiments on, both, synthetic and real images showing the ability of our model to reconstruct the occluded objects and the preferred perceptual order among them. We also present results on images of the Berkeley dataset with provided figure-ground ground-truth labeling.

preprint2016arXiv

FALDOI: A new minimization strategy for large displacement variational optical flow

We propose a large displacement optical flow method that introduces a new strategy to compute a good local minimum of any optical flow energy functional. The method requires a given set of discrete matches, which can be extremely sparse, and an energy functional which locally guides the interpolation from those matches. In particular, the matches are used to guide a structured coordinate-descent of the energy functional around these keypoints. It results in a two-step minimization method at the finest scale which is very robust to the inevitable outliers of the sparse matcher and able to capture large displacements of small objects. Its benefits over other variational methods that also rely on a set of sparse matches are its robustness against very few matches, high levels of noise and outliers. We validate our proposal using several optical flow variational models. The results consistently outperform the coarse-to-fine approaches and achieve good qualitative and quantitative performance on the standard optical flow benchmarks.

Coloma Ballester

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

TRIMMER: A New Paradigm for Video Summarization through Self-Supervised Reinforcement Learning

An Analysis of Generative Methods for Multiple Image Inpainting

Analysis of Different Losses for Deep Learning Image Colorization

Influence of Color Spaces for Deep Learning Image Colorization

Automatic Flare Spot Artifact Detection and Removal in Photographs

Always Look on the Bright Side of the Field: Merging Pose and Contextual Data to Estimate Orientation of Soccer Players

ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution

History-based Anomaly Detector: an Adversarial Approach to Anomaly Detection

Using Player's Body-Orientation to Model Pass Feasibility in Soccer

A Computational Model for Amodal Completion

FALDOI: A new minimization strategy for large displacement variational optical flow