Source author record

Thomas Gilles

Thomas Gilles appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics Artificial Intelligence Computation and Language

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Enhanced Behavioral Cloning with Environmental Losses for Self-Driving Vehicles

Learned path planners have attracted research interest due to their ability to model human driving behavior and rapid inference. Recent works on behavioral cloning show that simple imitation of expert observations is not sufficient to handle complex driving scenarios. Besides, predictions that land outside drivable areas can lead to potentially dangerous situations. This paper proposes a set of loss functions, namely Social loss and Road loss, which account for modelling risky social interactions in path planning. These losses act as a repulsive scalar field that surrounds non-drivable areas. Predictions that land near these regions incur in a higher training cost, which is minimized using backpropagation. This methodology provides additional environment feedback to the traditional supervised learning set up. We validated this approach on a large-scale urban driving dataset. The results show the agent learns to imitate human driving while exhibiting better safety metrics. Furthermore, the proposed methodology has positive effects on inference without the need to artificially generate unsafe driving examples. The explanability study suggests that the benefits obtained are associated with a higher relevance of non-drivable areas in the agent's decisions compared to classical behavioral cloning.

preprint2022arXiv

Information Extraction from Visually Rich Documents with Font Style Embeddings

Information extraction (IE) from documents is an intensive area of research with a large set of industrial applications. Current state-of-the-art methods focus on scanned documents with approaches combining computer vision, natural language processing and layout representation. We propose to challenge the usage of computer vision in the case where both token style and visual representation are available (i.e native PDF documents). Our experiments on three real-world complex datasets demonstrate that using token style attributes based embedding instead of a raw visual embedding in LayoutLM model is beneficial. Depending on the dataset, such an embedding yields an improvement of 0.18% to 2.29% in the weighted F1-score with a decrease of 30.7% in the final number of trainable parameters of the model, leading to an improvement in both efficiency and effectiveness.

preprint2022arXiv

THOMAS: Trajectory Heatmap Output with learned Multi-Agent Sampling

In this paper, we propose THOMAS, a joint multi-agent trajectory prediction framework allowing for an efficient and consistent prediction of multi-agent multi-modal trajectories. We present a unified model architecture for simultaneous agent future heatmap estimation, in which we leverage hierarchical and sparse image generation for fast and memory-efficient inference. We propose a learnable trajectory recombination model that takes as input a set of predicted trajectories for each agent and outputs its consistent reordered recombination. This recombination module is able to realign the initially independent modalities so that they do no collide and are coherent with each other. We report our results on the Interaction multi-agent prediction challenge and rank $1^{st}$ on the online test leaderboard.

preprint2022arXiv

Uncertainty estimation for Cross-dataset performance in Trajectory prediction

While a lot of work has been carried on developing trajectory prediction methods, and various datasets have been proposed for benchmarking this task, little study has been done so far on the generalizability and the transferability of these methods across dataset. In this paper, we observe the performance of two of the latest state-of-the-art trajectory prediction methods across four different datasets (Argoverse, NuScenes, Interaction, Shifts). This analysis allows to gain some insights on the generalizability proprieties of most recent trajectory prediction models and to analyze which dataset is more representative of real driving scenes and therefore enables better transferability. Furthermore we present a novel method to estimate prediction uncertainty and show how it could be used to achieve better performance across datasets.

Thomas Gilles

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Enhanced Behavioral Cloning with Environmental Losses for Self-Driving Vehicles

Information Extraction from Visually Rich Documents with Font Style Embeddings

THOMAS: Trajectory Heatmap Output with learned Multi-Agent Sampling

Uncertainty estimation for Cross-dataset performance in Trajectory prediction