Source author record

Rémi Flamary

Rémi Flamary appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC astro-ph.IM math.ST Statistics Theory Computation Computer Vision eess.SP Sound

Catalog footprint

What is connected

23works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Geometric Dictionary Learning of Dynamical Systems with Optimal Transport

Learning dynamical systems through operator-theoretic representations provides a powerful framework for analyzing complex dynamics, as spectral quantities such as eigenvalues and invariant structures encode characteristic time scales and long-term behavior. However, dynamical operators are typically estimated independently for each system, preventing the discovery of shared structure across related dynamics. To address this limitation, we posit that related dynamical systems lie near a low-dimensional manifold in spectral operator space. Based on this hypothesis, we introduce DOODL (Dynamical OperatOr Dictionary Learning), a framework that learns a dictionary of characteristic spectral dynamics whose combinations approximate this manifold and yield compact, interpretable embeddings of individual systems. Beyond representation learning, DOODL enables fast and interpretable operator estimation from short and partially observed trajectories by constraining the estimation to the learned operator manifold. Experiments on metastable Langevin dynamics and turbulent plasma simulations demonstrate that DOODL scales to highly complex multiscale regimes while capturing characteristic spectral structure governing the dynamics rather than merely fitting trajectories, achieving errors one to two orders of magnitude lower than independent operator estimation methods in challenging low-data regimes.

preprint2026arXiv

MSAlign: Aligning Molecule and Mass Spectra Foundation Models for Metabolite Identification

Accurately identifying metabolites i.e. small molecules from mass spectrometry data remains a core challenge in metabolomics, with broad applications in drug discovery, environmental analysis, and clinical research. We address the Molecule Retrieval task, which consists in recovering the chemical structure of a metabolite from its MS/MS spectrum given a set of candidate molecules. While the recent release of benchmark datasets such as MassSpecGym and Spectraverse has considerably accelerated the development of novel machine learning approaches, the complexity of data preprocessing pipelines and the lack of unified implementations make methods and results difficult to reproduce and compare. We make three contributions. First, we propose a unified framework encompassing recent approaches based on representation alignment and contrastive learning. Second, we introduce MSAlign, inspired by multimodal alignment in vision-language models, which learns a shared representation space by aligning two frozen foundation models (DreaMS for mass spectra and ChemBERTa for molecules) through lightweight MLP projections trained with a candidate-based contrastive objective. MSAlign is simple to implement, fast to train and consistently outperforms existing approaches across all benchmarks. Third, we investigate a long-standing evaluation problem: data splitting strategies in molecule retrieval implicitly trade off data leakage against domain shift. We formalize this tension by introducing a quantitative measure of distribution shift, and use it to evaluate splitting strategies in existing benchmarks. All datasets, splits, candidate sets, and a unified implementation of MSAlign and baselines are publicly released to support reproducible research.

preprint2022arXiv

Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters

This paper introduces a novel and generic framework to solve the flagship task of supervised labeled graph prediction by leveraging Optimal Transport tools. We formulate the problem as regression with the Fused Gromov-Wasserstein (FGW) loss and propose a predictive model relying on a FGW barycenter whose weights depend on inputs. First we introduce a non-parametric estimator based on kernel ridge regression for which theoretical results such as consistency and excess risk bound are proved. Next we propose an interpretable parametric model where the barycenter weights are modeled with a neural network and the graphs on which the FGW barycenter is calculated are additionally learned. Numerical experiments show the strength of the method and its ability to interpolate in the labeled graph space on simulated data and on a difficult metabolic identification problem where it can reach very good performance with very little engineering.

preprint2022arXiv

Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport

The problem of domain adaptation on an unlabeled target dataset using knowledge from multiple labelled source datasets is becoming increasingly important. A key challenge is to design an approach that overcomes the covariate and target shift both among the sources, and between the source and target domains. In this paper, we address this problem from a new perspective: instead of looking for a latent representation invariant between source and target domains, we exploit the diversity of source distributions by tuning their weights to the target task at hand. Our method, named Weighted Joint Distribution Optimal Transport (WJDOT), aims at finding simultaneously an Optimal Transport-based alignment between the source and target distributions and a re-weighting of the sources distributions. We discuss the theoretical aspects of the method and propose a conceptually simple algorithm. Numerical experiments indicate that the proposed method achieves state-of-the-art performance on simulated and real-life datasets.

preprint2022arXiv

Semi-relaxed Gromov-Wasserstein divergence with applications on graphs

Comparing structured objects such as graphs is a fundamental operation involved in many learning tasks. To this end, the Gromov-Wasserstein (GW) distance, based on Optimal Transport (OT), has proven to be successful in handling the specific nature of the associated objects. More specifically, through the nodes connectivity relations, GW operates on graphs, seen as probability measures over specific spaces. At the core of OT is the idea of conservation of mass, which imposes a coupling between all the nodes from the two considered graphs. We argue in this paper that this property can be detrimental for tasks such as graph dictionary or partition learning, and we relax it by proposing a new semi-relaxed Gromov-Wasserstein divergence. Aside from immediate computational benefits, we discuss its properties, and show that it can lead to an efficient graph dictionary learning algorithm. We empirically demonstrate its relevance for complex tasks on graphs such as partitioning, clustering and completion.

preprint2022arXiv

Sliding window strategy for convolutional spike sorting with Lasso : Algorithm, theoretical guarantees and complexity

Spike sorting is a class of algorithms used in neuroscience to attribute the time occurences of particular electric signals, called action potential or spike, to neurons. We rephrase this problem as a particular optimization problem : Lasso for convolutional models in high dimension. Lasso (i.e. least absolute shrinkage and selection operator) is a very generic tool in machine learning that help us to look for sparse solutions (here the time occurrences). However, for the size of the problem at hand in this neuroscience context, the classical Lasso solvers are failing. We present here a new and much faster algorithm. Making use of biological properties related to neurons, we explain how the particular structure of the problem allows several optimizations, leading to an algorithm with a temporal complexity which grows linearly with respect to the size of the recorded signal and can be performed online. Moreover the spatial separability of the initial problem allows to break it into subproblems, further reducing the complexity and making possible its application on the latest recording devices which comprise a large number of sensors. We provide several mathematical results: the size and numerical complexity of the subproblems can be estimated mathematically by using percolation theory. We also show under reasonable assumptions that the Lasso estimator retrieves the true time occurrences of the spikes {with large probability}. Finally the theoretical time complexity of the algorithm is given. Numerical simulations are also provided in order to illustrate the efficiency of our approach.

preprint2022arXiv

Template based Graph Neural Network with Optimal Transport Distances

Current Graph Neural Networks (GNN) architectures generally rely on two important components: node features embedding through message passing, and aggregation with a specialized form of pooling. The structural (or topological) information is implicitly taken into account in these two steps. We propose in this work a novel point of view, which places distances to some learnable graph templates at the core of the graph representation. This distance embedding is constructed thanks to an optimal transport distance: the Fused Gromov-Wasserstein (FGW) distance, which encodes simultaneously feature and structure dissimilarities by solving a soft graph-matching problem. We postulate that the vector of FGW distances to a set of template graphs has a strong discriminative power, which is then fed to a non-linear classifier for final predictions. Distance embedding can be seen as a new layer, and can leverage on existing message passing techniques to promote sensible feature representations. Interestingly enough, in our work the optimal set of template graphs is also learnt in an end-to-end fashion by differentiating through this layer. After describing the corresponding learning procedure, we empirically validate our claim on several synthetic and real life graph classification datasets, where our method is competitive or surpasses kernel and GNN state-of-the-art approaches. We complete our experiments by an ablation study and a sensitivity analysis to parameters.

preprint2021arXiv

Minibatch optimal transport distances; analysis and applications

Optimal transport distances have become a classic tool to compare probability distributions and have found many applications in machine learning. Yet, despite recent algorithmic developments, their complexity prevents their direct use on large scale datasets. To overcome this challenge, a common workaround is to compute these distances on minibatches i.e. to average the outcome of several smaller optimal transport problems. We propose in this paper an extended analysis of this practice, which effects were previously studied in restricted cases. We first consider a large variety of Optimal Transport kernels. We notably argue that the minibatch strategy comes with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation, but also with limits: the minibatch OT is not a distance. To recover some of the lost distance axioms, we introduce a debiased minibatch OT function and study its statistical and optimisation properties. Along with this theoretical analysis, we also conduct empirical experiments on gradient flows, generative adversarial networks (GANs) or color transfer that highlight the practical interest of this strategy.

preprint2021arXiv

Representation Transfer by Optimal Transport

Learning generic representations with deep networks requires massive training samples and significant computer resources. To learn a new specific task, an important issue is to transfer the generic teacher's representation to a student network. In this paper, we propose to use a metric between representations that is based on a functional view of neurons. We use optimal transport to quantify the match between two representations, yielding a distance that embeds some invariances inherent to the representation of deep networks. This distance defines a regularizer promoting the similarity of the student's representation with that of the teacher. Our approach can be used in any learning context where representation transfer is applicable. We experiment here on two standard settings: inductive transfer learning, where the teacher's representation is transferred to a student network of same architecture for a new related task, and knowledge distillation, where the teacher's representation is transferred to a student of simpler architecture for the same task (model compression). Our approach also lends itself to solving new learning problems; we demonstrate this by showing how to directly transfer the teacher's representation to a simpler architecture student for a new related task.

preprint2021arXiv

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.

preprint2016arXiv

Importance sampling strategy for non-convex randomized block-coordinate descent

As the number of samples and dimensionality of optimization problems related to statistics an machine learning explode, block coordinate descent algorithms have gained popularity since they reduce the original problem to several smaller ones. Coordinates to be optimized are usually selected randomly according to a given probability distribution. We introduce an importance sampling strategy that helps randomized coordinate descent algorithms to focus on blocks that are still far from convergence. The framework applies to problems composed of the sum of two possibly non-convex terms, one being separable and non-smooth. We have compared our algorithm to a full gradient proximal approach as well as to a randomized block coordinate algorithm that considers uniform sampling and cyclic block coordinate descent. Experimental evidences show the clear benefit of using an importance sampling strategy.

preprint2016arXiv

Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions

In this paper, we tackle the question of discovering an effective set of spatial filters to solve hyperspectral classification problems. Instead of fixing a priori the filters and their parameters using expert knowledge, we let the model find them within random draws in the (possibly infinite) space of possible filters. We define an active set feature learner that includes in the model only features that improve the classifier. To this end, we consider a fast and linear classifier, multiclass logistic classification, and show that with a good representation (the filters discovered), such a simple classifier can reach at least state of the art performances. We apply the proposed active set learner in four hyperspectral image classification problems, including agricultural and urban classification at different resolutions, as well as multimodal data. We also propose a hierarchical setting, which allows to generate more complex banks of features that can better describe the nonlinearities present in the data.

preprint2016arXiv

Optimal spectral transportation with application to music transcription

Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates. In particular, state-of-the-art music transcription systems decompose the spectrogram of the input signal onto a dictionary of representative note spectra. The typical measures of fit used to quantify the adequacy of the decomposition compare the data and template entries frequency-wise. As such, small displacements of energy from a frequency bin to another as well as variations of timber can disproportionally harm the fit. We address these issues by means of optimal transportation and propose a new measure of fit that treats the frequency distributions of energy holistically as opposed to frequency-wise. Building on the harmonic nature of sound, the new measure is invariant to shifts of energy to harmonically-related frequencies, as well as to small and local displacements of energy. Equipped with this new measure of fit, the dictionary of note templates can be considerably simplified to a set of Dirac vectors located at the target fundamental frequencies (musical pitch values). This in turns gives ground to a very fast and simple decomposition algorithm that achieves state-of-the-art performance on real musical data.

preprint2016arXiv

Optimal Transport for Domain Adaptation

Domain adaptation from one data space (or domain) to another is one of the most challenging tasks of modern data analytics. If the adaptation is done correctly, models built on a specific data space become more robust when confronted to data depicting the same semantic concepts (the classes), but observed by another observation system with its own specificities. Among the many strategies proposed to adapt a domain to another, finding a common representation has shown excellent properties: by finding a common representation for both domains, a single classifier can be effective in both and use labelled samples from the source domain to predict the unlabelled samples of the target domain. In this paper, we propose a regularized unsupervised optimal transportation model to perform the alignment of the representations in the source and target domains. We learn a transportation plan matching both PDFs, which constrains labelled samples in the source domain to remain close during transport. This way, we exploit at the same time the few labeled information in the source and the unlabelled distributions observed in both domains. Experiments in toy and challenging real visual adaptation examples show the interest of the method, that consistently outperforms state of the art approaches.

preprint2015arXiv

Distributed image reconstruction for very large arrays in radio astronomy

Current and future radio interferometric arrays such as LOFAR and SKA are characterized by a paradox. Their large number of receptors (up to millions) allow theoretically unprecedented high imaging resolution. In the same time, the ultra massive amounts of samples makes the data transfer and computational loads (correlation and calibration) order of magnitudes too high to allow any currently existing image reconstruction algorithm to achieve, or even approach, the theoretical resolution. We investigate here decentralized and distributed image reconstruction strategies which select, transfer and process only a fraction of the total data. The loss in MSE incurred by the proposed approach is evaluated theoretically and numerically on simple test cases.

preprint2015arXiv

Generalized conditional gradient: analysis of convergence and applications

The objectives of this technical report is to provide additional results on the generalized conditional gradient methods introduced by Bredies et al. [BLM05]. Indeed , when the objective function is smooth, we provide a novel certificate of optimality and we show that the algorithm has a linear convergence rate. Applications of this algorithm are also discussed.

preprint2015arXiv

Non-convex Regularizations for Feature Selection in Ranking With Sparse SVM

Feature selection in learning to rank has recently emerged as a crucial issue. Whereas several preprocessing approaches have been proposed, only a few works have been focused on integrating the feature selection into the learning process. In this work, we propose a general framework for feature selection in learning to rank using SVM with a sparse regularization term. We investigate both classical convex regularizations such as $\ell\_1$ or weighted $\ell\_1$ and non-convex regularization terms such as log penalty, Minimax Concave Penalty (MCP) or $\ell\_p$ pseudo norm with $p\textless{}1$. Two algorithms are proposed, first an accelerated proximal approach for solving the convex problems, second a reweighted $\ell\_1$ scheme to address the non-convex regularizations. We conduct intensive experiments on nine datasets from Letor 3.0 and Letor 4.0 corpora. Numerical results show that the use of non-convex regularizations we propose leads to more sparsity in the resulting models while prediction performance is preserved. The number of features is decreased by up to a factor of six compared to the $\ell\_1$ regularization. In addition, the software is publicly available on the web.

preprint2014arXiv

Mixed-norm Regularization for Brain Decoding

This work investigates the use of mixed-norm regularization for sensor selection in Event-Related Potential (ERP) based Brain-Computer Interfaces (BCI). The classification problem is cast as a discriminative optimization framework where sensor selection is induced through the use of mixed-norms. This framework is extended to the multi-task learning situation where several similar classification tasks related to different subjects are learned simultaneously. In this case, multi-task learning helps in leveraging data scarcity issue yielding to more robust classifiers. For this purpose, we have introduced a regularizer that induces both sensor selection and classifier similarities. The different regularization approaches are compared on three ERP datasets showing the interest of mixed-norm regularization in terms of sensor selection. The multi-task approaches are evaluated when a small number of learning examples are available yielding to significant performance improvements especially for subjects performing poorly.

preprint2014arXiv

Optimization of starshades: focal plane versus pupil plane

We search for the best possible transmission for an external occulter coronagraph that is dedicated to the direct observation of terrestrial exoplanets. We show that better observation conditions are obtained when the flux in the focal plane is minimized in the zone in which the exoplanet is observed, instead of the total flux received by the telescope. We describe the transmission of the occulter as a sum of basis functions. For each element of the basis, we numerically computed the Fresnel diffraction at the aperture of the telescope and the complex amplitude at its focus. The basis functions are circular disks that are linearly apodized over a few centimeters (truncated cones). We complemented the numerical calculation of the Fresnel diffraction for these functions by a comparison with pure circular discs (cylinder) for which an analytical expression, based on a decomposition in Lommel series, is available. The technique of deriving the optimal transmission for a given spectral bandwidth is a classical regularized quadratic minimization of intensities, but linear optimizations can be used as well. Minimizing the integrated intensity on the aperture of the telescope or for selected regions of the focal plane leads to slightly different transmissions for the occulter. For the focal plane optimization, the resulting residual intensity is concentrated behind the geometrical image of the occulter, in a blind region for the observation of an exoplanet, and the level of background residual starlight becomes very low outside this image. Finally, we provide a tolerance analysis for the alignment of the occulter to the telescope which also favors the focal plane optimization.This means that telescope offsets of a few decimeters do not strongly reduce the efficiency of the occulter.

preprint2011arXiv

Decoding finger movements from ECoG signals using switching linear models

One of the major challenges of ECoG-based Brain-Machine Interfaces is the movement prediction of a human subject. Several methods exist to predict an arm 2-D trajectory. The fourth BCI Competition gives a dataset in which the aim is to predict individual finger movements (5-D trajectory). The difficulty lies in the fact that there is no simple relation between ECoG signals and finger movement. We propose in this paper to decode finger flexions using switching models. This method permits to simplify the system as it is now described as an ensemble of linear models depending on an internal state. We show that an interesting accuracy prediction can be obtained by such a model.

preprint2011arXiv

Handling uncertainties in SVM classification

This paper addresses the pattern classification problem arising when available target data include some uncertainty information. Target data considered here is either qualitative (a class label) or quantitative (an estimation of the posterior probability). Our main contribution is a SVM inspired formulation of this problem allowing to take into account class label through a hinge loss as well as probability estimates using epsilon-insensitive cost function together with a minimum norm (maximum margin) objective. This formulation shows a dual form leading to a quadratic problem and allows the use of a representer theorem and associated kernel. The solution provided can be used for both decision and posterior probability estimation. Based on empirical evidence our method outperforms regular SVM in terms of probability predictions and classification performances.

preprint2011arXiv

Large margin filtering for signal sequence labeling

Signal Sequence Labeling consists in predicting a sequence of labels given an observed sequence of samples. A naive way is to filter the signal in order to reduce the noise and to apply a classification algorithm on the filtered samples. We propose in this paper to jointly learn the filter with the classifier leading to a large margin filtering for classification. This method allows to learn the optimal cutoff frequency and phase of the filter that may be different from zero. Two methods are proposed and tested on a toy dataset and on a real life BCI dataset from BCI Competition III.

preprint2010arXiv

Filtrage vaste marge pour l'étiquetage séquentiel à noyaux de signaux

We address in this paper the problem of multi-channel signal sequence labeling. In particular, we consider the problem where the signals are contaminated by noise or may present some dephasing with respect to their labels. For that, we propose to jointly learn a SVM sample classifier with a temporal filtering of the channels. This will lead to a large margin filtering that is adapted to the specificity of each channel (noise and time-lag). We derive algorithms to solve the optimization problem and we discuss different filter regularizations for automated scaling or selection of channels. Our approach is tested on a non-linear toy example and on a BCI dataset. Results show that the classification performance on these problems can be improved by learning a large margin filtering.

Rémi Flamary

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

Geometric Dictionary Learning of Dynamical Systems with Optimal Transport

MSAlign: Aligning Molecule and Mass Spectra Foundation Models for Metabolite Identification

Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters

Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport

Semi-relaxed Gromov-Wasserstein divergence with applications on graphs

Sliding window strategy for convolutional spike sorting with Lasso : Algorithm, theoretical guarantees and complexity

Template based Graph Neural Network with Optimal Transport Distances

Minibatch optimal transport distances; analysis and applications

Representation Transfer by Optimal Transport

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Importance sampling strategy for non-convex randomized block-coordinate descent

Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions

Optimal spectral transportation with application to music transcription

Optimal Transport for Domain Adaptation

Distributed image reconstruction for very large arrays in radio astronomy

Generalized conditional gradient: analysis of convergence and applications

Non-convex Regularizations for Feature Selection in Ranking With Sparse SVM

Mixed-norm Regularization for Brain Decoding

Optimization of starshades: focal plane versus pupil plane

Decoding finger movements from ECoG signals using switching linear models

Handling uncertainties in SVM classification

Large margin filtering for signal sequence labeling

Filtrage vaste marge pour l'étiquetage séquentiel à noyaux de signaux