Source author record

Mathilde Mougeot

Mathilde Mougeot appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

8works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Discrepancy-Based Active Learning for Domain Adaptation

The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of Lipschitz functions. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labeling on the source domain. We derive generalization error bounds for such active learning strategies in terms of Rademacher average and localized discrepancy for general loss functions which satisfy a regularity condition. A practical K-medoids algorithm that can address the case of large data set is inferred from the theoretical bounds. Our numerical experiments show that the proposed algorithm is competitive against other state-of-the-art active learning techniques in the context of domain adaptation, in particular on large data sets of around one hundred thousand images.

preprint2022arXiv

Fast and Accurate Importance Weighting for Correcting Sample Bias

Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased distribution. The seminal Kernel Mean Matching (KMM) method is, nowadays, still considered as state of the art in this research field. However, one of the main drawbacks of this method is the computational burden for large datasets. Building on previous works by Huang et al. (2007) and de Mathelin et al. (2021), we derive a novel importance weighting algorithm which scales to large datasets by using a neural network to predict the instance weights. We show, on multiple public datasets, under various sample biases, that our proposed approach drastically reduces the computational time on large dataset while maintaining similar sample bias correction performance compared to other importance weighting methods. The proposed approach appears to be the only one able to give relevant reweighting in a reasonable time for large dataset with up to two million data.

preprint2022arXiv

Physics-informed neural networks for non-Newtonian fluid thermo-mechanical problems: an application to rubber calendering process

Physics-Informed Neural Networks (PINNs) have gained much attention in various fields of engineering thanks to their capability of incorporating physical laws into the models. However, the assessment of PINNs in industrial applications involving coupling between mechanical and thermal fields is still an active research topic. In this work, we present an application of PINNs to a non-Newtonian fluid thermo-mechanical problem which is often considered in the rubber calendering process. We demonstrate the effectiveness of PINNs when dealing with inverse and ill-posed problems, which are impractical to be solved by classical numerical discretization methods. We study the impact of the placement of the sensors and the distribution of unsupervised points on the performance of PINNs in a problem of inferring hidden physical fields from some partial data. We also investigate the capability of PINNs to identify unknown physical parameters from the measurements captured by sensors. The effect of noisy measurements is also considered throughout this work. The results of this paper demonstrate that in the problem of identification, PINNs can successfully estimate the unknown parameters using only the measurements on the sensors. In ill-posed problems where boundary conditions are not completely defined, even though the placement of the sensors and the distribution of unsupervised points have a great impact on PINNs performance, we show that the algorithm is able to infer the hidden physics from local measurements.

preprint2020arXiv

Model family selection for classification using Neural Decision Trees

Model selection consists in comparing several candidate models according to a metric to be optimized. The process often involves a grid search, or such, and cross-validation, which can be time consuming, as well as not providing much information about the dataset itself. In this paper we propose a method to reduce the scope of exploration needed for the task. The idea is to quantify how much it would be necessary to depart from trained instances of a given family, reference models (RMs) carrying `rigid' decision boundaries (e.g. decision trees), so as to obtain an equivalent or better model. In our approach, this is realized by progressively relaxing the decision boundaries of the initial decision trees (the RMs) as long as this is beneficial in terms of performance measured on an analyzed dataset. More specifically, this relaxation is performed by making use of a neural decision tree, which is a neural network built from DTs. The final model produced by our method carries non-linear decision boundaries. Measuring the performance of the final model, and its agreement to its seeding RM can help the user to figure out on which family of models he should focus on.

preprint2013arXiv

Sloshing in the LNG shipping industry: risk modelling through multivariate heavy-tail analysis

In the liquefied natural gas (LNG) shipping industry, the phenomenon of sloshing can lead to the occurrence of very high pressures in the tanks of the vessel. The issue of modelling or estimating the probability of the simultaneous occurrence of such extremal pressures is now crucial from the risk assessment point of view. In this paper, heavy-tail modelling, widely used as a conservative approach to risk assessment and corresponding to a worst-case risk analysis, is applied to the study of sloshing. Multivariate heavy-tailed distributions are considered, with Sloshing pressures investigated by means of small-scale replica tanks instrumented with d >1 sensors. When attempting to fit such nonparametric statistical models, one naturally faces computational issues inherent in the phenomenon of dimensionality. The primary purpose of this article is to overcome this barrier by introducing a novel methodology. For d-dimensional heavy-tailed distributions, the structure of extremal dependence is entirely characterised by the angular measure, a positive measure on the intersection of a sphere with the positive orthant in Rd. As d increases, the mutual extremal dependence between variables becomes difficult to assess. Based on a spectral clustering approach, we show here how a low dimensional approximation to the angular measure may be found. The nonparametric method proposed for model sloshing has been successfully applied to pressure data. The parsimonious representation thus obtained proves to be very convenient for the simulation of multivariate heavy-tailed distributions, allowing for the implementation of Monte-Carlo simulation schemes in estimating the probability of failure. Besides confirming its performance on artificial data, the methodology has been implemented on a real data set specifically collected for risk assessment of sloshing in the LNG shipping industry.

preprint2012arXiv

Grouping Strategies and Thresholding for High Dimensional Linear Models

The estimation problem in a high regression model with structured sparsity is investigated. An algorithm using a two steps block thresholding procedure called GR-LOL is provided. Convergence rates are produced: they depend on simple coherence-type indices of the Gram matrix -easily checkable on the data- as well as sparsity assumptions of the model parameters measured by a combination of $l_1$ within-blocks with $l_q,q<1$ between-blocks norms. The simplicity of the coherence indicator suggests ways to optimize the rates of convergence when the group structure is not naturally given by the problem and is unknown. In such a case, an auto-driven procedure is provided to determine the regressors groups (number and contents). An intensive practical study compares our grouping methods with the standard LOL algorithm. We prove that the grouping rarely deteriorates the results but can improve them very significantly. GR-LOL is also compared with group-Lasso procedures and exhibits a very encouraging behavior. The results are quite impressive, especially when GR-LOL algorithm is combined with a grouping pre-processing.

preprint2011arXiv

A new selection method for high-dimensionial instrumental setting: application to the Growth Rate convergence hypothesis

This paper investigates the problem of selecting variables in regression-type models for an "instrumental" setting. Our study is motivated by empirically verifying the conditional convergence hypothesis used in the economical literature concerning the growth rate. To avoid unnecessary discussion about the choice and the pertinence of instrumental variables, we embed the model in a very high dimensional setting. We propose a selection procedure with no optimization step called LOLA, for Learning Out of Leaders with Adaptation. LOLA is an auto-driven algorithm with two thresholding steps. The consistency of the procedure is proved under sparsity conditions and simulations are conducted to illustrate the practical good performances of LOLA. The behavior of the algorithm is studied when instrumental variables are artificially added without a priori significant connection to the model. Using our algorithm, we provide a solution for modeling the link between the growth rate and the initial level of the gross domestic product and empirically prove the convergence hypothesis.

preprint2011arXiv

Learning Out of Leaders

This paper investigates the estimation problem in a regression-type model. To be able to deal with potential high dimensions, we provide a procedure called LOL, for Learning Out of Leaders with no optimization step. LOL is an auto-driven algorithm with two thresholding steps. A first adaptive thresholding helps to select leaders among the initial regressors in order to obtain a first reduction of dimensionality. Then a second thresholding is performed on the linear regression upon the leaders. The consistency of the procedure is investigated. Exponential bounds are obtained, leading to minimax and adaptive results for a wide class of sparse parameters, with (quasi) no restriction on the number p of possible regressors. An extensive computational experiment is conducted to emphasize the practical good performances of LOL.

Mathilde Mougeot

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Discrepancy-Based Active Learning for Domain Adaptation

Fast and Accurate Importance Weighting for Correcting Sample Bias

Physics-informed neural networks for non-Newtonian fluid thermo-mechanical problems: an application to rubber calendering process

Model family selection for classification using Neural Decision Trees

Sloshing in the LNG shipping industry: risk modelling through multivariate heavy-tail analysis

Grouping Strategies and Thresholding for High Dimensional Linear Models

A new selection method for high-dimensionial instrumental setting: application to the Growth Rate convergence hypothesis

Learning Out of Leaders