Source author record

Pierre Geurts

Pierre Geurts appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision eess.IV Molecular Networks Networking and Internet Architecture Neurons and Cognition

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Optimizing model-agnostic Random Subspace ensembles

This paper presents a model-agnostic ensemble approach for supervised learning. The proposed approach is based on a parametric version of Random Subspace, in which each base model is learned from a feature subset sampled according to a Bernoulli distribution. Parameter optimization is performed using gradient descent and is rendered tractable by using an importance sampling approach that circumvents frequent re-training of the base models after each gradient descent step. The degree of randomization in our parametric Random Subspace is thus automatically tuned through the optimization of the feature selection probabilities. This is an advantage over the standard Random Subspace approach, where the degree of randomization is controlled by a hyper-parameter. Furthermore, the optimized feature selection probabilities can be interpreted as feature importance scores. Our algorithm can also easily incorporate any differentiable regularization term to impose constraints on these importance scores.

preprint2022arXiv

Distillation from heterogeneous unlabeled collections

Compressing deep networks is essential to expand their range of applications to constrained settings. The need for compression however often arises long after the model was trained, when the original data might no longer be available. On the other hand, unlabeled data, not necessarily related to the target task, is usually plentiful, especially in image classification tasks. In this work, we propose a scheme to leverage such samples to distill the knowledge learned by a large teacher network to a smaller student. The proposed technique relies on (i) preferentially sampling datapoints that appear related, and (ii) taking better advantage of the learning signal. We show that the former speeds up the student's convergence, while the latter boosts its performance, achieving performances closed to what can be expected with the original data.

preprint2022arXiv

Evaluating Local Explanations using White-box Models

Evaluating explanation techniques using human subjects is costly, time-consuming and can lead to subjectivity in the assessments. To evaluate the accuracy of local explanations, we require access to the true feature importance scores for a given instance. However, the prediction function of a model usually does not decompose into linear additive terms that indicate how much a feature contributes to the output. In this work, we suggest to instead focus on the log odds ratio (LOR) of the prediction function, which naturally decomposes into additive terms for logistic regression and naive Bayes. We demonstrate how we can benchmark different explanation techniques in terms of their similarity to the LOR scores based on our proposed approach. In the experiments, we compare prominent local explanation techniques and find that the performance of the techniques can depend on the underlying model, the dataset, which data point is explained, the normalization of the data and the similarity metric.

preprint2020arXiv

Multi-task pre-training of deep neural networks for digital pathology

In this work, we investigate multi-task learning as a way of pre-training models for classification tasks in digital pathology. It is motivated by the fact that many small and medium-size datasets have been released by the community over the years whereas there is no large scale dataset similar to ImageNet in the domain. We first assemble and transform many digital pathology datasets into a pool of 22 classification tasks and almost 900k images. Then, we propose a simple architecture and training scheme for creating a transferable model and a robust evaluation and selection protocol in order to evaluate our method. Depending on the target task, we show that our models used as feature extractors either improve significantly over ImageNet pre-trained models or provide comparable performance. Fine-tuning improves performance over feature extraction and is able to recover the lack of specificity of ImageNet features, as both pre-training sources yield comparable performance.

preprint2016arXiv

Context-dependent feature analysis with random forests

In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.

preprint2014arXiv

Bridging physiological and evolutionary time scales in a gene regulatory network

Gene regulatory networks (GRN) govern phenotypic adaptations and reflect the trade-offs between physiological responses and evolutionary adaptation that act at different time scales. To identify patterns of molecular function and genetic diversity in GRNs, we studied the drought response of the common sunflower, Helianthus annuus, and how the underlying GRN is related to its evolution. We examined the responses of 32,423 expressed sequences to drought and to abscisic acid and selected 145 co-expressed transcripts. We characterized their regulatory relationships in nine kinetic studies based on different hormones. From this, we inferred a GRN by meta-analyses of a Gaussian graphical model and a random forest algorithm and studied the genetic differentiation among populations (FST) at nodes. We identified two main hubs in the network that transport nitrate in guard cells. This suggests that nitrate transport is a critical aspect of sunflower physiological response to drought. We observed that differentiation of the network genes in elite sunflower cultivars is correlated with their position and connectivity. This systems biology approach combined molecular data at different time scales and identified important physiological processes. At the evolutionary level, we propose that network topology could influence responses to human selection and possibly adaptation to dry environments.

preprint2014arXiv

Cerebral functional connectivity periodically (de)synchronizes with anatomical constraints

This paper studies the link between resting-state functional connectivity (FC), measured by the correlations of the fMRI BOLD time courses, and structural connectivity (SC), estimated through fiber tractography. Instead of a static analysis based on the correlation between SC and the FC averaged over the entire fMRI time series, we propose a dynamic analysis, based on the time evolution of the correlation between SC and a suitably windowed FC. Assessing the statistical significance of the time series against random phase permutations, our data show a pronounced peak of significance for time window widths around 20-30 TR (40-60 sec). Using the appropriate window width, we show that FC patterns oscillate between phases of high modularity, primarily shaped by anatomy, and phases of low modularity, primarily shaped by inter-network connectivity. Building upon recent results in dynamic FC, this emphasizes the potential role of SC as a transitory architecture between different highly connected resting state FC patterns. Finally, we show that networks implied in consciousness-related processes, such as the default mode network (DMN), contribute more to these brain-level fluctuations compared to other networks, such as the motor or somatosensory networks. This suggests that the fluctuations between FC and SC are capturing mind-wandering effects.

preprint2014arXiv

Classifying pairs with trees for supervised biological network inference

Networks are ubiquitous in biology and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate, theoretically and empirically, the exploitation of tree-based ensemble methods in the context of these two approaches for biological network inference. We first formalize the problem of network inference as classification of pairs, unifying in the process homogeneous and bipartite graphs and discussing two main sampling schemes. We then present the global and the local approaches, extending the later for the prediction of interactions between two unseen network nodes, and discuss their specializations to tree-based ensemble methods, highlighting their interpretability and drawing links with clustering techniques. Extensive computational experiments are carried out with these methods on various biological networks that clearly highlight that these methods are competitive with existing methods.

preprint2012arXiv

DMFSGD: A Decentralized Matrix Factorization Algorithm for Network Distance Prediction

The knowledge of end-to-end network distances is essential to many Internet applications. As active probing of all pairwise distances is infeasible in large-scale networks, a natural idea is to measure a few pairs and to predict the other ones without actually measuring them. This paper formulates the distance prediction problem as matrix completion where unknown entries of an incomplete matrix of pairwise distances are to be predicted. The problem is solvable because strong correlations among network distances exist and cause the constructed distance matrix to be low rank. The new formulation circumvents the well-known drawbacks of existing approaches based on Euclidean embedding. A new algorithm, so-called Decentralized Matrix Factorization by Stochastic Gradient Descent (DMFSGD), is proposed to solve the network distance prediction problem. By letting network nodes exchange messages with each other, the algorithm is fully decentralized and only requires each node to collect and to process local measurements, with neither explicit matrix constructions nor special nodes such as landmarks and central servers. In addition, we compared comprehensively matrix factorization and Euclidean embedding to demonstrate the suitability of the former on network distance prediction. We further studied the incorporation of a robust loss function and of non-negativity constraints. Extensive experiments on various publicly-available datasets of network delays show not only the scalability and the accuracy of our approach but also its usability in real Internet applications.

Pierre Geurts

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Optimizing model-agnostic Random Subspace ensembles

Distillation from heterogeneous unlabeled collections

Evaluating Local Explanations using White-box Models

Multi-task pre-training of deep neural networks for digital pathology

Context-dependent feature analysis with random forests

Bridging physiological and evolutionary time scales in a gene regulatory network

Cerebral functional connectivity periodically (de)synchronizes with anatomical constraints

Classifying pairs with trees for supervised biological network inference

DMFSGD: A Decentralized Matrix Factorization Algorithm for Network Distance Prediction