Source author record

Giorgio Corani

Giorgio Corani appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Methodology math.ST physics.data-an Populations and Evolution Quantitative Methods Statistics Theory

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Learning Bounded Treewidth Bayesian Networks with Thousands of Variables

We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. We propose a novel algorithm for this task, able to scale to large domains and large treewidths. Our novel approach consistently outperforms the state of the art on data sets with up to ten thousand variables.

preprint2016arXiv

Statistical comparison of classifiers through Bayesian hierarchical modelling

Usually one compares the accuracy of two competing classifiers via null hypothesis significance tests (nhst). Yet the nhst tests suffer from important shortcomings, which can be overcome by switching to Bayesian hypothesis testing. We propose a Bayesian hierarchical model which jointly analyzes the cross-validation results obtained by two classifiers on multiple data sets. It returns the posterior probability of the accuracies of the two classifiers being practically equivalent or significantly different. A further strength of the hierarchical model is that, by jointly analyzing the results obtained on all data sets, it reduces the estimation error compared to the usual approach of averaging the cross-validation results obtained on a given data set.

preprint2015arXiv

Should we really use post-hoc tests based on mean-ranks?

The statistical comparison of multiple algorithms over multiple data sets is fundamental in machine learning. This is typically carried out by the Friedman test. When the Friedman test rejects the null hypothesis, multiple comparisons are carried out to establish which are the significant differences among algorithms. The multiple comparisons are usually performed using the mean-ranks test. The aim of this technical note is to discuss the inconsistencies of the mean-ranks post-hoc test with the goal of discouraging its use in machine learning as well as in medicine, psychology, etc.. We show that the outcome of the mean-ranks test depends on the pool of algorithms originally included in the experiment. In other words, the outcome of the comparison between algorithms A and B depends also on the performance of the other algorithms included in the original experiment. This can lead to paradoxical situations. For instance the difference between A and B could be declared significant if the pool comprises algorithms C, D, E and not significant if the pool comprises algorithms F, G, H. To overcome these issues, we suggest instead to perform the multiple comparison using a test whose outcome only depends on the two algorithms being compared, such as the sign-test or the Wilcoxon signed-rank test.

preprint2014arXiv

Credal Model Averaging for classification: representing prior ignorance and expert opinions

Bayesian model averaging (BMA) is the state of the art approach for overcoming model uncertainty. Yet, especially on small data sets, the results yielded by BMA might be sensitive to the prior over the models. Credal Model Averaging (CMA) addresses this problem by substituting the single prior over the models by a set of priors (credal set). Such approach solves the problem of how to choose the prior over the models and automates sensitivity analysis. We discuss various CMA algorithms for building an ensemble of logistic regressors characterized by different sets of covariates. We show how CMA can be appropriately tuned to the case in which one is prior-ignorant and to the case in which instead domain knowledge is available. CMA detects prior-dependent instances, namely instances in which a different class is more probable depending on the prior over the models. On such instances CMA suspends the judgment, returning multiple classes. We thoroughly compare different BMA and CMA variants on a real case study, predicting presence of Alpine marmot burrows in an Alpine valley. We find that BMA is almost a random guesser on the instances recognized as prior-dependent by CMA.

preprint2012arXiv

Credal Classification based on AODE and compression coefficients

Bayesian model averaging (BMA) is an approach to average over alternative models; yet, it usually gets excessively concentrated around the single most probable model, therefore achieving only sub-optimal classification performance. The compression-based approach (Boulle, 2007) overcomes this problem, averaging over the different models by applying a logarithmic smoothing over the models' posterior probabilities. This approach has shown excellent performances when applied to ensembles of naive Bayes classifiers. AODE is another ensemble of models with high performance (Webb, 2005), based on a collection of non-naive classifiers (called SPODE) whose probabilistic predictions are aggregated by simple arithmetic mean. Aggregating the SPODEs via BMA rather than by arithmetic mean deteriorates the performance; instead, we aggregate the SPODEs via the compression coefficients and we show that the resulting classifier obtains a slight but consistent improvement over AODE. However, an important issue in any Bayesian ensemble of models is the arbitrariness in the choice of the prior over the models. We address this problem by the paradigm of credal classification, namely by substituting the unique prior with a set of priors. Credal classifier automatically recognize the prior-dependent instances, namely the instances whose most probable class varies, when different priors are considered; in these cases, credal classifiers remain reliable by returning a set of classes rather than a single class. We thus develop the credal version of both the BMA-based and the compression-based ensemble of SPODEs, substituting the single prior over the models by a set of priors. Experiments show that both credal classifiers provide higher classification reliability than their determinate counterparts; moreover the compression-based credal classifier compares favorably to previous credal classifiers.

preprint2011arXiv

Improving parameter learning of Bayesian nets from incomplete data

This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) log-likelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters.