Researcher profile

Rafael Izbicki

Rafael Izbicki contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Trustworthy scientific inference with generative models

Generative artificial intelligence (AI) excels at producing complex data structures (text, images, videos) by learning patterns from training examples. Across scientific disciplines, researchers are now applying generative models to "inverse problems" to directly predict hidden parameters from observed data along with measures of uncertainty. While these predictive or posterior-based methods can handle intractable likelihoods and large-scale studies, they can also produce biased or overconfident conclusions even without model misspecifications. We present a solution with Frequentist-Bayes (FreB), a mathematically rigorous protocol that reshapes AI-generated posterior probability distributions into (locally valid) confidence regions that consistently include true parameters with the expected probability, while achieving minimum size when training and target data align. We demonstrate FreB's effectiveness by tackling diverse case studies in the physical sciences: identifying unknown sources under dataset shift, reconciling competing theoretical models, and mitigating selection bias and systematics in observational studies. By providing validity guarantees with interpretable diagnostics, FreB enables trustworthy scientific inference across fields where direct likelihood evaluation remains impossible or prohibitively expensive.

preprint2022arXiv

A new LDA formulation with covariates

The Latent Dirichlet Allocation (LDA) model is a popular method for creating mixed-membership clusters. Despite having been originally developed for text analysis, LDA has been used for a wide range of other applications. We propose a new formulation for the LDA model which incorporates covariates. In this model, a negative binomial regression is embedded within LDA, enabling straight-forward interpretation of the regression coefficients and the analysis of the quantity of cluster-specific elements in each sampling units (instead of the analysis being focused on modeling the proportion of each cluster, as in Structural Topic Models). We use slice sampling within a Gibbs sampling algorithm to estimate model parameters. We rely on simulations to show how our algorithm is able to successfully retrieve the true parameter values and the ability to make predictions for the abundance matrix using the information given by the covariates. The model is illustrated using real data sets from three different areas: text-mining of Coronavirus articles, analysis of grocery shopping baskets, and ecology of tree species on Barro Colorado Island (Panama). This model allows the identification of mixed-membership clusters in discrete data and provides inference on the relationship between covariates and the abundance of these clusters.

preprint2022arXiv

Detecting Distributional Differences in Labeled Sequence Data with Application to Tropical Cyclone Satellite Imagery

Our goal is to quantify whether, and if so how, spatio-temporal patterns in tropical cyclone (TC) satellite imagery signal an upcoming rapid intensity change event. To address this question, we propose a new nonparametric test of association between a time series of images and a series of binary event labels. We ask whether there is a difference in distribution between (dependent but identically distributed) 24-h sequences of images preceding an event versus a non-event. By rewriting the statistical test as a regression problem, we leverage neural networks to infer modes of structural evolution of TC convection that are representative of the lead-up to rapid intensity change events. Dependencies between nearby sequences are handled by a bootstrap procedure that estimates the marginal distribution of the label series. We prove that type I error control is guaranteed as long as the distribution of the label series is well-estimated, which is made easier by the extensive historical data for binary TC event labels. We show empirical evidence that our proposed method identifies archetypes of infrared imagery associated with elevated rapid intensification risk, typically marked by deep or deepening core convection over time. Such results provide a foundation for improved forecasts of rapid intensification.

preprint2022arXiv

Logical coherence in Bayesian simultaneous three-way hypothesis tests

This paper studies whether Bayesian simultaneous three-way hypothesis tests can be logically coherent. Two types of results are obtained. First, under the standard error-wise constant loss, only for a limited set of models can a Bayes simultaneous test be logically coherent. Second, if more general loss functions are used, then it is possible to obtain Bayes simultaneous tests that are always logically coherent. An explicit example of such a loss function is provided.

preprint2022arXiv

Model interpretation using improved local regression with variable importance

A fundamental question on the use of ML models concerns the explanation of their predictions for increasing transparency in decision-making. Although several interpretability methods have emerged, some gaps regarding the reliability of their explanations have been identified. For instance, most methods are unstable (meaning that they give very different explanations with small changes in the data), and do not cope well with irrelevant features (that is, features not related to the label). This article introduces two new interpretability methods, namely VarImp and SupClus, that overcome these issues by using local regressions fits with a weighted distance that takes into account variable importance. Whereas VarImp generates explanations for each instance and can be applied to datasets with more complex relationships, SupClus interprets clusters of instances with similar explanations and can be applied to simpler datasets where clusters can be found. We compare our methods with state-of-the art approaches and show that it yields better explanations according to several metrics, particularly in high-dimensional problems with irrelevant features, as well as when the relationship between features and target is non-linear.

preprint2022arXiv

Re-calibrating Photometric Redshift Probability Distributions Using Feature-space Regression

Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift -- i.e., the fraction of times the true redshift falls between two limits $z_{1}$ and $z_{2}$ should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs. Though we focus on an example from astrophysics, our method can produce PDFs which are calibrated at all locations in feature space for any use case.

preprint2020arXiv

MeLIME: Meaningful Local Explanation for Machine Learning Models

Most state-of-the-art machine learning algorithms induce black-box models, preventing their application in many sensitive domains. Hence, many methodologies for explaining machine learning models have been proposed to address this problem. In this work, we introduce strategies to improve local explanations taking into account the distribution of the data used to train the black-box models. We show that our approach, MeLIME, produces more meaningful explanations compared to other techniques over different ML models, operating on various types of data. MeLIME generalizes the LIME method, allowing more flexible perturbation sampling and the use of different local interpretable models. Additionally, we introduce modifications to standard training algorithms of local interpretable models fostering more robust explanations, even allowing the production of counterfactual examples. To show the strengths of the proposed approach, we include experiments on tabular data, images, and text; all showing improved explanations. In particular, MeLIME generated more meaningful explanations on the MNIST dataset than methods such as GuidedBackprop, SmoothGrad, and Layer-wise Relevance Propagation. MeLIME is available on https://github.com/tiagobotari/melime.

preprint2020arXiv

Wisdom of the crowds forecasting the 2018 FIFA Men's World Cup

The FIFA Men's World Cup Tournament (WCT) is the most important football (soccer) competition, attracting worldwide attention. A popular practice among football fans in Brazil is to organize contests in which each participant informs guesses on the final score of each match. The participants are then ranked according to some scoring rule. Inspired by these contests, we created a website to hold an online contest, in which participants were asked for their probabilities on the outcomes of upcoming matches of the WCT. After each round of the tournament, the ranking of all users based on a proper scoring rule were published. This paper studies the performance of some methods intended to extract the wisdom of the crowds, which are aggregated forecasts that uses some or all of the forecasts available. The later methods are compared to simpler forecasting strategies as well as to statistical prediction models. Our findings corroborate the hypothesis that, at least for sporting events, the wisdom of the crowds offers a competitive forecasting strategy. Specifically, some of these strategies were able to achieve high scores in our contest.

preprint2019arXiv

Conditional Density Estimation Tools in Python and R with Applications to Photometric Redshifts and Likelihood-Free Cosmological Inference

It is well known in astronomy that propagating non-Gaussian prediction uncertainty in photometric redshift estimates is key to reducing bias in downstream cosmological analyses. Similarly, likelihood-free inference approaches, which are beginning to emerge as a tool for cosmological analysis, require a characterization of the full uncertainty landscape of the parameters of interest given observed data. However, most machine learning (ML) or training-based methods with open-source software target point prediction or classification, and hence fall short in quantifying uncertainty in complex regression and parameter inference settings. As an alternative to methods that focus on predicting the response (or parameters) $\mathbf{y}$ from features $\mathbf{x}$, we provide nonparametric conditional density estimation (CDE) tools for approximating and validating the entire probability density function (PDF) $\mathrm{p}(\mathbf{y}|\mathbf{x})$ of $\mathbf{y}$ given (i.e., conditional on) $\mathbf{x}$. As there is no one-size-fits-all CDE method, the goal of this work is to provide a comprehensive range of statistical tools and open-source software for nonparametric CDE and method assessment which can accommodate different types of settings and be easily fit to the problem at hand. Specifically, we introduce four CDE software packages in $\texttt{Python}$ and $\texttt{R}$ based on ML prediction methods adapted and optimized for CDE: $\texttt{NNKCDE}$, $\texttt{RFCDE}$, $\texttt{FlexCode}$, and $\texttt{DeepCDE}$. Furthermore, we present the $\texttt{cdetools}$ package, which includes functions for computing a CDE loss function for tuning and assessing the quality of individual PDFs, along with diagnostic functions. We provide sample code in $\texttt{Python}$ and $\texttt{R}$ as well as examples of applications to photometric redshift estimation and likelihood-free cosmological inference via CDE.

preprint2019arXiv

The NN-Stacking: Feature weighted linear stacking through neural networks

Stacking methods improve the prediction performance of regression models. A simple way to stack base regressions estimators is by combining them linearly, as done by \citet{breiman1996stacked}. Even though this approach is useful from an interpretative perspective, it often does not lead to high predictive power. We propose the NN-Stacking method (NNS), which generalizes Breiman's method by allowing the linear parameters to vary with input features. This improvement enables NNS to take advantage of the fact that distinct base models often perform better at different regions of the feature space. Our method uses neural networks to estimate the stacking coefficients. We show that while our approach keeps the interpretative features of Breiman's method at a local level, it leads to better predictive power, especially in datasets with large sample sizes.

preprint2019arXiv

Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations

Complex phenomena in engineering and the sciences are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to estimate an approximate likelihood or fit a fast emulator model for efficient statistical inference; such surrogate models include Gaussian synthetic likelihoods and more recently neural density estimators such as autoregressive models and normalizing flows. To date, however, there is no consistent way of quantifying the quality of such a fit. Here we propose a statistical framework that can distinguish any arbitrary misspecified model from the target likelihood, and that in addition can identify with statistical confidence the regions of parameter as well as feature space where the fit is inadequate. Our validation method applies to settings where simulations are extremely costly and generated in batches or "ensembles" at fixed locations in parameter space. At the heart of our approach is a two-sample test that quantifies the quality of the fit at fixed parameter values, and a global test that assesses goodness-of-fit across simulation parameters. While our general framework can incorporate any test statistic or distance metric, we specifically argue for a new two-sample test that can leverage any regression method to attain high power and provide diagnostics in complex data settings.

preprint2017arXiv

Comparing probabilistic predictive models applied to football

We propose two Bayesian multinomial-Dirichlet models to predict the final outcome of football (soccer) matches and compare them to three well-known models regarding their predictive power. All the models predicted the full-time results of 1710 matches of the first division of the Brazilian football championship and the comparison used three proper scoring rules, the proportion of errors and a calibration assessment. We also provide a goodness of fit measure. Our results show that multinomial-Dirichlet models are not only competitive with standard approaches, but they are also well calibrated and present reasonable goodness of fit.