Researcher profile

Emmanuel Bacry

Emmanuel Bacry contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

From rough to multifractal multidimensional volatility: A multidimensional Log S-fBM model

We introduce the multivariate Log S-fBM model (mLog S-fBM), extending the univariate framework proposed by Wu \textit{et al.} to the multidimensional setting. We define the multidimensional Stationary fractional Brownian motion (mS-fBM), characterized by marginals following S-fBM dynamics and a specific cross-covariance structure. It is parametrized by a correlation scale $T$, marginal-specific intermittency parameters and Hurst exponents, as well as their multidimensional counterparts: the co-intermittency matrix and the co-Hurst matrix. The mLog S-fBM is constructed by modeling volatility components as exponentials of the mS-fBM, preserving the dependence structure of the Gaussian core. We demonstrate that the model is well-defined for any co-Hurst matrix with entries in $[0, \frac{1}{2}[$, supporting vanishing co-Hurst parameters to bridge rough volatility and multifractal regimes. We generalize the small intermittency approximation technique to the multivariate setting to develop an efficient Generalized Method of Moments calibration procedure, estimating cross-covariance parameters for pairs of marginals. We validate it on synthetic data and apply it to S\&P 500 market data, modeling stock return fluctuations. Diagonal estimates of the stock Hurst matrix, corresponding to single-stock log-volatility Hurst exponents, are close to 0, indicating multifractal behavior, while co-Hurst off-diagonal entries are close to the Hurst exponent of the S\&P 500 index ($H \approx 0.12$), and co-intermittency off-diagonal entries align with univariate intermittency estimates.

preprint2022arXiv

From Rough to Multifractal volatility: the log S-fBM model

We introduce a family of random measures $M_{H,T} (d t)$, namely log S-fBM, such that, for $H>0$, $M_{H,T}(d t) = e^{ω_{H,T}(t)} d t$ where $ω_{H,T}(t)$ is a Gaussian process that can be considered as a stationary version of an $H$-fractional Brownian motion. Moreover, when $H \to 0$, one has $M_{H,T}(d t) \rightarrow {\widetilde M}_{T}(d t)$ (in the weak sense) where ${\widetilde M}_{T}(d t)$ is the celebrated log-normal multifractal random measure (MRM). Thus, this model allows us to consider, within the same framework, the two popular classes of multifractal ($H = 0$) and rough volatility ($0<H < 1/2$) models. The main properties of the log S-fBM are discussed and their estimation issues are addressed. We notably show that the direct estimation of $H$ from the scaling properties of $\ln(M_{H,T}([t, t+τ]))$, at fixed $τ$, can lead to strongly over-estimating the value of $H$. We propose a better GMM estimation method which is shown to be valid in the high-frequency asymptotic regime. When applied to a large set of empirical volatility data, we observe that stock indices have values around $H=0.1$ while individual stocks are characterized by values of $H$ that can be very close to $0$ and thus well described by a MRM. We also bring evidence that unlike the log-volatility variance $ν^2$ whose estimation appears to be poorly reliable (though used widely in the rough volatility literature), the estimation of the so-called &#34;intermittency coefficient&#34; $λ^2$, which is the product of $ν^2$ and the Hurst exponent $H$, appears to be far more reliable leading to values that seem to be universal for respectively all individual stocks and all stock indices.

preprint2020arXiv

SCALPEL3: a scalable open-source library for healthcare claims databases

This article introduces SCALPEL3, a scalable open-source framework for studies involving Large Observational Databases (LODs). Its design eases medical observational studies thanks to abstractions allowing concept extraction, high-level cohort manipulation, and production of data formats compatible with machine learning libraries. SCALPEL3 has successfully been used on the SNDS database (see Tuppin et al. (2017)), a huge healthcare claims database that handles the reimbursement of almost all French citizens. SCALPEL3 focuses on scalability, easy interactive analysis and helpers for data flow analysis to accelerate studies performed on LODs. It consists of three open-source libraries based on Apache Spark. SCALPEL-Flattening allows denormalization of the LOD (only SNDS for now) by joining tables sequentially in a big table. SCALPEL-Extraction provides fast concept extraction from a big table such as the one produced by SCALPEL-Flattening. Finally, SCALPEL-Analysis allows interactive cohort manipulations, monitoring statistics of cohort flows and building datasets to be used with machine learning libraries. The first two provide a Scala API while the last one provides a Python API that can be used in an interactive environment. Our code is available on GitHub. SCALPEL3 allowed to extract successfully complex concepts for studies such as Morel et al (2017) or studies with 14.5 million patients observed over three years (corresponding to more than 15 billion healthcare events and roughly 15 TeraBytes of data) in less than 49 minutes on a small 15 nodes HDFS cluster. SCALPEL3 provides a sharp interactive control of data processing through legible code, which helps to build studies with full reproducibility, leading to improved maintainability and audit of studies performed on LODs.

preprint2020arXiv

Sparse and low-rank multivariate Hawkes processes

We consider the problem of unveiling the implicit network structure of node interactions (such as user interactions in a social network), based only on high-frequency timestamps. Our inference is based on the minimization of the least-squares loss associated with a multivariate Hawkes model, penalized by $\ell_1$ and trace norm of the interaction tensor. We provide a first theoretical analysis for this problem, that includes sparsity and low-rank inducing penalizations. This result involves a new data-driven concentration inequality for matrix martingales in continuous time with observable variance, which is a result of independent interest and a broad range of possible applications since it extends to matrix martingales former results restricted to the scalar case. A consequence of our analysis is the construction of sharply tuned $\ell_1$ and trace-norm penalizations, that leads to a data-driven scaling of the variability of information available for each users. Numerical experiments illustrate the significant improvements achieved by the use of such data-driven penalizations.

preprint2020arXiv

ZiMM: a deep learning model for long term and blurry relapses with non-clinical claims data

This paper considers the problems of modeling and predicting a long-term and ``blurry&#39;&#39; relapse that occurs after a medical act, such as a surgery. The relapse is observed only indirectly, in a ``blurry&#39;&#39; fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a ``non-clinical&#39;&#39; claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Our motivation for using such a non-clinical claims database is its exhaustivity population-wise, compared to clinical electronic health records coming from a single or a small set of hospitals. Indeed, we consider a dataset containing the claims of almost \emph{all French citizens} who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work.