Source author record

Francesca Ieva

Francesca Ieva appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Methodology Computation Computation and Language eess.IV Machine Learning Quantitative Methods

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A neighbour selection approach for identifying differential networks in conditional functional graphical models

Estimation of brain functional connectivity from EEG data is of great importance both for medical research and diagnosis. It involves quantifying the conditional dependencies among the activity of different brain areas from the time-varying electric field recorded by sensors placed outside the scalp. These dependencies may vary within and across individuals and be influenced by covariates such as age, mental status, or disease severity. Motivated by this problem, we propose a novel neighbour selection approach based on functional-on-functional regression for the characterization of conditional Gaussian functional graphical models. We provide a fully automated, data-driven procedure for inferring conditional dependence structures among observed functional variables. In particular, pairwise interactions are directly identified and allowed to vary as a function of covariates, enabling covariate-specific modulation of connectivity patterns. Our proposed method accommodates an arbitrary number of continuous and discrete covariates. Moreover, unlike existing methods for direct estimation of differential graphical models, the proposed approach yields directly interpretable coefficients, allowing discrimination between covariate-induced increases and decreases in interaction strength. The methodology is evaluated through extensive simulation studies and an application to experimental EEG data. The results demonstrate clear advantages over existing approaches, including higher estimation accuracy and substantially reduced computational cost, especially in high-dimensional settings.

preprint2026arXiv

Penalized Likelihood Optimization for Adaptive Neighborhood Clustering in Time-to-Event Data with Group-Level Heterogeneity

The identification of patient subgroups with comparable event-risk dynamics plays a key role in supporting informed decision-making in clinical research. In such settings, it is important to account for the inherent dependence that arises when individuals are nested within higher-level units, such as hospitals. Existing survival models account for group-level heterogeneity through frailty terms but do not uncover latent patient subgroups, while most clustering methods ignore hierarchical structure and are not estimated jointly with survival outcomes. In this work, we introduce a new framework that simultaneously performs patient clustering and shared-frailty survival modeling through a penalized likelihood approach. The proposed methodology adaptively learns a patient-to-patient similarity matrix via a modified version of spectral clustering, enabling cluster formation directly from estimated risk profiles while accounting for group membership. A simulation study highlights the proposed model's ability to recover latent clusters and to correctly estimate hazard parameters. We apply our method to a large cohort of heart-failure patients hospitalized with COVID-19 between 2020 and 2021 in the Lombardy region (Italy), identifying clinically meaningful subgroups characterized by distinct risk profiles and highlighting the role of respiratory comorbidities and hospital-level variability in shaping mortality outcomes. This framework provides a flexible and interpretable tool for risk-based patient stratification in hierarchical data settings.

preprint2025arXiv

Automatic identification of diagnosis from hospital discharge letters via weakly-supervised Natural Language Processing

Identifying patient diagnoses from discharge letters is essential to enable large-scale cohort selection and epidemiological research, but traditional supervised approaches rely on extensive manual annotation, which is often impractical for large textual datasets. In this study, we present a novel weakly-supervised Natural Language Processing pipeline designed to classify Italian discharge letters without requiring manual labelling. After extracting diagnosis-related sentences, the method leverages a transformer-based model with an additional pre-training on Italian medical documents to generate semantic embeddings. A two-level clustering procedure is applied to these embeddings, and the resulting clusters are mapped to the diseases of interest to derive weak labels for a subset of data, eventually used to train a transformer-based classifier. We evaluate the approach on a real-world case study on bronchiolitis in a corpus of 33,176 Italian discharge letters of children admitted to 44 emergency rooms or hospitals in the Veneto Region between 2017 and 2020. The pipeline achieves an area under the curve (AUC) of 77.68% ($\pm 4.30\%)$ and an F1-score of 78.14% ($\pm 4.89\%$) against manual annotations. Its performance surpasses other unsupervised methods and approaches fully supervised models, maintaining robustness to cluster selection and promising generalizability across different disease types. It allows saving approximately 3 minutes of expert time per discharge letter, resulting in more than 1,500 hours for a dataset like ours. This study demonstrates the feasibility of a weakly-supervised strategy for identifying diagnoses from Italian discharge letters. The pipeline achieves strong performance, is adaptable to various diseases, and offers a scalable solution for clinical text classification, reducing the need for manual annotation while maintaining reliable accuracy.

preprint2023arXiv

Imaging-based representation and stratification of intra-tumor Heterogeneity via tree-edit distance

Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the promise to surrogate histopathological assessment of tumor lesions. In this work, we propose a tree-based representation strategy for describing intra-tumor heterogeneity of patients affected by metastatic cancer. We leverage radiomics information extracted from PET/CT imaging and we provide an exhaustive and easily readable summary of the disease spreading. We exploit this novel patient representation to perform cancer subtyping according to hierarchical clustering technique. To this purpose, a new heterogeneity-based distance between trees is defined and applied to a case study of prostate cancer. Clusters interpretation is explored in terms of concordance with severity status, tumor burden and biological characteristics. Results are promising, as the proposed method outperforms current literature approaches. Ultimately, the proposed method draws a general analysis framework that would allow to extract knowledge from daily acquired imaging data of patients and provide insights for effective treatment planning.

preprint2022arXiv

Mining and evaluation of patients' diagnostic therapeutic paths through state sequences analysis

The concept of care pathways is increasingly being used to enhance the quality of care and to optimize the use of resources for health care. Nevertheless, recommendations regarding the sequence of care are mostly based on consensus-based decisions as there is a lack of evidence on effective treatment sequences. In a real-world setting, classical statistical tools resulted to be insufficient to adequately consider a phenomenon with such high variability and has to be integrated with novel data mining techniques suitable of identifying patterns in complex data structures. Data-driven techniques can potentially support the empirical identification of effective care sequences by extracting them from data collected routinely. The purpose of this study is to perform sequence analysis to identify different patterns of treatment and to assess the most efficient in preventing adverse events. The clinical application that motivated the study of this method concerns the several problems frequently encountered in the quality of care provided in the mental health field. In particular, we analyzed administrative data provided by Regione Lombardia related to all the beneficiaries of the National Health Service with a diagnosis of schizophrenia from 2015 to 2018 resident in Lombardy, a region of northern Italy. This methodology considers the patient's therapeutic path as a conceptual unit, i.e., a sequence, composed of a succession of different states that can describe longitudinal patient's status. This kind of information, such as common patterns of care that allowed us to risk profile patients, can provide health policymakers an opportunity to plan optimum and individualized patient care by allocating appropriate resources, analyzing trends in the health status of a population, and finding the risk factors that can be leveraged to prevent the decline of mental health status at the population level.