Source author record

Tobia Boschi

Tobia Boschi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Artificial Intelligence Computation Computation and Language Machine Learning physics.soc-ph

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation

SDForger is a flexible and efficient framework for generating high-quality multivariate time series using LLMs. Leveraging a compact data representation, SDForger provides synthetic time series generation from a few samples and low-computation fine-tuning of any autoregressive LLM. Specifically, the framework transforms univariate and multivariate signals into tabular embeddings, which are then encoded into text and used to fine-tune the LLM. At inference, new textual embeddings are sampled and decoded into synthetic time series that retain the original data's statistical properties and temporal dynamics. Across a diverse range of datasets, SDForger outperforms existing generative models in many scenarios, both in similarity-based evaluations and downstream forecasting tasks. By enabling textual conditioning in the generation process, SDForger paves the way for multimodal modeling and the streamlined integration of time series with textual information. The model is open-sourced at https://github.com/IBM/fms-dgt/tree/main/fms_dgt/public/databuilders/time_series.

preprint2023arXiv

Contrasting pre-vaccine COVID-19 waves in Italy through Functional Data Analysis

We use data from 107 Italian provinces to characterize and compare mortality patterns in the first two COVID-19 epidemic waves, which occurred prior to the introduction of vaccines. We also associate these patterns with mobility, timing of government restrictions, and socio-demographic, infrastructural, and environmental covariates. Notwithstanding limitations in the accuracy and reliability of publicly available data, we are able to exploit information in curves and shapes through Functional Data Analysis techniques. Specifically, we document differences in magnitude and variability between the two waves; while both were characterized by a co-occurrence of 'exponential' and 'mild' mortality patterns, the second spread much more broadly and asynchronously through the country. Moreover, we find evidence of a significant positive association between local mobility and mortality in both epidemic waves and corroborate the effectiveness of timely restrictions in curbing mortality. The techniques we describe could capture additional signals of interest if applied, for instance, to data on cases and positivity rates. However, we show that the quality of such data, at least in the case of Italian provinces, was too poor to support meaningful analyses.

preprint2020arXiv

An Efficient Semi-smooth Newton Augmented Lagrangian Method for Elastic Net

Feature selection is an important and active research area in statistics and machine learning. The Elastic Net is often used to perform selection when the features present non-negligible collinearity or practitioners wish to incorporate additional known structure. In this article, we propose a new Semi-smooth Newton Augmented Lagrangian Method to efficiently solve the Elastic Net in ultra-high dimensional settings. Our new algorithm exploits both the sparsity induced by the Elastic Net penalty and the sparsity due to the second order information of the augmented Lagrangian. This greatly reduces the computational cost of the problem. Using simulations on both synthetic and real datasets, we demonstrate that our approach outperforms its best competitors by at least an order of magnitude in terms of CPU time. We also apply our approach to a Genome Wide Association Study on childhood obesity.