Source author record

Federico Nanni

Federico Nanni appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Digital Libraries Information Retrieval Artificial Intelligence Computational Engineering, Finance, and Science cs.CY Human-Computer Interaction Programming Languages

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

AI-Assisted Authoring for Transparent, Data-Driven Documents

We introduce _transparent documents_, interactive web-based scholarly articles which allow readers to explore the relationship to the underlying data by hovering over fragments of text, and present an LLM-based tool for authoring transparent documents, building on recent developments in data provenance for general-purpose programming languages. As a target platform, our implementation uses Fluid, an open source programming language with a provenance-tracking runtime. Our agent-based tool supports a human author during the creation of transparent documents, identifying fragments of text which can be computed from data, such as numerical values selected from records or computed by aggregations like sum and mean, comparatives and superlatives like _better than_ and _largest_, trend-adjectives like _growing_, and similar quantitative or semi-quantitative phrases, and then attempts to synthesise a suitable Fluid query over the data which generates the target string. The resulting expression is inserted into the article's web page, turning the static text fragment into an interactable data-driven element able to reveal the data that underwrites the natural language claim. We evaluate our approach on a subset of SciGen, an open source dataset consisting of tables from scientific articles and their corresponding descriptions, which we extend with hand-generated counterfactual test cases to evaluate how well machine-generated expressions generalise. Our results show that gpt4o is often able to synthesise compound expressions extensionally compatible with our gold solutions.

preprint2022arXiv

Identifying Moments of Change from Longitudinal User Text

Identifying changes in individuals' behaviour and mood, as observed via content shared on online platforms, is increasingly gaining importance. Most research to-date on this topic focuses on either: (a) identifying individuals at risk or with a certain mental health condition given a batch of posts or (b) providing equivalent labels at the post level. A disadvantage of such work is the lack of a strong temporal component and the inability to make longitudinal assessments following an individual's trajectory and allowing timely interventions. Here we define a new task, that of identifying moments of change in individuals on the basis of their shared content online. The changes we consider are sudden shifts in mood (switches) or gradual mood progression (escalations). We have created detailed guidelines for capturing moments of change and a corpus of 500 manually annotated user timelines (18.7K posts). We have developed a variety of baseline models drawing inspiration from related tasks and show that the best performance is obtained through context aware sequential modelling. We also introduce new metrics for capturing rare events in temporal windows.

preprint2021arXiv

Event-based Access to Historical Italian War Memoirs

The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.

preprint2020arXiv

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Recognizing toponyms and resolving them to their real-world referents is required for providing advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a toponym previously recognized. While it has traditionally received little attention in the research community, it has been shown that candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a flexible deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several new realistic datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors). We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.

preprint2016arXiv

Entities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA

In order to create a corpus exploration method providing topics that are easier to interpret than standard LDA topic models, here we propose combining two techniques called Entity linking and Labeled LDA. Our method identifies in an ontology a series of descriptive labels for each document in a corpus. Then it generates a specific topic for each label. Having a direct relation between topics and labels makes interpretation easier; using an ontology as background knowledge limits label ambiguity. As our topics are described with a limited number of clear-cut labels, they promote interpretability, and this may help quantitative evaluation. We illustrate the potential of the approach by applying it in order to define the most relevant topics addressed by each party in the European Parliament's fifth mandate (1999-2004).

preprint2016arXiv

Reconstructing a website's lost past: Methodological issues concerning the history of www.unibo.it

This paper describes how born digital primary sources could be used to reconstruct the recent history of scientific institutions. The case study is an analysis of the first 25 years online of the University of Bologna. The focus of this work is primarily methodological: several different issues are presented, starting with the fact that the University of Bologna website has been excluded for thirteen years from the Internet Archive's Wayback Machine, and possible solutions are proposed and applied. The article is organised in three parts: in the first one, some of the fundamental concepts on web archives and the preservation of born digital sources are introduced. Then the reconstruction of the University of Bologna web's past is presented. Finally the future of this research is described, presenting a specific case study in which the historian's craft will be challenged by a completely different issue, namely the large amount of data available in the university digital library.

Federico Nanni

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

AI-Assisted Authoring for Transparent, Data-Driven Documents

Identifying Moments of Change from Longitudinal User Text

Event-based Access to Historical Italian War Memoirs

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Entities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA

Reconstructing a website's lost past: Methodological issues concerning the history of www.unibo.it