Source author record

Andrea Tacchella

Andrea Tacchella appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Computation and Language Artificial Intelligence cs.CY physics.comp-ph physics.data-an q-fin.EC q-fin.GN Social and Information Networks

Catalog footprint

What is connected

5works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Anticipating Innovation Using Large Language Models

Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transformer-based model that treats technologies, classified by International Patent Classification codes, as words in its vocabulary, learning the language of technologies by embedding these codes during fine-tuning. We define context similarity between code embeddings as a measure of linguistic convergence and show that it accurately predicts first technological combinations. TechToken also improves general representation quality, outperforming state-of-the-art models across different patent-related tasks.

preprint2022arXiv

A Bayesian approach to translators' reliability assessment

Translation Quality Assessment (TQA) is a process conducted by human translators and is widely used, both for estimating the performance of (increasingly used) Machine Translation, and for finding an agreement between translation providers and their customers. While translation scholars are aware of the importance of having a reliable way to conduct the TQA process, it seems that there is limited literature that tackles the issue of reliability with a quantitative approach. In this work, we consider the TQA as a complex process from the point of view of physics of complex systems and approach the reliability issue from the Bayesian paradigm. Using a dataset of translation quality evaluations (in the form of error annotations), produced entirely by the Professional Translation Service Provider Translated SRL, we compare two Bayesian models that parameterise the following features involved in the TQA process: the translation difficulty, the characteristics of the translators involved in producing the translation, and of those assessing its quality - the reviewers. We validate the models in an unsupervised setting and show that it is possible to get meaningful insights into translators even with just one review per translation; subsequently, we extract information like translators' skills and reviewers' strictness, as well as their consistency in their respective roles. Using this, we show that the reliability of reviewers cannot be taken for granted even in the case of expert translators: a translator's expertise can induce a cognitive bias when reviewing a translation produced by another translator. The most expert translators, however, are characterised by the highest level of consistency, both in translating and in assessing the translation quality.

preprint2016arXiv

The Build-Up of Diversity in Complex Ecosystems

Diversity is a fundamental feature of ecosystems, even when the concept of ecosystem is extended to sociology or economics. Diversity can be intended as the count of different items, animals, or, more generally, interactions. There are two classes of stylized facts that emerge when diversity is taken into account. The first are Diversity explosions: evolutionary radiations in biology, or the process of escaping 'Poverty Traps' in economics are two well known examples. The second is nestedness: entities with a very diverse set of interactions are the only ones that interact with more specialized ones. In a single sentence: specialists interact with generalists. Nestedness is observed in a variety of bipartite networks of interactions: Biogeographic, macroeconomic and mutualistic to name a few. This indicates that entities diversify following a pattern. Since they appear in such very different systems, these two stylized facts point out that the build up of diversity is driven by a fundamental probabilistic mechanism, and here we sketch its minimal features. We show how the contraction of a random tripartite network, which is maximally entropic in all its degree distributions but one, can reproduce stylized facts of real data with great accuracy which is qualitatively lost when that degree distribution is changed. We base our reasoning on the combinatoric picture that the nodes on one layer of these bipartite networks can be described as combinations of a number of fundamental building blocks. The stylized facts of diversity that we observe in real systems can be explained with an extreme heterogeneity (a scale-free distribution) in the number of meaningful combinations in which each building block is involved. We show that if the usefulness of the building blocks has a scale-free distribution, then maximally entropic baskets of building blocks will give rise to very rich behaviors.

preprint2014arXiv

How the Taxonomy of Products Drives the Economic Development of Countries

We introduce an algorithm able to reconstruct the relevant network structure on which the time evolution of country-product bipartite networks takes place. The significant links are obtained by selecting the largest values of the projected matrix. We first perform a number of tests of this filtering procedure on synthetic cases and a toy model. Then we analyze the bipartite network constituted by countries and exported products, using two databases for a total of almost 50 years. It is then possible to build a hierarchically directed network, in which the taxonomy of products emerges in a natural way. We study the influence of the structure of this taxonomy network on countries' development; in particular, guided by an example taken from the industrialization of South Korea, we link the structure of the taxonomy network to the empirical temporal connections between product activations, finding that the most relevant edges for countries' development are the ones suggested by our network. These results suggest paths in the product space which are easier to achieve, and so can drive countries' policies in the industrialization process.

preprint2012arXiv

A network analysis of countries' export flows: firm grounds for the building blocks of the economy

In this paper we analyze the bipartite network of countries and products from UN data on country production. We define the country-country and product-product projected networks and introduce a novel method of filtering information based on elements' similarity. As a result we find that country clustering reveals unexpected socio-geographic links among the most competing countries. On the same footings the products clustering can be efficiently used for a bottom-up classification of produced goods. Furthermore we mathematically reformulate the "reflections method" introduced by Hidalgo and Hausmann as a fixpoint problem; such formulation highlights some conceptual weaknesses of the approach. To overcome such an issue, we introduce an alternative methodology (based on biased Markov chains) that allows to rank countries in a conceptually consistent way. Our analysis uncovers a strong non-linear interaction between the diversification of a country and the ubiquity of its products, thus suggesting the possible need of moving towards more efficient and direct non-linear fixpoint algorithms to rank countries and products in the global market.