Researcher profile

Mirco Musolesi

Mirco Musolesi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the most common strategies either consider only the most probable tokens, which reduces output diversity, or increase the likelihood of unlikely tokens, compromising output accuracy and correctness. In this paper, we propose DiffSampling, a new decoding method that leverages a mathematical analysis of the token probability distribution to ensure the generation of contextually appropriate text. In particular, the difference between consecutive, sorted probabilities can be used to truncate incorrect tokens. In addition, we also propose two variations of the proposed method that aim to correct the subtle inconsistencies of common sampling strategies. Experiments involving four different text-generation tasks demonstrate that our approach consistently performs at least on par with the existing methods it builds upon in terms of quality, despite sampling from a larger set of tokens.

preprint2026arXiv

Emergent Semantic Role Understanding in Language Models

Understanding how linguistic structure emerges in language models is central to interpreting what these systems learn from data and how much supervision they truly require. In particular, semantic role understanding ("who did what to whom") is a core component of meaning representation, yet it remains unclear whether it arises from pre-training alone or depends on task-specific fine-tuning. We study whether semantic role understanding emerges during language model pre-training or requires task-specific fine-tuning. We freeze decoder-only transformers and train linear probes to extract semantic roles, using performance to infer whether role information is already encoded in pre-training or learned during adaptation. Across model scales, we find that frozen representations contain substantial semantic role information, with performance improving but not fully matching fine-tuned models. This indicates partial but incomplete emergence from pre-training alone. We show that semantic role structure emerges from language modeling objectives, but its internal implementation shifts toward more distributed representations as model scale increases.

preprint2022arXiv

Planning Spatial Networks with Monte Carlo Tree Search

We tackle the problem of goal-directed graph construction: given a starting graph, a budget of modifications, and a global objective function, the aim is to find a set of edges whose addition to the graph achieves the maximum improvement in the objective (e.g., communication efficiency). This problem emerges in many networks of great importance for society such as transportation and critical infrastructure networks. We identify two significant shortcomings with present methods. Firstly, they focus exclusively on network topology while ignoring spatial information; however, in many real-world networks, nodes are embedded in space, which yields different global objectives and governs the range and density of realizable connections. Secondly, existing RL methods scale poorly to large networks due to the high cost of training a model and the scaling factors of the action space and global objectives. In this work, we formulate this problem as a deterministic MDP. We adopt the Monte Carlo Tree Search framework for planning in this domain, prioritizing the optimality of final solutions over the speed of policy evaluation. We propose several improvements over the standard UCT algorithm for this family of problems, addressing their single-agent nature, the trade-off between the costs of edges and their contribution to the objective, and an action space linear in the number of nodes. We demonstrate the suitability of this approach for improving the global efficiency and attack resilience of a variety of synthetic and real-world networks, including Internet backbone networks and metro systems. Our approach obtains a 24% improvement in these metrics compared to UCT on the largest networks tested and scalability superior to previous methods.

preprint2021arXiv

Cooperation and Reputation Dynamics with Reinforcement Learning

Creating incentives for cooperation is a challenge in natural and artificial systems. One potential answer is reputation, whereby agents trade the immediate cost of cooperation for the future benefits of having a good reputation. Game theoretical models have shown that specific social norms can make cooperation stable, but how agents can independently learn to establish effective reputation mechanisms on their own is less understood. We use a simple model of reinforcement learning to show that reputation mechanisms generate two coordination problems: agents need to learn how to coordinate on the meaning of existing reputations and collectively agree on a social norm to assign reputations to others based on their behavior. These coordination problems exhibit multiple equilibria, some of which effectively establish cooperation. When we train agents with a standard Q-learning algorithm in an environment with the presence of reputation mechanisms, convergence to undesirable equilibria is widespread. We propose two mechanisms to alleviate this: (i) seeding a proportion of the system with fixed agents that steer others towards good equilibria; and (ii), intrinsic rewards based on the idea of introspection, i.e., augmenting agents' rewards by an amount proportionate to the performance of their own strategy against themselves. A combination of these simple mechanisms is successful in stabilizing cooperation, even in a fully decentralized version of the problem where agents learn to use and assign reputations simultaneously. We show how our results relate to the literature in Evolutionary Game Theory, and discuss implications for artificial, human and hybrid systems, where reputations can be used as a way to establish trust and cooperation.

preprint2021arXiv

Modelling Grocery Retail Topic Distributions: Evaluation, Interpretability and Stability

Understanding the shopping motivations behind market baskets has high commercial value in the grocery retail industry. Analyzing shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while keeping interpretable outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to process grocery transactions and to discover a broad representation of customers' shopping motivations. However, summarizing the posterior distribution of an LDA model is challenging, while individual LDA draws may not be coherent and cannot capture topic uncertainty. Moreover, the evaluation of LDA models is dominated by model-fit measures which may not adequately capture the qualitative aspects such as interpretability and stability of topics. In this paper, we introduce clustering methodology that post-processes posterior LDA draws to summarise the entire posterior distribution and identify semantic modes represented as recurrent topics. Our approach is an alternative to standard label-switching techniques and provides a single posterior summary set of topics, as well as associated measures of uncertainty. Furthermore, we establish a more holistic definition for model evaluation, which assesses topic models based not only on their likelihood but also on their coherence, distinctiveness and stability. By means of a survey, we set thresholds for the interpretation of topic coherence and topic similarity in the domain of grocery retail data. We demonstrate that the selection of recurrent topics through our clustering methodology not only improves model likelihood but also outperforms the qualitative aspects of LDA such as interpretability and stability. We illustrate our methods on an example from a large UK supermarket chain.