Source author record

Bruno Lepri

Bruno Lepri appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cs.CY Social and Information Networks physics.soc-ph Machine Learning Artificial Intelligence Computer Vision Human-Computer Interaction Computation and Language Multiagent Systems Applications Computational Complexity Cryptography and Security Digital Libraries Information Theory math.IT physics.data-an Robotics

Catalog footprint

What is connected

33works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Generative AI collective behavior needs an interactionist paradigm

In this article, we argue that understanding the collective behavior of agents based on large language models (LLMs) is an essential area of inquiry, with important implications in terms of risks and benefits, impacting us as a society at many levels. We claim that the distinctive nature of LLMs--namely, their initialization with extensive pre-trained knowledge and implicit social priors, together with their capability of adaptation through in-context learning--motivates the need for an interactionist paradigm consisting of alternative theoretical foundations, methodologies, and analytical tools, in order to systematically examine how prior knowledge and embedded values interact with social context to shape emergent phenomena in multi-agent generative AI systems. We propose and discuss four directions that we consider crucial for the development and deployment of LLM-based collectives, focusing on theory, methods, and trans-disciplinary dialogue.

preprint2026arXiv

Graph Hierarchical Recurrence for Long-Range Generalization

Graph Neural Networks (GNNs) and Graph Transformers (GTs) are now a fundamental paradigm for graph learning, combining the representation-learning capabilities of deep models with the sample efficiency induced by their inductive biases. Despite their effectiveness, a large body of work has shown that these models still face fundamental limitations in tasks that require capturing correlations between distant regions of a graph. To address this issue, we introduce Graph Hierarchical Recurrence (GHR), a novel framework that operates jointly on the input graph and on a hierarchical abstraction obtained through pooling. We also show that the limitations of existing models are even more pronounced in out-of-range generalization, where test instances involve interactions over distances longer than those observed during training. By contrast, despite its simple design, GHR provides three key advantages: strong performance on long-range dependencies, improved out-of-range generalization, and high parameter efficiency. To corroborate these claims, we show that across a broad set of long-range benchmarks, GHR consistently outperforms existing graph models while using as little as 1% of the parameters of current state-of-the-art models. These results suggest a complementary direction to the current trend of scaling architectures to obtain graph foundation models, indicating that increased model capacity alone may not be sufficient for generalization.

preprint2026arXiv

LLMberjack: Guided Trimming of Debate Trees for Multi-Party Conversation Creation

We present LLMberjack, a platform for creating multi-party conversations starting from existing debates, originally structured as reply trees. The system offers an interactive interface that visualizes discussion trees and enables users to construct coherent linearized dialogue sequences while preserving participant identity and discourse relations. It integrates optional large language model (LLM) assistance to support automatic editing of the messages and speakers' descriptions. We demonstrate the platform's utility by showing how tree visualization facilitates the creation of coherent, meaningful conversation threads and how LLM support enhances output quality while reducing human effort. The tool is open-source and designed to promote transparent and reproducible workflows to create multi-party conversations, addressing a lack of resources of this type.

preprint2026arXiv

Mirror, Mirror on the Wall: Can VLM Agents Tell Who They Are at All?

In the animal kingdom, mirror self-recognition is a canonical probe of higher-order cognition, emerging only in some species. We ask whether an analogous functional capability emerges in embodied vision-language model (VLM) agents: can they recognize themselves in a mirror? We introduce a controlled 3D benchmark where a first-person VLM agent must infer a hidden body attribute from its reflection and select the matching target, while avoiding self-other misattribution. To separate mirror-grounded self-identification from shortcuts, we test mirror removal, misleading cues, and occluded reflections. We also evaluate the decision process through mirror seeking, temporal ordering, self-attribution, and reasoning-action consistency. Our experiments show that mirror-based self-identification emerges mainly in stronger VLMs. These models can use reflected evidence for action, whereas weaker models often inspect the mirror but fail to extract self-relevant information or misattribute their reflection. Language-vision conflict further shows that self-referential language alone is not evidence of grounded self-identification. Overall, mirror-based evaluation provides a diagnostic for whether embodied self-grounding is causally rooted in perception and action rather than priors, prompt compliance, or confabulation.

preprint2022arXiv

A Framework for Verifiable and Auditable Federated Anomaly Detection

Federated Leaning is an emerging approach to manage cooperation between a group of agents for the solution of Machine Learning tasks, with the goal of improving each agent's performance without disclosing any data. In this paper we present a novel algorithmic architecture that tackle this problem in the particular case of Anomaly Detection (or classification or rare events), a setting where typical applications often comprise data with sensible information, but where the scarcity of anomalous examples encourages collaboration. We show how Random Forests can be used as a tool for the development of accurate classifiers with an effective insight-sharing mechanism that does not break the data integrity. Moreover, we explain how the new architecture can be readily integrated in a blockchain infrastructure to ensure the verifiable and auditable execution of the algorithm. Furthermore, we discuss how this work may set the basis for a more general approach for the design of federated ensemble-learning methods beyond the specific task and architecture discussed in this paper.

preprint2022arXiv

ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image Translation

Recently, there has been an increasing interest in image editing methods that employ pre-trained unconditional image generators (e.g., StyleGAN). However, applying these methods to translate images to multiple visual domains remains challenging. Existing works do not often preserve the domain-invariant part of the image (e.g., the identity in human face translations), they do not usually handle multiple domains, or do not allow for multi-modal translations. This work proposes an implicit style function (ISF) to straightforwardly achieve multi-modal and multi-domain image-to-image translation from pre-trained unconditional generators. The ISF manipulates the semantics of an input latent code to make the image generated from it lying in the desired visual domain. Our results in human face and animal manipulations show significantly improved results over the baselines. Our model enables cost-effective multi-modal unsupervised image-to-image translations at high resolution using pre-trained unconditional GANs. The code and data are available at: \url{https://github.com/yhlleo/stylegan-mmuit}.

preprint2022arXiv

Modeling International Mobility using Roaming Cell Phone Traces during COVID-19 Pandemic

Most of the studies related to human mobility are focused on intra-country mobility. However, there are many scenarios (e.g., spreading diseases, migration) in which timely data on international commuters are vital. Mobile phones represent a unique opportunity to monitor international mobility flows in a timely manner and with proper spatial aggregation. This work proposes using roaming data generated by mobile phones to model incoming and outgoing international mobility. We use the gravity and radiation models to capture mobility flows before and during the introduction of non-pharmaceutical interventions. However, traditional models have some limitations: for instance, mobility restrictions are not explicitly captured and may play a crucial role. To overtake such limitations, we propose the COVID Gravity Model (CGM), namely an extension of the traditional gravity model that is tailored for the pandemic scenario. This proposed approach overtakes, in terms of accuracy, the traditional models by 126.9% for incoming mobility and by 63.9% when modeling outgoing mobility flows.

preprint2022arXiv

Reprogramming FairGANs with Variational Auto-Encoders: A New Transfer Learning Model

Fairness-aware GANs (FairGANs) exploit the mechanisms of Generative Adversarial Networks (GANs) to impose fairness on the generated data, freeing them from both disparate impact and disparate treatment. Given the model's advantages and performance, we introduce a novel learning framework to transfer a pre-trained FairGAN to other tasks. This reprogramming process has the goal of maintaining the FairGAN's main targets of data utility, classification utility, and data fairness, while widening its applicability and ease of use. In this paper we present the technical extensions required to adapt the original architecture to this new framework (and in particular the use of Variational Auto-Encoders), and discuss the benefits, trade-offs, and limitations of the new model.

preprint2022arXiv

Trajectory Test-Train Overlap in Next-Location Prediction Datasets

Next-location prediction, consisting of forecasting a user's location given their historical trajectories, has important implications in several fields, such as urban planning, geo-marketing, and disease spreading. Several predictors have been proposed in the last few years to address it, including last-generation ones based on deep learning. This paper tests the generalization capability of these predictors on public mobility datasets, stratifying the datasets by whether the trajectories in the test set also appear fully or partially in the training set. We consistently discover a severe problem of trajectory overlapping in all analyzed datasets, highlighting that predictors memorize trajectories while having limited generalization capacities. We thus propose a methodology to rerank the outputs of the next-location predictors based on spatial mobility patterns. With these techniques, we significantly improve the predictors' generalization capability, with a relative improvement on the accuracy up to 96.15% on the trajectories that cannot be memorized (i.e., low overlap with the training set).

preprint2021arXiv

A Survey on Deep Learning for Human Mobility

The study of human mobility is crucial due to its impact on several aspects of our society, such as disease spreading, urban planning, well-being, pollution, and more. The proliferation of digital mobility data, such as phone records, GPS traces, and social media posts, combined with the predictive power of artificial intelligence, triggered the application of deep learning to human mobility. Existing surveys focus on single tasks, data sources, mechanistic or traditional machine learning approaches, while a comprehensive description of deep learning solutions is missing. This survey provides a taxonomy of mobility tasks, a discussion on the challenges related to each task and how deep learning may overcome the limitations of traditional models, a description of the most relevant solutions to the mobility tasks described above and the relevant challenges for the future. Our survey is a guide to the leading deep learning solutions to next-location prediction, crowd flow prediction, trajectory generation, and flow generation. At the same time, it helps deep learning scientists and practitioners understand the fundamental concepts and the open challenges of the study of human mobility.

preprint2021arXiv

Detecting discriminatory risk through data annotation based on Bayesian inferences

Thanks to the increasing growth of computational power and data availability, the research in machine learning has advanced with tremendous rapidity. Nowadays, the majority of automatic decision making systems are based on data. However, it is well known that machine learning systems can present problematic results if they are built on partial or incomplete data. In fact, in recent years several studies have found a convergence of issues related to the ethics and transparency of these systems in the process of data collection and how they are recorded. Although the process of rigorous data collection and analysis is fundamental in the model design, this step is still largely overlooked by the machine learning community. For this reason, we propose a method of data annotation based on Bayesian statistical inference that aims to warn about the risk of discriminatory results of a given data set. In particular, our method aims to deepen knowledge and promote awareness about the sampling practices employed to create the training set, highlighting that the probability of success or failure conditioned to a minority membership is given by the structure of the data available. We empirically test our system on three datasets commonly accessed by the machine learning community and we investigate the risk of racial discrimination.

preprint2020arXiv

Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach

Manipulating visual attributes of images through human-written text is a very challenging task. On the one hand, models have to learn the manipulation without the ground truth of the desired output. On the other hand, models have to deal with the inherent ambiguity of natural language. Previous research usually requires either the user to describe all the characteristics of the desired image or to use richly-annotated image captioning datasets. In this work, we propose a novel unsupervised approach, based on image-to-image translation, that alters the attributes of a given image through a command-like sentence such as "change the hair color to black". Contrarily to state-of-the-art approaches, our model does not require a human-annotated dataset nor a textual description of all the attributes of the desired image, but only those that have to be modified. Our proposed model disentangles the image content from the visual attributes, and it learns to modify the latter using the textual description, before generating a new image from the content and the modified attribute representation. Because text might be inherently ambiguous (blond hair may refer to different shadows of blond, e.g. golden, icy, sandy), our method generates multiple stochastic versions of the same translation. Experiments show that the proposed model achieves promising performances on two large-scale public datasets: CelebA and CUB. We believe our approach will pave the way to new avenues of research combining textual and speech commands with visual attributes.

preprint2020arXiv

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.

preprint2020arXiv

GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling

Unsupervised image-to-image translation (UNIT) aims at learning a mapping between several visual domains by using unpaired training images. Recent studies have shown remarkable success for multiple domains but they suffer from two main limitations: they are either built from several two-domain mappings that are required to be learned independently, or they generate low-diversity results, a problem known as mode collapse. To overcome these limitations, we propose a method named GMM-UNIT, which is based on a content-attribute disentangled representation where the attribute space is fitted with a GMM. Each GMM component represents a domain, and this simple assumption has two prominent advantages. First, it can be easily extended to most multi-domain and multi-modal image-to-image translation tasks. Second, the continuous domain encoding allows for interpolation between domains and for extrapolation to unseen domains and translations. Additionally, we show how GMM-UNIT can be constrained down to different methods in the literature, meaning that GMM-UNIT is a unifying framework for unsupervised image-to-image translation.

preprint2020arXiv

Learning Mobility Flows from Urban Features with Spatial Interaction Models and Neural Networks

A fundamental problem of interest to policy makers, urban planners, and other stakeholders involved in urban development projects is assessing the impact of planning and construction activities on mobility flows. This is a challenging task due to the different spatial, temporal, social, and economic factors influencing urban mobility flows. These flows, along with the influencing factors, can be modelled as attributed graphs with both node and edge features characterising locations in a city and the various types of relationships between them. In this paper, we address the problem of assessing origin-destination (OD) car flows between a location of interest and every other location in a city, given their features and the structural characteristics of the graph. We propose three neural network architectures, including graph neural networks (GNN), and conduct a systematic comparison between the proposed methods and state-of-the-art spatial interaction models, their modifications, and machine learning approaches. The objective of the paper is to address the practical problem of estimating potential flow between an urban development project location and other locations in the city, where the features of the project location are known in advance. We evaluate the performance of the models on a regression task using a custom data set of attributed car OD flows in London. We also visualise the model performance by showing the spatial distribution of flow residuals across London.

preprint2020arXiv

Mobile phone data and COVID-19: Missing an opportunity?

This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of use cases. It presents ways to overcome these gaps and key recommendations for urgent action, most notably the establishment of mixed expert groups on national and regional level, and the inclusion and support of governments and public authorities early on. It is authored by a group of experienced data scientists, epidemiologists, demographers and representatives of mobile network operators who jointly put their work at the service of the global effort to combat the COVID-19 pandemic.

preprint2020arXiv

Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation

Image to image translation aims to learn a mapping that transforms an image from one visual domain to another. Recent works assume that images descriptors can be disentangled into a domain-invariant content representation and a domain-specific style representation. Thus, translation models seek to preserve the content of source images while changing the style to a target visual domain. However, synthesizing new images is extremely challenging especially in multi-domain translations, as the network has to compose content and style to generate reliable and diverse images in multiple domains. In this paper we propose the use of an image retrieval system to assist the image-to-image translation task. First, we train an image-to-image translation model to map images to multiple domains. Then, we train an image retrieval model using real and generated images to find images similar to a query one in content but in a different domain. Finally, we exploit the image retrieval system to fine-tune the image-to-image translation model and generate higher quality images. Our experiments show the effectiveness of the proposed solution and highlight the contribution of the retrieval network, which can benefit from additional unlabeled data and help image-to-image translation models in the presence of scarce data.

preprint2020arXiv

Segregated interactions in urban and online space

Urban income segregation is a widespread phenomenon that challenges societies across the globe. Classical studies on segregation have largely focused on the geographic distribution of residential neighborhoods rather than on patterns of social behaviors and interactions. In this study, we analyze segregation in economic and social interactions by observing credit card transactions and Twitter mentions among thousands of individuals in three culturally different metropolitan areas. We show that segregated interaction is amplified relative to the expected effects of geographic segregation in terms of both purchase activity and online communication. Furthermore, we find that segregation increases with difference in socio-economic status but is asymmetric for purchase activity, i.e., the amount of interaction from poorer to wealthier neighborhoods is larger than vice versa. Our results provide novel insights into the understanding of behavioral segregation in human interactions with significant socio-political and economic implications.

preprint2020arXiv

Socio-economic, built environment, and mobility conditions associated with crime: A study of multiple cities

Nowadays, 23% of the world population lives in multi-million cities. In these metropolises, criminal activity is much higher and violent than in either small cities or rural areas. Thus, understanding what factors influence urban crime in big cities is a pressing need. Mainstream studies analyse crime records through historical panel data or analysis of historical patterns combined with ecological factor and exploratory mapping. More recently, machine learning methods have provided informed crime prediction over time. However, previous studies have focused on a single city at a time, considering only a limited number of factors (such as socio-economical characteristics) and often at large spatial units. Hence, our understanding of the factors influencing crime across cultures and cities is very limited. Here we propose a Bayesian model to explore how crime is related not only to socio-economic factors but also to the built environmental (e.g. land use) and mobility characteristics of neighbourhoods. To that end, we integrate multiple open data sources with mobile phone traces and compare how the different factors correlate with crime in diverse cities, namely Boston, Bogotá, Los Angeles and Chicago. We find that the combined use of socio-economic conditions, mobility information and physical characteristics of the neighbourhood effectively explain the emergence of crime, and improve the performance of the traditional approaches. However, we show that the socio-ecological factors of neighbourhoods relate to crime very differently from one city to another. Thus there is clearly no "one fits all" model.

preprint2020arXiv

Uncovering socioeconomic gaps in mobility reduction during the COVID-19 pandemic using location data

Using smartphone location data from Colombia, Mexico, and Indonesia, we investigate how non-pharmaceutical policy interventions intended to mitigate the spread of the COVID-19 pandemic impact human mobility. In all three countries, we find that following the implementation of mobility restriction measures, human movement decreased substantially. Importantly, we also uncover large and persistent differences in mobility reduction between wealth groups: on average, users in the top decile of wealth reduced their mobility up to twice as much as users in the bottom decile. For decision-makers seeking to efficiently allocate resources to response efforts, these findings highlight that smartphone location data can be leveraged to tailor policies to the needs of specific socioeconomic groups, especially the most vulnerable.

preprint2020arXiv

Understanding individual behaviour: from virtual to physical patterns

As "Big Data" has become pervasive, an increasing amount of research has connected the dots between human behaviour in the offline and online worlds. Consequently, researchers have exploited these new findings to create models that better predict different aspects of human life and recommend future behaviour. To date, however, we do not yet fully understand the similarities and differences of human behaviour in these virtual and physical worlds. Here, we analyse and discuss the mobility and application usage of 400,000 individuals over eight months. We find an astonishing similarity between people's mobility in the physical space and how they move from app to app in smartphones. Our data shows that individuals use and visit a finite number of apps and places, but they keep exploring over time. In particular, two distinct profiles of individuals emerge: those that keep changing places and services, and those that are stable over time, named as "explorers" and "keepers". We see these findings as crucial to enrich a discussion for the potentials and the challenges of building human-centric AI systems, which might leverage recent results in Computational Social Science.

preprint2019arXiv

Urban Swarms: A new approach for autonomous waste management

Modern cities are growing ecosystems that face new challenges due to the increasing population demands. One of the many problems they face nowadays is waste management, which has become a pressing issue requiring new solutions. Swarm robotics systems have been attracting an increasing amount of attention in the past years and they are expected to become one of the main driving factors for innovation in the field of robotics. The research presented in this paper explores the feasibility of a swarm robotics system in an urban environment. By using bio-inspired foraging methods such as multi-place foraging and stigmergy-based navigation, a swarm of robots is able to improve the efficiency and autonomy of the urban waste management system in a realistic scenario. To achieve this, a diverse set of simulation experiments was conducted using real-world GIS data and implementing different garbage collection scenarios driven by robot swarms. Results presented in this research show that the proposed system outperforms current approaches. Moreover, results not only show the efficiency of our solution, but also give insights about how to design and customize these systems.

preprint2016arXiv

Are Safer Looking Neighborhoods More Lively? A Multimodal Investigation into Urban Life

Policy makers, urban planners, architects, sociologists, and economists are interested in creating urban areas that are both lively and safe. But are the safety and liveliness of neighborhoods independent characteristics? Or are they just two sides of the same coin? In a world where people avoid unsafe looking places, neighborhoods that look unsafe will be less lively, and will fail to harness the natural surveillance of human activity. But in a world where the preference for safe looking neighborhoods is small, the connection between the perception of safety and liveliness will be either weak or nonexistent. In this paper we explore the connection between the levels of activity and the perception of safety of neighborhoods in two major Italian cities by combining mobile phone data (as a proxy for activity or liveliness) with scores of perceived safety estimated using a Convolutional Neural Network trained on a dataset of Google Street View images scored using a crowdsourced visual perception survey. We find that: (i) safer looking neighborhoods are more active than what is expected from their population density, employee density, and distance to the city centre; and (ii) that the correlation between appearance of safety and activity is positive, strong, and significant, for females and people over 50, but negative for people under 30, suggesting that the behavioral impact of perception depends on the demographic of the population. Finally, we use occlusion techniques to identify the urban features that contribute to the appearance of safety, finding that greenery and street facing windows contribute to a positive appearance of safety (in agreement with Oscar Newman's defensible space theory). These results suggest that urban appearance modulates levels of human activity and, consequently, a neighborhood's rate of natural surveillance.

preprint2016arXiv

The Death and Life of Great Italian Cities: A Mobile Phone Data Perspective

The Death and Life of Great American Cities was written in 1961 and is now one of the most influential book in city planning. In it, Jane Jacobs proposed four conditions that promote life in a city. However, these conditions have not been empirically tested until recently. This is mainly because it is hard to collect data about "city life". The city of Seoul recently collected pedestrian activity through surveys at an unprecedented scale, with an effort spanning more than a decade, allowing researchers to conduct the first study successfully testing Jacobs's conditions. In this paper, we identify a valuable alternative to the lengthy and costly collection of activity survey data: mobile phone data. We extract human activity from such data, collect land use and socio-demographic information from the Italian Census and Open Street Map, and test the four conditions in six Italian cities. Although these cities are very different from the places for which Jacobs's conditions were spelled out (i.e., great American cities) and from the places in which they were recently tested (i.e., the Asian city of Seoul), we find those conditions to be indeed associated with urban life in Italy as well. Our methodology promises to have a great impact on urban studies, not least because, if replicated, it will make it possible to test Jacobs's theories at scale.

preprint2016arXiv

The Tyranny of Data? The Bright and Dark Sides of Data-Driven Decision-Making for Social Good

The unprecedented availability of large-scale human behavioral data is profoundly changing the world we live in. Researchers, companies, governments, financial institutions, non-governmental organizations and also citizen groups are actively experimenting, innovating and adapting algorithmic decision-making tools to understand global patterns of human behavior and provide decision support to tackle problems of societal importance. In this chapter, we focus our attention on social good decision-making algorithms, that is algorithms strongly influencing decision-making and resource optimization of public goods, such as public health, safety, access to finance and fair employment. Through an analysis of specific use cases and approaches, we highlight both the positive opportunities that are created through data-driven algorithmic decision-making, and the potential negative consequences that practitioners should be aware of and address in order to truly realize the potential of this emergent field. We elaborate on the need for these algorithms to provide transparency and accountability, preserve privacy and be tested and evaluated in context, by means of living lab approaches involving citizens. Finally, we turn to the requirements which would make it possible to leverage the predictive power of data-driven human behavior analysis while ensuring transparency, accountability, and civic participation.

preprint2015arXiv

Beyond Contagion: Reality Mining Reveals Complex Patterns of Social Influence

Contagion, a concept from epidemiology, has long been used to characterize social influence on people's behavior and affective (emotional) states. While it has revealed many useful insights, it is not clear whether the contagion metaphor is sufficient to fully characterize the complex dynamics of psychological states in a social context. Using wearable sensors that capture daily face-to-face interaction, combined with three daily experience sampling surveys, we collected the most comprehensive data set of personality and emotion dynamics of an entire community of work. From this high-resolution data about actual (rather than self-reported) face-to-face interaction, a complex picture emerges where contagion (that can be seen as adaptation of behavioral responses to the behavior of other people) cannot fully capture the dynamics of transitory states. We found that social influence has two opposing effects on states: \emph{adaptation} effects that go beyond mere contagion, and \emph{complementarity} effects whereby individuals' behaviors tend to complement the behaviors of others. Surprisingly, these effects can exhibit completely different directions depending on the stable personality or emotional dispositions (stable traits) of target individuals. Our findings provide a foundation for richer models of social dynamics, and have implications on organizational engineering and workplace well-being.

preprint2015arXiv

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.

preprint2014arXiv

Daily Stress Recognition from Mobile Phone Data, Weather Conditions and Individual Traits

Research has proven that stress reduces quality of life and causes many diseases. For this reason, several researchers devised stress detection systems based on physiological parameters. However, these systems require that obtrusive sensors are continuously carried by the user. In our paper, we propose an alternative approach providing evidence that daily stress can be reliably recognized based on behavioral metrics, derived from the user's mobile phone activity and from additional indicators, such as the weather conditions (data pertaining to transitory properties of the environment) and the personality traits (data concerning permanent dispositions of individuals). Our multifactorial statistical model, which is person-independent, obtains the accuracy score of 72.28% for a 2-class daily stress recognition problem. The model is efficient to implement for most of multimedia applications due to highly reduced low-dimensional feature space (32d). Moreover, we identify and discuss the indicators which have strong predictive power.

preprint2014arXiv

Generalized Compression Dictionary Distance as Universal Similarity Measure

We present a new similarity measure based on information theoretic measures which is superior than Normalized Compression Distance for clustering problems and inherits the useful properties of conditional Kolmogorov complexity. We show that Normalized Compression Dictionary Size and Normalized Compression Dictionary Entropy are computationally more efficient, as the need to perform the compression itself is eliminated. Also they scale linearly with exponential vector size growth and are content independent. We show that normalized compression dictionary distance is compressor independent, if limited to lossless compressors, which gives space for optimizations and implementation speed improvement for real-time and big data applications. The introduced measure is applicable for machine learning tasks of parameter-free unsupervised clustering, supervised learning such as classification and regression, feature selection, and is applicable for big data problems with order of magnitude speed increase.

preprint2014arXiv

Money Walks: A Human-Centric Study on the Economics of Personal Mobile Data

In the context of a myriad of mobile apps which collect personally identifiable information (PII) and a prospective market place of personal data, we investigate a user-centric monetary valuation of mobile PII. During a 6-week long user study in a living lab deployment with 60 participants, we collected their daily valuations of 4 categories of mobile PII (communication, e.g. phonecalls made/received, applications, e.g. time spent on different apps, location and media, photos taken) at three levels of complexity (individual data points, aggregated statistics and processed, i.e. meaningful interpretations of the data). In order to obtain honest valuations, we employ a reverse second price auction mechanism. Our findings show that the most sensitive and valued category of personal information is location. We report statistically significant associations between actual mobile usage, personal dispositions, and bidding behavior. Finally, we outline key implications for the design of mobile services and future markets of personal data.

preprint2014arXiv

Once Upon a Crime: Towards Crime Prediction from Demographics and Mobile Data

In this paper, we present a novel approach to predict crime in a geographic space from multiple data sources, in particular mobile phone and demographic data. The main contribution of the proposed approach lies in using aggregated and anonymized human behavioral data derived from mobile network activity to tackle the crime prediction problem. While previous research efforts have used either background historical knowledge or offenders' profiling, our findings support the hypothesis that aggregated human behavioral data captured from the mobile network infrastructure, in combination with basic demographic information, can be used to predict crime. In our experimental results with real crime data from London we obtain an accuracy of almost 70% when predicting whether a specific area in the city will be a crime hotspot or not. Moreover, we provide a discussion of the implications of our findings for data-driven crime analysis.

preprint2012arXiv

Automatic Prediction Of Small Group Performance In Information Sharing Tasks

In this paper, we describe a novel approach, based on Markov jump processes, to model small group conversational dynamics and to predict small group performance. More precisely, we estimate conversational events such as turn taking, backchannels, turn-transitions at the micro-level (1 minute windows) and then we bridge the micro-level behavior and the macro-level performance. We tested our approach with a cooperative task, the Information Sharing task, and we verified the relevance of micro- level interaction dynamics in determining a good group performance (e.g. higher speaking turns rate and more balanced participation among group members).

preprint2012arXiv

Do Linguistic Style and Readability of Scientific Abstracts affect their Virality?

Reactions to textual content posted in an online social network show different dynamics depending on the linguistic style and readability of the submitted content. Do similar dynamics exist for responses to scientific articles? Our intuition, supported by previous research, suggests that the success of a scientific article depends on its content, rather than on its linguistic style. In this article, we examine a corpus of scientific abstracts and three forms of associated reactions: article downloads, citations, and bookmarks. Through a class-based psycholinguistic analysis and readability indices tests, we show that certain stylistic and readability features of abstracts clearly concur in determining the success and viral capability of a scientific article.

Bruno Lepri

What is connected

Connect this record

See the researcher in context

Building this map preview

33 published item(s)

Generative AI collective behavior needs an interactionist paradigm

Graph Hierarchical Recurrence for Long-Range Generalization

LLMberjack: Guided Trimming of Debate Trees for Multi-Party Conversation Creation

Mirror, Mirror on the Wall: Can VLM Agents Tell Who They Are at All?

A Framework for Verifiable and Auditable Federated Anomaly Detection

ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image Translation

Modeling International Mobility using Roaming Cell Phone Traces during COVID-19 Pandemic

Reprogramming FairGANs with Variational Auto-Encoders: A New Transfer Learning Model

Trajectory Test-Train Overlap in Next-Location Prediction Datasets

A Survey on Deep Learning for Human Mobility

Detecting discriminatory risk through data annotation based on Bayesian inferences

Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling

Learning Mobility Flows from Urban Features with Spatial Interaction Models and Neural Networks

Mobile phone data and COVID-19: Missing an opportunity?

Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation

Segregated interactions in urban and online space

Socio-economic, built environment, and mobility conditions associated with crime: A study of multiple cities

Uncovering socioeconomic gaps in mobility reduction during the COVID-19 pandemic using location data

Understanding individual behaviour: from virtual to physical patterns

Urban Swarms: A new approach for autonomous waste management

Are Safer Looking Neighborhoods More Lively? A Multimodal Investigation into Urban Life

The Death and Life of Great Italian Cities: A Mobile Phone Data Perspective

The Tyranny of Data? The Bright and Dark Sides of Data-Driven Decision-Making for Social Good

Beyond Contagion: Reality Mining Reveals Complex Patterns of Social Influence

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

Daily Stress Recognition from Mobile Phone Data, Weather Conditions and Individual Traits

Generalized Compression Dictionary Distance as Universal Similarity Measure

Money Walks: A Human-Centric Study on the Economics of Personal Mobile Data

Once Upon a Crime: Towards Crime Prediction from Demographics and Mobile Data

Automatic Prediction Of Small Group Performance In Information Sharing Tasks

Do Linguistic Style and Readability of Scientific Abstracts affect their Virality?