Source author record

Lewis Mitchell

Lewis Mitchell appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks physics.soc-ph physics.data-an nlin.CD physics.ao-ph Computation and Language cs.CY Information Theory math.IT Applications cond-mat.dis-nn physics.geo-ph Populations and Evolution

Catalog footprint

What is connected

22works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

The entropy rate of Linear Additive Markov Processes

This work derives a theoretical value for the entropy of a Linear Additive Markov Process (LAMP), an expressive model able to generate sequences with a given autocorrelation structure. While a first-order Markov Chain model generates new values by conditioning on the current state, the LAMP model takes the transition state from the sequence's history according to some distribution which does not have to be bounded. The LAMP model captures complex relationships and long-range dependencies in data with similar expressibility to a higher-order Markov process. While a higher-order Markov process has a polynomial parameter space, a LAMP model is characterised only by a probability distribution and the transition matrix of an underlying first-order Markov Chain. We prove that the theoretical entropy rate of a LAMP is equivalent to the theoretical entropy rate of the underlying first-order Markov Chain. This surprising result is explained by the randomness introduced by the random process which selects the LAMP transitioning state, and provides a tool to model complex dependencies in data while retaining useful theoretical results. We use the LAMP model to estimate the entropy rate of the LastFM, BrightKite, Wikispeedia and Reuters-21578 datasets. We compare estimates calculated using frequency probability estimates, a first-order Markov model and the LAMP model, and consider two approaches to ensuring the transition matrix is irreducible. In most cases the LAMP entropy rates are lower than those of the alternatives, suggesting that LAMP model is better at accommodating structural dependencies in the processes.

preprint2022arXiv

#IStandWithPutin versus #IStandWithUkraine: The interaction of bots and humans in discussion of the Russia/Ukraine war

The 2022 Russian invasion of Ukraine emphasises the role social media plays in modern-day warfare, with conflict occurring in both the physical and information environments. There is a large body of work on identifying malicious cyber-activity, but less focusing on the effect this activity has on the overall conversation, especially with regards to the Russia/Ukraine Conflict. Here, we employ a variety of techniques including information theoretic measures, sentiment and linguistic analysis, and time series techniques to understand how bot activity influences wider online discourse. By aggregating account groups we find significant information flows from bot-like accounts to non-bot accounts with behaviour differing between sides. Pro-Russian non-bot accounts are most influential overall, with information flows to a variety of other account groups. No significant outward flows exist from pro-Ukrainian non-bot accounts, with significant flows from pro-Ukrainian bot accounts into pro-Ukrainian non-bot accounts. We find that bot activity drives an increase in conversations surrounding angst (with p = 2.450 x 1e-4) as well as those surrounding work/governance (with p = 3.803 x 1e-18). Bot activity also shows a significant relationship with non-bot sentiment (with p = 3.76 x 1e-4), where we find the relationship holds in both directions. This work extends and combines existing techniques to quantify how bots are influencing people in the online conversation around the Russia/Ukraine invasion. It opens up avenues for researchers to understand quantitatively how these malicious campaigns operate, and what makes them impactful.

preprint2022arXiv

Are we always in strife? A longitudinal study of the echo chamber effect in the Australian Twittersphere

Contrary to expectations that the increased connectivity offered by the internet and particularly Online Social Networks (OSNs) would result in broad consensus on contentious issues, we instead frequently observe the formation of polarised echo chambers, in which only one side of an argument is entertained. These can progress to filter bubbles, actively filtering contrasting opinions, resulting in vulnerability to misinformation and increased polarisation on social and political issues. These have real-world effects when they spread offline, such as vaccine hesitation and violence. This work seeks to develop a better understanding of how echo chambers manifest in different discussions dealing with different issues over an extended period of time. We explore the activities of two groups of polarised accounts across three Twitter discussions in the Australian context. We found Australian Twitter accounts arguing against marriage equality in 2017 were more likely to support the notion that arsonists were the primary cause of the 2019/2020 Australian bushfires, and those supporting marriage equality argued against that arson narrative. We also found strong evidence that the stance people took on marriage equality in 2017 did not predict their political stance in discussions around the Australian federal election two years later. Although mostly isolated from each other, we observe that in certain situations the polarised groups may interact with the broader community, which offers hope that the echo chambers may be reduced with concerted outreach to members.

preprint2022arXiv

Promoting and countering misinformation during Australia's 2019-2020 bushfires: A case study of polarisation

During Australia's unprecedented bushfires in 2019-2020, misinformation blaming arson resurfaced on Twitter using #ArsonEmergency. The extent to which bots were responsible for disseminating and amplifying this misinformation has received scrutiny in the media and academic research. Here we study Twitter communities spreading this misinformation during the population-level event, and investigate the role of online communities and bots. Our in-depth investigation of the dynamics of the discussion uses a phased approach -- before and after reporting of bots promoting the hashtag was broadcast by the mainstream media. Though we did not find many bots, the most bot-like accounts were social bots, which present as genuine humans. Further, we distilled meaningful quantitative differences between two polarised communities in the Twitter discussion, resulting in the following insights. First, Supporters of the arson narrative promoted misinformation by engaging others directly with replies and mentions using hashtags and links to external sources. In response, Opposers retweeted fact-based articles and official information. Second, Supporters were embedded throughout their interaction networks, but Opposers obtained high centrality more efficiently despite their peripheral positions. By the last phase, Opposers and unaffiliated accounts appeared to coordinate, potentially reaching a broader audience. Finally, unaffiliated accounts shared the same URLs as Opposers over Supporters by a ratio of 9:1 in the last phase, having shared mostly Supporter URLs in the first phase. This foiled Supporters' efforts, highlighting the value of exposing misinformation campaigns. We speculate that the communication strategies observed here could be discoverable in other misinformation-related discussions and could inform counter-strategies.

preprint2021arXiv

Exploring the effect of streamed social media data variations on social network analysis

To study the effects of Online Social Network (OSN) activity on real-world offline events, researchers need access to OSN data, the reliability of which has particular implications for social network analysis. This relates not only to the completeness of any collected dataset, but also to constructing meaningful social and information networks from them. In this multidisciplinary study, we consider the question of constructing traditional social networks from OSN data and then present several measurement case studies showing how variations in collected OSN data affects social network analyses. To this end we developed a systematic com parison methodology, which we applied to five pairs of parallel datasets collected from Twitter in four case studies. We found considerable differences in several of the datasets collected with different tools and that these variations significantly alter the results of subsequent analyses. Our results lead to a set of guidelines for researchers planning to collect online data streams to infer social networks.

preprint2020arXiv

A method to evaluate the reliability of social media data for social network analysis

To study the effects of Online Social Network (OSN) activity on real-world offline events, researchers need access to OSN data, the reliability of which has particular implications for social network analysis. This relates not only to the completeness of any collected dataset, but also to constructing meaningful social and information networks from them. In this multidisciplinary study, we consider the question of constructing traditional social networks from OSN data and then present a measurement case study showing how the reliability of OSN data affects social network analyses. To this end we developed a systematic comparison methodology, which we applied to two parallel datasets we collected from Twitter. We found considerable differences in datasets collected with different tools and that these variations significantly alter the results of subsequent analyses. Our results lead to a set of guidelines for researchers planning to collect online data streams to infer social networks.

preprint2020arXiv

Complex contagion features without social reinforcement in a model of social information flow

Contagion models are a primary lens through which we understand the spread of information over social networks. However, simple contagion models cannot reproduce the complex features observed in real-world data, leading to research on more complicated complex contagion models. A noted feature of complex contagion is social reinforcement that individuals require multiple exposures to information before they begin to spread it themselves. Here we show that the quoter model, a model of the social flow of written information over a network, displays features of complex contagion, including the weakness of long ties and that increased density inhibits rather than promotes information flow. Interestingly, the quoter model exhibits these features despite having no explicit social reinforcement mechanism, unlike complex contagion models. Our results highlight the need to complement contagion models with an information-theoretic view of information spreading to better understand how network properties affect information flow and what are the most necessary ingredients when modeling social behavior.

preprint2020arXiv

Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts

A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts' rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback-Leibler and Jensen-Shannon divergences. Through several case studies, we demonstrate how generalized word shift graphs can be flexibly applied across domains for diagnostic investigation, hypothesis generation, and substantive interpretation. By providing a detailed lens into textual shifts between corpora, generalized word shift graphs help computational social scientists, digital humanists, and other text analysis practitioners fashion more robust scientific narratives.

preprint2020arXiv

Symptom extraction from the narratives of personal experiences with COVID-19 on Reddit

Social media discussion of COVID-19 provides a rich source of information into how the virus affects people's lives that is qualitatively different from traditional public health datasets. In particular, when individuals self-report their experiences over the course of the virus on social media, it can allow for identification of the emotions each stage of symptoms engenders in the patient. Posts to the Reddit forum r/COVID19Positive contain first-hand accounts from COVID-19 positive patients, giving insight into personal struggles with the virus. These posts often feature a temporal structure indicating the number of days after developing symptoms the text refers to. Using topic modelling and sentiment analysis, we quantify the change in discussion of COVID-19 throughout individuals' experiences for the first 14 days since symptom onset. Discourse on early symptoms such as fever, cough, and sore throat was concentrated towards the beginning of the posts, while language indicating breathing issues peaked around ten days. Some conversation around critical cases was also identified and appeared at a roughly constant rate. We identified two clear clusters of positive and negative emotions associated with the evolution of these symptoms and mapped their relationships. Our results provide a perspective on the patient experience of COVID-19 that complements other medical data streams and can potentially reveal when mental health issues might appear.

preprint2016arXiv

A data-driven model for influenza transmission incorporating media effects

Numerous studies have attempted to model the effect of mass media on the transmission of diseases such as influenza, however quantitative data on media engagement has until recently been difficult to obtain. With the recent explosion of "big data" coming from online social media and the like, large volumes of data on a population's engagement with mass media during an epidemic are becoming available to researchers. In this study we combine an online data set comprising millions of shared messages relating to influenza with traditional surveillance data on flu activity to suggest a functional form for the relationship between the two. Using this data we present a simple deterministic model for influenza dynamics incorporating media effects, and show that such a model helps explain the dynamics of historical influenza outbreaks. Furthermore, through model selection we show that the proposed media function fits historical data better than other media functions proposed in earlier studies.

preprint2016arXiv

Tracking the Teletherms: The spatiotemporal dynamics of the hottest and coldest days of the year

Instabilities and long term shifts in seasons, whether induced by natural drivers or human activities, pose great disruptive threats to ecological, agricultural, and social systems. Here, we propose, measure, and explore two fundamental markers of location-sensitive seasonal variations: the Summer and Winter Teletherms---the on-average annual dates of the hottest and coldest days of the year. We analyse daily temperature extremes recorded at 1218 stations across the contiguous United States from 1853--2012, and observe large regional variation with the Summer Teletherm falling up to 90 days after the Summer Solstice, and 50 days for the Winter Teletherm after the Winter Solstice. We show that Teletherm temporal dynamics are substantive with clear and in some cases dramatic shifts reflective of system bifurcations. We also compare recorded daily temperature extremes with output from two regional climate models finding considerable though relatively unbiased error. Our work demonstrates that Teletherms are an intuitive, powerful, and statistically sound measure of local climate change, and that they pose detailed, stringent challenges for future theoretical and computational models.

preprint2015arXiv

Climate change sentiment on Twitter: An unsolicited public opinion poll

The consequences of anthropogenic climate change are extensively debated through scientific papers, newspaper articles, and blogs. Newspaper articles may lack accuracy, while the severity of findings in scientific papers may be too opaque for the public to understand. Social media, however, is a forum where individuals of diverse backgrounds can share their thoughts and opinions. As consumption shifts from old media to new, Twitter has become a valuable resource for analyzing current events and headline news. In this research, we analyze tweets containing the word "climate" collected between September 2008 and July 2014. Through use of a previously developed sentiment measurement tool called the Hedonometer, we determine how collective sentiment varies in response to climate change news, events, and natural disasters. We find that natural disasters, climate bills, and oil-drilling can contribute to a decrease in happiness while climate rallies, a book release, and a green ideas contest can contribute to an increase in happiness. Words uncovered by our analysis suggest that responses to climate change news are predominately from climate change activists rather than climate change deniers, indicating that Twitter is a valuable resource for the spread of climate change awareness.

preprint2015arXiv

Constructing a taxonomy of fine-grained human movement and activity motifs through social media

Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people's frequently visited locations along with their likely activities as a function of time of day and day of week, capitalizing on both the content and geolocation of messages. We subsequently characterize people's transition pattern motifs and demonstrate that spatial information is encoded in word choice.

preprint2014arXiv

Human language reveals a universal positivity bias

Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias is strongly independent of frequency of word usage. Alongside these general regularities, we describe inter-language variations in the emotional spectrum of languages which allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.

preprint2014arXiv

Standing Swells Surveyed Showing Surprisingly Stable Solutions for the Lorenz '96 Model

The Lorenz '96 model is an adjustable dimension system of ODEs exhibiting chaotic behavior representative of dynamics observed in the Earth's atmosphere. In the present study, we characterize statistical properties of the chaotic dynamics while varying the degrees of freedom and the forcing. Tuning the dimensionality of the system, we find regions of parameter space with surprising stability in the form of standing waves traveling amongst the slow oscillators. The boundaries of these stable regions fluctuate regularly with the number of slow oscillators. These results demonstrate hidden order in the Lorenz '96 system, strengthening the evidence for its role as a hallmark representative of nonlinear dynamical behavior.

preprint2013arXiv

Happiness and the Patterns of Life: A Study of Geolocated Tweets

The patterns of life exhibited by large populations have been described and modeled both as a basic science exercise and for a range of applied goals such as reducing automotive congestion, improving disaster response, and even predicting the location of individuals. However, these studies previously had limited access to conversation content, rendering changes in expression as a function of movement invisible. In addition, they typically use the communication between a mobile phone and its nearest antenna tower to infer position, limiting the spatial resolution of the data to the geographical region serviced by each cellphone tower. We use a collection of 37 million geolocated tweets to characterize the movement patterns of 180,000 individuals, taking advantage of several orders of magnitude of increased spatial accuracy relative to previous work. Employing the recently developed sentiment analysis instrument known as the 'hedonometer', we characterize changes in word usage as a function of movement, and find that expressed happiness increases logarithmically with distance from an individual's average location.

preprint2013arXiv

Non-global parameter estimation using local ensemble Kalman filtering

We study parameter estimation for non-global parameters in a low-dimensional chaotic model using the local ensemble transform Kalman filter (LETKF). By modifying existing techniques for using observational data to estimate global parameters, we present a methodology whereby spatially-varying parameters can be estimated using observations only within a localized region of space. Taking a low-dimensional nonlinear chaotic conceptual model for atmospheric dynamics as our numerical testbed, we show that this parameter estimation methodology accurately estimates parameters which vary in both space and time, as well as parameters representing physics absent from the model.

preprint2013arXiv

Shadow networks: Discovering hidden nodes with models of information flow

Complex, dynamic networks underlie many systems, and understanding these networks is the concern of a great span of important scientific and engineering problems. Quantitative description is crucial for this understanding yet, due to a range of measurement problems, many real network datasets are incomplete. Here we explore how accidentally missing or deliberately hidden nodes may be detected in networks by the effect of their absence on predictions of the speed with which information flows through the network. We use Symbolic Regression (SR) to learn models relating information flow to network topology. These models show localized, systematic, and non-random discrepancies when applied to test networks with intentionally masked nodes, demonstrating the ability to detect the presence of missing nodes and where in the network those nodes are likely to reside.

preprint2013arXiv

The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place

We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated over the course of several recent years on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-level measures such as obesity rates.

preprint2012arXiv

On finite-size Lyapunov exponents in multiscale systems

We study the effect of regime switches on finite size Lyapunov exponents (FSLEs) in determining the error growth rates and predictability of multiscale systems. We consider a dynamical system involving slow and fast regimes and switches between them. The surprising result is that due to the presence of regimes the error growth rate can be a non-monotonic function of initial error amplitude. In particular, troughs in the large scales of FSLE spectra is shown to be a signature of slow regimes, whereas fast regimes are shown to cause large peaks in the spectra where error growth rates far exceed those estimated from the maximal Lyapunov exponent. We present analytical results explaining these signatures and corroborate them with numerical simulations. We show further that these peaks disappear in stochastic parametrizations of the fast chaotic processes, and the associated FSLE spectra reveal that large scale predictability properties of the full deterministic model are well approximated whereas small scale features are not properly resolved.

preprint2011arXiv

Controlling overestimation of error covariance in ensemble Kalman filters with sparse observations: A variance limiting Kalman filter

We consider the problem of an ensemble Kalman filter when only partial observations are available. In particular we consider the situation where the observational space consists of variables which are directly observable with known observational error, and of variables of which only their climatic variance and mean are given. To limit the variance of the latter poorly resolved variables we derive a variance limiting Kalman filter (VLKF) in a variational setting. We analyze the variance limiting Kalman filter for a simple linear toy model and determine its range of optimal performance. We explore the variance limiting Kalman filter in an ensemble transform setting for the Lorenz-96 system, and show that incorporating the information of the variance of some un-observable variables can improve the skill and also increase the stability of the data assimilation procedure.

preprint2011arXiv

Data assimilation in slow-fast systems using homogenized climate models

A deterministic multiscale toy model is studied in which a chaotic fast subsystem triggers rare transitions between slow regimes, akin to weather or climate regimes. Using homogenization techniques, a reduced stochastic parametrization model is derived for the slow dynamics. The reliability of this reduced climate model in reproducing the statistics of the slow dynamics of the full deterministic model for finite values of the time scale separation is numerically established. The statistics however is sensitive to uncertainties in the parameters of the stochastic model. It is investigated whether the stochastic climate model can be beneficial as a forecast model in an ensemble data assimilation setting, in particular in the realistic setting when observations are only available for the slow variables. The main result is that reduced stochastic models can indeed improve the analysis skill, when used as forecast models instead of the perfect full deterministic model. The stochastic climate model is far superior at detecting transitions between regimes. The observation intervals for which skill improvement can be obtained are related to the characteristic time scales involved. The reason why stochastic climate models are capable of producing superior skill in an ensemble setting is due to the finite ensemble size; ensembles obtained from the perfect deterministic forecast model lacks sufficient spread even for moderate ensemble sizes. Stochastic climate models provide a natural way to provide sufficient ensemble spread to detect transitions between regimes. This is corroborated with numerical simulations. The conclusion is that stochastic parametrizations are attractive for data assimilation despite their sensitivity to uncertainties in the parameters.

Lewis Mitchell

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

The entropy rate of Linear Additive Markov Processes

#IStandWithPutin versus #IStandWithUkraine: The interaction of bots and humans in discussion of the Russia/Ukraine war

Are we always in strife? A longitudinal study of the echo chamber effect in the Australian Twittersphere

Promoting and countering misinformation during Australia's 2019-2020 bushfires: A case study of polarisation

Exploring the effect of streamed social media data variations on social network analysis

A method to evaluate the reliability of social media data for social network analysis

Complex contagion features without social reinforcement in a model of social information flow

Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts

Symptom extraction from the narratives of personal experiences with COVID-19 on Reddit

A data-driven model for influenza transmission incorporating media effects

Tracking the Teletherms: The spatiotemporal dynamics of the hottest and coldest days of the year

Climate change sentiment on Twitter: An unsolicited public opinion poll

Constructing a taxonomy of fine-grained human movement and activity motifs through social media

Human language reveals a universal positivity bias

Standing Swells Surveyed Showing Surprisingly Stable Solutions for the Lorenz '96 Model

Happiness and the Patterns of Life: A Study of Geolocated Tweets

Non-global parameter estimation using local ensemble Kalman filtering

Shadow networks: Discovering hidden nodes with models of information flow

The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place

On finite-size Lyapunov exponents in multiscale systems

Controlling overestimation of error covariance in ensemble Kalman filters with sparse observations: A variance limiting Kalman filter

Data assimilation in slow-fast systems using homogenized climate models