Source author record

Gareth Tyson

Gareth Tyson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Networking and Internet Architecture Cryptography and Security cs.CY Computation and Language Human-Computer Interaction Software Engineering

Catalog footprint

What is connected

19works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Understanding the Consequences of VTuber Reincarnation

The rapid proliferation of VTubers, digital avatars controlled and voiced by human actors (Nakanohito), has created a lucrative and popular entertainment ecosystem. However, the prevailing industry model, where corporations retain ownership of the VTuber persona while the Nakanohito bears the immense pressure of dual-identity management, exposes the Nakanohito to significant vulnerabilities, including burnout, harassment, and precarious labor conditions. When these pressures become untenable, the Nakanohito may terminate their contracts and later debut with a new persona, a process known as "reincarnation". This phenomenon, a rising concern in the industry, inflicts substantial losses on the Nakanohito, agencies, and audiences alike. Understanding the quantitative fallout of reincarnation is crucial for mitigating this damage and fostering a more sustainable industry. To address this gap, we conduct the first large-scale empirical study of VTuber reincarnation, analyzing 12 significant cases using a comprehensive dataset of 728K livestream sessions and 4.5B viewer interaction records. Our results suggest reincarnation significantly damages a Nakanohito's career, leading to a decline in audience and financial support, an increase in harassment, and negative repercussions for the wider VTuber industry. Overall, these insights carry immediate implications for mitigating the significant professional and personal costs of the reincarnation, and fostering a healthier and more equitable VTuber ecosystem.

preprint2023arXiv

A Twitter Dataset for Pakistani Political Discourse

We share the largest dataset for the Pakistani Twittersphere consisting of over 49 million tweets, collected during one of the most politically active periods in the country. We collect the data after the deposition of the government by a No Confidence Vote in April 2022. This large-scale dataset can be used for several downstream tasks such as political bias, bots detection, trolling behavior, (dis)misinformation, and censorship related to Pakistani Twitter users. In addition, this dataset provides a large collection of tweets in Urdu and Roman Urdu that can be used for optimizing language processing tasks.

preprint2022arXiv

A Reddit Dataset for the Russo-Ukrainian Conflict in 2022

Reddit consists of sub-communities that cover a focused topic. This paper provides a list of relevant subreddits for the ongoing Russo-Ukrainian crisis. We perform an exhaustive subreddit exploration using keyword search and shortlist 12 subreddits as potential candidates that contain nominal discourse related to the crisis. These subreddits contain over 300,000 posts and 8 million comments collectively. We provide an additional categorization of content into two categories, "R-U Conflict", and "Military Related", based on their primary focus. We further perform content characterization of those subreddits. The results show a surge of posts and comments soon after Russia launched the invasion. "Military Related" posts are more likely to receive more replies than "R-U Conflict" posts. Our textual analysis shows an apparent preference for the Pro-Ukraine stance in "R-U Conflict", while "Military Related" retain a neutral stance.

preprint2022arXiv

A Study of Third-party Resources Loading on Web

This paper performs a large-scale study of dependency chains in the web, to find that around 50% of first-party websites render content that they did not directly load. Although the majority (84.91%) of websites have short dependency chains (below 3 levels), we find websites with dependency chains exceeding 30. Using VirusTotal, we show that 1.2% of these third-parties are classified as suspicious -- although seemingly small, this limited set of suspicious third-parties have remarkable reach into the wider ecosystem. We find that 73% of websites under-study load resources from suspicious third-parties, and 24.8% of first-party webpages contain at least three third-parties classified as suspicious in their dependency chain. By running sandboxed experiments, we observe a range of activities with the majority of suspicious JavaScript codes downloading malware.

preprint2022arXiv

Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web

Recent years have witnessed growing consolidation of web operations. For example, the majority of web traffic now originates from a few organizations, and even micro-websites often choose to host on large pre-existing cloud infrastructures. In response to this, the "Decentralized Web" attempts to distribute ownership and operation of web services more evenly. This paper describes the design and implementation of the largest and most widely used Decentralized Web platform - the InterPlanetary File System (IPFS) - an open-source, content-addressable peer-to-peer network that provides distributed data storage and delivery. IPFS has millions of daily content retrievals and already underpins dozens of third-party applications. This paper evaluates the performance of IPFS by introducing a set of measurement methodologies that allow us to uncover the characteristics of peers in the IPFS network. We reveal presence in more than 2700 Autonomous Systems and 152 countries, the majority of which operate outside large central cloud providers like Amazon or Azure. We further evaluate IPFS performance, showing that both publication and retrieval delays are acceptable for a wide range of use cases. Finally, we share our datasets, experiences and lessons learned.

preprint2022arXiv

Jettisoning Junk Messaging in the Era of End-to-End Encryption: A Case Study of WhatsApp

WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, understanding misbehavior on WhatsApp is an important issue. The sending of unwanted junk messages by unknown contacts via WhatsApp remains understudied by researchers, in part because of the end-to-end encryption offered by the platform. We address this gap by studying junk messaging on a multilingual dataset of 2.6M messages sent to 5K public WhatsApp groups in India. We characterise both junk content and senders. We find that nearly 1 in 10 messages is unwanted content sent by junk senders, and a number of unique strategies are employed to reflect challenges faced on WhatsApp, e.g., the need to change phone numbers regularly. We finally experiment with on-device classification to automate the detection of junk, whilst respecting end-to-end encryption.

preprint2022arXiv

Toxicity in the Decentralized Web and the Potential for Model Sharing

The "Decentralised Web" (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is challenging. This is because there is no central entity that can define toxicity, nor a large central pool of data that can be used to build universal classifiers. It is therefore unsurprising that there have been several high-profile cases of the DW being misused to coordinate and disseminate harmful material. Using a dataset of 9.9M posts from 117K users on Pleroma (a popular DW microblogging service), we quantify the presence of toxic content. We find that toxic content is prevalent and spreads rapidly between instances. We show that automating per-instance content moderation is challenging due to the lack of sufficient training data available and the effort required in labelling. We therefore propose and evaluate ModPair, a model sharing system that effectively detects toxic content, gaining an average per-instance macro-F1 score 0.89.

preprint2022arXiv

Twitter Dataset for 2022 Russo-Ukrainian Crisis

Online Social Networks (OSNs) play a significant role in information sharing during a crisis. The data collected during such a crisis can reflect the large scale public opinions and sentiment. In addition, OSN data can also be used to study different campaigns that are employed by various entities to engineer public opinions. Such information sharing campaigns can range from spreading factual information to propaganda and misinformation. We provide a Twitter dataset of the 2022 Russo-Ukrainian conflict. In the first release, we share over 1.6 million tweets shared during the 1st week of the crisis.

preprint2021arXiv

An Empirical Assessment of Global COVID-19 Contact Tracing Applications

The rapid spread of COVID-19 has made manual contact tracing difficult. Thus, various public health authorities have experimented with automatic contact tracing using mobile applications (or "apps"). These apps, however, have raised security and privacy concerns. In this paper, we propose an automated security and privacy assessment tool, COVIDGUARDIAN, which combines identification and analysis of Personal Identification Information (PII), static program analysis and data flow analysis, to determine security and privacy weaknesses. Furthermore, in light of our findings, we undertake a user study to investigate concerns regarding contact tracing apps. We hope that COVIDGUARDIAN, and the issues raised through responsible disclosure to vendors, can contribute to the safe deployment of mobile contact tracing. As part of this, we offer concrete guidelines, and highlight gaps between user requirements and app performance.

preprint2020arXiv

A First Instagram Dataset on COVID-19

The novel coronavirus (COVID-19) pandemic outbreak is drastically shaping and reshaping many aspects of our life, with a huge impact on our social life. In this era of lockdown policies in most of the major cities around the world, we see a huge increase in people and professional engagement in social media. Social media is playing an important role in news propagation as well as keeping people in contact. At the same time, this source is both a blessing and a curse as the coronavirus infodemic has become a major concern, and is already a topic that needs special attention and further research. In this paper, we provide a multilingual coronavirus (COVID-19) Instagram dataset that we have been continuously collected since March 30, 2020. We are making our dataset available to the research community at Github. We believe that this contribution will help the community to better understand the dynamics behind this phenomenon in Instagram, as one of the major social media. This dataset could also help study the propagation of misinformation related to this outbreak.

preprint2020arXiv

Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity

As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, more popular social media platforms. In this research, we study the temporal activity trends on Urban Dictionary and provide the first analysis of how this activity relates to content being discussed on a major social network: Twitter. By collecting the whole of Urban Dictionary, as well as a large sample of tweets over seven years, we explore the connections between the words and phrases that are defined and searched for on Urban Dictionary and the content that is talked about on Twitter. Through a series of cross-correlation calculations, we identify cases in which Urban Dictionary activity closely reflects the larger conversation happening on Twitter. Then, we analyze the types of terms that have a stronger connection to discussions on Twitter, finding that Urban Dictionary activity that is positively correlated with Twitter is centered around terms related to memes, popular public figures, and offline events. Finally, We explore the relationship between periods of time when terms are trending on Twitter and the corresponding activity on Urban Dictionary, revealing that new definitions are more likely to be added to Urban Dictionary for terms that are currently trending on Twitter.

preprint2020arXiv

Characterising User Content on a Multi-lingual Social Network

Social media has been on the vanguard of political information diffusion in the 21st century. Most studies that look into disinformation, political influence and fake-news focus on mainstream social media platforms. This has inevitably made English an important factor in our current understanding of political activity on social media. As a result, there has only been a limited number of studies into a large portion of the world, including the largest, multilingual and multi-cultural democracy: India. In this paper we present our characterisation of a multilingual social network in India called ShareChat. We collect an exhaustive dataset across 72 weeks before and during the Indian general elections of 2019, across 14 languages. We investigate the cross lingual dynamics by clustering visually similar images together, and exploring how they move across language barriers. We find that Telugu, Malayalam, Tamil and Kannada languages tend to be dominant in soliciting political images (often referred to as memes), and posts from Hindi have the largest cross-lingual diffusion across ShareChat (as well as images containing text in English). In the case of images containing text that cross language barriers, we see that language translation is used to widen the accessibility. That said, we find cases where the same image is associated with very different text (and therefore meanings). This initial characterisation paves the way for more advanced pipelines to understand the dynamics of fake and political content in a multi-lingual and non-textual setting.

preprint2020arXiv

Characterizing EOSIO Blockchain

EOSIO has become one of the most popular blockchain platforms since its mainnet launch in June 2018. In contrast to the traditional PoW-based systems (e.g., Bitcoin and Ethereum), which are limited by low throughput, EOSIO is the first high throughput Delegated Proof of Stake system that has been widely adopted by many applications. Although EOSIO has millions of accounts and billions of transactions, little is known about its ecosystem, especially related to security and fraud. In this paper, we perform a large-scale measurement study of the EOSIO blockchain and its associated DApps. We gather a large-scale dataset of EOSIO and characterize activities including money transfers, account creation and contract invocation. Using our insights, we then develop techniques to automatically detect bots and fraudulent activity. We discover thousands of bot accounts (over 30\% of the accounts in the platform) and a number of real-world attacks (301 attack accounts). By the time of our study, 80 attack accounts we identified have been confirmed by DApp teams, causing 828,824 EOS tokens losses (roughly 2.6 million US\$) in total.

preprint2016arXiv

A First Look at User Activity on Tinder

Mobile dating apps have become a popular means to meet potential partners. Although several exist, one recent addition stands out amongst all others. Tinder presents its users with pictures of people geographically nearby, whom they can either like or dislike based on first impressions. If two users like each other, they are allowed to initiate a conversation via the chat feature. In this paper we use a set of curated profiles to explore the behaviour of men and women in Tinder. We reveal differences between the way men and women interact with the app, highlighting the strategies employed. Women attain large numbers of matches rapidly, whilst men only slowly accumulate matches. To expand on our findings, we collect survey data to understand user intentions on Tinder. Most notably, our results indicate that a little effort in grooming profiles, especially for male users, goes a long way in attracting attention.

preprint2016arXiv

Charting an Intent Driven Network

The current strong divide between applications and the network control plane is desirable for many reasons; but a downside is that the network is kept in the dark regarding the ultimate purposes and intentions of applications and, as a result, is unable to optimize for these. An alternative approach, explored in this paper, is for applications to declare to the network their abstract intents and assumptions; e.g. "this is a Tweet", or "this application will run within a local domain". Such an enriched semantic has the potential to enable the network better to fulfill application intent, while also helping optimize network resource usage across applications. We refer to this approach as 'intent driven networking' (IDN), and we sketch an incrementally-deployable design to serve as a stepping stone towards a practical realization of the IDN concept within today's Internet.

preprint2016arXiv

Staggercast: Demand-Side Management for ISPs

The continuing expansion of Internet media consumption has increased traffic volumes, and hence congestion, on access links. In response, both mobile and wireline ISPs must either increase capacity or perform traffic engineering over existing resources. Unfortunately, provisioning timescales are long, the process is costly, and single-homing means operators cannot balance across the last mile. Inspired by energy and transport networks, we propose demand-side management of users to reduce the impact caused by consumption patterns out-pacing that of edge network provision. By directly affecting user behaviour through a range of incentives, our techniques enable resource management over shorter timescales than is possible in conventional networks. Using survey data from 100 participants we explore the feasibility of introducing the principles of demand-side management in today's networks.

preprint2015arXiv

Does the Internet deserve everybody?

There has been a long standing tradition amongst developed nations of influencing, both directly and indirectly, the activities of developing economies. Behind this is one of a range of aims: building/improving living standards, bettering the social status of recipient communities, etc. In some cases, this has resulted in prosperous relations, yet often this has been seen as the exploitation of a power position or a veneer for other activities (e.g. to tap into new emerging markets). In this paper, we explore whether initiatives to improve Internet connectivity in developing regions are always ethical. We draw a list of issues that would aid in formulating Internet initiatives that are ethical, effective, and sustainable.

preprint2015arXiv

RiPKI: The Tragic Story of RPKI Deployment in the Web Ecosystem

Web content delivery is one of the most important services on the Internet. Access to websites is typically secured via TLS. However, this security model does not account for prefix hijacking on the network layer, which may lead to traffic blackholing or transparent interception. Thus, to achieve comprehensive security and service availability, additional protective mechanisms are necessary such as the RPKI, a recently deployed Resource Public Key Infrastructure to prevent hijacking of traffic by networks. This paper argues two positions. First, that modern web hosting practices make route protection challenging due to the propensity to spread servers across many different networks, often with unpredictable client redirection strategies, and, second, that we need a better understanding why protection mechanisms are not deployed. To initiate this, we empirically explore the relationship between web hosting infrastructure and RPKI deployment. Perversely, we find that less popular websites are more likely to be secured than the prominent sites. Worryingly, we find many large-scale CDNs do not support RPKI, thus making their customers vulnerable. This leads us to explore business reasons why operators are hesitant to deploy RPKI, which may help to guide future research on improving Internet security.

preprint2014arXiv

The Effect of Network and Infrastructural Variables on SPDY's Performance

HTTP is a successful Internet technology on top of which a lot of the web resides. However, limitations with its current specification, i.e. HTTP/1.1, have encouraged some to look for the next generation of HTTP. In SPDY, Google has come up with such a proposal that has growing community acceptance, especially after being adopted by the IETF HTTPbis-WG as the basis for HTTP/2.0. SPDY has the potential to greatly improve web experience with little deployment overhead. However, we still lack an understanding of its true potential in different environments. This paper seeks to resolve these issues, offering a comprehensive evaluation of SPDY's performance using extensive experiments. We identify the impact of network characteristics and website infrastructure on SPDY's potential page loading benefits, finding that these factors are decisive for SPDY and its optimal deployment strategy. Through this, we feed into the wider debate regarding HTTP/2.0, exploring the key aspects that impact the performance of this future protocol.

Gareth Tyson

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Understanding the Consequences of VTuber Reincarnation

A Twitter Dataset for Pakistani Political Discourse

A Reddit Dataset for the Russo-Ukrainian Conflict in 2022

A Study of Third-party Resources Loading on Web

Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web

Jettisoning Junk Messaging in the Era of End-to-End Encryption: A Case Study of WhatsApp

Toxicity in the Decentralized Web and the Potential for Model Sharing

Twitter Dataset for 2022 Russo-Ukrainian Crisis

An Empirical Assessment of Global COVID-19 Contact Tracing Applications

A First Instagram Dataset on COVID-19

Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity

Characterising User Content on a Multi-lingual Social Network

Characterizing EOSIO Blockchain

A First Look at User Activity on Tinder

Charting an Intent Driven Network

Staggercast: Demand-Side Management for ISPs

Does the Internet deserve everybody?

RiPKI: The Tragic Story of RPKI Deployment in the Web Ecosystem

The Effect of Network and Infrastructural Variables on SPDY's Performance