Researcher profile

Aaron Harwood

Aaron Harwood contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Twitter conversations predict the daily confirmed COVID-19 cases

As of writing this paper, COVID-19 (Coronavirus disease 2019) has spread to more than 220 countries and territories. Following the outbreak, the pandemic's seriousness has made people more active on social media, especially on the microblogging platforms such as Twitter and Weibo. The pandemic-specific discourse has remained on-trend on these platforms for months now. Previous studies have confirmed the contributions of such socially generated conversations towards situational awareness of crisis events. The early forecasts of cases are essential to authorities to estimate the requirements of resources needed to cope with the outgrowths of the virus. Therefore, this study attempts to incorporate the public discourse in the design of forecasting models particularly targeted for the steep-hill region of an ongoing wave. We propose a sentiment-involved topic-based latent variables search methodology for designing forecasting models from publicly available Twitter conversations. As a use case, we implement the proposed methodology on Australian COVID-19 daily cases and Twitter conversations generated within the country. Experimental results: (i) show the presence of latent social media variables that Granger-cause the daily COVID-19 confirmed cases, and (ii) confirm that those variables offer additional prediction capability to forecasting models. Further, the results show that the inclusion of social media variables introduces 48.83--51.38% improvements on RMSE over the baseline models. We also release the large-scale COVID-19 specific geotagged global tweets dataset, MegaGeoCOV, to the public anticipating that the geotagged data of this scale would aid in understanding the conversational dynamics of the pandemic through other spatial and temporal contexts.

preprint2019arXiv

A Utilization Model for Optimization of Checkpoint Intervals in Distributed Stream Processing Systems

State-of-the-art distributed stream processing systems such as Apache Flink and Storm have recently included checkpointing to provide fault-tolerance for stateful applications. This is a necessary eventuality as these systems head into the Exascale regime, and is evidently more efficient than replication as state size grows. However current systems use a nominal value for the checkpoint interval, indicative of assuming roughly 1 failure every 19 days, that does not take into account the salient aspects of the checkpoint process, nor the system scale, which can readily lead to inefficient system operation. To address this shortcoming, we provide a rigorous derivation of utilization -- the fraction of total time available for the system to do useful work -- that incorporates checkpoint interval, failure rate, checkpoint cost, failure detection and restart cost, depth of the system topology and message delay. Our model yields an elegant expression for utilization and provides an optimal checkpoint interval given these parameters, interestingly showing it to be dependent only on checkpoint cost and failure rate. We confirm the accuracy and efficacy of our model through experiments with Apache Flink, where we obtain improvements in system utilization for every case, especially as the system size increases. Our model provides a solid theoretical basis for the analysis and optimization of more elaborate checkpointing approaches.