Source author record

Aaron Harwood

Aaron Harwood appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Artificial Intelligence Computation and Language Social and Information Networks

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Twitter conversations predict the daily confirmed COVID-19 cases

As of writing this paper, COVID-19 (Coronavirus disease 2019) has spread to more than 220 countries and territories. Following the outbreak, the pandemic's seriousness has made people more active on social media, especially on the microblogging platforms such as Twitter and Weibo. The pandemic-specific discourse has remained on-trend on these platforms for months now. Previous studies have confirmed the contributions of such socially generated conversations towards situational awareness of crisis events. The early forecasts of cases are essential to authorities to estimate the requirements of resources needed to cope with the outgrowths of the virus. Therefore, this study attempts to incorporate the public discourse in the design of forecasting models particularly targeted for the steep-hill region of an ongoing wave. We propose a sentiment-involved topic-based latent variables search methodology for designing forecasting models from publicly available Twitter conversations. As a use case, we implement the proposed methodology on Australian COVID-19 daily cases and Twitter conversations generated within the country. Experimental results: (i) show the presence of latent social media variables that Granger-cause the daily COVID-19 confirmed cases, and (ii) confirm that those variables offer additional prediction capability to forecasting models. Further, the results show that the inclusion of social media variables introduces 48.83--51.38% improvements on RMSE over the baseline models. We also release the large-scale COVID-19 specific geotagged global tweets dataset, MegaGeoCOV, to the public anticipating that the geotagged data of this scale would aid in understanding the conversational dynamics of the pandemic through other spatial and temporal contexts.

preprint2019arXiv

A Utilization Model for Optimization of Checkpoint Intervals in Distributed Stream Processing Systems

State-of-the-art distributed stream processing systems such as Apache Flink and Storm have recently included checkpointing to provide fault-tolerance for stateful applications. This is a necessary eventuality as these systems head into the Exascale regime, and is evidently more efficient than replication as state size grows. However current systems use a nominal value for the checkpoint interval, indicative of assuming roughly 1 failure every 19 days, that does not take into account the salient aspects of the checkpoint process, nor the system scale, which can readily lead to inefficient system operation. To address this shortcoming, we provide a rigorous derivation of utilization -- the fraction of total time available for the system to do useful work -- that incorporates checkpoint interval, failure rate, checkpoint cost, failure detection and restart cost, depth of the system topology and message delay. Our model yields an elegant expression for utilization and provides an optimal checkpoint interval given these parameters, interestingly showing it to be dependent only on checkpoint cost and failure rate. We confirm the accuracy and efficacy of our model through experiments with Apache Flink, where we obtain improvements in system utilization for every case, especially as the system size increases. Our model provides a solid theoretical basis for the analysis and optimization of more elaborate checkpointing approaches.

preprint2016arXiv

Geometry of Interest (GOI): Spatio-Temporal Destination Extraction and Partitioning in GPS Trajectory Data

Nowadays large amounts of GPS trajectory data is being continuously collected by GPS-enabled devices such as vehicles navigation systems and mobile phones. GPS trajectory data is useful for applications such as traffic management, location forecasting, and itinerary planning. Such applications often need to extract the time-stamped Sequence of Visited Locations (SVLs) of the mobile objects. The nearest neighbor query (NNQ) is the most applied method for labeling the visited locations based on the IDs of the POIs in the process of SVL generation. NNQ in some scenarios is not accurate enough. To improve the quality of the extracted SVLs, instead of using NNQ, we label the visited locations as the IDs of the POIs which geometrically intersect with the GPS observations. Intersection operator requires the accurate geometry of the points of interest which we refer to them as the Geometries of Interest (GOIs). In some application domains (e.g. movement trajectories of animals), adequate information about the POIs and their GOIs may not be available a priori, or they may not be publicly accessible and, therefore, they need to be derived from GPS trajectory data. In this paper we propose a novel method for estimating the POIs and their GOIs, which consists of three phases: (i) extracting the geometries of the stay regions; (ii) constructing the geometry of destination regions based on the extracted stay regions; and (iii) constructing the GOIs based on the geometries of the destination regions. Using the geometric similarity to known GOIs as the major evaluation criterion, the experiments we performed using long-term GPS trajectory data show that our method outperforms the existing approaches.

preprint2006arXiv

A Case for Cooperative and Incentive-Based Coupling of Distributed Clusters

Research interest in Grid computing has grown significantly over the past five years. Management of distributed resources is one of the key issues in Grid computing. Central to management of resources is the effectiveness of resource allocation as it determines the overall utility of the system. The current approaches to superscheduling in a grid environment are non-coordinated since application level schedulers or brokers make scheduling decisions independently of the others in the system. Clearly, this can exacerbate the load sharing and utilization problems of distributed resources due to suboptimal schedules that are likely to occur. To overcome these limitations, we propose a mechanism for coordinated sharing of distributed clusters based on computational economy. The resulting environment, called \emph{Grid-Federation}, allows the transparent use of resources from the federation when local resources are insufficient to meet its users' requirements. The use of computational economy methodology in coordinating resource allocation not only facilitates the QoS based scheduling, but also enhances utility delivered by resources.