Source author record

Ahmed Awad

Ahmed Awad appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Distributed, Parallel, and Cluster Computing Other Computer Science

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Distributed Real-Time Recommender System for Big Data Streams

In today's data-driven world, recommender systems (RS) play a crucial role to support the decision-making process. As users become continuously connected to the internet, they become less patient and less tolerant to obsolete recommendations made by an RS, e.g., movie recommendations on Netflix or books to read on Amazon. This, in turn, requires continuous training of the RS to cope with both the online fashion of data and the changing nature of user tastes and interests, known as concept drift. Streaming (online) RS has to address three requirements: continuous training and recommendation, handling concept drifts, and ability to scale. Streaming recommender systems proposed in the literature mostly, address the first two requirements and do not consider scalability. That is because they run the training process on a single machine. Such a machine, no matter how powerful it is, will eventually fail to cope with the volume of the data, a lesson learned from big data processing. To tackle the third challenge, we propose a Splitting and Replication mechanism for building distributed streaming recommender systems. Our mechanism is inspired by the successful shared-nothing architecture that underpins contemporary big data processing systems. We have applied our mechanism to two well-known approaches for online recommender systems, namely, matrix factorization and item-based collaborative filtering. We have implemented our mechanism on top of Apache Flink. We conducted experiments comparing the performance of the baseline (single machine) approach with our distributed approach. Evaluating different data sets, improvement in processing latency, throughput, and accuracy have been observed. Our experiments show online recall improvement by 40\% with more than 50\% less memory consumption.

preprint2022arXiv

Efficient Checking of Timed Order Compliance Rules over Graph-encoded Event Logs

Validation of compliance rules against process data is a fundamental functionality for business process management. Over the years, the problem has been addressed for different types of process data, i.e., process models, process event data at runtime, and event logs representing historical execution. Several approaches have been proposed to tackle compliance checking over process logs. These approaches have been based on different data models and storage technologies including relational databases, graph databases, and proprietary formats. Graph-based encoding of event logs is a promising direction that turns several process analytics tasks into queries on the underlying graph. Compliance checking is one class of such analysis tasks. In this paper, we argue that encoding log data as graphs alone is not enough to guarantee efficient processing of queries on this data. Efficiency is important due to the interactive nature of compliance checking. Thus, compliance checking would benefit from sub-linear scanning of the data. Moreover, as more data are added, e.g., new batches of logs arrive, the data size should grow sub-linearly to optimize both the space of storage and time for querying. We propose two encoding methods using graph representation, realized in Neo4J, and show the benefits of these encoding on a special class of queries, namely timed order compliance rules. Compared to a baseline encoding, our experiments show up to 5x speed up in the querying time as well as a 3x reduction in the graph size.

preprint2020arXiv

Correlating Unlabeled Events at Runtime

Process mining is of great importance for both data-centric and process-centric systems. Process mining receives so-called process logs which are collections of partially-ordered events. An event has to possess at least three attributes, case ID, task ID and a timestamp for mining approaches to work. When a case ID is unknown, the event is called unlabeled. Traditionally, process mining is an offline task, where events are collected from different sources are usually manually correlated. That is, events belonging to the same instance are assigned the same case ID. With today's high-volume/high-speed nature of, e.g., IoT applications, process mining shifts to be an online task. For this, event correlation has to be automated and has to occur as the data is generated. In this paper, we introduce an approach that correlates unlabeled events at runtime. Given a process model, a stream of unlabeled events and other information about task duration, our approach can induce a case identifier to a set of unlabeled events with a trust percentage. It can also check the conformance of the identified cases with the process model. A prototype of the proposed approach was implemented and evaluated against real-life and synthetic logs.