Source author record

Simon Smith

Simon Smith appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Programming Languages Data Structures and Algorithms Machine Learning math.CO math.GR Software Engineering

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

OODIDA: On-board/Off-board Distributed Real-Time Data Analytics for Connected Vehicles

A fleet of connected vehicles easily produces many gigabytes of data per hour, making centralized (off-board) data processing impractical. In addition, there is the issue of distributing tasks to on-board units in vehicles and processing them efficiently. Our solution to this problem is OODIDA (On-board/Off-board Distributed Data Analytics), which is a platform that tackles both task distribution to connected vehicles as well as concurrent execution of tasks on arbitrary subsets of edge clients. Its message-passing infrastructure has been implemented in Erlang/OTP, while the end points use a language-independent JSON interface. Computations can be carried out in arbitrary programming languages. The message-passing infrastructure of OODIDA is highly scalable, facilitating the execution of large numbers of concurrent tasks.

preprint2020arXiv

Facilitating Rapid Prototyping in the OODIDA Data Analytics Platform via Active-Code Replacement

OODIDA (On-board/Off-board Distributed Data Analytics) is a platform for distributed real-time analytics, targeting fleets of reference vehicles in the automotive industry. Its users are data analysts. The bulk of the data analytics tasks are performed by clients (on-board), while a central cloud server performs supplementary tasks (off-board). OODIDA can be automatically packaged and deployed, which necessitates restarting parts of the system, or all of it. As this is potentially disruptive, we added the ability to execute user-defined Python modules on clients as well as the server. These modules can be replaced without restarting any part of the system; they can even be replaced between iterations of an ongoing assignment. This feature is referred to as active-code replacement. It facilitates use cases such as iterative A/B testing of machine learning algorithms or modifying experimental algorithms on-the-fly. Consistency of results is achieved by majority vote, which prevents tainted state. Active-code replacement can be done in less than a second in an idealized setting whereas a standard deployment takes many orders of magnitude more time. The main contribution of this paper is the description of a relatively straightforward approach to active-code replacement that is very user-friendly. It enables a data analyst to quickly execute custom code on the cloud server as well as on client devices. Sensible safeguards and design decisions ensure that this feature can be used by non-specialists who are not familiar with the implementation of OODIDA in general or this feature in particular. As a consequence of adding the active-code replacement feature, OODIDA is now very well-suited for rapid prototyping.

preprint2020arXiv

S-RASTER: Contraction Clustering for Evolving Data Streams

Contraction Clustering (RASTER) is a single-pass algorithm for density-based clustering of 2D data. It can process arbitrary amounts of data in linear time and in constant memory, quickly identifying approximate clusters. It also exhibits good scalability in the presence of multiple CPU cores. RASTER exhibits very competitive performance compared to standard clustering algorithms, but at the cost of decreased precision. Yet, RASTER is limited to batch processing and unable to identify clusters that only exist temporarily. In contrast, S-RASTER is an adaptation of RASTER to the stream processing paradigm that is able to identify clusters in evolving data streams. This algorithm retains the main benefits of its parent algorithm, i.e. single-pass linear time cost and constant memory requirements for each discrete time step within a sliding window. The sliding window is efficiently pruned, and clustering is still performed in linear time. Like RASTER, S-RASTER trades off an often negligible amount of precision for speed. Our evaluation shows that competing algorithms are at least 50% slower. Furthermore, S-RASTER shows good qualitative results, based on standard metrics. It is very well suited to real-world scenarios where clustering does not happen continually but only periodically.

preprint2006arXiv

Infinite primitive directed graphs

A group $G$ of permutations of a set $Ω$ is {\em primitive} if it acts transitively on $Ω$, and the only $G$-invariant equivalence relations on $Ω$ are the trivial and universal relations. A graph $Γ$ is {\em primitive} if its automorphism group acts primitively on its vertex set. A graph $Γ$ has {\em connectivity one} if it is connected and there exists a vertex $α$ of $Γ$, such that the induced graph $Γ\setminus \{α\}$ is not connected. If $Γ$ has connectivity one, a {\em block} of $Γ$ is a connected subgraph that is maximal subject to the condition that it does not have connectivity one. The primitive undirected graphs with connectivity one have been fully classified by Jung and Watkins: the blocks of such graphs are primitive, pairwise-isomorphic and have at least three vertices. When one considers the general case of a directed primitive graph with connectivity one, however, this result no longer holds. In this paper we investigate the structure of these directed graphs, and obtain a complete characterisation.