Source author record

Luigi De Simone

Luigi De Simone appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Distributed, Parallel, and Cluster Computing Artificial Intelligence Hardware Architecture Networking and Internet Architecture Robotics

Catalog footprint

What is connected

11works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Introducing k4.0s: a Model for Mixed-Criticality Container Orchestration in Industry 4.0 (extended)

Time predictable edge cloud is seen as the answer for many arising needs in Industry 4.0 environments, since it is able to provide flexible, modular, and reconfigurable services with low latency and reduced costs. Orchestration systems are becoming the core component of clouds since they take decisions on the placement and lifecycle of software components. Current solutions start introducing real-time containers support for time predictability; however, these approaches lack of determinism as well as support for workloads requiring multiple levels of assurance/criticality. In this paper, we present k4.0s, an orchestration model for real-time and mixed-criticality environments, which includes timeliness, criticality and network requirements. The model leverages new abstractions for both node and jobs, e.g., node assurance, and requires novel monitoring strategies. We sketch an implementation of the proposal based on Kubernetes, and present an experimentation motivating the need for node assurance levels and adequate monitoring.

preprint2023arXiv

Run-time Failure Detection via Non-intrusive Event Analysis in a Large-Scale Cloud Computing Platform

Cloud computing systems fail in complex and unforeseen ways due to unexpected combinations of events and interactions among hardware and software components. These failures are especially problematic when they are silent, i.e., not accompanied by any explicit failure notification, hindering the timely detection and recovery. In this work, we propose an approach to run-time failure detection tailored for monitoring multi-tenant and concurrent cloud computing systems. The approach uses a non-intrusive form of event tracing, without manual changes to the system's internals to propagate session identifiers (IDs), and builds a set of lightweight monitoring rules from fault-free executions. We evaluated the effectiveness of the approach in detecting failures in the context of the OpenStack cloud computing platform, a complex and "off-the-shelf" distributed system, by executing a campaign of fault injection experiments in a multi-tenant scenario. Our experiments show that the approach detects the failure with an F1 score (0.85) and accuracy (0.77) higher than the ones provided by the OpenStack failure logging mechanisms (0.53 and 0.50) and two non--session-aware run-time verification approaches (both lower than 0.15). Moreover, the approach significantly decreases the average time to detect failures at run-time (~114 seconds) compared to the OpenStack logging mechanisms.

preprint2022arXiv

A Latency-driven Availability Assessment for Multi-Tenant Service Chains

Nowadays, most telecommunication services adhere to the Service Function Chain (SFC) paradigm, where network functions are implemented via software. In particular, container virtualization is becoming a popular approach to deploy network functions and to enable resource slicing among several tenants. The resulting infrastructure is a complex system composed by a huge amount of containers implementing different SFC functionalities, along with different tenants sharing the same chain. The complexity of such a scenario lead us to evaluate two critical metrics: the steady-state availability (the probability that a system is functioning in long runs) and the latency (the time between a service request and the pertinent response). Consequently, we propose a latency-driven availability assessment for multi-tenant service chains implemented via Containerized Network Functions (CNFs). We adopt a multi-state system to model single CNFs and the queueing formalism to characterize the service latency. To efficiently compute the availability, we develop a modified version of the Multidimensional Universal Generating Function (MUGF) technique. Finally, we solve an optimization problem to minimize the SFC cost under an availability constraint. As a relevant example of SFC, we consider a containerized version of IP Multimedia Subsystem, whose parameters have been estimated through fault injection techniques and load tests.

preprint2022arXiv

Certify the Uncertified: Towards Assessment of Virtualization for Mixed-criticality in the Automotive Domain

Nowadays, a feature-rich automotive vehicle offers several technologies to assist the driver during his trip and guarantee an amusing infotainment system to the other passengers, too. Consolidating worlds at different criticalities is a welcomed challenge for car manufacturers that have recently tried to leverage virtualization technologies due to reduced maintenance, deployment, and shipping costs. For this reason, more and more mixed-criticality systems are emerging, trying to assure compliance with the ISO 26262 Road Vehicle Safety standard. In this short paper, we provide a preliminary investigation of the certification capabilities for Jailhouse, a popular open-source partitioning hypervisor. To this aim, we propose a testing methodology and showcase the results, pointing out when the software gets to a faulting state, deviating from its expected behavior. The ultimate goal is to picture the right direction for the hypervisor towards a potential certification process.

preprint2022arXiv

On Temporal Isolation Assessment in Virtualized Railway Signaling as a Service Systems

Railway signaling systems provide numerous critical functions at different safety level, to correctly implement the entire transport ecosystem. Today, we are witnessing the increasing use of the cloud and virtualization technologies in such mixed-criticality systems, with the main goal of reducing costs, improving reliability, while providing orchestration capabilities. Unfortunately, virtualization includes several issues for assessing temporal isolation, which is critical for safety-related standards like EN50128. In this short paper, we envision leveraging the real-time flavor of a general-purpose hypervisor, like Xen, to build the Railway Signaling as a Service (RSaaS) systems of the future. We provide a preliminary background, highlighting the need for a systematic evaluation of the temporal isolation to demonstrate the feasibility of using general-purpose hypervisors in the safety-critical context for certification purposes.

preprint2022arXiv

ThorFI: A Novel Approach for Network Fault Injection as a Service

In this work, we present a novel fault injection solution (ThorFI) for virtual networks in cloud computing infrastructures. ThorFI is designed to provide non-intrusive fault injection capabilities for a cloud tenant, and to isolate injections from interfering with other tenants on the infrastructure. We present the solution in the context of the OpenStack cloud management platform, and release this implementation as open-source software. Finally, we present two relevant case studies of ThorFI, respectively in an NFV IMS and of a high-availability cloud application. The case studies show that ThorFI can enhance functional tests with fault injection, as in 4%-34% of the test cases the IMS is unable to handle faults; and that despite redundancy in virtual networks, faults in one virtual network segment can propagate to other segments, and can affect the throughput and response time of the cloud application as a whole, by about 3 times in the worst case.

preprint2021arXiv

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

Identifying the failure modes of cloud computing systems is a difficult and time-consuming task, due to the growing complexity of such systems, and the large volume and noisiness of failure data. This paper presents a novel approach for analyzing failure data from cloud systems, in order to relieve human analysts from manually fine-tuning the data for feature engineering. The approach leverages Deep Embedded Clustering (DEC), a family of unsupervised clustering algorithms based on deep learning, which uses an autoencoder to optimize data dimensionality and inter-cluster variance. We applied the approach in the context of the OpenStack cloud computing platform, both on the raw failure data and in combination with an anomaly detection pre-processing algorithm. The results show that the performance of the proposed approach, in terms of purity of clusters, is comparable to, or in some cases even better than manually fine-tuned clustering, thus avoiding the need for deep domain knowledge and reducing the effort to perform the analysis. In all cases, the proposed approach provides better performance than unsupervised clustering when no feature engineering is applied to the data. Moreover, the distribution of failure modes from the proposed approach is closer to the actual frequency of the failure modes.

preprint2021arXiv

Virtualization over Multiprocessor System-on-Chip: an Enabling Paradigm for Industrial IoT

The next-generation Industrial Internet of Things (IIoT) inherently requires smart devices featuring rich connectivity, local intelligence, and autonomous behavior. Emerging Multiprocessor System-on-Chip (MPSoC) platforms along with comprehensive support for virtualization will represent two key building blocks for smart devices in future IIoT edge infrastructures. We review representative existing solutions, highlighting the aspects that are most relevant for integration in IIoT solutions. From the analysis, we derive a reference architecture for a general virtualization-ready edge IIoT node. We then analyze the implications and benefits for a concrete use case scenario and identify the crucial research challenges to be faced to bridge the gap towards full support for virtualization-ready IIoT nodes

preprint2020arXiv

ProFIPy: Programmable Software Fault Injection as-a-Service

In this paper, we present a new fault injection tool (ProFIPy) for Python software. The tool is designed to be programmable, in order to enable users to specify their software fault model, using a domain-specific language (DSL) for fault injection. Moreover, to achieve better usability, ProFIPy is provided as software-as-a-service and supports the user through the configuration of the faultload and workload, failure data analysis, and full automation of the experiments using container-based virtualization and parallelization.

preprint2020arXiv

Towards Runtime Verification via Event Stream Processing in Cloud Computing Infrastructures

Software bugs in cloud management systems often cause erratic behavior, hindering detection, and recovery of failures. As a consequence, the failures are not timely detected and notified, and can silently propagate through the system. To face these issues, we propose a lightweight approach to runtime verification, for monitoring and failure detection of cloud computing systems. We performed a preliminary evaluation of the proposed approach in the OpenStack cloud management platform, an "off-the-shelf" distributed system, showing that the approach can be applied with high failure detection coverage.

preprint2019arXiv

Enhancing Failure Propagation Analysis in Cloud Computing Systems

In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection with anomaly detection to identify the symptoms of failures. We evaluated the proposed approach in the context of the OpenStack cloud computing platform. We show that our model can significantly improve the accuracy of failure analysis in terms of false positives and negatives, with a low computational cost.

Luigi De Simone

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Introducing k4.0s: a Model for Mixed-Criticality Container Orchestration in Industry 4.0 (extended)

Run-time Failure Detection via Non-intrusive Event Analysis in a Large-Scale Cloud Computing Platform

A Latency-driven Availability Assessment for Multi-Tenant Service Chains

Certify the Uncertified: Towards Assessment of Virtualization for Mixed-criticality in the Automotive Domain

On Temporal Isolation Assessment in Virtualized Railway Signaling as a Service Systems

ThorFI: A Novel Approach for Network Fault Injection as a Service

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

Virtualization over Multiprocessor System-on-Chip: an Enabling Paradigm for Industrial IoT

ProFIPy: Programmable Software Fault Injection as-a-Service

Towards Runtime Verification via Event Stream Processing in Cloud Computing Infrastructures

Enhancing Failure Propagation Analysis in Cloud Computing Systems