Source author record

Alexandru Iosup

Alexandru Iosup appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Software Engineering cs.CY Networking and Internet Architecture Performance Systems and Control

Catalog footprint

What is connected

13works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Failure Analysis of Big Cloud Service Providers Prior to and During Covid-19 Period

Cloud services are important for societal function such as healthcare, commerce, entertainment and education. Cloud can provide a variety of features such as increased collaboration and inexpensive computing. Failures are unavoidable in cloud services due to the large size and complexity, resulting in decreased reliability and efficiency. For example, due to bugs, many high-severity failures have been occurring in cloud infrastructure of popular providers, causing outages of several hours and the unrecoverable loss of user data. There are prior studies about cloud failure analyses are limited and use sources such as news articles. However, a detailed cloud failure focused study is required that provides analyses for cloud failure data gathered directly from the vendors. Furthermore, the Covid-19 cloud failures should be studied as cloud services played a major role throughout the Covid-19 period, as individuals relied on cloud services for activities such as working from home. A program can be made for this task. As a result, we will be able to better understand and mitigate cloud failures to reduce the effect of cloud failures.

preprint2022arXiv

Future Computer Systems and Networking Research in the Netherlands: A Manifesto

Our modern society and competitive economy depend on a strong digital foundation and, in turn, on sustained research and innovation in computer systems and networks (CompSys). With this manifesto, we draw attention to CompSys as a vital part of ICT. Among ICT technologies, CompSys covers all the hardware and all the operational software layers that enable applications; only application-specific details, and often only application-specific algorithms, are not part of CompSys. Each of the Top Sectors of the Dutch Economy, each route in the National Research Agenda, and each of the UN Sustainable Development Goals pose challenges that cannot be addressed without groundbreaking CompSys advances. Looking at the 2030-2035 horizon, important new applications will emerge only when enabled by CompSys developments. Triggered by the COVID-19 pandemic, millions moved abruptly online, raising infrastructure scalability and data sovereignty issues; but governments processing social data and responsible social networks still require a paradigm shift in data sovereignty and sharing. AI already requires massive computer systems which can cost millions per training task, but the current technology leaves an unsustainable energy footprint including large carbon emissions. Computational sciences such as bioinformatics, and "Humanities for all" and "citizen data science", cannot become affordable and efficient until computer systems take a generational leap. Similarly, the emerging quantum internet depends on (traditional) CompSys to bootstrap operation for the foreseeable future. Large commercial sectors, including finance and manufacturing, require specialized computing and networking or risk becoming uncompetitive. And, at the core of Dutch innovation, promising technology hubs, deltas, ports, and smart cities, could see their promise stagger due to critical dependency on non-European technology.

preprint2022arXiv

Let's Trace It: Fine-Grained Serverless Benchmarking using Synchronous and Asynchronous Orchestrated Applications

Making serverless computing widely applicable requires detailed performance understanding. Although contemporary benchmarking approaches exist, they report only coarse results, do not apply distributed tracing, do not consider asynchronous applications, and provide limited capabilities for (root cause) analysis. Addressing this gap, we design and implement ServiBench, a serverless benchmarking suite. ServiBench (i) leverages synchronous and asynchronous serverless applications representative of production usage, (ii) extrapolates cloud-provider data to generate realistic workloads, (iii) conducts comprehensive, end-to-end experiments to capture application-level performance, (iv) analyzes results using a novel approach based on (distributed) serverless tracing, and (v) supports comprehensively serverless performance analysis. With ServiBench, we conduct comprehensive experiments on AWS, covering five common performance factors: median latency, cold starts, tail latency, scalability, and dynamic workloads. We find that the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, or trigger-based coordination. We release collected experimental data under FAIR principles and ServiBench as a tested, extensible open-source tool.

preprint2021arXiv

A Review of Serverless Use Cases and their Characteristics

The serverless computing paradigm promises many desirable properties for cloud applications - low-cost, fine-grained deployment, and management-free operation. Consequently, the paradigm has underwent rapid growth: there currently exist tens of serverless platforms and all global cloud providers host serverless operations. To help tune existing platforms, guide the design of new serverless approaches, and overall contribute to understanding this paradigm, in this work we present a long-term, comprehensive effort to identify, collect, and characterize 89 serverless use cases. We survey use cases, sourced from white and grey literature, and from consultations with experts in areas such as scientific computing. We study each use case using 24 characteristics, including general aspects, but also workload, application, and requirements. When the use cases employ workflows, we further analyze their characteristics. Overall, we hope our study will be useful for both academia and industry, and encourage the community to further share and communicate their use cases. This article appears also as a SPEC Technical Report: https://research.spec.org/fileadmin/user_upload/documents/rg_cloud/endorsed_publications/SPEC_RG_2020_Serverless_Usecases.pdf The article may be submitted for peer-reviewed publication.

preprint2021arXiv

An Analysis of Distributed Systems Syllabi With a Focus on Performance-Related Topics

We analyze a dataset of 51 current (2019-2020) Distributed Systems syllabi from top Computer Science programs, focusing on finding the prevalence and context in which topics related to performance are being taught in these courses. We also study the scale of the infrastructure mentioned in DS courses, from small client-server systems to cloud-scale, peer-to-peer, global-scale systems. We make eight main findings, covering goals such as performance, and scalability and its variant elasticity; activities such as performance benchmarking and monitoring; eight selected performance-enhancing techniques (replication, caching, sharding, load balancing, scheduling, streaming, migrating, and offloading); and control issues such as trade-offs that include performance and performance variability.

preprint2021arXiv

Capelin: Data-Driven Capacity Procurement for Cloud Datacenters using Portfolios of Scenarios -- Extended Technical Report

Cloud datacenters provide a backbone to our digital society. Inaccurate capacity procurement for cloud datacenters can lead to significant performance degradation, denser targets for failure, and unsustainable energy consumption. Although this activity is core to improving cloud infrastructure, relatively few comprehensive approaches and support tools exist for mid-tier operators, leaving many planners with merely rule-of-thumb judgement. We derive requirements from a unique survey of experts in charge of diverse datacenters in several countries. We propose Capelin, a data-driven, scenario-based capacity planning system for mid-tier cloud datacenters. Capelin introduces the notion of portfolios of scenarios, which it leverages in its probing for alternative capacity-plans. At the core of the system, a trace-based, discrete-event simulator enables the exploration of different possible topologies, with support for scaling the volume, variety, and velocity of resources, and for horizontal (scale-out) and vertical (scale-up) scaling. Capelin compares alternative topologies and for each gives detailed quantitative operational information, which could facilitate human decisions of capacity planning. We implement and open-source Capelin, and show through comprehensive trace-based experiments it can aid practitioners. The results give evidence that reasonable choices can be worse by a factor of 1.5-2.0 than the best, in terms of performance degradation or energy consumption.

preprint2021arXiv

Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language

Computational Workflows are widely used in data analysis, enabling innovation and decision-making. In many domains (bioinformatics, image analysis, & radio astronomy) the analysis components are numerous and written in multiple different computer languages by third parties. However, many competing workflow systems exist, severely limiting portability of such workflows, thereby hindering the transfer of workflows between different systems, between different projects and different settings, leading to vendor lock-ins and limiting their generic re-usability. Here we present the Common Workflow Language (CWL) project which produces free and open standards for describing command-line tool based workflows. The CWL standards provide a common but reduced set of abstractions that are both used in practice and implemented in many popular workflow systems. The CWL language is declarative, which allows expressing computational workflows constructed from diverse software tools, executed each through their command-line interface. Being explicit about the runtime environment and any use of software containers enables portability and reuse. Workflows written according to the CWL standards are a reusable description of that analysis that are runnable on a diverse set of computing environments. These descriptions contain enough information for advanced optimization without additional input from workflow authors. The CWL standards support polylingual workflows, enabling portability and reuse of such workflows, easing for example scholarly publication, fulfilling regulatory requirements, collaboration in/between academic research and industry, while reducing implementation costs. CWL has been taken up by a wide variety of domains, and industries and support has been implemented in many major workflow systems.

preprint2020arXiv

A Survey and Annotated Bibliography of Workflow Scheduling in Computing Infrastructures: Community, Keyword, and Article Reviews -- Extended Technical Report

Workflows are prevalent in today's computing infrastructures. The workflow model support various different domains, from machine learning to finance and from astronomy to chemistry. Different Quality-of-Service (QoS) requirements and other desires of both users and providers makes workflow scheduling a tough problem, especially since resource providers need to be as efficient as possible with their resources to be competitive. To a newcomer or even an experienced researcher, sifting through the vast amount of articles can be a daunting task. Questions regarding the difference techniques, policies, emerging areas, and opportunities arise. Surveys are an excellent way to cover these questions, yet surveys rarely publish their tools and data on which it is based. Moreover, the communities that are behind these articles are rarely studied. We attempt to address these shortcomings in this work. We focus on four areas within workflow scheduling: 1) the workflow formalism, 2) workflow allocation, 3) resource provisioning, and 4) applications and services. Each part features one or more taxonomies, a view of the community, important and emerging keywords, and directions for future work. We introduce and make open-source an instrument we used to combine and store article meta-data. Using this meta-data, we 1) obtain important keywords overall and per year, per community, 2) identify keywords growing in importance, 3) get insight into the structure and relations within each community, and 4) perform a systematic literature survey per part to validate and complement our taxonomies.

preprint2020arXiv

Flexibility Is Key in Organizing a Global Professional Conference Online: The ICPE 2020 Experience in the COVID-19 Era

Organizing professional conferences online has never been more timely. Responding to the new challenges raised by COVID-19, the organizers of the ACM/SPEC International Conference on Performance Engineering 2020 had to address the question: How should we organize these conferences online? This article summarizes their successful answer.

preprint2020arXiv

Serverless Applications: Why, When, and How?

Serverless computing shows good promise for efficiency and ease-of-use. Yet, there are only a few, scattered and sometimes conflicting reports on questions such as 'Why do so many companies adopt serverless?', 'When are serverless applications well suited?', and 'How are serverless applications currently implemented?' To address these questions, we analyze 89 serverless applications from open-source projects, industrial sources, academic literature, and scientific computing - the most extensive study to date.

preprint2016arXiv

Ready for Rain? A View from SPEC Research on the Future of Cloud Metrics

In the past decade, cloud computing has emerged from a pursuit for a service-driven information and communication technology (ICT), into a signifcant fraction of the ICT market. Responding to the growth of the market, many alternative cloud services and their underlying systems are currently vying for the attention of cloud users and providers. Thus, benchmarking them is needed, to enable cloud users to make an informed choice, and to enable system DevOps to tune, design, and evaluate their systems. This requires focusing on old and new system properties, possibly leading to the re-design of classic benchmarking metrics, such as expressing performance as throughput and latency (response time), and the design of new, cloud-specififc metrics. Addressing this requirement, in this work we focus on four system properties: (i) elasticity of the cloud service, to accommodate large variations in the amount of service requested, (ii) performance isolation between the tenants of shared cloud systems, (iii) availability of cloud services and systems, and the (iv) operational risk of running a production system in a cloud environment.Focusing on key metrics, for each of these properties we review the state-of-the-art, then select or propose new metrics together with measurement approaches. We see the presented metrics as a foundation towards upcoming, industry-standard, cloud benchmarks. Keywords: Cloud Computing; Metrics; Measurement; Benchmarking; Elasticity; Isolation; Performance; Service Level Objective; Availability; Operational Risk.

preprint2016arXiv

Self-Awareness of Cloud Applications

Cloud applications today deliver an increasingly larger portion of the Information and Communication Technology (ICT) services. To address the scale, growth, and reliability of cloud applications, self-aware management and scheduling are becoming commonplace. How are they used in practice? In this chapter, we propose a conceptual framework for analyzing state-of-the-art self-awareness approaches used in the context of cloud applications. We map important applications corresponding to popular and emerging application domains to this conceptual framework, and compare the practical characteristics, benefits, and drawbacks of self-awareness approaches. Last, we propose a roadmap for addressing open challenges in self-aware cloud and datacenter applications.

preprint2014arXiv

Cloud Usage Patterns: A Formalism for Description of Cloud Usage Scenarios

Cloud computing is becoming an increasingly lucrative branch of the existing information and communication technologies (ICT). Enabling a debate about cloud usage scenarios can help with attracting new customers, sharing best-practices, and designing new cloud services. In contrast to previous approaches, which have attempted mainly to formalize the common service delivery models (i.e., Infrastructure-as-a-Service, Platform-as-a-Service, and Software-as-a-Service), in this work, we propose a formalism for describing common cloud usage scenarios referred to as cloud usage patterns. Our formalism takes a structuralist approach allowing decomposition of a cloud usage scenario into elements corresponding to the common cloud service delivery models. Furthermore, our formalism considers several cloud usage patterns that have recently emerged, such as hybrid services and value chains in which mediators are involved, also referred to as value chains with mediators. We propose a simple yet expressive textual and visual language for our formalism, and we show how it can be used in practice for describing a variety of real-world cloud usage scenarios. The scenarios for which we demonstrate our formalism include resource provisioning of global providers of infrastructure and/or platform resources, online social networking services, user-data processing services, online customer and ticketing services, online asset management and banking applications, CRM (Customer Relationship Management) applications, and online social gaming applications.

Alexandru Iosup

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Failure Analysis of Big Cloud Service Providers Prior to and During Covid-19 Period

Future Computer Systems and Networking Research in the Netherlands: A Manifesto

Let's Trace It: Fine-Grained Serverless Benchmarking using Synchronous and Asynchronous Orchestrated Applications

A Review of Serverless Use Cases and their Characteristics

An Analysis of Distributed Systems Syllabi With a Focus on Performance-Related Topics

Capelin: Data-Driven Capacity Procurement for Cloud Datacenters using Portfolios of Scenarios -- Extended Technical Report

Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language

A Survey and Annotated Bibliography of Workflow Scheduling in Computing Infrastructures: Community, Keyword, and Article Reviews -- Extended Technical Report

Flexibility Is Key in Organizing a Global Professional Conference Online: The ICPE 2020 Experience in the COVID-19 Era

Serverless Applications: Why, When, and How?

Ready for Rain? A View from SPEC Research on the Future of Cloud Metrics

Self-Awareness of Cloud Applications

Cloud Usage Patterns: A Formalism for Description of Cloud Usage Scenarios