Researcher profile

Alexandru Iosup

Alexandru Iosup contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2023arXiv

Failure Analysis of Big Cloud Service Providers Prior to and During Covid-19 Period

Cloud services are important for societal function such as healthcare, commerce, entertainment and education. Cloud can provide a variety of features such as increased collaboration and inexpensive computing. Failures are unavoidable in cloud services due to the large size and complexity, resulting in decreased reliability and efficiency. For example, due to bugs, many high-severity failures have been occurring in cloud infrastructure of popular providers, causing outages of several hours and the unrecoverable loss of user data. There are prior studies about cloud failure analyses are limited and use sources such as news articles. However, a detailed cloud failure focused study is required that provides analyses for cloud failure data gathered directly from the vendors. Furthermore, the Covid-19 cloud failures should be studied as cloud services played a major role throughout the Covid-19 period, as individuals relied on cloud services for activities such as working from home. A program can be made for this task. As a result, we will be able to better understand and mitigate cloud failures to reduce the effect of cloud failures.

preprint2022arXiv

Future Computer Systems and Networking Research in the Netherlands: A Manifesto

Our modern society and competitive economy depend on a strong digital foundation and, in turn, on sustained research and innovation in computer systems and networks (CompSys). With this manifesto, we draw attention to CompSys as a vital part of ICT. Among ICT technologies, CompSys covers all the hardware and all the operational software layers that enable applications; only application-specific details, and often only application-specific algorithms, are not part of CompSys. Each of the Top Sectors of the Dutch Economy, each route in the National Research Agenda, and each of the UN Sustainable Development Goals pose challenges that cannot be addressed without groundbreaking CompSys advances. Looking at the 2030-2035 horizon, important new applications will emerge only when enabled by CompSys developments. Triggered by the COVID-19 pandemic, millions moved abruptly online, raising infrastructure scalability and data sovereignty issues; but governments processing social data and responsible social networks still require a paradigm shift in data sovereignty and sharing. AI already requires massive computer systems which can cost millions per training task, but the current technology leaves an unsustainable energy footprint including large carbon emissions. Computational sciences such as bioinformatics, and "Humanities for all" and "citizen data science", cannot become affordable and efficient until computer systems take a generational leap. Similarly, the emerging quantum internet depends on (traditional) CompSys to bootstrap operation for the foreseeable future. Large commercial sectors, including finance and manufacturing, require specialized computing and networking or risk becoming uncompetitive. And, at the core of Dutch innovation, promising technology hubs, deltas, ports, and smart cities, could see their promise stagger due to critical dependency on non-European technology.

preprint2022arXiv

Let's Trace It: Fine-Grained Serverless Benchmarking using Synchronous and Asynchronous Orchestrated Applications

Making serverless computing widely applicable requires detailed performance understanding. Although contemporary benchmarking approaches exist, they report only coarse results, do not apply distributed tracing, do not consider asynchronous applications, and provide limited capabilities for (root cause) analysis. Addressing this gap, we design and implement ServiBench, a serverless benchmarking suite. ServiBench (i) leverages synchronous and asynchronous serverless applications representative of production usage, (ii) extrapolates cloud-provider data to generate realistic workloads, (iii) conducts comprehensive, end-to-end experiments to capture application-level performance, (iv) analyzes results using a novel approach based on (distributed) serverless tracing, and (v) supports comprehensively serverless performance analysis. With ServiBench, we conduct comprehensive experiments on AWS, covering five common performance factors: median latency, cold starts, tail latency, scalability, and dynamic workloads. We find that the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, or trigger-based coordination. We release collected experimental data under FAIR principles and ServiBench as a tested, extensible open-source tool.

preprint2021arXiv

A Review of Serverless Use Cases and their Characteristics

The serverless computing paradigm promises many desirable properties for cloud applications - low-cost, fine-grained deployment, and management-free operation. Consequently, the paradigm has underwent rapid growth: there currently exist tens of serverless platforms and all global cloud providers host serverless operations. To help tune existing platforms, guide the design of new serverless approaches, and overall contribute to understanding this paradigm, in this work we present a long-term, comprehensive effort to identify, collect, and characterize 89 serverless use cases. We survey use cases, sourced from white and grey literature, and from consultations with experts in areas such as scientific computing. We study each use case using 24 characteristics, including general aspects, but also workload, application, and requirements. When the use cases employ workflows, we further analyze their characteristics. Overall, we hope our study will be useful for both academia and industry, and encourage the community to further share and communicate their use cases. This article appears also as a SPEC Technical Report: https://research.spec.org/fileadmin/user_upload/documents/rg_cloud/endorsed_publications/SPEC_RG_2020_Serverless_Usecases.pdf The article may be submitted for peer-reviewed publication.

preprint2021arXiv

An Analysis of Distributed Systems Syllabi With a Focus on Performance-Related Topics

We analyze a dataset of 51 current (2019-2020) Distributed Systems syllabi from top Computer Science programs, focusing on finding the prevalence and context in which topics related to performance are being taught in these courses. We also study the scale of the infrastructure mentioned in DS courses, from small client-server systems to cloud-scale, peer-to-peer, global-scale systems. We make eight main findings, covering goals such as performance, and scalability and its variant elasticity; activities such as performance benchmarking and monitoring; eight selected performance-enhancing techniques (replication, caching, sharding, load balancing, scheduling, streaming, migrating, and offloading); and control issues such as trade-offs that include performance and performance variability.

preprint2021arXiv

Capelin: Data-Driven Capacity Procurement for Cloud Datacenters using Portfolios of Scenarios -- Extended Technical Report

Cloud datacenters provide a backbone to our digital society. Inaccurate capacity procurement for cloud datacenters can lead to significant performance degradation, denser targets for failure, and unsustainable energy consumption. Although this activity is core to improving cloud infrastructure, relatively few comprehensive approaches and support tools exist for mid-tier operators, leaving many planners with merely rule-of-thumb judgement. We derive requirements from a unique survey of experts in charge of diverse datacenters in several countries. We propose Capelin, a data-driven, scenario-based capacity planning system for mid-tier cloud datacenters. Capelin introduces the notion of portfolios of scenarios, which it leverages in its probing for alternative capacity-plans. At the core of the system, a trace-based, discrete-event simulator enables the exploration of different possible topologies, with support for scaling the volume, variety, and velocity of resources, and for horizontal (scale-out) and vertical (scale-up) scaling. Capelin compares alternative topologies and for each gives detailed quantitative operational information, which could facilitate human decisions of capacity planning. We implement and open-source Capelin, and show through comprehensive trace-based experiments it can aid practitioners. The results give evidence that reasonable choices can be worse by a factor of 1.5-2.0 than the best, in terms of performance degradation or energy consumption.

preprint2021arXiv

Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language

Computational Workflows are widely used in data analysis, enabling innovation and decision-making. In many domains (bioinformatics, image analysis, & radio astronomy) the analysis components are numerous and written in multiple different computer languages by third parties. However, many competing workflow systems exist, severely limiting portability of such workflows, thereby hindering the transfer of workflows between different systems, between different projects and different settings, leading to vendor lock-ins and limiting their generic re-usability. Here we present the Common Workflow Language (CWL) project which produces free and open standards for describing command-line tool based workflows. The CWL standards provide a common but reduced set of abstractions that are both used in practice and implemented in many popular workflow systems. The CWL language is declarative, which allows expressing computational workflows constructed from diverse software tools, executed each through their command-line interface. Being explicit about the runtime environment and any use of software containers enables portability and reuse. Workflows written according to the CWL standards are a reusable description of that analysis that are runnable on a diverse set of computing environments. These descriptions contain enough information for advanced optimization without additional input from workflow authors. The CWL standards support polylingual workflows, enabling portability and reuse of such workflows, easing for example scholarly publication, fulfilling regulatory requirements, collaboration in/between academic research and industry, while reducing implementation costs. CWL has been taken up by a wide variety of domains, and industries and support has been implemented in many major workflow systems.

preprint2020arXiv

A Survey and Annotated Bibliography of Workflow Scheduling in Computing Infrastructures: Community, Keyword, and Article Reviews -- Extended Technical Report

Workflows are prevalent in today's computing infrastructures. The workflow model support various different domains, from machine learning to finance and from astronomy to chemistry. Different Quality-of-Service (QoS) requirements and other desires of both users and providers makes workflow scheduling a tough problem, especially since resource providers need to be as efficient as possible with their resources to be competitive. To a newcomer or even an experienced researcher, sifting through the vast amount of articles can be a daunting task. Questions regarding the difference techniques, policies, emerging areas, and opportunities arise. Surveys are an excellent way to cover these questions, yet surveys rarely publish their tools and data on which it is based. Moreover, the communities that are behind these articles are rarely studied. We attempt to address these shortcomings in this work. We focus on four areas within workflow scheduling: 1) the workflow formalism, 2) workflow allocation, 3) resource provisioning, and 4) applications and services. Each part features one or more taxonomies, a view of the community, important and emerging keywords, and directions for future work. We introduce and make open-source an instrument we used to combine and store article meta-data. Using this meta-data, we 1) obtain important keywords overall and per year, per community, 2) identify keywords growing in importance, 3) get insight into the structure and relations within each community, and 4) perform a systematic literature survey per part to validate and complement our taxonomies.

preprint2020arXiv

Flexibility Is Key in Organizing a Global Professional Conference Online: The ICPE 2020 Experience in the COVID-19 Era

Organizing professional conferences online has never been more timely. Responding to the new challenges raised by COVID-19, the organizers of the ACM/SPEC International Conference on Performance Engineering 2020 had to address the question: How should we organize these conferences online? This article summarizes their successful answer.

preprint2020arXiv

Serverless Applications: Why, When, and How?

Serverless computing shows good promise for efficiency and ease-of-use. Yet, there are only a few, scattered and sometimes conflicting reports on questions such as 'Why do so many companies adopt serverless?', 'When are serverless applications well suited?', and 'How are serverless applications currently implemented?' To address these questions, we analyze 89 serverless applications from open-source projects, industrial sources, academic literature, and scientific computing - the most extensive study to date.