Source author record

Thorsten Wittkopp

Thorsten Wittkopp appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Distributed, Parallel, and Cluster Computing eess.SY Networking and Internet Architecture Software Engineering Systems and Control

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Cucumber: Renewable-Aware Admission Control for Delay-Tolerant Cloud and Edge Workloads

The growing electricity demand of cloud and edge computing increases operational costs and will soon have a considerable impact on the environment. A possible countermeasure is equipping IT infrastructure directly with on-site renewable energy sources. Yet, particularly smaller data centers may not be able to use all generated power directly at all times, while feeding it into the public grid or energy storage is often not an option. To maximize the usage of renewable excess energy, we propose Cucumber, an admission control policy that accepts delay-tolerant workloads only if they can be computed within their deadlines without the use of grid energy. Using probabilistic forecasting of computational load, energy consumption, and energy production, Cucumber can be configured towards more optimistic or conservative admission. We evaluate our approach on two scenarios using real solar production forecasts for Berlin, Mexico City, and Cape Town in a simulation environment. For scenarios where excess energy was actually available, our results show that Cucumber's default configuration achieves acceptance rates close to the optimal case and causes 97.0% of accepted workloads to be powered using excess energy, while more conservative admission results in 18.5% reduced acceptance at almost zero grid power usage.

preprint2022arXiv

On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed that either quickly profile towards a good configuration or determine one based on data from previous runs. Still, performance data to train such methods is often lacking and must be costly collected. In this paper, we propose a collaborative approach for sharing anonymized workload execution traces among users, mining them for general patterns, and exploiting clusters of historical workloads for future optimizations. We evaluate our prototype implementation for mining workload execution graphs on a publicly available trace dataset and demonstrate the predictive value of workload clusters determined through traces only.

preprint2021arXiv

Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper

Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between the research areas of machine learning, big data, streaming analytics, and the management of IT operations. AIOps, as a field, is a candidate to produce the future standard for IT operation management. To that end, AIOps has several challenges. First, it needs to combine separate research branches from other research fields like software reliability engineering. Second, novel modelling techniques are needed to understand the dynamics of different systems. Furthermore, it requires to lay out the basis for assessing: time horizons and uncertainty for imminent SLA violations, the early detection of emerging problems, autonomous remediation, decision making, support of various optimization objectives. Moreover, a good understanding and interpretability of these aiding models are important for building trust between the employed tools and the domain experts. Finally, all this will result in faster adoption of AIOps, further increase the interest in this research field and contribute to bridging the gap towards fully-autonomous operating IT systems. The main aim of the AIOPS workshop is to bring together researchers from both academia and industry to present their experiences, results, and work in progress in this field. The workshop aims to strengthen the community and unite it towards the goal of joining the efforts for solving the main challenges the field is currently facing. A consensus and adoption of the principles of openness and reproducibility will boost the research in this emerging area significantly.

preprint2021arXiv

Decentralized Federated Learning Preserves Model and Data Privacy

The increasing complexity of IT systems requires solutions, that support operations in case of failure. Therefore, Artificial Intelligence for System Operations (AIOps) is a field of research that is becoming increasingly focused, both in academia and industry. One of the major issues of this area is the lack of access to adequately labeled data, which is majorly due to legal protection regulations or industrial confidentiality. Methods to mitigate this stir from the area of federated learning, whereby no direct access to training data is required. Original approaches utilize a central instance to perform the model synchronization by periodical aggregation of all model parameters. However, there are many scenarios where trained models cannot be published since its either confidential knowledge or training data could be reconstructed from them. Furthermore the central instance needs to be trusted and is a single point of failure. As a solution, we propose a fully decentralized approach, which allows to share knowledge between trained models. Neither original training data nor model parameters need to be transmitted. The concept relies on teacher and student roles that are assigned to the models, whereby students are trained on the output of their teachers via synthetically generated input data. We conduct a case study on log anomaly detection. The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher. In addition, we demonstrate that our method allows the synchronization of several models trained on different distinct training data subsets.

preprint2020arXiv

Superiority of Simplicity: A Lightweight Model for Network Device Workload Prediction

The rapid growth and distribution of IT systems increases their complexity and aggravates operation and maintenance. To sustain control over large sets of hosts and the connecting networks, monitoring solutions are employed and constantly enhanced. They collect diverse key performance indicators (KPIs) (e.g. CPU utilization, allocated memory, etc.) and provide detailed information about the system state. Storing such metrics over a period of time naturally raises the motivation of predicting future KPI progress based on past observations. Although, a variety of time series forecasting methods exist, forecasting the progress of IT system KPIs is very hard. First, KPI types like CPU utilization or allocated memory are very different and hard to be expressed by the same model. Second, system components are interconnected and constantly changing due to soft- or firmware updates and hardware modernization. Thus a frequent model retraining or fine-tuning must be expected. Therefore, we propose a lightweight solution for KPI series prediction based on historic observations. It consists of a weighted heterogeneous ensemble method composed of two models - a neural network and a mean predictor. As ensemble method a weighted summation is used, whereby a heuristic is employed to set the weights. The modelling approach is evaluated on the available FedCSIS 2020 challenge dataset and achieves an overall $R^2$ score of 0.10 on the preliminary 10% test data and 0.15 on the complete test data. We publish our code on the following github repository: https://github.com/citlab/fed_challenge