Researcher profile

Kyriakos Georgiou

Kyriakos Georgiou contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2021arXiv

Robust and accurate fine-grain power models for embedded systems with no on-chip PMU

This paper presents a novel approach to event-based power modelling for embedded platforms that do not have a Performance Monitoring Unit (PMU). The method involves complementing the target hardware platform, where the physical power data is measured, with another platform on which the CPU performance data, that is needed for model generation, can be collected. The methodology is used to generate accurate fine-grain power models for the the Gaisler GR712RC dual-core LEON3 fault-tolerant SPARC processor with on-board power sensors and no PMU. A Kintex UltraScale FPGA is used as the support platform to obtain the required CPU performance data, by running a soft-core representation of the dual-core LEON3 as on the GR712RC but with a PMU implementation. Both platforms execute the same benchmark set and data collection is synchronised using per-sample timestamps so that the power sensor data from the GR712RC board can be matched to the PMU data from the FPGA. The synchronised samples are then processed by the Robust Energy and Power Predictor Selection (REPPS) software in order to generate power models. The models achieve less than 2% power estimation error when validated on an industrial use-case and can successfully follow program phases, which makes them suitable for runtime power profiling.

preprint2020arXiv

Lost in translation: Exposing hidden compiler optimization opportunities

Existing iterative compilation and machine-learning-based optimization techniques have been proven very successful in achieving better optimizations than the standard optimization levels of a compiler. However, they were not engineered to support the tuning of a compiler's optimizer as part of the compiler's daily development cycle. In this paper, we first establish the required properties which a technique must exhibit to enable such tuning. We then introduce an enhancement to the classic nightly routine testing of compilers which exhibits all the required properties, and thus, is capable of driving the improvement and tuning of the compiler's common optimizer. This is achieved by leveraging resource usage and compilation information collected while systematically exploiting prefixes of the transformations applied at standard optimization levels. Experimental evaluation using the LLVM v6.0.1 compiler demonstrated that the new approach was able to reveal hidden cross-architecture and architecture-dependent potential optimizations on two popular processors: the Intel i5-6300U and the Arm Cortex-A53-based Broadcom BCM2837 used in the Raspberry Pi 3B+. As a case study, we demonstrate how the insights from our approach enabled us to identify and remove a significant shortcoming of the CFG simplification pass of the LLVM v6.0.1 compiler.

preprint2020arXiv

Performance and Energy Trade-Offs for Parallel Applications on Heterogeneous Multi-Processing Systems

This work proposes a methodology to find performance and energy trade-offs for parallel applications running on Heterogeneous Multi-Processing systems with a single instruction-set architecture. These offer flexibility in the form of different core types and voltage and frequency pairings, defining a vast design space to explore. Therefore, for a given application, choosing a configuration that optimizes the performance and energy consumption is not straightforward. Our method proposes novel analytical models for performance and power consumption whose parameters can be fitted using only a few strategically sampled offline measurements. These models are then used to estimate an application's performance and energy consumption for the whole configuration space. In turn, these offline predictions define the choice of estimated Pareto-optimal configurations of the model, which are used to inform the selection of the configuration that the application should be executed on. The methodology was validated on an ODROID-XU3 board for eight programs from the PARSEC Benchmark, Phoronix Test Suite and Rodinia applications. The generated Pareto-optimal configuration space represented a 99% reduction of the universe of all available configurations. Energy savings of up to 59.77%, 61.38% and 17.7% were observed when compared to the performance, ondemand and powersave Linux governors, respectively, with higher or similar performance.

preprint2020arXiv

When parallel speedups hit the memory wall

After Amdahl's trailblazing work, many other authors proposed analytical speedup models but none have considered the limiting effect of the memory wall. These models exploited aspects such as problem-size variation, memory size, communication overhead, and synchronization overhead, but data-access delays are assumed to be constant. Nevertheless, such delays can vary, for example, according to the number of cores used and the ratio between processor and memory frequencies. Given the large number of possible configurations of operating frequency and number of cores that current architectures can offer, suitable speedup models to describe such variations among these configurations are quite desirable for off-line or on-line scheduling decisions. This work proposes new parallel speedup models that account for variations of the average data-access delay to describe the limiting effect of the memory wall on parallel speedups. Analytical results indicate that the proposed modeling can capture the desired behavior while experimental hardware results validate the former. Additionally, we show that when accounting for parameters that reflect the intrinsic characteristics of the applications, such as degree of parallelism and susceptibility to the memory wall, our proposal has significant advantages over machine-learning-based modeling. Moreover, besides being black-box modeling, our experiments show that conventional machine-learning modeling needs about one order of magnitude more measurements to reach the same level of accuracy achieved in our modeling.

preprint2016arXiv

ENTRA: Whole-Systems Energy Transparency

Promoting energy efficiency to a first class system design goal is an important research challenge. Although more energy-efficient hardware can be designed, it is software that controls the hardware; for a given system the potential for energy savings is likely to be much greater at the higher levels of abstraction in the system stack. Thus the greatest savings are expected from energy-aware software development, which is the vision of the EU ENTRA project. This article presents the concept of energy transparency as a foundation for energy-aware software development. We show how energy modelling of hardware is combined with static analysis to allow the programmer to understand the energy consumption of a program without executing it, thus enabling exploration of the design space taking energy into consideration. The paper concludes by summarising the current and future challenges identified in the ENTRA project.