Researcher profile

Björn Forsberg

Björn Forsberg contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2024arXiv

Data-Driven Power Modeling and Monitoring via Hardware Performance Counters Tracking

In the current high-performance and embedded computing era, full-stack energy-centric design is paramount. Use cases require increasingly high performance at an affordable power budget, often under real-time constraints. Extreme heterogeneity and parallelism address these issues but greatly complicate online power consumption assessment, which is essential for dynamic hardware and software stack adaptations. We introduce a novel architecture-agnostic power modeling methodology with state-of-the-art accuracy, low overhead, and high responsiveness. Our methodology identifies the best Performance Monitoring Counters (PMCs) to model the power consumption of each hardware sub-system at each Dynamic Voltage and Frequency Scaling (DVFS) state. The individual linear models are combined into a complete model that effectively describes the power consumption of the whole system, achieving high accuracy and low overhead. Our evaluation reports an average estimation error of 7.5 % for power consumption and 1.3 % for energy. Furthermore, we propose Runmeter, an open-source, PMC-based monitoring framework integrated into the Linux kernel. Runmeter manages PMC samples collection and manipulation, efficiently evaluating our power models at runtime. With a time overhead of only 0.7 % in the worst case, Runmeter provides responsive and accurate power measurements directly in the kernel, which can be employed for actuation policies such as Dynamic Power Management (DPM) and power-aware task scheduling.

preprint2022arXiv

HEROv2: Full-Stack Open-Source Research Platform for Heterogeneous Computing

Heterogeneous computers integrate general-purpose host processors with domain-specific accelerators to combine versatility with efficiency and high performance. To realize the full potential of heterogeneous computers, however, many hardware and software design challenges have to be overcome. While architectural and system simulators can be used to analyze heterogeneous computers, they are faced with unavoidable compromises between simulation speed and performance modeling accuracy. In this work we present HEROv2, an FPGA-based research platform that enables accurate and fast exploration of heterogeneous computers consisting of accelerators based on clusters of 32-bit RISC-V cores and an application-class 64-bit ARMv8 or RV64 host processor. HEROv2 allows to seamlessly share data between 64-bit hosts and 32-bit accelerators and comes with a fully open-source on-chip network, a unified heterogeneous programming interface, and a mixed-data-model, mixed-ISA heterogeneous compiler based on LLVM. We evaluate HEROv2 in four case studies from the application level over toolchain and system architecture down to accelerator microarchitecture. We demonstrate how HEROv2 enables effective research and development on the full stack of heterogeneous computing. For instance, the compiler can tile loops and infer data transfers to and from the accelerators, which leads to a speedup of up to 4.4x compared to the original program and in most cases is only 15 % slower than a handwritten implementation, which requires 2.6x more code.