Source author record

Umit Y. Ogras

Umit Y. Ogras appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing eess.SY Hardware Architecture Machine Learning Systems and Control Performance Artificial Intelligence eess.SP

Catalog footprint

What is connected

11works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

COIN: Communication-Aware In-Memory Acceleration for Graph Convolutional Networks

Graph convolutional networks (GCNs) have shown remarkable learning capabilities when processing graph-structured data found inherently in many application areas. GCNs distribute the outputs of neural networks embedded in each vertex over multiple iterations to take advantage of the relations captured by the underlying graphs. Consequently, they incur a significant amount of computation and irregular communication overheads, which call for GCN-specific hardware accelerators. To this end, this paper presents a communication-aware in-memory computing architecture (COIN) for GCN hardware acceleration. Besides accelerating the computation using custom compute elements (CE) and in-memory computing, COIN aims at minimizing the intra- and inter-CE communication in GCN operations to optimize the performance and energy efficiency. Experimental evaluations with widely used datasets show up to 105x improvement in energy consumption compared to state-of-the-art GCN accelerator.

preprint2022arXiv

tinyMAN: Lightweight Energy Manager using Reinforcement Learning for Energy Harvesting Wearable IoT Devices

Advances in low-power electronics and machine learning techniques lead to many novel wearable IoT devices. These devices have limited battery capacity and computational power. Thus, energy harvesting from ambient sources is a promising solution to power these low-energy wearable devices. They need to manage the harvested energy optimally to achieve energy-neutral operation, which eliminates recharging requirements. Optimal energy management is a challenging task due to the dynamic nature of the harvested energy and the battery energy constraints of the target device. To address this challenge, we present a reinforcement learning-based energy management framework, tinyMAN, for resource-constrained wearable IoT devices. The framework maximizes the utilization of the target device under dynamic energy harvesting patterns and battery constraints. Moreover, tinyMAN does not rely on forecasts of the harvested energy which makes it a prediction-free approach. We deployed tinyMAN on a wearable device prototype using TensorFlow Lite for Micro thanks to its small memory footprint of less than 100 KB. Our evaluations show that tinyMAN achieves less than 2.36 ms and 27.75 $μ$J while maintaining up to 45% higher utility compared to prior approaches.

preprint2020arXiv

An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms

Mobile platforms must satisfy the contradictory requirements of fast response time and minimum energy consumption as a function of dynamically changing applications. To address this need, system-on-chips (SoC) that are at the heart of these devices provide a variety of control knobs, such as the number of active cores and their voltage/frequency levels. Controlling these knobs optimally at runtime is challenging for two reasons. First, the large configuration space prohibits exhaustive solutions. Second, control policies designed offline are at best sub-optimal since many potential new applications are unknown at design-time. We address these challenges by proposing an online imitation learning approach. Our key idea is to construct an offline policy and adapt it online to new applications to optimize a given metric (e.g., energy). The proposed methodology leverages the supervision enabled by power-performance models learned at runtime. We demonstrate its effectiveness on a commercial mobile platform with 16 diverse benchmarks. Our approach successfully adapts the control policy to an unknown application after executing less than 25% of its instructions.

preprint2020arXiv

Analysis and Control of Power-Temperature Dynamics in Heterogeneous Multiprocessors

Virtually all electronic systems try to optimize a fundamental trade-off between higher performance and lower power consumption. The latter becomes critical in mobile computing systems, such as smartphones, which rely on passive cooling. Otherwise, the heat concentrated in a small area drives both the junction and skin temperatures up. High junction temperatures degrade the reliability, while skin temperature deteriorates the user experience. Therefore, there is a strong need for a formal analysis of power consumption-temperature dynamics and predictive thermal management algorithms. This paper presents a theoretical power-temperature analysis of multiprocessor systems, which are modeled as multi-input multi-output dynamic systems. We analyze the conditions under which the system converges to a stable steady-state temperature. Then, we use these models to design a control algorithm that manages the temperature of the system without affecting the performance of the application. Experiments on the Odroid-XU3 board show that the control algorithm is able to regulate the temperature with a minimal loss in performance when compared to the default thermal governors.

preprint2020arXiv

Analytical Performance Modeling of NoCs under Priority Arbitration and Bursty Traffic

Networks-on-Chip (NoCs) used in commercial many-core processors typically incorporate priority arbitration. Moreover, they experience bursty traffic due to application workloads. However, most state-of-the-art NoC analytical performance analysis techniques assume fair arbitration and simple traffic models. To address these limitations, we propose an analytical modeling technique for priority-aware NoCs under bursty traffic. Experimental evaluations with synthetic and bursty traffic show that the proposed approach has less than 10% modeling error with respect to cycle-accurate NoC simulator.

preprint2020arXiv

Analytical Performance Models for NoCs with Multiple Priority Traffic Classes

Networks-on-chip (NoCs) have become the standard for interconnect solutions in industrial designs ranging from client CPUs to many-core chip-multiprocessors. Since NoCs play a vital role in system performance and power consumption, pre-silicon evaluation environments include cycle-accurate NoC simulators. Long simulations increase the execution time of evaluation frameworks, which are already notoriously slow, and prohibit design-space exploration. Existing analytical NoC models, which assume fair arbitration, cannot replace these simulations since industrial NoCs typically employ priority schedulers and multiple priority classes. To address this limitation, we propose a systematic approach to construct priority-aware analytical performance models using micro-architecture specifications and input traffic. Our approach consists of developing two novel transformations of queuing system and designing an algorithm which iteratively uses these two transformations to estimate end-to-end latency. Our approach decomposes the given NoC into individual queues with modified service time to enable accurate and scalable latency computations. Specifically, we introduce novel transformations along with an algorithm that iteratively applies these transformations to decompose the queuing system. Experimental evaluations using real architectures and applications show high accuracy of 97% and up to 2.5x speedup in full-system simulation.

preprint2020arXiv

DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework

Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared to homogeneous architectures. They can be further tailored to a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this potential is contingent upon optimizing the SoC for the target domain and utilizing its resources effectively at runtime. To this end, system-level design - including scheduling, power-thermal management algorithms and design space exploration studies - plays a crucial role. This paper presents a system-level domain-specific SoC simulation (DS3) framework to address this need. DS3 enables both design space exploration and dynamic resource management for power-performance optimization of domain applications. We showcase DS3 using six real-world applications from wireless communications and radar processing domain. DS3, as well as the reference applications, is shared as open-source software to stimulate research in this area.

preprint2020arXiv

Online Adaptive Learning for Runtime Resource Management of Heterogeneous SoCs

Dynamic resource management has become one of the major areas of research in modern computer and communication system design due to lower power consumption and higher performance demands. The number of integrated cores, level of heterogeneity and amount of control knobs increase steadily. As a result, the system complexity is increasing faster than our ability to optimize and dynamically manage the resources. Moreover, offline approaches are sub-optimal due to workload variations and large volume of new applications unknown at design time. This paper first reviews recent online learning techniques for predicting system performance, power, and temperature. Then, we describe the use of predictive models for online control using two modern approaches: imitation learning (IL) and an explicit nonlinear model predictive control (NMPC). Evaluations on a commercial mobile platform with 16 benchmarks show that the IL approach successfully adapts the control policy to unknown applications. The explicit NMPC provides 25% energy savings compared to a state-of-the-art algorithm for multi-variable power management of modern GPU sub-systems.

preprint2020arXiv

Runtime Task Scheduling using Imitation Learning for Heterogeneous Many-Core Systems

Domain-specific systems-on-chip, a class of heterogeneous many-core systems, are recognized as a key approach to narrow down the performance and energy-efficiency gap between custom hardware accelerators and programmable processors. Reaching the full potential of these architectures depends critically on optimally scheduling the applications to available resources at runtime. Existing optimization-based techniques cannot achieve this objective at runtime due to the combinatorial nature of the task scheduling problem. As the main theoretical contribution, this paper poses scheduling as a classification problem and proposes a hierarchical imitation learning (IL)-based scheduler that learns from an Oracle to maximize the performance of multiple domain-specific applications. Extensive evaluations with six streaming applications from wireless communications and radar domains show that the proposed IL-based scheduler approximates an offline Oracle policy with more than 99% accuracy for performance- and energy-based optimization objectives. Furthermore, it achieves almost identical performance to the Oracle with a low runtime overhead and successfully adapts to new applications, many-core system configurations, and runtime variations in application characteristics.

preprint2020arXiv

User-Space Emulation Framework for Domain-Specific SoC Design

In this work, we propose a portable, Linux-based emulation framework to provide an ecosystem for hardware-software co-design of Domain-specific SoCs (DSSoCs) and enable their rapid evaluation during the pre-silicon design phase. This framework holistically targets three key challenges of DSSoC design: accelerator integration, resource management, and application development. We address these challenges via a flexible and lightweight user-space runtime environment that enables easy integration of new accelerators, scheduling heuristics, and user applications, and we illustrate the utility of each through various case studies. With signal processing (WiFi and RADAR) as the target domain, we use our framework to evaluate the performance of various dynamic workloads on hypothetical DSSoC hardware configurations composed of mixtures of CPU cores and FFT accelerators using a Zynq UltraScale+TM MPSoC. We show the portability of this framework by conducting a similar study on an Odroid platform composed of big.LITTLE ARM clusters. Finally, we introduce a prototype compilation toolchain that enables automatic mapping of unlabeled C code to DSSoC platforms. Taken together, this environment offers a unique ecosystem to rapidly perform functional verification and obtain performance and utilization estimates that help accelerate convergence towards a final DSSoC design.

preprint2007arXiv

Energy- and Performance-Driven NoC Communication Architecture Synthesis Using a Decomposition Approach

In this paper, we present a methodology for customized communication architecture synthesis that matches the communication requirements of the target application. This is an important problem, particularly for network-based implementations of complex applications. Our approach is based on using frequently encountered generic communication primitives as an alphabet capable of characterizing any given communication pattern. The proposed algorithm searches through the entire design space for a solution that minimizes the system total energy consumption, while satisfying the other design constraints. Compared to the standard mesh architecture, the customized architecture generated by the newly proposed approach shows about 36% throughput increase and 51% reduction in the energy required to encrypt 128 bits of data with a standard encryption algorithm.

Umit Y. Ogras

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

COIN: Communication-Aware In-Memory Acceleration for Graph Convolutional Networks

tinyMAN: Lightweight Energy Manager using Reinforcement Learning for Energy Harvesting Wearable IoT Devices

An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms

Analysis and Control of Power-Temperature Dynamics in Heterogeneous Multiprocessors

Analytical Performance Modeling of NoCs under Priority Arbitration and Bursty Traffic

Analytical Performance Models for NoCs with Multiple Priority Traffic Classes

DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework

Online Adaptive Learning for Runtime Resource Management of Heterogeneous SoCs

Runtime Task Scheduling using Imitation Learning for Heterogeneous Many-Core Systems

User-Space Emulation Framework for Domain-Specific SoC Design

Energy- and Performance-Driven NoC Communication Architecture Synthesis Using a Decomposition Approach