Source author record

Heechul Yun

Heechul Yun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Operating Systems Systems and Control Distributed, Parallel, and Cluster Computing Artificial Intelligence Computer Vision eess.SY Hardware Architecture Machine Learning Performance

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Anytime-Lidar: Deadline-aware 3D Object Detection

In this work, we present a novel scheduling framework enabling anytime perception for deep neural network (DNN) based 3D object detection pipelines. We focus on computationally expensive region proposal network (RPN) and per-category multi-head detector components, which are common in 3D object detection pipelines, and make them deadline-aware. We propose a scheduling algorithm, which intelligently selects the subset of the components to make effective time and accuracy trade-off on the fly. We minimize accuracy loss of skipping some of the neural network sub-components by projecting previously detected objects onto the current scene through estimations. We apply our approach to a state-of-art 3D object detection network, PointPillars, and evaluate its performance on Jetson Xavier AGX using nuScenes dataset. Compared to the baselines, our approach significantly improve the network's accuracy under various deadline constraints.

preprint2022arXiv

DeepPicarMicro: Applying TinyML to Autonomous Cyber Physical Systems

Running deep neural networks (DNNs) on tiny Micro-controller Units (MCUs) is challenging due to their limitations in computing, memory, and storage capacity. Fortunately, recent advances in both MCU hardware and machine learning software frameworks make it possible to run fairly complex neural networks on modern MCUs, resulting in a new field of study widely known as TinyML. However, there have been few studies to show the potential for TinyML applications in cyber physical systems (CPS). In this paper, we present DeepPicarMicro, a small self-driving RC car testbed, which runs a convolutional neural network (CNN) on a Raspberry Pi Pico MCU. We apply a state-of-the-art DNN optimization to successfully fit the well-known PilotNet CNN architecture, which was used to drive NVIDIA's real self-driving car, on the MCU. We apply a state-of-art network architecture search (NAS) approach to find further optimized networks that can effectively control the car in real-time in an end-to-end manner. From an extensive systematic experimental evaluation study, we observe an interesting relationship between the accuracy, latency, and control performance of a system. From this, we propose a joint optimization strategy that takes both accuracy and latency of a model in the network architecture search process for AI enabled CPS.

preprint2021arXiv

Memory-Aware Denial-of-Service Attacks on Shared Cache in Multicore Real-Time Systems

In this paper, we identify that memory performance plays a crucial role in the feasibility and effectiveness for performing denial-of-service attacks on shared cache. Based on this insight, we introduce new cache DoS attacks, which can be mounted from the user-space and can cause extreme worst-case execution time (WCET) impacts to cross-core victims -- even if the shared cache is partitioned -- by taking advantage of the platform's memory address mapping information and HugePage support. We deploy these enhanced attacks on two popular embedded out-of-order multicore platforms using both synthetic and real-world benchmarks. The proposed DoS attacks achieve up to 111X WCET increases on the tested platforms.

preprint2020arXiv

SpectreRewind: Leaking Secrets to Past Instructions

Transient execution attacks utilize micro-architectural covert channels to leak secrets that should not have been accessible during logical program execution. Commonly used micro-architectural covert channels are those that leave lasting footprints in the microarchitectural state, for example, a cache state change, from which the secret is recovered after the transient execution is completed. In this paper, we present SpectreRewind, a new approach to create contention based covert channels for transient execution attacks. In our approach, a covert channel is established by issuing the necessary instructions logically before the transiently executed victim code. Unlike prior contention based covert channels, which require simultaneous multi-threading (SMT), SpectreRewind supports single hardware thread based covert channels, making it viable on systems where attacker cannot utilize SMT. We show that contention on the floating point division unit on commodity processors can be used to create a high-performance (~100 KB/s), low-noise covert channel for transient execution attacks instead of commonly used flush+reload based cache covert channels. We implement a Meltdown attack utilizing the proposed covert channel showing competitive performance compared to the stateof-the-art cache based covert channel implementation. We also show that the covert channel works in the JavaScript engine of a Chrome browser.

preprint2020arXiv

Virtual Gang based Scheduling of Real-Time Tasks on Multicore Platforms

We propose a virtual-gang based parallel real-time task scheduling approach for multicore platforms. Our approach is based on the notion of a virtual-gang, which is a group of parallel real-time tasks that are statically linked and scheduled together by a gang scheduler. We present a light-weight intra-gang synchronization framework, called RTG-Sync, and virtual gang formation algorithms that provide strong temporal isolation and high real-time schedulability in scheduling real-time tasks on multicore. We evaluate our approach both analytically, with generated tasksets against state-of-the-art approaches, and empirically with a case-study involving real-world workloads on a real embedded multicore platform. The results show that our approach provides simple but powerful compositional analysis framework, achieves better analytic schedulability, especially when the effect of interference is considered, and is a practical solution for COTS multicore platforms.

preprint2015arXiv

Protecting Memory-Performance Critical Sections in Soft Real-Time Applications

Soft real-time applications such as multimedia applications often show bursty memory access patterns---regularly requiring a high memory bandwidth for a short duration of time. Such a period is often critical for timely data processing. Hence, we call it a memory-performance critical section. Unfortunately, in multicore architecture, non-real-time applications on different cores may also demand high memory bandwidth at the same time, which can substantially increase the time spent on the memory performance critical sections. In this paper, we present BWLOCK, user-level APIs and a memory bandwidth control mechanism that can protect such memory performance critical sections of soft real-time applications. BWLOCK provides simple lock like APIs to declare memory-performance critical sections. If an application enters a memory-performance critical section, the memory bandwidth control system then dynamically limit other cores' memory access rates to protect memory performance of the application until the critical section finishes. From case studies with real-world soft real-time applications, we found (1) such memory-performance critical sections do exist and are often easy to identify; and (2) applying BWLOCK for memory critical sections significantly improve performance of the soft real-time applications at a small or no cost in throughput of non real-time applications.

preprint2014arXiv

Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems

In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can generate many parallel memory requests at a time. The processing of these parallel requests in the DRAM controller greatly affects the memory interference delay experienced by running tasks on the platform. In this paper, we model a modern COTS multicore system which has a nonblocking last-level cache (LLC) and a DRAM controller that prioritizes reads over writes. To minimize interference, we focus on LLC and DRAM bank partitioned systems. Based on the model, we propose an analysis that computes a safe upper bound for the worst-case memory interference delay. We validated our analysis on a real COTS multicore platform with a set of carefully designed synthetic benchmarks as well as SPEC2006 benchmarks. Evaluation results show that our analysis is more accurately capture the worst-case memory interference delay and provides safer upper bounds compared to a recently proposed analysis which significantly under-estimate the delay.

preprint2012arXiv

S3A: Secure System Simplex Architecture for Enhanced Security of Cyber-Physical Systems

Until recently, cyber-physical systems, especially those with safety-critical properties that manage critical infrastructure (e.g. power generation plants, water treatment facilities, etc.) were considered to be invulnerable against software security breaches. The recently discovered 'W32.Stuxnet' worm has drastically changed this perception by demonstrating that such systems are susceptible to external attacks. Here we present an architecture that enhances the security of safety-critical cyber-physical systems despite the presence of such malware. Our architecture uses the property that control systems have deterministic execution behavior, to detect an intrusion within 0.6 μs while still guaranteeing the safety of the plant. We also show that even if an attack is successful, the overall state of the physical system will still remain safe. Even if the operating system's administrative privileges have been compromised, our architecture will still be able to protect the physical system from coming to harm.

preprint2011arXiv

Deterministic Real-time Thread Scheduling

Race condition is a timing sensitive problem. A significant source of timing variation comes from nondeterministic hardware interactions such as cache misses. While data race detectors and model checkers can check races, the enormous state space of complex software makes it difficult to identify all of the races and those residual implementation errors still remain a big challenge. In this paper, we propose deterministic real-time scheduling methods to address scheduling nondeterminism in uniprocessor systems. The main idea is to use timing insensitive deterministic events, e.g, an instruction counter, in conjunction with a real-time clock to schedule threads. By introducing the concept of Worst Case Executable Instructions (WCEI), we guarantee both determinism and real-time performance.