Source author record

Felix Xiaozhu Lin

Felix Xiaozhu Lin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Operating Systems cs.CY Networking and Internet Architecture Computer Vision Databases Distributed, Parallel, and Cluster Computing Human-Computer Interaction Machine Learning Other Computer Science Programming Languages

Catalog footprint

What is connected

11works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

GPUReplay: A 50-KB GPU Stack for Client ML

GPUReplay (GR) is a novel way for deploying GPU-accelerated computation on mobile and embedded devices. It addresses high complexity of a modern GPU stack for deployment ease and security. The idea is to record GPU executions on the full GPU stack ahead of time and replay the executions on new input at run time. We address key challenges towards making GR feasible, sound, and practical to use. The resultant replayer is a drop-in replacement of the original GPU stack. It is tiny (50 KB of executable), robust (replaying long executions without divergence), portable (running in a commodity OS, in TEE, and baremetal), and quick to launch (speeding up startup by up to two orders of magnitude). We show that GPUReplay works with a variety of integrated GPU hardware, GPU APIs, ML frameworks, and 33 neural network (NN) implementations for inference or training. The code is available at https://github.com/bakhi/GPUReplay.

preprint2022arXiv

Minimum Viable Device Drivers for ARM TrustZone

While TrustZone can isolate IO hardware, it lacks drivers for modern IO devices. Rather than porting drivers, we propose a novel approach to deriving minimum viable drivers: developers exercise a full driver and record the driver/device interactions; the processed recordings, dubbed driverlets, are replayed in the TEE at run time to access IO devices. Driverlets address two key challenges: correctness and expressiveness, for which they build on a key construct called interaction template. The interaction template ensures faithful reproduction of recorded IO jobs (albeit on new IO data); it accepts dynamic input values; it tolerates nondeterministic device behaviors. We demonstrate driverlets on a series of sophisticated devices, making them accessible to TrustZone for the first time to our knowledge. Our experiments show that driverlets are secure, easy to build, and incur acceptable overhead (1.4x -2.7x compared to native drivers). Driverlets fill a critical gap in the TrustZone TEE, realizing its long-promised vision of secure IO.

preprint2022arXiv

Protecting File Activities via Deception for ARM TrustZone

A TrustZone TEE often invokes an external filesystem. While filedata can be encrypted, the revealed file activities can leak secrets. To hide the file activities from the filesystem and its OS, we propose Enigma, a deception-based defense injecting sybil file activities as the cover of the actual file activities. Enigma contributes three new designs. (1) To make the deception credible, the TEE generates sybil calls by replaying file calls from the TEE code under protection. (2) To make sybil activities cheap, the TEE requests the OS to run K filesystem images simultaneously. Concealing the disk, the TEE backs only one image with the actual disk while backing other images by only storing their metadata. (3) To protect filesystem image identities, the TEE shuffles the images frequently, preventing the OS from observing any image for long. Enigma works with unmodified filesystems shipped withLinux. On a low-cost Arm SoC with EXT4 and F2FS, our system can concurrently run as many as 50 filesystem images with 1% of disk overhead per additional image. Compared to common obfuscation for hiding addresses in a flat space, Enigma hides file activities with richer semantics. Its cost is lower by one order of magnitude while achieving the same level of probabilistic security guarantees.

preprint2021arXiv

A First Look at Deep Learning Apps on Smartphones

We are in the dawn of deep learning explosion for smartphones. To bridge the gap between research and practice, we present the first empirical study on 16,500 the most popular Android apps, demystifying how smartphone apps exploit deep learning in the wild. To this end, we build a new static tool that dissects apps and analyzes their deep learning functions. Our study answers threefold questions: what are the early adopter apps of deep learning, what do they use deep learning for, and how do their deep learning models look like. Our study has strong implications for app developers, smartphone vendors, and deep learning R\&D. On one hand, our findings paint a promising picture of deep learning for smartphones, showing the prosperity of mobile deep learning frameworks as well as the prosperity of apps building their cores atop deep learning. On the other hand, our findings urge optimizations on deep learning models deployed on smartphones, the protection of these models, and validation of research ideas on these models.

preprint2020arXiv

Approximate Query Service on Autonomous IoT Cameras

Elf is a runtime for an energy-constrained camera to continuously summarize video scenes as approximate object counts. Elf's novelty centers on planning the camera's count actions under energy constraint. (1) Elf explores the rich action space spanned by the number of sample image frames and the choice of per-frame object counters; it unifies errors from both sources into one single bounded error. (2) To decide count actions at run time, Elf employs a learning-based planner, jointly optimizing for past and future videos without delaying result materialization. Tested with more than 1,000 hours of videos and under realistic energy constraints, Elf continuously generates object counts within only 11% of the true counts on average. Alongside the counts, Elf presents narrow errors shown to be bounded and up to 3.4x smaller than competitive baselines. At a higher level, Elf makes a case for advancing the geographic frontier of video analytics.

preprint2020arXiv

DeepCache: Principled Cache for Mobile Deep Vision

We present DeepCache, a principled cache design for deep learning inference in continuous mobile vision. DeepCache benefits model execution efficiency by exploiting temporal locality in input video streams. It addresses a key challenge raised by mobile vision: the cache must operate under video scene variation, while trading off among cacheability, overhead, and loss in model accuracy. At the input of a model, DeepCache discovers video temporal locality by exploiting the video's internal structure, for which it borrows proven heuristics from video compression; into the model, DeepCache propagates regions of reusable results by exploiting the model's internal structure. Notably, DeepCache eschews applying video heuristics to model internals which are not pixels but high-dimensional, difficult-to-interpret data. Our implementation of DeepCache works with unmodified deep learning models, requires zero developer's manual effort, and is therefore immediately deployable on off-the-shelf mobile devices. Our experiments show that DeepCache saves inference execution time by 18% on average and up to 47%. DeepCache reduces system energy consumption by 20% on average.

preprint2020arXiv

Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime Measures

A set of about 80 researchers, practitioners, and federal agency program managers participated in the NSF-sponsored Grand Challenges in Resilience Workshop held on Purdue campus on March 19-21, 2019. The workshop was divided into three themes: resilience in cyber, cyber-physical, and socio-technical systems. About 30 attendees in all participated in the discussions of cyber resilience. This article brings out the substantive parts of the challenges and solution approaches that were identified in the cyber resilience theme. In this article, we put forward the substantial challenges in cyber resilience in a few representative application domains and outline foundational solutions to address these challenges. These solutions fall into two broad themes: resilience-by-design and resilience-by-reaction. We use examples of autonomous systems as the application drivers motivating cyber resilience. We focus on some autonomous systems in the near horizon (autonomous ground and aerial vehicles) and also a little more distant (autonomous rescue and relief). For resilience-by-design, we focus on design methods in software that are needed for our cyber systems to be resilient. In contrast, for resilience-by-reaction, we discuss how to make systems resilient by responding, reconfiguring, or recovering at runtime when failures happen. We also discuss the notion of adaptive execution to improve resilience, execution transparently and adaptively among available execution platforms (mobile/embedded, edge, and cloud). For each of the two themes, we survey the current state, and the desired state and ways to get there. We conclude the paper by looking at the research challenges we will have to solve in the short and the mid-term to make the vision of resilient autonomous systems a reality.

preprint2014arXiv

Draining our Glass: An Energy and Heat Characterization of Google Glass

The Google Glass is a mobile device designed to be worn as eyeglasses. This form factor enables new usage possibilities, such as hands-free video chats and instant web search. However, its shape also hampers its potential: (1) battery size, and therefore lifetime, is limited by a need for the device to be lightweight, and (2) high-power processing leads to significant heat, which should be limited, due to the Glass' compact form factor and close proximity to the user's skin. We use the Glass in a case study of the power and thermal characteristics of optical head-mounted display devices. We share insights and implications to limit power consumption to increase the safety and utility of head-mounted devices.

preprint2012arXiv

Guadalupe: a browser design for heterogeneous hardware

Mobile systems are embracing heterogeneous architectures by getting more types of cores and more specialized cores, which allows applications to be faster and more efficient. We aim at exploiting the hardware heterogeneity from the browser without requiring any changes to either the OS or the web applications. Our design, Guadalupe, can use hardware processing units with different degrees of capability for matched browser services. It starts with a weak hardware unit, determines if and when a strong unit is needed, and seamlessly migrates to the strong one when necessary. Guadalupe not only makes more computing resources available to mobile web browsing but also improves its energy proportionality. Based on Chrome for Android and TI OMAP4, We provide a prototype browser implementation for resource loading and rendering. Compared to Chrome for Android, we show that Guadalupe browser for rendering can increase other 3D application's frame rate by up to 767% and save 4.7% of the entire system's energy consumption. More importantly, by using the two cases, we demonstrate that Guadalupe creates the great opportunity for many browser services to get better resource utilization and energy proportionality by exploiting hardware heterogeneity.

preprint2011arXiv

How Far Can Client-Only Solutions Go for Mobile Browser Speed?

Mobile browser is known to be slow because of the bottleneck in resource loading. Client-only solutions to improve resource loading are attractive because they are immediately deployable, scalable, and secure. We present the first publicly known treatment of client-only solutions to understand how much they can improve mobile browser speed without infrastructure support. Leveraging an unprecedented set of web usage data collected from 24 iPhone users continuously over one year, we examine the three fundamental, orthogonal approaches a client-only solution can take: caching, prefetching, and speculative loading, which is first proposed and studied in this work. Speculative loading predicts and speculatively loads the subresources needed to open a web page once its URL is given. We show that while caching and prefetching are highly limited for mobile browsing, speculative loading can be significantly more effective. Empirically, we show that client-only solutions can improve the browser speed by about 1.4 second on average for web sites visited by the 24 iPhone users. We also report the design, realization, and evaluation of speculative loading in a WebKit-based browser called Tempo. On average, Tempo can reduce browser delay by 1 second (~20%).

preprint2011arXiv

Transparent Programming of Heterogeneous Smartphones for Sensing

Sensing on smartphones is known to be power-hungry. It has been shown that this problem can be solved by adding an ultra low-power processor to execute simple, frequent sensor data processing. While very effective in saving energy, this resulting heterogeneous, distributed architecture poses a significant challenge to application development. We present Reflex, a suite of runtime and compilation techniques to conceal the heterogeneous, distributed nature from developers. The Reflex automatically transforms the developer's code for distributed execution with the help of the Reflex runtime. To create a unified system illusion, Reflex features a novel software distributed shared memory (DSM) design that leverages the extreme architectural asymmetry between the low-power processor and the powerful central processor to achieve both energy efficiency and performance. We report a complete realization of Reflex for heterogeneous smartphones with Maemo/Linux as the central kernel. Using a tri-processor hardware prototype and sensing applications reported in recent literature, we evaluate the Reflex realization for programming transparency, energy efficiency, and performance. We show that Reflex supports a programming style that is very close to contemporary smartphone programming. It allows existing sensing applications to be ported with minor source code changes. Reflex reduces the system power in sensing by up to 83%, and its runtime system only consumes 10% local memory on a typical ultra-low power processor.

Felix Xiaozhu Lin

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

GPUReplay: A 50-KB GPU Stack for Client ML

Minimum Viable Device Drivers for ARM TrustZone

Protecting File Activities via Deception for ARM TrustZone

A First Look at Deep Learning Apps on Smartphones

Approximate Query Service on Autonomous IoT Cameras

DeepCache: Principled Cache for Mobile Deep Vision

Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime Measures

Draining our Glass: An Energy and Heat Characterization of Google Glass

Guadalupe: a browser design for heterogeneous hardware

How Far Can Client-Only Solutions Go for Mobile Browser Speed?

Transparent Programming of Heterogeneous Smartphones for Sensing