Researcher profile

Felix Xiaozhu Lin

Felix Xiaozhu Lin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

GPUReplay: A 50-KB GPU Stack for Client ML

GPUReplay (GR) is a novel way for deploying GPU-accelerated computation on mobile and embedded devices. It addresses high complexity of a modern GPU stack for deployment ease and security. The idea is to record GPU executions on the full GPU stack ahead of time and replay the executions on new input at run time. We address key challenges towards making GR feasible, sound, and practical to use. The resultant replayer is a drop-in replacement of the original GPU stack. It is tiny (50 KB of executable), robust (replaying long executions without divergence), portable (running in a commodity OS, in TEE, and baremetal), and quick to launch (speeding up startup by up to two orders of magnitude). We show that GPUReplay works with a variety of integrated GPU hardware, GPU APIs, ML frameworks, and 33 neural network (NN) implementations for inference or training. The code is available at https://github.com/bakhi/GPUReplay.

preprint2022arXiv

Minimum Viable Device Drivers for ARM TrustZone

While TrustZone can isolate IO hardware, it lacks drivers for modern IO devices. Rather than porting drivers, we propose a novel approach to deriving minimum viable drivers: developers exercise a full driver and record the driver/device interactions; the processed recordings, dubbed driverlets, are replayed in the TEE at run time to access IO devices. Driverlets address two key challenges: correctness and expressiveness, for which they build on a key construct called interaction template. The interaction template ensures faithful reproduction of recorded IO jobs (albeit on new IO data); it accepts dynamic input values; it tolerates nondeterministic device behaviors. We demonstrate driverlets on a series of sophisticated devices, making them accessible to TrustZone for the first time to our knowledge. Our experiments show that driverlets are secure, easy to build, and incur acceptable overhead (1.4x -2.7x compared to native drivers). Driverlets fill a critical gap in the TrustZone TEE, realizing its long-promised vision of secure IO.

preprint2022arXiv

Protecting File Activities via Deception for ARM TrustZone

A TrustZone TEE often invokes an external filesystem. While filedata can be encrypted, the revealed file activities can leak secrets. To hide the file activities from the filesystem and its OS, we propose Enigma, a deception-based defense injecting sybil file activities as the cover of the actual file activities. Enigma contributes three new designs. (1) To make the deception credible, the TEE generates sybil calls by replaying file calls from the TEE code under protection. (2) To make sybil activities cheap, the TEE requests the OS to run K filesystem images simultaneously. Concealing the disk, the TEE backs only one image with the actual disk while backing other images by only storing their metadata. (3) To protect filesystem image identities, the TEE shuffles the images frequently, preventing the OS from observing any image for long. Enigma works with unmodified filesystems shipped withLinux. On a low-cost Arm SoC with EXT4 and F2FS, our system can concurrently run as many as 50 filesystem images with 1% of disk overhead per additional image. Compared to common obfuscation for hiding addresses in a flat space, Enigma hides file activities with richer semantics. Its cost is lower by one order of magnitude while achieving the same level of probabilistic security guarantees.

preprint2021arXiv

A First Look at Deep Learning Apps on Smartphones

We are in the dawn of deep learning explosion for smartphones. To bridge the gap between research and practice, we present the first empirical study on 16,500 the most popular Android apps, demystifying how smartphone apps exploit deep learning in the wild. To this end, we build a new static tool that dissects apps and analyzes their deep learning functions. Our study answers threefold questions: what are the early adopter apps of deep learning, what do they use deep learning for, and how do their deep learning models look like. Our study has strong implications for app developers, smartphone vendors, and deep learning R\&D. On one hand, our findings paint a promising picture of deep learning for smartphones, showing the prosperity of mobile deep learning frameworks as well as the prosperity of apps building their cores atop deep learning. On the other hand, our findings urge optimizations on deep learning models deployed on smartphones, the protection of these models, and validation of research ideas on these models.

preprint2020arXiv

Approximate Query Service on Autonomous IoT Cameras

Elf is a runtime for an energy-constrained camera to continuously summarize video scenes as approximate object counts. Elf's novelty centers on planning the camera's count actions under energy constraint. (1) Elf explores the rich action space spanned by the number of sample image frames and the choice of per-frame object counters; it unifies errors from both sources into one single bounded error. (2) To decide count actions at run time, Elf employs a learning-based planner, jointly optimizing for past and future videos without delaying result materialization. Tested with more than 1,000 hours of videos and under realistic energy constraints, Elf continuously generates object counts within only 11% of the true counts on average. Alongside the counts, Elf presents narrow errors shown to be bounded and up to 3.4x smaller than competitive baselines. At a higher level, Elf makes a case for advancing the geographic frontier of video analytics.

preprint2020arXiv

DeepCache: Principled Cache for Mobile Deep Vision

We present DeepCache, a principled cache design for deep learning inference in continuous mobile vision. DeepCache benefits model execution efficiency by exploiting temporal locality in input video streams. It addresses a key challenge raised by mobile vision: the cache must operate under video scene variation, while trading off among cacheability, overhead, and loss in model accuracy. At the input of a model, DeepCache discovers video temporal locality by exploiting the video's internal structure, for which it borrows proven heuristics from video compression; into the model, DeepCache propagates regions of reusable results by exploiting the model's internal structure. Notably, DeepCache eschews applying video heuristics to model internals which are not pixels but high-dimensional, difficult-to-interpret data. Our implementation of DeepCache works with unmodified deep learning models, requires zero developer's manual effort, and is therefore immediately deployable on off-the-shelf mobile devices. Our experiments show that DeepCache saves inference execution time by 18% on average and up to 47%. DeepCache reduces system energy consumption by 20% on average.

preprint2020arXiv

Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime Measures

A set of about 80 researchers, practitioners, and federal agency program managers participated in the NSF-sponsored Grand Challenges in Resilience Workshop held on Purdue campus on March 19-21, 2019. The workshop was divided into three themes: resilience in cyber, cyber-physical, and socio-technical systems. About 30 attendees in all participated in the discussions of cyber resilience. This article brings out the substantive parts of the challenges and solution approaches that were identified in the cyber resilience theme. In this article, we put forward the substantial challenges in cyber resilience in a few representative application domains and outline foundational solutions to address these challenges. These solutions fall into two broad themes: resilience-by-design and resilience-by-reaction. We use examples of autonomous systems as the application drivers motivating cyber resilience. We focus on some autonomous systems in the near horizon (autonomous ground and aerial vehicles) and also a little more distant (autonomous rescue and relief). For resilience-by-design, we focus on design methods in software that are needed for our cyber systems to be resilient. In contrast, for resilience-by-reaction, we discuss how to make systems resilient by responding, reconfiguring, or recovering at runtime when failures happen. We also discuss the notion of adaptive execution to improve resilience, execution transparently and adaptively among available execution platforms (mobile/embedded, edge, and cloud). For each of the two themes, we survey the current state, and the desired state and ways to get there. We conclude the paper by looking at the research challenges we will have to solve in the short and the mid-term to make the vision of resilient autonomous systems a reality.