Researcher profile

Shaoshan Liu

Shaoshan Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2024arXiv

Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators

As Deep Neural Networks (DNNs) are increasingly deployed in safety critical and privacy sensitive applications such as autonomous driving and biometric authentication, it is critical to understand the fault-tolerance nature of DNNs. Prior work primarily focuses on metrics such as Failures In Time (FIT) rate and the Silent Data Corruption (SDC) rate, which quantify how often a device fails. Instead, this paper focuses on quantifying the DNN accuracy given that a transient error has occurred, which tells us how well a network behaves when a transient error occurs. We call this metric Resiliency Accuracy (RA). We show that existing RA formulation is fundamentally inaccurate, because it incorrectly assumes that software variables (model weights/activations) have equal faulty probability under hardware transient faults. We present an algorithm that captures the faulty probabilities of DNN variables under transient faults and, thus, provides correct RA estimations validated by hardware. To accelerate RA estimation, we reformulate RA calculation as a Monte Carlo integration problem, and solve it using importance sampling driven by DNN specific heuristics. Using our lightweight RA estimation method, we show that transient faults lead to far greater accuracy degradation than what todays DNN resiliency tools estimate. We show how our RA estimation tool can help design more resilient DNNs by integrating it with a Network Architecture Search framework.

preprint2022arXiv

An Energy-Efficient and Runtime-Reconfigurable FPGA-Based Accelerator for Robotic Localization Systems

Simultaneous Localization and Mapping (SLAM) estimates agents' trajectories and constructs maps, and localization is a fundamental kernel in autonomous machines at all computing scales, from drones, AR, VR to self-driving cars. In this work, we present an energy-efficient and runtime-reconfigurable FPGA-based accelerator for robotic localization. We exploit SLAM-specific data locality, sparsity, reuse, and parallelism, and achieve >5x performance improvement over the state-of-the-art. Especially, our design is reconfigurable at runtime according to the environment to save power while sustaining accuracy and performance.

preprint2022arXiv

Autonomous Mobile Clinics: Empowering Affordable Anywhere Anytime Healthcare Access

We are facing a global healthcare crisis today as the healthcare cost is ever climbing, but with the aging population, government fiscal revenue is ever dropping. To create a more efficient and effective healthcare system, three technical challenges immediately present themselves: healthcare access, healthcare equity, and healthcare efficiency. An autonomous mobile clinic solves the healthcare access problem by bringing healthcare services to the patient by the order of the patient's fingertips. Nevertheless, to enable a universal autonomous mobile clinic network, a three-stage technical roadmap needs to be achieved: In stage one, we focus on solving the inequity challenge in the existing healthcare system by combining autonomous mobility and telemedicine. In stage two, we develop an AI doctor for primary care, which we foster from infancy to adulthood with clean healthcare data. With the AI doctor, we can solve the inefficiency problem. In stage three, after we have proven that the autonomous mobile clinic network can truly solve the target clinical use cases, we shall open up the platform for all medical verticals, thus enabling universal healthcare through this whole new system.

preprint2022arXiv

Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System

This paper is the first to provide a thorough system design overview along with the fusion methods selection criteria of a real-world cooperative autonomous driving system, named Infrastructure-Augmented Autonomous Driving or IAAD. We present an in-depth introduction of the IAAD hardware and software on both road-side and vehicle-side computing and communication platforms. We extensively characterize the IAAD system in the context of real-world deployment scenarios and observe that the network condition that fluctuates along the road is currently the main technical roadblock for cooperative autonomous driving. To address this challenge, we propose new fusion methods, dubbed "inter-frame fusion" and "planning fusion" to complement the current state-of-the-art "intra-frame fusion". We demonstrate that each fusion method has its own benefit and constraint.

preprint2022arXiv

Dataflow Accelerator Architecture for Autonomous Machine Computing

Commercial autonomous machines is a thriving sector, one that is likely the next ubiquitous computing platform, after Personal Computers (PC), cloud computing, and mobile computing. Nevertheless, a suitable computing substrate for autonomous machines is missing, and many companies are forced to develop ad hoc computing solutions that are neither principled nor extensible. By analyzing the demands of autonomous machine computing, this article proposes Dataflow Accelerator Architecture (DAA), a modern instantiation of the classic dataflow principle, that matches the characteristics of autonomous machine software.

preprint2022arXiv

Robotic Computing on FPGAs: Current Progress, Research Challenges, and Opportunities

Robotic computing has reached a tipping point, with a myriad of robots (e.g., drones, self-driving cars, logistic robots) being widely applied in diverse scenarios. The continuous proliferation of robotics, however, critically depends on efficient computing substrates, driven by real-time requirements, robotic size-weight-and-power constraints, cybersecurity considerations, and dynamically changing scenarios. Within all platforms, FPGA is able to deliver both software and hardware solutions with low power, high performance, reconfigurability, reliability, and adaptivity characteristics, serving as the promising computing substrate for robotic applications. This paper highlights the current progress, design techniques, challenges, and open research challenges in the domain of robotic computing on FPGAs.

preprint2021arXiv

A Survey of FPGA-Based Robotic Computing

Recent researches on robotics have shown significant improvement, spanning from algorithms, mechanics to hardware architectures. Robotics, including manipulators, legged robots, drones, and autonomous vehicles, are now widely applied in diverse scenarios. However, the high computation and data complexity of robotic algorithms pose great challenges to its applications. On the one hand, CPU platform is flexible to handle multiple robotic tasks. GPU platform has higher computational capacities and easy-touse development frameworks, so they have been widely adopted in several applications. On the other hand, FPGA-based robotic accelerators are becoming increasingly competitive alternatives, especially in latency-critical and power-limited scenarios. With specialized designed hardware logic and algorithm kernels, FPGA-based accelerators can surpass CPU and GPU in performance and energy efficiency. In this paper, we give an overview of previous work on FPGA-based robotic accelerators covering different stages of the robotic system pipeline. An analysis of software and hardware optimization techniques and main technical issues is presented, along with some commercial and space applications, to serve as a guide for future work.

preprint2021arXiv

Engineering Education in the Age of Autonomous Machines

In the past few years, we have observed a huge supply-demand gap for autonomous driving engineers. The core problem is that autonomous driving is not one single technology but rather a complex system integrating many technologies, and no one single academic department can provide comprehensive education in this field. We advocate to create a cross-disciplinary program to expose students with technical background in computer science, computer engineering, electrical engineering, as well as mechanical engineering. On top of the cross-disciplinary technical foundation, a capstone project that provides students with hands-on experiences of working with a real autonomous vehicle is required to consolidate the technical foundation.

preprint2021arXiv

Towards Fully Intelligent Transportation through Infrastructure-Vehicle Cooperative Autonomous Driving: Challenges and Opportunities

The infrastructure-vehicle cooperative autonomous driving approach depends on the cooperation between intelligent roads and intelligent vehicles. This approach is not only safer but also more economical compared to the traditional on-vehicle-only autonomous driving approach. In this paper, we introduce our real-world deployment experiences of cooperative autonomous driving, and delve into the details of new challenges and opportunities. Specifically, based on our progress towards commercial deployment, we follow a three-stage development roadmap of the cooperative autonomous driving approach:infrastructure-augmented autonomous driving (IAAD), infrastructure-guided autonomous driving (IGAD), and infrastructure-planned autonomous driving (IPAD).

preprint2020arXiv

Autonomous Last-mile Delivery Vehicles in Complex Traffic Environments

E-commerce has evolved with the digital technology revolution over the years. Last-mile logistics service contributes a significant part of the e-commerce experience. In contrast to the traditional last-mile logistics services, smart logistics service with autonomous driving technologies provides a promising solution to reduce the delivery cost and to improve efficiency. However, the traffic conditions in complex traffic environments, such as those in China, are more challenging compared to those in well-developed countries. Many types of moving objects (such as pedestrians, bicycles, electric bicycles, and motorcycles, etc.) share the road with autonomous vehicles, and their behaviors are not easy to track and predict. This paper introduces a technical solution from JD.com, a leading E-commerce company in China, to the autonomous last-mile delivery in complex traffic environments. Concretely, the methodologies in each module of our autonomous vehicles are presented, together with safety guarantee strategies. Up to this point, JD.com has deployed more than 300 self-driving vehicles for trial operations in tens of provinces of China, with an accumulated 715,819 miles and up to millions of on-road testing hours.

preprint2020arXiv

CoCoPIE: Making Mobile AI Sweet As PIE --Compression-Compilation Co-Design Goes a Long Way

Assuming hardware is the major constraint for enabling real-time mobile intelligence, the industry has mainly dedicated their efforts to developing specialized hardware accelerators for machine learning and inference. This article challenges the assumption. By drawing on a recent real-time AI optimization framework CoCoPIE, it maintains that with effective compression-compiler co-design, it is possible to enable real-time artificial intelligence on mainstream end devices without special hardware. CoCoPIE is a software framework that holds numerous records on mobile AI: the first framework that supports all main kinds of DNNs, from CNNs to RNNs, transformer, language models, and so on; the fastest DNN pruning and acceleration framework, up to 180X faster compared with current DNN pruning on other frameworks such as TensorFlow-Lite; making many representative AI applications able to run in real-time on off-the-shelf mobile devices that have been previously regarded possible only with special hardware support; making off-the-shelf mobile devices outperform a number of representative ASIC and FPGA solutions in terms of energy efficiency and/or performance.

preprint2020arXiv

Critical Business Decision Making for Technology Startups -- A PerceptIn Case Study

Most business decisions are made with analysis, but some are judgment calls not susceptible to analysis due to time or information constraints. In this article, we present a real-life case study of critical business decision making of PerceptIn, an autonomous driving technology startup. In early years of PerceptIn, PerceptIn had to make a decision on the design of computing systems for its autonomous vehicle products. By providing details on PerceptIn's decision process and the results of the decision, we hope to provide some insights that can be beneficial to entrepreneurs and engineering managers in technology startups.

preprint2020arXiv

Real-Time Spatio-Temporal LiDAR Point Cloud Compression

Compressing massive LiDAR point clouds in real-time is critical to autonomous machines such as drones and self-driving cars. While most of the recent prior work has focused on compressing individual point cloud frames, this paper proposes a novel system that effectively compresses a sequence of point clouds. The idea to exploit both the spatial and temporal redundancies in a sequence of point cloud frames. We first identify a key frame in a point cloud sequence and spatially encode the key frame by iterative plane fitting. We then exploit the fact that consecutive point clouds have large overlaps in the physical space, and thus spatially encoded data can be (re-)used to encode the temporal stream. Temporal encoding by reusing spatial encoding data not only improves the compression rate, but also avoids redundant computations, which significantly improves the compression speed. Experiments show that our compression system achieves 40x to 90x compression rate, significantly higher than the MPEG's LiDAR point cloud compression standard, while retaining high end-to-end application accuracies. Meanwhile, our compression system has a compression speed that matches the point cloud generation rate by today LiDARs and out-performs existing compression systems, enabling real-time point cloud transmission.