Source author record

Xinming Huang

Xinming Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV eess.SP Hardware Architecture Machine Learning Robotics

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Self-supervised Geometric Features Discovery via Interpretable Attention for Vehicle Re-Identification and Beyond

To learn distinguishable patterns, most of recent works in vehicle re-identification (ReID) struggled to redevelop official benchmarks to provide various supervisions, which requires prohibitive human labors. In this paper, we seek to achieve the similar goal but do not involve more human efforts. To this end, we introduce a novel framework, which successfully encodes both geometric local features and global representations to distinguish vehicle instances, optimized only by the supervision from official ID labels. Specifically, given our insight that objects in ReID share similar geometric characteristics, we propose to borrow self-supervised representation learning to facilitate geometric features discovery. To condense these features, we introduce an interpretable attention module, with the core of local maxima aggregation instead of fully automatic learning, whose mechanism is completely understandable and whose response map is physically reasonable. To the best of our knowledge, we are the first that perform self-supervised learning to discover geometric features. We conduct comprehensive experiments on three most popular datasets for vehicle ReID, i.e., VeRi-776, CityFlow-ReID, and VehicleID. We report our state-of-the-art (SOTA) performances and promising visualization results. We also show the excellent scalability of our approach on other ReID related tasks, i.e., person ReID and multi-target multi-camera (MTMC) vehicle tracking.

preprint2022arXiv

A Near Sensor Edge Computing System for Point Cloud Semantic Segmentation

Point cloud semantic segmentation has attracted attentions due to its robustness to light condition. This makes it an ideal semantic solution for autonomous driving. However, considering the large computation burden and bandwidth demanding of neural networks, putting all the computing into vehicle Electronic Control Unit (ECU) is not efficient or practical. In this paper, we proposed a light weighted point cloud semantic segmentation network based on range view. Due to its simple pre-processing and standard convolution, it is efficient when running on deep learning accelerator like DPU. Furthermore, a near sensor computing system is built for autonomous vehicles. In this system, a FPGA-based deep learning accelerator core (DPU) is placed next to the LiDAR sensor, to perform point cloud pre-processing and segmentation neural network. By leaving only the post-processing step to ECU, this solution heavily alleviate the computation burden of ECU and consequently shortens the decision making and vehicles reaction latency. Our semantic segmentation network achieved 10 frame per second (fps) on Xilinx DPU with computation efficiency 42.5 GOP/W.

preprint2022arXiv

Enabling 3D Object Detection with a Low-Resolution LiDAR

Light Detection And Ranging (LiDAR) has been widely used in autonomous vehicles for perception and localization. However, the cost of a high-resolution LiDAR is still prohibitively expensive, while its low-resolution counterpart is much more affordable. Therefore, using low-resolution LiDAR for autonomous driving is an economically viable solution, but the point cloud sparsity makes it extremely challenging. In this paper, we propose a two-stage neural network framework that enables 3D object detection using a low-resolution LiDAR. Taking input from a low-resolution LiDAR point cloud and a monocular camera image, a depth completion network is employed to produce dense point cloud that is subsequently processed by a voxel-based network for 3D object detection. Evaluated with KITTI dataset for 3D object detection in Bird-Eye View (BEV), the experimental result shows that the proposed approach performs significantly better than directly applying the 16-line LiDAR point cloud for object detection. For both easy and moderate cases, our 3D vehicle detection results are close to those using 64-line high-resolution LiDARs.

preprint2020arXiv

A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN

In this paper, a scalable neural network hardware architecture for image segmentation is proposed. By sharing the same computing resources, both convolution and deconvolution operations are handled by the same process element array. In addition, access to on-chip and off-chip memories is optimized to alleviate the burden introduced by partial sum. As an example, SegNet-Basic has been implemented using the proposed unified architecture by targeting on Xilinx ZC706 FPGA, which achieves the performance of 151.5 GOPS and 94.3 GOPS for convolution and deconvolution respectively. This unified convolution/deconvolution design is applicable to other CNNs with deconvolution.

preprint2020arXiv

Automatic Building and Labeling of HD Maps with Deep Learning

In a world where autonomous driving cars are becoming increasingly more common, creating an adequate infrastructure for this new technology is essential. This includes building and labeling high-definition (HD) maps accurately and efficiently. Today, the process of creating HD maps requires a lot of human input, which takes time and is prone to errors. In this paper, we propose a novel method capable of generating labelled HD maps from raw sensor data. We implemented and tested our methods on several urban scenarios using data collected from our test vehicle. The results show that the pro-posed deep learning based method can produce highly accurate HD maps. This approach speeds up the process of building and labeling HD maps, which can make meaningful contribution to the deployment of autonomous vehicle.

preprint2020arXiv

DepthNet: Real-Time LiDAR Point Cloud Depth Completion for Autonomous Vehicles

Autonomous vehicles rely heavily on sensors such as camera and LiDAR, which provide real-time information about their surroundings for the tasks of perception, planning and control. Typically a LiDAR can only provide sparse point cloud owing to a limited number of scanning lines. By employing depth completion, a dense depth map can be generated by assigning each camera pixel a corresponding depth value. However, the existing depth completion convolutional neural networks are very complex that requires high-end GPUs for processing, and thus they are not applicable to real-time autonomous driving. In this paper, a light-weight network is proposed for the task of LiDAR point cloud depth completion. With an astonishing 96.2% reduction in the number of parameters, it still achieves comparable performance (9.3% better in MAE but 3.9% worse in RMSE) to the state-of-the-art network. For real-time embedded platforms, depthwise separable technique is applied to both convolution and deconvolution operations and the number of parameters decreases further by a factor of 7.3, with only a small percentage increase in RMSE and MAE performance. Moreover, a system-on-chip architecture for depth completion is developed on a PYNQ-based FPGA platform that achieves real-time processing for HDL-64E LiDAR at the speed 11.1 frame per second.

preprint2020arXiv

Pedestrian Tracking with Gated Recurrent Units and Attention Mechanisms

Pedestrian tracking has long been considered an important problem, especially in security applications. Previously,many approaches have been proposed with various types of sensors. One popular method is Pedestrian Dead Reckoning(PDR) [1] which is based on the inertial measurement unit(IMU) sensor. However PDR is an integration and threshold based method, which suffers from accumulation errors and low accuracy. In this paper, we propose a novel method in which the sensor data is fed into a deep learning model to predict the displacements and orientations of the pedestrian. We also devise a new apparatus to collect and construct databases containing synchronized IMU sensor data and precise locations measured by a LIDAR. The preliminary results are promising, and we plan to push this forward by collecting more data and adapting the deep learning model for all general pedestrian motions.

preprint2020arXiv

PointNet on FPGA for Real-Time LiDAR Point Cloud Processing

LiDAR sensors have been widely used in many autonomous vehicle modalities, such as perception, mapping, and localization. This paper presents an FPGA-based deep learning platform for real-time point cloud processing targeted on autonomous vehicles. The software driver for the Velodyne LiDAR sensor is modified and moved into the on-chip processor system, while the programmable logic is designed as a customized hardware accelerator. As the state-of-art deep learning algorithm for point cloud processing, PointNet is successfully implemented on the proposed FPGA platform. Targeted on a Xilinx Zynq UltraScale+ MPSoC ZCU104 development board, the FPGA implementations of PointNet achieve the computing performance of 182.1 GOPS and 280.0 GOPS for classification and segmentation respectively. The proposed design can support an input up to 4096 points per frame. The processing time is 19.8 ms for classification and 34.6 ms for segmentation, which meets the real-time requirement for most of the existing LiDAR sensors.

preprint2020arXiv

TreeRNN: Topology-Preserving Deep GraphEmbedding and Learning

General graphs are difficult for learning due to their irregular structures. Existing works employ message passing along graph edges to extract local patterns using customized graph kernels, but few of them are effective for the integration of such local patterns into global features. In contrast, in this paper we study the methods to transfer the graphs into trees so that explicit orders are learned to direct the feature integration from local to global. To this end, we apply the breadth first search (BFS) to construct trees from the graphs, which adds direction to the graph edges from the center node to the peripheral nodes. In addition, we proposed a novel projection scheme that transfer the trees to image representations, which is suitable for conventional convolution neural networks (CNNs) and recurrent neural networks (RNNs). To best learn the patterns from the graph-tree-images, we propose TreeRNN, a 2D RNN architecture that recurrently integrates the image pixels by rows and columns to help classify the graph categories. We evaluate the proposed method on several graph classification datasets, and manage to demonstrate comparable accuracy with the state-of-the-art on MUTAG, PTC-MR and NCI1 datasets.

Xinming Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Self-supervised Geometric Features Discovery via Interpretable Attention for Vehicle Re-Identification and Beyond

A Near Sensor Edge Computing System for Point Cloud Semantic Segmentation

Enabling 3D Object Detection with a Low-Resolution LiDAR

A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN

Automatic Building and Labeling of HD Maps with Deep Learning

DepthNet: Real-Time LiDAR Point Cloud Depth Completion for Autonomous Vehicles

Pedestrian Tracking with Gated Recurrent Units and Attention Mechanisms

PointNet on FPGA for Real-Time LiDAR Point Cloud Processing

TreeRNN: Topology-Preserving Deep GraphEmbedding and Learning