Source author record

Yunzhi Lin

Yunzhi Lin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Robotics Computer Vision eess.SP Information Theory math.IT Networking and Internet Architecture

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence with Uncertainty Estimation

We propose a single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category. Our method takes as input the previous and current frame from a monocular RGB video, as well as predictions from the previous frame, to predict the bounding cuboid and 6-DoF pose (up to scale). Internally, a deep network predicts distributions over object keypoints (vertices of the bounding cuboid) in image coordinates, after which a novel probabilistic filtering process integrates across estimates before computing the final pose using PnP. Our framework allows the system to take previous uncertainties into consideration when predicting the current frame, resulting in predictions that are more accurate and stable than single frame methods. Extensive experiments show that our method outperforms existing approaches on the challenging Objectron benchmark of annotated object videos. We also demonstrate the usability of our work in an augmented reality setting.

preprint2022arXiv

Primitive Shape Recognition for Object Grasping

Shape informs how an object should be grasped, both in terms of where and how. As such, this paper describes a segmentation-based architecture for decomposing objects sensed with a depth camera into multiple primitive shapes, along with a post-processing pipeline for robotic grasping. Segmentation employs a deep network, called PS-CNN, trained on synthetic data with 6 classes of primitive shapes and generated using a simulation engine. Each primitive shape is designed with parametrized grasp families, permitting the pipeline to identify multiple grasp candidates per shape region. The grasps are rank ordered, with the first feasible one chosen for execution. For task-free grasping of individual objects, the method achieves a 94.2% success rate placing it amongst the top performing grasp methods when compared to top-down and SE(3)-based approaches. Additional tests involving variable viewpoints and clutter demonstrate robustness to setup. For task-oriented grasping, PS-CNN achieves a 93.0% success rate. Overall, the outcomes support the hypothesis that explicitly encoding shape primitives within a grasping pipeline should boost grasping performance, including task-free and task-relevant grasp prediction.

preprint2022arXiv

SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following

This paper investigates robot manipulation based on human instruction with ambiguous requests. The intent is to compensate for imperfect natural language via visual observations. Early symbolic methods, based on manually defined symbols, built modular framework consist of semantic parsing and task planning for producing sequences of actions from natural language requests. Modern connectionist methods employ deep neural networks to automatically learn visual and linguistic features and map to a sequence of low-level actions, in an endto-end fashion. These two approaches are blended to create a hybrid, modular framework: it formulates instruction following as symbolic goal learning via deep neural networks followed by task planning via symbolic planners. Connectionist and symbolic modules are bridged with Planning Domain Definition Language. The vision-and-language learning network predicts its goal representation, which is sent to a planner for producing a task-completing action sequence. For improving the flexibility of natural language, we further incorporate implicit human intents with explicit human instructions. To learn generic features for vision and language, we propose to separately pretrain vision and language encoders on scene graph parsing and semantic textual similarity tasks. Benchmarking evaluates the impacts of different components of, or options for, the vision-and-language learning model and shows the effectiveness of pretraining strategies. Manipulation experiments conducted in the simulator AI2THOR show the robustness of the framework to novel scenarios.

preprint2022arXiv

Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image

Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6-DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation that operates on unknown object instances within a known category using a single RGB image as input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric (27.6% higher than the MobilePose single-stage approach and 7.1% higher than the related two-stage approach).

preprint2020arXiv

Joint Optimization of Spectrum and Energy Efficiency Considering the C-V2X Security: A Deep Reinforcement Learning Approach

Cellular vehicle-to-everything (C-V2X) communication, as a part of 5G wireless communication, has been considered one of the most significant techniques for Smart City. Vehicles platooning is an application of Smart City that improves traffic capacity and safety by C-V2X. However, different from vehicles platooning travelling on highways, C-V2X could be more easily eavesdropped and the spectrum resource could be limited when they converge at an intersection. Satisfying the secrecy rate of C-V2X, how to increase the spectrum efficiency (SE) and energy efficiency (EE) in the platooning network is a big challenge. In this paper, to solve this problem, we propose a Security-Aware Approach to Enhancing SE and EE Based on Deep Reinforcement Learning, named SEED. The SEED formulates an objective optimization function considering both SE and EE, and the secrecy rate of C-V2X is treated as a critical constraint of this function. The optimization problem is transformed into the spectrum and transmission power selections of V2V and V2I links using deep Q network (DQN). The heuristic result of SE and EE is obtained by the DQN policy based on rewards. Finally, we simulate the traffic and communication environments using Python. The evaluation results demonstrate that the SEED outperforms the DQN-wopa algorithm and the baseline algorithm by 31.83 % and 68.40 % in efficiency. Source code for the SEED is available at https://github.com/BandaidZ/OptimizationofSEandEEBasedonDRL.

Yunzhi Lin

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence with Uncertainty Estimation

Primitive Shape Recognition for Object Grasping

SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following

Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image

Joint Optimization of Spectrum and Energy Efficiency Considering the C-V2X Security: A Deep Reinforcement Learning Approach