Source author record

Chao Zeng

Chao Zeng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV physics.ao-ph Machine Learning math.NA physics.data-an Robotics

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp

The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to judge the shape of the target object, resulting in low grasping accuracy. Humans can accurately grasp objects from a single view by leveraging their geometry experience to estimate object shapes. Inspired by humans, we propose a novel 6-DoF grasping framework that converts the point completion results as object shape features to train the 6-DoF grasp network. Here, point completion can generate approximate complete points from the 2.5D points similar to the human geometry experience, and converting it as shape features is the way to utilize it to improve grasp efficiency. Furthermore, due to the gap between the network generation and actual execution, we integrate a score filter into our framework to select more executable grasp proposals for the real robot. This enables our method to maintain a high grasp quality in any camera viewpoint. Extensive experiments demonstrate that utilizing complete point features enables the generation of significantly more accurate grasp proposals and the inclusion of a score filter greatly enhances the credibility of real-world robot grasping. Our method achieves a 17.8\% success rate higher than the state-of-the-art method in real-world experiments.

preprint2022arXiv

Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection

Salient Object Detection is the task of predicting the human attended region in a given scene. Fusing depth information has been proven effective in this task. The main challenge of this problem is how to aggregate the complementary information from RGB modality and depth modality. However, conventional deep models heavily rely on CNN feature extractors, and the long-range contextual dependencies are usually ignored. In this work, we propose Dual Swin-Transformer based Mutual Interactive Network. We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs. Before fusing the two branches of features into one, attention-based modules are applied to enhance features from each modality. We design a self-attention-based cross-modality interaction module and a gated modality attention module to leverage the complementary information between the two modalities. For the saliency decoding, we create different stages enhanced with dense connections and keep a decoding memory while the multi-level encoding features are considered simultaneously. Considering the inaccurate depth map issue, we collect the RGB features of early stages into a skip convolution module to give more guidance from RGB modality to the final saliency prediction. In addition, we add edge supervision to regularize the feature learning process. Comprehensive experiments on five standard RGB-D SOD benchmark datasets over four evaluation metrics demonstrate the superiority of the proposed DTMINet method.

preprint2022arXiv

Learning Transformer Features for Image Quality Assessment

Objective image quality evaluation is a challenging task, which aims to measure the quality of a given image automatically. According to the availability of the reference images, there are Full-Reference and No-Reference IQA tasks, respectively. Most deep learning approaches use regression from deep features extracted by Convolutional Neural Networks. For the FR task, another option is conducting a statistical comparison on deep features. For all these methods, non-local information is usually neglected. In addition, the relationship between FR and NR tasks is less explored. Motivated by the recent success of transformers in modeling contextual information, we propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme. Evaluation experiments on three standard IQA datasets, i.e., LIVE, CSIQ and TID2013, and KONIQ-10K, show that our proposed model can achieve state-of-the-art FR performance. In addition, comparable NR performance is achieved in extensive experiments, and the results show that the NR performance can be leveraged by the joint training scheme.

preprint2020arXiv

Deep learning-based air temperature mapping by fusing remote sensing, station, simulation and socioeconomic data

Air temperature (Ta) is an essential climatological component that controls and influences various earth surface processes. In this study, we make the first attempt to employ deep learning for Ta mapping mainly based on space remote sensing and ground station observations. Considering that Ta varies greatly in space and time and is sensitive to many factors, assimilation data and socioeconomic data are also included for a multi-source data fusion based estimation. Specifically, a 5-layers structured deep belief network (DBN) is employed to better capture the complicated and non-linear relationships between Ta and different predictor variables. Layer-wise pre-training process for essential features extraction and fine-tuning process for weight parameters optimization ensure the robust prediction of Ta spatio-temporal distribution. The DBN model was implemented for 0.01° daily maximum Ta mapping across China. The ten-fold cross-validation results indicate that the DBN model achieves promising results with the RMSE of 1.996°C, MAE of 1.539°C, and R of 0.986 at the national scale. Compared with multiple linear regression (MLR), back-propagation neural network (BPNN) and random forest (RF) method, the DBN model reduces the MAE values by 1.340°C, 0.387°C and 0.222°C, respectively. Further analysis on spatial distribution and temporal tendency of prediction errors both validate the great potentials of DBN in Ta estimation.

preprint2019arXiv

Spatially Continuous and High-resolution Land Surface Temperature: A Review of Reconstruction and Spatiotemporal Fusion Techniques

Remotely sensed, spatially continuous and high spatiotemporal resolution (hereafter referred to as high resolution) land surface temperature (LST) is a key parameter for studying the thermal environment and has important applications in many fields. However, difficult atmospheric conditions, sensor malfunctioning and scanning gaps between orbits frequently introduce spatial discontinuities into satellite-retri1eved LST products. For a single sensor, there is also a trade-off between temporal and spatial resolution and, therefore, it is impossible to obtain high temporal and spatial resolution simultaneously. In recent years the reconstruction and spatiotemporal fusion of LST products have become active research topics that aim at overcoming this limitation. They are two of most investigated approaches in thermal remote sensing and attract increasing attention, which has resulted in a number of different algorithms. However, to the best of our knowledge, currently no review exists that expatiates and summarizes the available LST reconstruction and spatiotemporal fusion methods and algorithms. This paper introduces the principles and theories behind LST reconstruction and spatiotemporal fusion and provides an overview of the published research and algorithms. We summarized three kinds of reconstruction methods for missing pixels (spatial, temporal and spatiotemporal methods), two kinds of reconstruction methods for cloudy pixels (Satellite Passive Microwave (PMW)-based and Surface Energy Balance (SEB)-based methods) and three kinds of spatiotemporal fusion methods (weighted function-based, unmixing-based and hybrid methods). The review concludes by summarizing validation methods and by identifying some promising future research directions for generating spatially continuous and high resolution LST products.

preprint2015arXiv

Dimensions of Biquadratic and Bicubic Spline Spaces over Hierarchical T-meshes

This paper discusses the dimensions of biquadratic C1 spline spaces and bicubic C2 spline spaces over hierarchical T-meshes using the smoothing cofactor-conformality method. We obtain the dimension formula of biquadratic C1 spline spaces over hierarchical T-meshes in a concise way. In addition, we provide a dimension formula for bicubic C2 spline spaces over hierarchical T-mesh with fewer restrictions than that in the previous literature. A dimension formula for bicubic C2 spline spaces over a new type hierarchical T-mesh is also provided.