Source author record

Dongdong Zhao

Dongdong Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Cryptography and Security Robotics

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects

High-precision tiny object alignment remains a common and critical challenge for humanoid robots in real-world. To address this problem, this paper proposes a vision-based framework for precisely estimating and controlling the relative position between a handheld tool and a target object for humanoid robots, e.g., a screwdriver tip and a screw head slot. By fusing images from the head and torso cameras on a robot with its head joint angles, the proposed Transformer-based visual servoing method can correct the handheld tool's positional errors effectively, especially at a close distance. Experiments on M4-M8 screws demonstrate an average convergence error of 0.8-1.3 mm and a success rate of 93\%-100\%. Through comparative analysis, the results validate that this capability of high-precision tiny object alignment is enabled by the Distance Estimation Transformer architecture and the Multi-Perception-Head mechanism proposed in this paper.

preprint2023arXiv

Motion-aware Memory Network for Fast Video Salient Object Detection

Previous methods based on 3DCNN, convLSTM, or optical flow have achieved great success in video salient object detection (VSOD). However, they still suffer from high computational costs or poor quality of the generated saliency maps. To solve these problems, we design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD. Furthermore, previous methods only considered single-frame prediction without temporal association. As a result, the model may not focus on the temporal information sufficiently. Thus, we initially introduce object motion prediction between inter-frame into VSOD. Our model follows standard encoder--decoder architecture. In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames. This approach is more efficient than the optical flow-based methods. In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches. The semantic information of the high-level features is used to fuse the object details in the low-level features, and then the spatiotemporal features are obtained step by step to reconstruct the saliency maps. Moreover, inspired by the boundary supervision commonly used in image salient object detection (ISOD), we design a motion-aware loss for predicting object boundary motion and simultaneously perform multitask learning for VSOD and object motion prediction, which can further facilitate the model to extract spatiotemporal features accurately and maintain the object integrity. Extensive experiments on several datasets demonstrated the effectiveness of our method and can achieve state-of-the-art metrics on some datasets. The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.

preprint2022arXiv

NegDL: Privacy-Preserving Deep Learning Based on Negative Database

In the era of big data, deep learning has become an increasingly popular topic. It has outstanding achievements in the fields of image recognition, object detection, and natural language processing et al. The first priority of deep learning is exploiting valuable information from a large amount of data, which will inevitably induce privacy issues that are worthy of attention. Presently, several privacy-preserving deep learning methods have been proposed, but most of them suffer from a non-negligible degradation of either efficiency or accuracy. Negative database (\textit{NDB}) is a new type of data representation which can protect data privacy by storing and utilizing the complementary form of original data. In this paper, we propose a privacy-preserving deep learning method named NegDL based on \textit{NDB}. Specifically, private data are first converted to \textit{NDB} as the input of deep learning models by a generation algorithm called \textit{QK}-hidden algorithm, and then the sketches of \textit{NDB} are extracted for training and inference. We demonstrate that the computational complexity of NegDL is the same as the original deep learning model without privacy protection. Experimental results on Breast Cancer, MNIST, and CIFAR-10 benchmark datasets demonstrate that the accuracy of NegDL could be comparable to the original deep learning model in most cases, and it performs better than the method based on differential privacy.