Source author record

Zhihai He

Zhihai He appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Information Retrieval Machine Learning Networking and Internet Architecture

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation

We observe that human poses exhibit strong group-wise structural correlation and spatial coupling between keypoints due to the biological constraints of different body parts. This group-wise structural correlation can be explored to improve the accuracy and robustness of human pose estimation. In this work, we develop a self-constrained prediction-verification network to characterize and learn the structural correlation between keypoints during training. During the inference stage, the feedback information from the verification network allows us to perform further optimization of pose prediction, which significantly improves the performance of human pose estimation. Specifically, we partition the keypoints into groups according to the biological structure of human body. Within each group, the keypoints are further partitioned into two subsets, high-confidence base keypoints and low-confidence terminal keypoints. We develop a self-constrained prediction-verification network to perform forward and backward predictions between these keypoint subsets. One fundamental challenge in pose estimation, as well as in generic prediction tasks, is that there is no mechanism for us to verify if the obtained pose estimation or prediction results are accurate or not, since the ground truth is not available. Once successfully learned, the verification network serves as an accuracy verification module for the forward pose prediction. During the inference stage, it can be used to guide the local optimization of the pose estimation results of low-confidence keypoints with the self-constrained loss on high-confidence keypoints as the objective function. Our extensive experimental results on benchmark MS COCO and CrowdPose datasets demonstrate that the proposed method can significantly improve the pose estimation results.

preprint2021arXiv

Structure-Preserving Progressive Low-rank Image Completion for Defending Adversarial Attacks

Deep neural networks recognize objects by analyzing local image details and summarizing their information along the inference layers to derive the final decision. Because of this, they are prone to adversarial attacks. Small sophisticated noise in the input images can accumulate along the network inference path and produce wrong decisions at the network output. On the other hand, human eyes recognize objects based on their global structure and semantic cues, instead of local image textures. Because of this, human eyes can still clearly recognize objects from images which have been heavily damaged by adversarial attacks. This leads to a very interesting approach for defending deep neural networks against adversarial attacks. In this work, we propose to develop a structure-preserving progressive low-rank image completion (SPLIC) method to remove unneeded texture details from the input images and shift the bias of deep neural networks towards global object structures and semantic cues. We formulate the problem into a low-rank matrix completion problem with progressively smoothed rank functions to avoid local minimums during the optimization process. Our experimental results demonstrate that the proposed method is able to successfully remove the insignificant local image details while preserving important global object structures. On black-box, gray-box, and white-box attacks, our method outperforms existing defense methods (by up to 12.6%) and significantly improves the adversarial robustness of the network.

preprint2020arXiv

Ensemble Generative Cleaning with Feedback Loops for Defending Adversarial Attacks

Effective defense of deep neural networks against adversarial attacks remains a challenging problem, especially under powerful white-box attacks. In this paper, we develop a new method called ensemble generative cleaning with feedback loops (EGC-FL) for effective defense of deep neural networks. The proposed EGC-FL method is based on two central ideas. First, we introduce a transformed deadzone layer into the defense network, which consists of an orthonormal transform and a deadzone-based activation function, to destroy the sophisticated noise pattern of adversarial attacks. Second, by constructing a generative cleaning network with a feedback loop, we are able to generate an ensemble of diverse estimations of the original clean image. We then learn a network to fuse this set of diverse estimations together to restore the original image. Our extensive experimental results demonstrate that our approach improves the state-of-art by large margins in both white-box and black-box attacks. It significantly improves the classification accuracy for white-box PGD attacks upon the second best method by more than 29% on the SVHN dataset and more than 39% on the challenging CIFAR-10 dataset.

preprint2020arXiv

Reciprocal Learning Networks for Human Trajectory Prediction

We observe that the human trajectory is not only forward predictable, but also backward predictable. Both forward and backward trajectories follow the same social norms and obey the same physical constraints with the only difference in their time directions. Based on this unique property, we develop a new approach, called reciprocal learning, for human trajectory prediction. Two networks, forward and backward prediction networks, are tightly coupled, satisfying the reciprocal constraint, which allows them to be jointly learned. Based on this constraint, we borrow the concept of adversarial attacks of deep neural networks, which iteratively modifies the input of the network to match the given or forced network output, and develop a new method for network prediction, called reciprocal attack for matched prediction. It further improves the prediction accuracy. Our experimental results on benchmark datasets demonstrate that our new method outperforms the state-of-the-art methods for human trajectory prediction.

preprint2020arXiv

Unsupervised Deep Metric Learning with Transformed Attention Consistency and Contrastive Clustering Loss

Existing approaches for unsupervised metric learning focus on exploring self-supervision information within the input image itself. We observe that, when analyzing images, human eyes often compare images against each other instead of examining images individually. In addition, they often pay attention to certain keypoints, image regions, or objects which are discriminative between image classes but highly consistent within classes. Even if the image is being transformed, the attention pattern will be consistent. Motivated by this observation, we develop a new approach to unsupervised deep metric learning where the network is learned based on self-supervision information across images instead of within one single image. To characterize the consistent pattern of human attention during image comparisons, we introduce the idea of transformed attention consistency. It assumes that visually similar images, even undergoing different image transforms, should share the same consistent visual attention map. This consistency leads to a pairwise self-supervision loss, allowing us to learn a Siamese deep neural network to encode and compare images against their transformed or matched pairs. To further enhance the inter-class discriminative power of the feature generated by this network, we adapt the concept of triplet loss from supervised metric learning to our unsupervised case and introduce the contrastive clustering loss. Our extensive experimental results on benchmark datasets demonstrate that our proposed method outperforms current state-of-the-art methods for unsupervised metric learning by a large margin.

preprint2016arXiv

A Classification Leveraged Object Detector

Currently, the state-of-the-art image classification algorithms outperform the best available object detector by a big margin in terms of average precision. We, therefore, propose a simple yet principled approach that allows us to leverage object detection through image classification on supporting regions specified by a preliminary object detector. Using a simple bag-of- words model based image classification algorithm, we leveraged the performance of the deformable model objector from 35.9% to 39.5% in average precision over 20 categories on standard PASCAL VOC 2007 detection dataset.

preprint2016arXiv

Joint Audio-Video Fingerprint Media Retrieval Using Rate-Coverage Optimization

In this work, we propose a joint audio-video fingerprint Automatic Content Recognition (ACR) technology for media retrieval. The problem is focused on how to balance the query accuracy and the size of fingerprint, and how to allocate the bits of the fingerprint to video frames and audio frames to achieve the best query accuracy. By constructing a novel concept called Coverage, which is highly correlated to the query accuracy, we are able to form a rate-coverage model to translate the original problem into an optimization problem that can be resolved by dynamic programming. To the best of our knowledge, this is the first work that uses joint audio-video fingerprint ACR technology for media retrieval with a theoretical problem formulation. Experimental results indicate that compared to reference algorithms, the proposed method has up to 25% query accuracy improvement while using 60% overall bit-rates, and 25% bit-rate reduction while achieving 85% accuracy, and it significantly outperforms the solution with single audio or video source fingerprint.

preprint2016arXiv

Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking

In this paper, we develop a new approach of spatially supervised recurrent convolutional neural networks for visual object tracking. Our recurrent convolutional network exploits the history of locations as well as the distinctive visual features learned by the deep neural networks. Inspired by recent bounding box regression methods for object detection, we study the regression capability of Long Short-Term Memory (LSTM) in the temporal domain, and propose to concatenate high-level visual features produced by convolutional networks with region information. In contrast to existing deep learning based trackers that use binary classification for region candidates, we use regression for direct prediction of the tracking locations both at the convolutional layer and at the recurrent unit. Our extensive experimental results and performance comparison with state-of-the-art tracking methods on challenging benchmark video tracking datasets shows that our tracker is more accurate and robust while maintaining low computational cost. For most test video sequences, our method achieves the best tracking performance, often outperforms the second best by a large margin.

preprint2010arXiv

Monitoring wild animal communities with arrays of motion sensitive camera traps

Studying animal movement and distribution is of critical importance to addressing environmental challenges including invasive species, infectious diseases, climate and land-use change. Motion sensitive camera traps offer a visual sensor to record the presence of a broad range of species providing location -specific information on movement and behavior. Modern digital camera traps that record video present new analytical opportunities, but also new data management challenges. This paper describes our experience with a terrestrial animal monitoring system at Barro Colorado Island, Panama. Our camera network captured the spatio-temporal dynamics of terrestrial bird and mammal activity at the site - data relevant to immediate science questions, and long-term conservation issues. We believe that the experience gained and lessons learned during our year long deployment and testing of the camera traps as well as the developed solutions are applicable to broader sensor network applications and are valuable for the advancement of the sensor network research. We suggest that the continued development of these hardware, software, and analytical tools, in concert, offer an exciting sensor-network solution to monitoring of animal populations which could realistically scale over larger areas and time spans.

Zhihai He

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation

Structure-Preserving Progressive Low-rank Image Completion for Defending Adversarial Attacks

Ensemble Generative Cleaning with Feedback Loops for Defending Adversarial Attacks

Reciprocal Learning Networks for Human Trajectory Prediction

Unsupervised Deep Metric Learning with Transformed Attention Consistency and Contrastive Clustering Loss

A Classification Leveraged Object Detector

Joint Audio-Video Fingerprint Media Retrieval Using Rate-Coverage Optimization

Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking

Monitoring wild animal communities with arrays of motion sensitive camera traps