Researcher profile

Zhihai He

Zhihai He contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation

We observe that human poses exhibit strong group-wise structural correlation and spatial coupling between keypoints due to the biological constraints of different body parts. This group-wise structural correlation can be explored to improve the accuracy and robustness of human pose estimation. In this work, we develop a self-constrained prediction-verification network to characterize and learn the structural correlation between keypoints during training. During the inference stage, the feedback information from the verification network allows us to perform further optimization of pose prediction, which significantly improves the performance of human pose estimation. Specifically, we partition the keypoints into groups according to the biological structure of human body. Within each group, the keypoints are further partitioned into two subsets, high-confidence base keypoints and low-confidence terminal keypoints. We develop a self-constrained prediction-verification network to perform forward and backward predictions between these keypoint subsets. One fundamental challenge in pose estimation, as well as in generic prediction tasks, is that there is no mechanism for us to verify if the obtained pose estimation or prediction results are accurate or not, since the ground truth is not available. Once successfully learned, the verification network serves as an accuracy verification module for the forward pose prediction. During the inference stage, it can be used to guide the local optimization of the pose estimation results of low-confidence keypoints with the self-constrained loss on high-confidence keypoints as the objective function. Our extensive experimental results on benchmark MS COCO and CrowdPose datasets demonstrate that the proposed method can significantly improve the pose estimation results.

preprint2021arXiv

Structure-Preserving Progressive Low-rank Image Completion for Defending Adversarial Attacks

Deep neural networks recognize objects by analyzing local image details and summarizing their information along the inference layers to derive the final decision. Because of this, they are prone to adversarial attacks. Small sophisticated noise in the input images can accumulate along the network inference path and produce wrong decisions at the network output. On the other hand, human eyes recognize objects based on their global structure and semantic cues, instead of local image textures. Because of this, human eyes can still clearly recognize objects from images which have been heavily damaged by adversarial attacks. This leads to a very interesting approach for defending deep neural networks against adversarial attacks. In this work, we propose to develop a structure-preserving progressive low-rank image completion (SPLIC) method to remove unneeded texture details from the input images and shift the bias of deep neural networks towards global object structures and semantic cues. We formulate the problem into a low-rank matrix completion problem with progressively smoothed rank functions to avoid local minimums during the optimization process. Our experimental results demonstrate that the proposed method is able to successfully remove the insignificant local image details while preserving important global object structures. On black-box, gray-box, and white-box attacks, our method outperforms existing defense methods (by up to 12.6%) and significantly improves the adversarial robustness of the network.

preprint2020arXiv

Ensemble Generative Cleaning with Feedback Loops for Defending Adversarial Attacks

Effective defense of deep neural networks against adversarial attacks remains a challenging problem, especially under powerful white-box attacks. In this paper, we develop a new method called ensemble generative cleaning with feedback loops (EGC-FL) for effective defense of deep neural networks. The proposed EGC-FL method is based on two central ideas. First, we introduce a transformed deadzone layer into the defense network, which consists of an orthonormal transform and a deadzone-based activation function, to destroy the sophisticated noise pattern of adversarial attacks. Second, by constructing a generative cleaning network with a feedback loop, we are able to generate an ensemble of diverse estimations of the original clean image. We then learn a network to fuse this set of diverse estimations together to restore the original image. Our extensive experimental results demonstrate that our approach improves the state-of-art by large margins in both white-box and black-box attacks. It significantly improves the classification accuracy for white-box PGD attacks upon the second best method by more than 29% on the SVHN dataset and more than 39% on the challenging CIFAR-10 dataset.

preprint2020arXiv

Reciprocal Learning Networks for Human Trajectory Prediction

We observe that the human trajectory is not only forward predictable, but also backward predictable. Both forward and backward trajectories follow the same social norms and obey the same physical constraints with the only difference in their time directions. Based on this unique property, we develop a new approach, called reciprocal learning, for human trajectory prediction. Two networks, forward and backward prediction networks, are tightly coupled, satisfying the reciprocal constraint, which allows them to be jointly learned. Based on this constraint, we borrow the concept of adversarial attacks of deep neural networks, which iteratively modifies the input of the network to match the given or forced network output, and develop a new method for network prediction, called reciprocal attack for matched prediction. It further improves the prediction accuracy. Our experimental results on benchmark datasets demonstrate that our new method outperforms the state-of-the-art methods for human trajectory prediction.

preprint2020arXiv

Unsupervised Deep Metric Learning with Transformed Attention Consistency and Contrastive Clustering Loss

Existing approaches for unsupervised metric learning focus on exploring self-supervision information within the input image itself. We observe that, when analyzing images, human eyes often compare images against each other instead of examining images individually. In addition, they often pay attention to certain keypoints, image regions, or objects which are discriminative between image classes but highly consistent within classes. Even if the image is being transformed, the attention pattern will be consistent. Motivated by this observation, we develop a new approach to unsupervised deep metric learning where the network is learned based on self-supervision information across images instead of within one single image. To characterize the consistent pattern of human attention during image comparisons, we introduce the idea of transformed attention consistency. It assumes that visually similar images, even undergoing different image transforms, should share the same consistent visual attention map. This consistency leads to a pairwise self-supervision loss, allowing us to learn a Siamese deep neural network to encode and compare images against their transformed or matched pairs. To further enhance the inter-class discriminative power of the feature generated by this network, we adapt the concept of triplet loss from supervised metric learning to our unsupervised case and introduce the contrastive clustering loss. Our extensive experimental results on benchmark datasets demonstrate that our proposed method outperforms current state-of-the-art methods for unsupervised metric learning by a large margin.

preprint2010arXiv

Monitoring wild animal communities with arrays of motion sensitive camera traps

Studying animal movement and distribution is of critical importance to addressing environmental challenges including invasive species, infectious diseases, climate and land-use change. Motion sensitive camera traps offer a visual sensor to record the presence of a broad range of species providing location -specific information on movement and behavior. Modern digital camera traps that record video present new analytical opportunities, but also new data management challenges. This paper describes our experience with a terrestrial animal monitoring system at Barro Colorado Island, Panama. Our camera network captured the spatio-temporal dynamics of terrestrial bird and mammal activity at the site - data relevant to immediate science questions, and long-term conservation issues. We believe that the experience gained and lessons learned during our year long deployment and testing of the camera traps as well as the developed solutions are applicable to broader sensor network applications and are valuable for the advancement of the sensor network research. We suggest that the continued development of these hardware, software, and analytical tools, in concert, offer an exciting sensor-network solution to monitoring of animal populations which could realistically scale over larger areas and time spans.