Source author record

Zhanyi Hu

Zhanyi Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Data Structures and Algorithms Robotics

Catalog footprint

What is connected

9works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Iterative Co-Training Transductive Framework for Zero Shot Learning

In zero-shot learning (ZSL) community, it is generally recognized that transductive learning performs better than inductive one as the unseen-class samples are also used in its training stage. How to generate pseudo labels for unseen-class samples and how to use such usually noisy pseudo labels are two critical issues in transductive learning. In this work, we introduce an iterative co-training framework which contains two different base ZSL models and an exchanging module. At each iteration, the two different ZSL models are co-trained to separately predict pseudo labels for the unseen-class samples, and the exchanging module exchanges the predicted pseudo labels, then the exchanged pseudo-labeled samples are added into the training sets for the next iteration. By such, our framework can gradually boost the ZSL performance by fully exploiting the potential complementarity of the two models' classification capabilities. In addition, our co-training framework is also applied to the generalized ZSL (GZSL), in which a semantic-guided OOD detector is proposed to pick out the most likely unseen-class samples before class-level classification to alleviate the bias problem in GZSL. Extensive experiments on three benchmarks show that our proposed methods could significantly outperform about $31$ state-of-the-art ones.

preprint2022arXiv

HardBoost: Boosting Zero-Shot Learning with Hard Classes

This work is a systematical analysis on the so-called hard class problem in zero-shot learning (ZSL), that is, some unseen classes disproportionally affect the ZSL performances than others, as well as how to remedy the problem by detecting and exploiting hard classes. At first, we report our empirical finding that the hard class problem is a ubiquitous phenomenon and persists regardless of used specific methods in ZSL. Then, we find that high semantic affinity among unseen classes is a plausible underlying cause of hardness and design two metrics to detect hard classes. Finally, two frameworks are proposed to remedy the problem by detecting and exploiting hard classes, one under inductive setting, the other under transductive setting. The proposed frameworks could accommodate most existing ZSL methods to further significantly boost their performances with little efforts. Extensive experiments on three popular benchmarks demonstrate the benefits by identifying and exploiting the hard classes in ZSL.

preprint2022arXiv

Language-Level Semantics Conditioned 3D Point Cloud Segmentation

In this work, a language-level Semantics Conditioned framework for 3D Point cloud segmentation, called SeCondPoint, is proposed, where language-level semantics are introduced to condition the modeling of point feature distribution as well as the pseudo-feature generation, and a feature-geometry-based mixup approach is further proposed to facilitate the distribution learning. To our knowledge, this is the first attempt in literature to introduce language-level semantics to the 3D point cloud segmentation task. Since a large number of point features could be generated from the learned distribution thanks to the semantics conditioned modeling, any existing segmentation network could be embedded into the proposed framework to boost its performance. In addition, the proposed framework has the inherent advantage of dealing with novel classes, which seems an impossible feat for the current segmentation networks. Extensive experimental results on two public datasets demonstrate that three typical segmentation networks could achieve significant improvements over their original performances after enhancement by the proposed framework in the conventional 3D segmentation task. Two benchmarks are also introduced for a newly introduced zero-shot 3D segmentation task, and the results also validate the proposed framework.

preprint2022arXiv

Semantic-diversity transfer network for generalized zero-shot learning via inner disagreement based OOD detector

Zero-shot learning (ZSL) aims to recognize objects from unseen classes, where the kernel problem is to transfer knowledge from seen classes to unseen classes by establishing appropriate mappings between visual and semantic features. The knowledge transfer in many existing works is limited mainly due to the facts that 1) the widely used visual features are global ones but not totally consistent with semantic attributes; 2) only one mapping is learned in existing works, which is not able to effectively model diverse visual-semantic relations; 3) the bias problem in the generalized ZSL (GZSL) could not be effectively handled. In this paper, we propose two techniques to alleviate these limitations. Firstly, we propose a Semantic-diversity transfer Network (SetNet) addressing the first two limitations, where 1) a multiple-attention architecture and a diversity regularizer are proposed to learn multiple local visual features that are more consistent with semantic attributes and 2) a projector ensemble that geometrically takes diverse local features as inputs is proposed to model visual-semantic relations from diverse local perspectives. Secondly, we propose an inner disagreement based domain detection module (ID3M) for GZSL to alleviate the third limitation, which picks out unseen-class data before class-level classification. Due to the absence of unseen-class data in training stage, ID3M employs a novel self-contained training scheme and detects out unseen-class data based on a designed inner disagreement criterion. Experimental results on three public datasets demonstrate that the proposed SetNet with the explored ID3M achieves a significant improvement against $30$ state-of-the-art methods.

preprint2021arXiv

Bidirectional Trajectory Computation for Odometer-Aided Visual-Inertial SLAM

Odometer-aided visual-inertial SLAM systems typically have a good performance for navigation of wheeled platforms, while they usually suffer from degenerate cases before the first turning. In this paper, firstly we perform an observability analysis w.r.t. the extrinsic parameters before the first turning, which is a complement of the existing results of observability analyses. Secondly, inspired by the above observability analyses, we propose a bidirectional trajectory computation method, by which the poses before the first turning are refined in the backward computation thread, and the real-time trajectory is adjusted accordingly. Experimental results prove that our proposed method not only solves the problem of the unobservability of accelerometer bias and extrinsic parameters before the first turning, but also results in more accurate trajectories in comparison with the state-of-the-art approaches.

preprint2021arXiv

Superpoint-guided Semi-supervised Semantic Segmentation of 3D Point Clouds

3D point cloud semantic segmentation is a challenging topic in the computer vision field. Most of the existing methods in literature require a large amount of fully labeled training data, but it is extremely time-consuming to obtain these training data by manually labeling massive point clouds. Addressing this problem, we propose a superpoint-guided semi-supervised segmentation network for 3D point clouds, which jointly utilizes a small portion of labeled scene point clouds and a large number of unlabeled point clouds for network training. The proposed network is iteratively updated with its predicted pseudo labels, where a superpoint generation module is introduced for extracting superpoints from 3D point clouds, and a pseudo-label optimization module is explored for automatically assigning pseudo labels to the unlabeled points under the constraint of the extracted superpoints. Additionally, there are some 3D points without pseudo-label supervision. We propose an edge prediction module to constrain features of edge points. A superpoint feature aggregation module and a superpoint feature consistency loss function are introduced to smooth superpoint features. Extensive experimental results on two 3D public datasets demonstrate that our method can achieve better performance than several state-of-the-art point cloud segmentation networks and several popular semi-supervised segmentation methods with few labeled scenes.

preprint2020arXiv

Zero-Shot Learning from Adversarial Feature Residual to Compact Visual Feature

Recently, many zero-shot learning (ZSL) methods focused on learning discriminative object features in an embedding feature space, however, the distributions of the unseen-class features learned by these methods are prone to be partly overlapped, resulting in inaccurate object recognition. Addressing this problem, we propose a novel adversarial network to synthesize compact semantic visual features for ZSL, consisting of a residual generator, a prototype predictor, and a discriminator. The residual generator is to generate the visual feature residual, which is integrated with a visual prototype predicted via the prototype predictor for synthesizing the visual feature. The discriminator is to distinguish the synthetic visual features from the real ones extracted from an existing categorization CNN. Since the generated residuals are generally numerically much smaller than the distances among all the prototypes, the distributions of the unseen-class features synthesized by the proposed network are less overlapped. In addition, considering that the visual features from categorization CNNs are generally inconsistent with their semantic features, a simple feature selection strategy is introduced for extracting more compact semantic visual features. Extensive experimental results on six benchmark datasets demonstrate that our method could achieve a significantly better performance than existing state-of-the-art methods by 1.2-13.2% in most cases.

preprint2016arXiv

Dynamic Parallel and Distributed Graph Cuts

Graph-cuts are widely used in computer vision. In order to speed up the optimization process and improve the scalability for large graphs, Strandmark and Kahl introduced a splitting method to split a graph into multiple subgraphs for parallel computation in both shared and distributed memory models. However, this parallel algorithm (parallel BK-algorithm) does not have a polynomial bound on the number of iterations and is found non-convergent in some cases due to the possible multiple optimal solutions of its sub-problems. To remedy this non-convergence problem, in this work we first introduce a merging method capable of merging any number of those adjacent sub-graphs which could hardly reach an agreement on their overlapped region in the parallel BK algorithm. Based on the pseudo-boolean representations of graph cuts,our merging method is shown able to effectively reuse all the computed flows in these sub-graphs. Through both the splitting and merging, we further propose a dynamic parallel and distributed graph-cuts algorithm with guaranteed convergence to the globally optimal solutions within a predefined number of iterations. In essence, this work provides a general framework to allow more sophisticated splitting and merging strategies to be employed to further boost performance. Our dynamic parallel algorithm is validated with extensive experimental results.

preprint2016arXiv

Modern Physiognomy: An Investigation on Predicting Personality Traits and Intelligence from the Human Face

The human behavior of evaluating other individuals with respect to their personality traits and intelligence by evaluating their faces plays a crucial role in human relations. These trait judgments might influence important social outcomes in our lives such as elections and court sentences. Previous studies have reported that human can make valid inferences for at least four personality traits. In addition, some studies have demonstrated that facial trait evaluation can be learned using machine learning methods accurately. In this work, we experimentally explore whether self-reported personality traits and intelligence can be predicted reliably from a facial image. More specifically, the prediction problem is separately cast in two parts: a classification task and a regression task. A facial structural feature is constructed from the relations among facial salient points, and an appearance feature is built by five texture descriptors. In addition, a minutia-based fingerprint feature from a fingerprint image is also explored. The classification results show that the personality traits "Rule-consciousness" and "Vigilance" can be predicted reliably, and that the traits of females can be predicted more accurately than those of male. However, the regression experiments show that it is difficult to predict scores for individual personality traits and intelligence. The residual plots and the correlation results indicate no evident linear correlation between the measured scores and the predicted scores. Both the classification and the regression results reveal that "Rule-consciousness" and "Tension" can be reliably predicted from the facial features, while "Social boldness" gets the worst prediction results. The experiments results show that it is difficult to predict intelligence from either the facial features or the fingerprint feature, a finding that is in agreement with previous studies.

Zhanyi Hu

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

An Iterative Co-Training Transductive Framework for Zero Shot Learning

HardBoost: Boosting Zero-Shot Learning with Hard Classes

Language-Level Semantics Conditioned 3D Point Cloud Segmentation

Semantic-diversity transfer network for generalized zero-shot learning via inner disagreement based OOD detector

Bidirectional Trajectory Computation for Odometer-Aided Visual-Inertial SLAM

Superpoint-guided Semi-supervised Semantic Segmentation of 3D Point Clouds

Zero-Shot Learning from Adversarial Feature Residual to Compact Visual Feature

Dynamic Parallel and Distributed Graph Cuts

Modern Physiognomy: An Investigation on Predicting Personality Traits and Intelligence from the Human Face