Source author record

Baocai Yin

Baocai Yin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Machine Learning Graphics Robotics

Catalog footprint

What is connected

18works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A New Perspective on the Effects of Spectrum in Graph Neural Networks

Many improvements on GNNs can be deemed as operations on the spectrum of the underlying graph matrix, which motivates us to directly study the characteristics of the spectrum and their effects on GNN performance. By generalizing most existing GNN architectures, we show that the correlation issue caused by the $unsmooth$ spectrum becomes the obstacle to leveraging more powerful graph filters as well as developing deep architectures, which therefore restricts GNNs' performance. Inspired by this, we propose the correlation-free architecture which naturally removes the correlation issue among different channels, making it possible to utilize more sophisticated filters within each channel. The final correlation-free architecture with more powerful filters consistently boosts the performance of learning graph representations. Code is available at https://github.com/qslim/gnn-spectrum.

preprint2022arXiv

Bi-directional Object-context Prioritization Learning for Saliency Ranking

The saliency ranking task is recently proposed to study the visual behavior that humans would typically shift their attention over different objects of a scene based on their degrees of saliency. Existing approaches focus on learning either object-object or object-scene relations. Such a strategy follows the idea of object-based attention in Psychology, but it tends to favor those objects with strong semantics (e.g., humans), resulting in unrealistic saliency ranking. We observe that spatial attention works concurrently with object-based attention in the human visual recognition system. During the recognition process, the human spatial attention mechanism would move, engage, and disengage from region to region (i.e., context to context). This inspires us to model the region-level interactions, in addition to the object-level reasoning, for saliency ranking. To this end, we propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking. Our model includes two novel modules: (1) a selective object saliency (SOS) module that models objectbased attention via inferring the semantic representation of the salient object, and (2) an object-context-object relation (OCOR) module that allocates saliency ranks to objects by jointly modeling the object-context and context-object interactions of the salient objects. Extensive experiments show that our approach outperforms existing state-of-theart methods. Our code and pretrained model are available at https://github.com/GrassBro/OCOR.

preprint2022arXiv

DR-GAN: Distribution Regularization for Text-to-Image Generation

This paper presents a new Text-to-Image generation model, named Distribution Regularization Generative Adversarial Network (DR-GAN), to generate images from text descriptions from improved distribution learning. In DR-GAN, we introduce two novel modules: a Semantic Disentangling Module (SDM) and a Distribution Normalization Module (DNM). SDM combines the spatial self-attention mechanism and a new Semantic Disentangling Loss (SDL) to help the generator distill key semantic information for the image generation. DNM uses a Variational Auto-Encoder (VAE) to normalize and denoise the image latent distribution, which can help the discriminator better distinguish synthesized images from real images. DNM also adopts a Distribution Adversarial Loss (DAL) to guide the generator to align with normalized real image distributions in the latent space. Extensive experiments on two public datasets demonstrated that our DR-GAN achieved a competitive performance in the Text-to-Image task.

preprint2022arXiv

MHSA-Net: Multi-Head Self-Attention Network for Occluded Person Re-Identification

This paper presents a novel person re-identification model, named Multi-Head Self-Attention Network (MHSA-Net), to prune unimportant information and capture key local information from person images. MHSA-Net contains two main novel components: Multi-Head Self-Attention Branch (MHSAB) and Attention Competition Mechanism (ACM). The MHSAB adaptively captures key local person information, and then produces effective diversity embeddings of an image for the person matching. The ACM further helps filter out attention noise and non-key information. Through extensive ablation studies, we verified that the Multi-Head Self-Attention Branch (MHSAB) and Attention Competition Mechanism (ACM) both contribute to the performance improvement of the MHSA-Net. Our MHSA-Net achieves competitive performance in the standard and occluded person Re-ID tasks.

preprint2022arXiv

Monocular Camera-based Complex Obstacle Avoidance via Efficient Deep Reinforcement Learning

Deep reinforcement learning has achieved great success in laser-based collision avoidance works because the laser can sense accurate depth information without too much redundant data, which can maintain the robustness of the algorithm when it is migrated from the simulation environment to the real world. However, high-cost laser devices are not only difficult to deploy for a large scale of robots but also demonstrate unsatisfactory robustness towards the complex obstacles, including irregular obstacles, e.g., tables, chairs, and shelves, as well as complex ground and special materials. In this paper, we propose a novel monocular camera-based complex obstacle avoidance framework. Particularly, we innovatively transform the captured RGB images to pseudo-laser measurements for efficient deep reinforcement learning. Compared to the traditional laser measurement captured at a certain height that only contains one-dimensional distance information away from the neighboring obstacles, our proposed pseudo-laser measurement fuses the depth and semantic information of the captured RGB image, which makes our method effective for complex obstacles. We also design a feature extraction guidance module to weight the input pseudo-laser measurement, and the agent has more reasonable attention for the current state, which is conducive to improving the accuracy and efficiency of the obstacle avoidance policy.

preprint2022arXiv

NodeTrans: A Graph Transfer Learning Approach for Traffic Prediction

Recently, deep learning methods have made great progress in traffic prediction, but their performance depends on a large amount of historical data. In reality, we may face the data scarcity issue. In this case, deep learning models fail to obtain satisfactory performance. Transfer learning is a promising approach to solve the data scarcity issue. However, existing transfer learning approaches in traffic prediction are mainly based on regular grid data, which is not suitable for the inherent graph data in the traffic network. Moreover, existing graph-based models can only capture shared traffic patterns in the road network, and how to learn node-specific patterns is also a challenge. In this paper, we propose a novel transfer learning approach to solve the traffic prediction with few data, which can transfer the knowledge learned from a data-rich source domain to a data-scarce target domain. First, a spatial-temporal graph neural network is proposed, which can capture the node-specific spatial-temporal traffic patterns of different road networks. Then, to improve the robustness of transfer, we design a pattern-based transfer strategy, where we leverage a clustering-based mechanism to distill common spatial-temporal patterns in the source domain, and use these knowledge to further improve the prediction performance of the target domain. Experiments on real-world datasets verify the effectiveness of our approach.

preprint2021arXiv

Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation

In this paper, we propose a fully automatic system for generating comic books from videos without any human intervention. Given an input video along with its subtitles, our approach first extracts informative keyframes by analyzing the subtitles, and stylizes keyframes into comic-style images. Then, we propose a novel automatic multi-page layout framework, which can allocate the images across multiple pages and synthesize visually interesting layouts based on the rich semantics of the images (e.g., importance and inter-image relation). Finally, as opposed to using the same type of balloon as in previous works, we propose an emotion-aware balloon generation method to create different types of word balloons by analyzing the emotion of subtitles and audios. Our method is able to vary balloon shapes and word sizes in balloons in response to different emotions, leading to more enriched reading experience. Once the balloons are generated, they are placed adjacent to their corresponding speakers via speaker detection. Our results show that our method, without requiring any user inputs, can generate high-quality comic pages with visually rich layouts and balloons. Our user studies also demonstrate that users prefer our generated results over those by state-of-the-art comic generation systems.

preprint2021arXiv

CaEGCN: Cross-Attention Fusion based Enhanced Graph Convolutional Network for Clustering

With the powerful learning ability of deep convolutional networks, deep clustering methods can extract the most discriminative information from individual data and produce more satisfactory clustering results. However, existing deep clustering methods usually ignore the relationship between the data. Fortunately, the graph convolutional network can handle such relationship, opening up a new research direction for deep clustering. In this paper, we propose a cross-attention based deep clustering framework, named Cross-Attention Fusion based Enhanced Graph Convolutional Network (CaEGCN), which contains four main modules: the cross-attention fusion module which innovatively concatenates the Content Auto-encoder module (CAE) relating to the individual data and Graph Convolutional Auto-encoder module (GAE) relating to the relationship between the data in a layer-by-layer manner, and the self-supervised model that highlights the discriminative information for clustering tasks. While the cross-attention fusion module fuses two kinds of heterogeneous representation, the CAE module supplements the content information for the GAE module, which avoids the over-smoothing problem of GCN. In the GAE module, two novel loss functions are proposed that reconstruct the content and relationship between the data, respectively. Finally, the self-supervised module constrains the distributions of the middle layer representations of CAE and GAE to be consistent. Experimental results on different types of datasets prove the superiority and robustness of the proposed CaEGCN.

preprint2021arXiv

Incomplete Descriptor Mining with Elastic Loss for Person Re-Identification

In this paper, we propose a novel person Re-ID model, Consecutive Batch DropBlock Network (CBDB-Net), to capture the attentive and robust person descriptor for the person Re-ID task. The CBDB-Net contains two novel designs: the Consecutive Batch DropBlock Module (CBDBM) and the Elastic Loss (EL). In the Consecutive Batch DropBlock Module (CBDBM), we firstly conduct uniform partition on the feature maps. And then, we independently and continuously drop each patch from top to bottom on the feature maps, which can output multiple incomplete feature maps. In the training stage, these multiple incomplete features can better encourage the Re-ID model to capture the robust person descriptor for the Re-ID task. In the Elastic Loss (EL), we design a novel weight control item to help the Re-ID model adaptively balance hard sample pairs and easy sample pairs in the whole training process. Through an extensive set of ablation studies, we verify that the Consecutive Batch DropBlock Module (CBDBM) and the Elastic Loss (EL) each contribute to the performance boosts of CBDB-Net. We demonstrate that our CBDB-Net can achieve the competitive performance on the three standard person Re-ID datasets (the Market-1501, the DukeMTMC-Re-ID, and the CUHK03 dataset), three occluded Person Re-ID datasets (the Occluded DukeMTMC, the Partial-REID, and the Partial iLIDS dataset), and a general image retrieval dataset (In-Shop Clothes Retrieval dataset).

preprint2016arXiv

Block-Diagonal Sparse Representation by Learning a Linear Combination Dictionary for Recognition

In a sparse representation based recognition scheme, it is critical to learn a desired dictionary, aiming both good representational power and discriminative performance. In this paper, we propose a new dictionary learning model for recognition applications, in which three strategies are adopted to achieve these two objectives simultaneously. First, a block-diagonal constraint is introduced into the model to eliminate the correlation between classes and enhance the discriminative performance. Second, a low-rank term is adopted to model the coherence within classes for refining the sparse representation of each class. Finally, instead of using the conventional over-complete dictionary, a specific dictionary constructed from the linear combination of the training samples is proposed to enhance the representational power of the dictionary and to improve the robustness of the sparse representation model. The proposed method is tested on several public datasets. The experimental results show the method outperforms most state-of-the-art methods.

preprint2016arXiv

Kernelized LRR on Grassmann Manifolds for Subspace Clustering

Low rank representation (LRR) has recently attracted great interest due to its pleasing efficacy in exploring low-dimensional sub- space structures embedded in data. One of its successful applications is subspace clustering, by which data are clustered according to the subspaces they belong to. In this paper, at a higher level, we intend to cluster subspaces into classes of subspaces. This is naturally described as a clustering problem on Grassmann manifold. The novelty of this paper is to generalize LRR on Euclidean space onto an LRR model on Grassmann manifold in a uniform kernelized LRR framework. The new method has many applications in data analysis in computer vision tasks. The proposed models have been evaluated on a number of practical data analysis applications. The experimental results show that the proposed models outperform a number of state-of-the-art subspace clustering methods.

preprint2016arXiv

Matrix Variate RBM Model with Gaussian Distributions

Restricted Boltzmann Machine (RBM) is a particular type of random neural network models modeling vector data based on the assumption of Bernoulli distribution. For multi-dimensional and non-binary data, it is necessary to vectorize and discretize the information in order to apply the conventional RBM. It is well-known that vectorization would destroy internal structure of data, and the binary units will limit the applying performance due to fickle real data. To address the issue, this paper proposes a Matrix variate Gaussian Restricted Boltzmann Machine (MVGRBM) model for matrix data whose entries follow Gaussian distributions. Compared with some other RBM algorithm, MVGRBM can model real value data better and it has good performance in image classification.

preprint2016arXiv

Partial Least Squares Regression on Riemannian Manifolds and Its Application in Classifications

Partial least squares regression (PLSR) has been a popular technique to explore the linear relationship between two datasets. However, most of algorithm implementations of PLSR may only achieve a suboptimal solution through an optimization on the Euclidean space. In this paper, we propose several novel PLSR models on Riemannian manifolds and develop optimization algorithms based on Riemannian geometry of manifolds. This algorithm can calculate all the factors of PLSR globally to avoid suboptimal solutions. In a number of experiments, we have demonstrated the benefits of applying the proposed model and algorithm to a variety of learning tasks in pattern recognition and object classification.

preprint2016arXiv

Tensor Sparse and Low-Rank based Submodule Clustering Method for Multi-way Data

A new submodule clustering method via sparse and low-rank representation for multi-way data is proposed in this paper. Instead of reshaping multi-way data into vectors, this method maintains their natural orders to preserve data intrinsic structures, e.g., image data kept as matrices. To implement clustering, the multi-way data, viewed as tensors, are represented by the proposed tensor sparse and low-rank model to obtain its submodule representation, called a free module, which is finally used for spectral clustering. The proposed method extends the conventional subspace clustering method based on sparse and low-rank representation to multi-way data submodule clustering by combining t-product operator. The new method is tested on several public datasets, including synthetical data, video sequences and toy images. The experiments show that the new method outperforms the state-of-the-art methods, such as Sparse Subspace Clustering (SSC), Low-Rank Representation (LRR), Ordered Subspace Clustering (OSC), Robust Latent Low Rank Representation (RobustLatLRR) and Sparse Submodule Clustering method (SSmC).

preprint2015arXiv

Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization

Tensors or multiarray data are generalizations of matrices. Tensor clustering has become a very important research topic due to the intrinsically rich structures in real-world multiarray datasets. Subspace clustering based on vectorizing multiarray data has been extensively researched. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model. In contrast to existing techniques, we propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the so-called multinomial manifold, for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.

preprint2015arXiv

Kernelized Low Rank Representation on Grassmann Manifolds

Low rank representation (LRR) has recently attracted great interest due to its pleasing efficacy in exploring low-dimensional subspace structures embedded in data. One of its successful applications is subspace clustering which means data are clustered according to the subspaces they belong to. In this paper, at a higher level, we intend to cluster subspaces into classes of subspaces. This is naturally described as a clustering problem on Grassmann manifold. The novelty of this paper is to generalize LRR on Euclidean space onto an LRR model on Grassmann manifold in a uniform kernelized framework. The new methods have many applications in computer vision tasks. Several clustering experiments are conducted on handwritten digit images, dynamic textures, human face clips and traffic scene sequences. The experimental results show that the proposed methods outperform a number of state-of-the-art subspace clustering methods.

preprint2015arXiv

Low Rank Representation on Grassmann Manifolds: An Extrinsic Perspective

Many computer vision algorithms employ subspace models to represent data. The Low-rank representation (LRR) has been successfully applied in subspace clustering for which data are clustered according to their subspace structures. The possibility of extending LRR on Grassmann manifold is explored in this paper. Rather than directly embedding Grassmann manifold into a symmetric matrix space, an extrinsic view is taken by building the self-representation of LRR over the tangent space of each Grassmannian point. A new algorithm for solving the proposed Grassmannian LRR model is designed and implemented. Several clustering experiments are conducted on handwritten digits dataset, dynamic texture video clips and YouTube celebrity face video data. The experimental results show our method outperforms a number of existing methods.

preprint2014arXiv

Connectivity-preserving Geometry Images

We propose connectivity-preserving geometry images (CGIMs), which map a three-dimensional mesh onto a rectangular regular array of an image, such that the reconstructed mesh produces no sampling errors, but merely round-off errors. We obtain a V-matrix with respect to the original mesh, whose elements are vertices of the mesh, which intrinsically preserves the vertex-set and the connectivity of the original mesh in the sense of allowing round-off errors. We generate a CGIM array by using the Cartesian coordinates of corresponding vertices of the V-matrix. To reconstruct a mesh, we obtain a vertex-set and an edge-set by collecting all the elements with different pixels, and all different pairwise adjacent elements from the CGIM array respectively. Compared with traditional geometry images, CGIMs achieve minimum reconstruction errors with an efficient parametrization-free algorithm via elementary permutation techniques. We apply CGIMs to lossy compression of meshes, and the experimental results show that CGIMs perform well in reconstruction precision and detail preservation.

Baocai Yin

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

A New Perspective on the Effects of Spectrum in Graph Neural Networks

Bi-directional Object-context Prioritization Learning for Saliency Ranking

DR-GAN: Distribution Regularization for Text-to-Image Generation

MHSA-Net: Multi-Head Self-Attention Network for Occluded Person Re-Identification

Monocular Camera-based Complex Obstacle Avoidance via Efficient Deep Reinforcement Learning

NodeTrans: A Graph Transfer Learning Approach for Traffic Prediction

Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation

CaEGCN: Cross-Attention Fusion based Enhanced Graph Convolutional Network for Clustering

Incomplete Descriptor Mining with Elastic Loss for Person Re-Identification

Block-Diagonal Sparse Representation by Learning a Linear Combination Dictionary for Recognition

Kernelized LRR on Grassmann Manifolds for Subspace Clustering

Matrix Variate RBM Model with Gaussian Distributions

Partial Least Squares Regression on Riemannian Manifolds and Its Application in Classifications

Tensor Sparse and Low-Rank based Submodule Clustering Method for Multi-way Data

Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization

Kernelized Low Rank Representation on Grassmann Manifolds

Low Rank Representation on Grassmann Manifolds: An Extrinsic Perspective

Connectivity-preserving Geometry Images