Researcher profile

Zhao Kang

Zhao Kang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions

In multimodal representation learning, synergistic interactions between modalities not only provide complementary information but also create unique outcomes through specific interaction patterns that no single modality could achieve alone. Existing methods may struggle to effectively capture the full spectrum of synergistic information, leading to suboptimal performance in tasks where such interactions are critical. This is particularly problematic because synergistic information constitutes the fundamental value proposition of multimodal representation. To address this challenge, we introduce InfMasking, a contrastive synergistic information extraction method designed to enhance synergistic information through an Infinite Masking strategy. InfMasking stochastically occludes most features from each modality during fusion, preserving only partial information to create representations with varied synergistic patterns. Unmasked fused representations are then aligned with masked ones through mutual information maximization to encode comprehensive synergistic information. This infinite masking strategy enables capturing richer interactions by exposing the model to diverse partial modality combinations during training. As computing mutual information estimates with infinite masking is computationally prohibitive, we derive an InfMasking loss to approximate this calculation. Through controlled experiments, we demonstrate that InfMasking effectively enhances synergistic information between modalities. In evaluations on large-scale real-world datasets, InfMasking achieves state-of-the-art performance across seven benchmarks. Code is released at https://github.com/brightest66/InfMasking.

preprint2026arXiv

Structure-Centric Graph Foundation Model via Geometric Bases

Graph foundation models (GFMs) seek transferable representations across graph domains but are limited by structural heterogeneity and incompatible node feature spaces. We propose Structure-Centric Graph Foundation Models (SCGFM), which treat graph topology as the primary source of transferable knowledge. Modeling graphs as metric measure spaces, SCGFM introduces learnable geometric bases that define a shared structural coordinate system. Graphs are aligned to these bases via Gromov-Wasserstein distances, yielding structure-aligned latent representations that accommodate heterogeneous graph topologies. To address feature incompatibility, SCGFM employs a structure-aware feature re-encoding mechanism that unifies node representations without assuming a fixed feature dimensionality or requiring dataset-specific preprocessing. Experiments on graph- and node-level tasks demonstrate strong in-domain and cross-domain generalization, outperforming existing GFM approaches.

preprint2022arXiv

Eliminating Gradient Conflict in Reference-based Line-Art Colorization

Reference-based line-art colorization is a challenging task in computer vision. The color, texture, and shading are rendered based on an abstract sketch, which heavily relies on the precise long-range dependency modeling between the sketch and reference. Popular techniques to bridge the cross-modal information and model the long-range dependency employ the attention mechanism. However, in the context of reference-based line-art colorization, several techniques would intensify the existing training difficulty of attention, for instance, self-supervised training protocol and GAN-based losses. To understand the instability in training, we detect the gradient flow of attention and observe gradient conflict among attention branches. This phenomenon motivates us to alleviate the gradient issue by preserving the dominant gradient branch while removing the conflict ones. We propose a novel attention mechanism using this training strategy, Stop-Gradient Attention (SGA), outperforming the attention baseline by a large margin with better training stability. Compared with state-of-the-art modules in line-art colorization, our approach demonstrates significant improvements in Fréchet Inception Distance (FID, up to 27.21%) and structural similarity index measure (SSIM, up to 25.67%) on several benchmarks. The code of SGA is available at https://github.com/kunkun0w0/SGA .

preprint2022arXiv

Log-based Sparse Nonnegative Matrix Factorization for Data Representation

Nonnegative matrix factorization (NMF) has been widely studied in recent years due to its effectiveness in representing nonnegative data with parts-based representations. For NMF, a sparser solution implies better parts-based representation.However, current NMF methods do not always generate sparse solutions.In this paper, we propose a new NMF method with log-norm imposed on the factor matrices to enhance the sparseness.Moreover, we propose a novel column-wisely sparse norm, named $\ell_{2,\log}$-(pseudo) norm to enhance the robustness of the proposed method.The $\ell_{2,\log}$-(pseudo) norm is invariant, continuous, and differentiable.For the $\ell_{2,\log}$ regularized shrinkage problem, we derive a closed-form solution, which can be used for other general problems.Efficient multiplicative updating rules are developed for the optimization, which theoretically guarantees the convergence of the objective value sequence.Extensive experimental results confirm the effectiveness of the proposed method, as well as the enhanced sparseness and robustness.

preprint2022arXiv

Scalable Multi-view Clustering with Graph Filtering

With the explosive growth of multi-source data, multi-view clustering has attracted great attention in recent years. Most existing multi-view methods operate in raw feature space and heavily depend on the quality of original feature representation. Moreover, they are often designed for feature data and ignore the rich topology structure information. Accordingly, in this paper, we propose a generic framework to cluster both attribute and graph data with heterogeneous features. It is capable of exploring the interplay between feature and structure. Specifically, we first adopt graph filtering technique to eliminate high-frequency noise to achieve a clustering-friendly smooth representation. To handle the scalability challenge, we develop a novel sampling strategy to improve the quality of anchors. Extensive experiments on attribute and graph benchmarks demonstrate the superiority of our approach with respect to state-of-the-art approaches.

preprint2021arXiv

Structured Graph Learning for Scalable Subspace Clustering: From Single-view to Multi-view

Graph-based subspace clustering methods have exhibited promising performance. However, they still suffer some of these drawbacks: encounter the expensive time overhead, fail in exploring the explicit clusters, and cannot generalize to unseen data points. In this work, we propose a scalable graph learning framework, seeking to address the above three challenges simultaneously. Specifically, it is based on the ideas of anchor points and bipartite graph. Rather than building a $n\times n$ graph, where $n$ is the number of samples, we construct a bipartite graph to depict the relationship between samples and anchor points. Meanwhile, a connectivity constraint is employed to ensure that the connected components indicate clusters directly. We further establish the connection between our method and the K-means clustering. Moreover, a model to process multi-view data is also proposed, which is linear scaled with respect to $n$. Extensive experiments demonstrate the efficiency and effectiveness of our approach with respect to many state-of-the-art clustering methods.

preprint2020arXiv

On Deep Unsupervised Active Learning

Unsupervised active learning has attracted increasing attention in recent years, where its goal is to select representative samples in an unsupervised setting for human annotating. Most existing works are based on shallow linear models by assuming that each sample can be well approximated by the span (i.e., the set of all linear combinations) of certain selected samples, and then take these selected samples as representative ones to label. However, in practice, the data do not necessarily conform to linear models, and how to model nonlinearity of data often becomes the key point to success. In this paper, we present a novel Deep neural network framework for Unsupervised Active Learning, called DUAL. DUAL can explicitly learn a nonlinear embedding to map each input into a latent space through an encoder-decoder architecture, and introduce a selection block to select representative samples in the the learnt latent space. In the selection block, DUAL considers to simultaneously preserve the whole input patterns as well as the cluster structure of data. Extensive experiments are performed on six publicly available datasets, and experimental results clearly demonstrate the efficacy of our method, compared with state-of-the-arts.

preprint2020arXiv

Relation-Guided Representation Learning

Deep auto-encoders (DAEs) have achieved great success in learning data representations via the powerful representability of neural networks. But most DAEs only focus on the most dominant structures which are able to reconstruct the data from a latent space and neglect rich latent structural information. In this work, we propose a new representation learning method that explicitly models and leverages sample relations, which in turn is used as supervision to guide the representation learning. Different from previous work, our framework well preserves the relations between samples. Since the prediction of pairwise relations themselves is a fundamental problem, our model adaptively learns them from data. This provides much flexibility to encode real data manifold. The important role of relation and representation learning is evaluated on the clustering task. Extensive experiments on benchmark data sets demonstrate the superiority of our approach. By seeking to embed samples into subspace, we further show that our method can address the large-scale and out-of-sample problem.

preprint2020arXiv

Structured Graph Learning for Clustering and Semi-supervised Classification

Graphs have become increasingly popular in modeling structures and interactions in a wide variety of problems during the last decade. Graph-based clustering and semi-supervised classification techniques have shown impressive performance. This paper proposes a graph learning framework to preserve both the local and global structure of data. Specifically, our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Furthermore, most existing graph-based methods conduct clustering and semi-supervised classification on the graph learned from the original data matrix, which doesn't have explicit cluster structure, thus they might not achieve the optimal performance. By considering rank constraint, the achieved graph will have exactly $c$ connected components if there are $c$ clusters or classes. As a byproduct of this, graph learning and label inference are jointly and iteratively implemented in a principled way. Theoretically, we show that our model is equivalent to a combination of kernel k-means and k-means methods under certain condition. Extensive experiments on clustering and semi-supervised classification demonstrate that the proposed method outperforms other state-of-the-art methods.

preprint2020arXiv

Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering

In this paper, we propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF. It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step. In particular, projection matrices are sought under the guidance of building new data representations, such that the spatial information is retained and projections are enhanced by the goal of clustering, which helps construct optimal projection directions. Moreover, to exploit nonlinear structures of the data, manifold is constructed in the projected subspace, which is adaptively updated according to the projections and less afflicted with noise and outliers of the data and thus more representative in the projected space. Hence, seeking projections, building new data representations, and learning manifold are seamlessly integrated in a single model, which mutually enhance other and lead to a powerful data representation. Comprehensive experimental results verify the effectiveness of TS-NMF in comparison with several state-of-the-art algorithms, which suggests high potential of the proposed method for real world applications.