Researcher profile

Chengfei Liu

Chengfei Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2023arXiv

Multi-stage feature decorrelation constraints for improving CNN classification performance

For the convolutional neural network (CNN) used for pattern classification, the training loss function is usually applied to the final output of the network, except for some regularization constraints on the network parameters. However, with the increasing of the number of network layers, the influence of the loss function on the network front layers gradually decreases, and the network parameters tend to fall into local optimization. At the same time, it is found that the trained network has significant information redundancy at all stages of features, which reduces the effectiveness of feature mapping at all stages and is not conducive to the change of the subsequent parameters of the network in the direction of optimality. Therefore, it is possible to obtain a more optimized solution of the network and further improve the classification accuracy of the network by designing a loss function for restraining the front stage features and eliminating the information redundancy of the front stage features .For CNN, this article proposes a multi-stage feature decorrelation loss (MFD Loss), which refines effective features and eliminates information redundancy by constraining the correlation of features at all stages. Considering that there are many layers in CNN, through experimental comparison and analysis, MFD Loss acts on multiple front layers of CNN, constrains the output features of each layer and each channel, and performs supervision training jointly with classification loss function during network training. Compared with the single Softmax Loss supervised learning, the experiments on several commonly used datasets on several typical CNNs prove that the classification performance of Softmax Loss+MFD Loss is significantly better. Meanwhile, the comparison experiments before and after the combination of MFD Loss and some other typical loss functions verify its good universality.

preprint2022arXiv

CHIEF: Clustering with Higher-order Motifs in Big Networks

Clustering a group of vertices in networks facilitates applications across different domains, such as social computing and Internet of Things. However, challenges arises for clustering networks with increased scale. This paper proposes a solution which consists of two motif clustering techniques: standard acceleration CHIEF-ST and approximate acceleration CHIEF-AP. Both algorithms first find the maximal k-edge-connected subgraphs within the target networks to lower the network scale, then employ higher-order motifs in clustering. In the first procedure, we propose to lower the network scale by optimizing the network structure with maximal k-edge-connected subgraphs. For CHIEF-ST, we illustrate that all target motifs will be kept after this procedure when the minimum node degree of the target motif is equal or greater than k. For CHIEF-AP, we prove that the eigenvalues of the adjacency matrix and the Laplacian matrix are relatively stable after this step. That is, CHIEF-ST has no influence on motif clustering, whereas CHIEF-AP introduces limited yet acceptable impact. In the second procedure, we employ higher-order motifs, i.e., heterogeneous four-node motifs clustering in higher-order dense networks. The contributions of CHIEF are two-fold: (1) improved efficiency of motif clustering for big networks; (2) verification of higher-order motif significance. The proposed solutions are found to outperform baseline approaches according to experiments on real and synthetic networks, which demonstrates CHIEF's strength in large network analysis. Meanwhile, higher-order motifs are proved to perform better than traditional triangle motifs in clustering.

preprint2020arXiv

Efficient Exact Algorithms for Maximum Balanced Biclique Search in Bipartite Graphs

Given a bipartite graph, the maximum balanced biclique (\textsf{MBB}) problem, discovering a mutually connected while equal-sized disjoint sets with the maximum cardinality, plays a significant role for mining the bipartite graph and has numerous applications. Despite the NP-hardness of the \textsf{MBB} problem, in this paper, we show that an exact \textsf{MBB} can be discovered extremely fast in bipartite graphs for real applications. We propose two exact algorithms dedicated for dense and sparse bipartite graphs respectively. For dense bipartite graphs, an $\mathcal{O}^{*}( 1.3803^{n})$ algorithm is proposed. This algorithm in fact can find an \textsf{MBB} in near polynomial time for dense bipartite graphs that are common for applications such as VLSI design. This is because, using our proposed novel techniques, the search can fast converge to sufficiently dense bipartite graphs which we prove to be polynomially solvable. For large sparse bipartite graphs typical for applications such as biological data analysis, an $\mathcal{O}^{*}( 1.3803^{\ddotδ})$ algorithm is proposed, where $\ddotδ$ is only a few hundreds for large sparse bipartite graphs with millions of vertices. The indispensible optimizations that lead to this time complexity are: we transform a large sparse bipartite graph into a limited number of dense subgraphs with size up to $\ddotδ$ and then apply our proposed algorithm for dense bipartite graphs on each of the subgraphs. To further speed up this algorithm, tighter upper bounds, faster heuristics and effective reductions are proposed, allowing an \textsf{MBB} to be discovered within a few seconds for bipartite graphs with millions of vertices. Extensive experiments are conducted on synthetic and real large bipartite graphs to demonstrate the efficiency and effectiveness of our proposed algorithms and techniques.

preprint2020arXiv

Index-based Solutions for Efficient Density Peak Clustering

Density Peak Clustering (DPC), a popular density-based clustering approach, has received considerable attention from the research community primarily due to its simplicity and fewer-parameter requirement. However, the resultant clusters obtained using DPC are influenced by the sensitive parameter $d_c$, which depends on data distribution and requirements of different users. Besides, the original DPC algorithm requires visiting a large number of objects, making it slow. To this end, this paper investigates index-based solutions for DPC. Specifically, we propose two list-based index methods viz. (i) a simple List Index, and (ii) an advanced Cumulative Histogram Index. Efficient query algorithms are proposed for these indices which significantly avoids irrelevant comparisons at the cost of space. For memory-constrained systems, we further introduce an approximate solution to the above indices which allows substantial reduction in the space cost, provided that slight inaccuracies are admissible. Furthermore, owing to considerably lower memory requirements of existing tree-based index structures, we also present effective pruning techniques and efficient query algorithms to support DPC using the popular Quadtree Index and R-tree Index. Finally, we practically evaluate all the above indices and present the findings and results, obtained from a set of extensive experiments on six synthetic and real datasets. The experimental insights obtained can help to guide in selecting a befitting index.