Source author record

Li Kuang

Li Kuang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks Computation and Language Computational Complexity nlin.AO Software Engineering

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

CSRS: Code Search with Relevance Matching and Semantic Matching

Developers often search and reuse existing code snippets in the process of software development. Code search aims to retrieve relevant code snippets from a codebase according to natural language queries entered by the developer. Up to now, researchers have already proposed information retrieval (IR) based methods and deep learning (DL) based methods. The IR-based methods focus on keyword matching, that is to rank codes by relevance between queries and code snippets, while DL-based methods focus on capturing the semantic correlations. However, the existing methods do not consider capturing two matching signals simultaneously. Therefore, in this paper, we propose CSRS, a code search model with relevance matching and semantic matching. CSRS comprises (1) an embedding module containing convolution kernels of different sizes which can extract n-gram embeddings of queries and codes, (2) a relevance matching module that measures lexical matching signals, and (3) a co-attention based semantic matching module to capture the semantic correlation. We train and evaluate CSRS on a dataset with 18.22M and 10k code snippets. The experimental results demonstrate that CSRS achieves an MRR of 0.614, which outperforms two state-of-the-art models DeepCS and CARLCS-CNN by 33.77% and 18.53% respectively. In addition, we also conducted several experiments to prove the effectiveness of each component of CSRS.

preprint2022arXiv

Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors

Chinese BERT models achieve remarkable progress in dealing with grammatical errors of word substitution. However, they fail to handle word insertion and deletion because BERT assumes the existence of a word at each position. To address this, we present a simple and effective Chinese pretrained model. The basic idea is to enable the model to determine whether a word exists at a particular position. We achieve this by introducing a special token \texttt{[null]}, the prediction of which stands for the non-existence of a word. In the training stage, we design pretraining tasks such that the model learns to predict \texttt{[null]} and real words jointly given the surrounding context. In the inference stage, the model readily detects whether a word should be inserted or deleted with the standard masked language modeling function. We further create an evaluation dataset to foster research on word insertion and deletion. It includes human-annotated corrections for 7,726 erroneous sentences. Results show that existing Chinese BERT performs poorly on detecting insertion and deletion errors. Our approach significantly improves the F1 scores from 24.1\% to 78.1\% for word insertion and from 26.5\% to 68.5\% for word deletion, respectively.

preprint2014arXiv

A simple model clarifies the complicated relationships of complex networks

Real-world networks such as the Internet and WWW have many common traits. Until now, hundreds of models were proposed to characterize these traits for understanding the networks. Because different models used very different mechanisms, it is widely believed that these traits origin from different causes. However, we find that a simple model based on optimisation can produce many traits, including scale-free, small-world, ultra small-world, Delta-distribution, compact, fractal, regular and random networks. Moreover, by revising the proposed model, the community-structure networks are generated. By this model and the revised versions, the complicated relationships of complex networks are illustrated. The model brings a new universal perspective to the understanding of complex networks and provide a universal method to model complex networks from the viewpoint of optimisation.

preprint2013arXiv

A Fractal and Scale-free Model of Complex Networks with Hub Attraction Behaviors

It is widely believed that fractality of complex networks origins from hub repulsion behaviors (anticorrelation or disassortativity), which means large degree nodes tend to connect with small degree nodes. This hypothesis was demonstrated by a dynamical growth model, which evolves as the inverse renormalization procedure proposed by Song et al. Now we find that the dynamical growth model is based on the assumption that all the cross-boxes links has the same probability e to link to the most connected nodes inside each box. Therefore, we modify the growth model by adopting the flexible probability e, which makes hubs have higher probability to connect with hubs than non-hubs. With this model, we find some fractal and scale-free networks have hub attraction behaviors (correlation or assortativity). The results are the counter-examples of former beliefs.

preprint2013arXiv

The Ergodicity of the Collatz Process in Positive Integer Field

The $3x+1$ problem, also called the Collatz conjecture, is a very interesting unsolved mathematical problem related to computer science. This paper generalized this problem by relaxing the constraints, i.e., generalizing this deterministic process to non-deterministic process, and set up three models. This paper analyzed the ergodicity of these models and proved that the ergodicity of the Collatz process in positive integer field holds, i.e., all the positive integers can be transformed to 1 by the iterations of the Collatz function.

Li Kuang

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

CSRS: Code Search with Relevance Matching and Semantic Matching

Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors

A simple model clarifies the complicated relationships of complex networks

A Fractal and Scale-free Model of Complex Networks with Hub Attraction Behaviors

The Ergodicity of the Collatz Process in Positive Integer Field