Source author record

Zhipeng Xue

Zhipeng Xue appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Software Engineering Artificial Intelligence Computation and Language Machine Learning

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Clone-based code method usage pattern mining

When programmers retrieve a code method and want to reuse it, they need to understand the usage patterns of the retrieved method. However, it is difficult to obtain usage information of the retrieved method since this method may only have a brief comment and few available usage examples. In this paper, we propose an approach, called LUPIN (cLone-based Usage Pattern mIniNg), to mine the usage patterns of these methods, which do not widely appeared in the code repository. The key idea of LUPIN is that the cloned code of the target method may have a similar usage pattern, and we can collect more usage information of the target method from cloned code usage examples. From the amplified usage examples, we mine the usage pattern of the target method by frequent subsequence mining after program slicing and code normalization. Our evaluation shows that LUPIN can mine four categories of usage patterns with an average precision of 0.65.

preprint2022arXiv

SEED: Semantic Graph based Deep detection for type-4 clone

Type-4 clones refer to a pair of code snippets with similar semantics but written in different syntax, which challenges the existing code clone detection techniques. Previous studies, however, highly rely on syntactic structures and textual tokens, which cannot precisely represent the semantic information of code and might introduce non-negligible noise into the detection models. To overcome these limitations, we design a novel semantic graph-based deep detection approach, called SEED. For a pair of code snippets, SEED constructs a semantic graph of each code snippet based on intermediate representation to represent the code semantic more precisely compared to the representations based on lexical and syntactic analysis. To accommodate the characteristics of Type-4 clones, a semantic graph is constructed focusing on the operators and API calls instead of all tokens. Then, SEED generates the feature vectors by using the graph match network and performs clone detection based on the similarity among the vectors. Extensive experiments show that our approach significantly outperforms two baseline approaches over two public datasets and one customized dataset. Especially, SEED outperforms other baseline methods by an average of 25.2% in the form of F1-Score. Our experiments demonstrate that SEED can reach state-of-the-art and be useful for Type-4 clone detection in practice.

preprint2021arXiv

CogNet: Bridging Linguistic Knowledge, World Knowledge and Commonsense Knowledge

In this paper, we present CogNet, a knowledge base (KB) dedicated to integrating three types of knowledge: (1) linguistic knowledge from FrameNet, which schematically describes situations, objects and events. (2) world knowledge from YAGO, Freebase, DBpedia and Wikidata, which provides explicit knowledge about specific instances. (3) commonsense knowledge from ConceptNet, which describes implicit general facts. To model these different types of knowledge consistently, we introduce a three-level unified frame-styled representation architecture. To integrate free-form commonsense knowledge with other structured knowledge, we propose a strategy that combines automated labeling and crowdsourced annotation. At present, CogNet integrates 1,000+ semantic frames from linguistic KBs, 20,000,000+ frame instances from world KBs, as well as 90,000+ commonsense assertions from commonsense KBs. All these data can be easily queried and explored on our online platform, and free to download in RDF format for utilization under a CC-BY-SA 4.0 license. The demo and data are available at http://cognet.top/.

preprint2020arXiv

Denoising-based Turbo Message Passing for Compressed Video Background Subtraction

In this paper, we consider the compressed video background subtraction problem that separates the background and foreground of a video from its compressed measurements. The background of a video usually lies in a low dimensional space and the foreground is usually sparse. More importantly, each video frame is a natural image that has textural patterns. By exploiting these properties, we develop a message passing algorithm termed offline denoising-based turbo message passing (DTMP). We show that these structural properties can be efficiently handled by the existing denoising techniques under the turbo message passing framework. We further extend the DTMP algorithm to the online scenario where the video data is collected in an online manner. The extension is based on the similarity/continuity between adjacent video frames. We adopt the optical flow method to refine the estimation of the foreground. We also adopt the sliding window based background estimation to reduce complexity. By exploiting the Gaussianity of messages, we develop the state evolution to characterize the per-iteration performance of offline and online DTMP. Comparing to the existing algorithms, DTMP can work at much lower compression rates, and can subtract the background successfully with a lower mean squared error and better visual quality for both offline and online compressed video background subtraction.

preprint2018arXiv

TARM: A Turbo-type Algorithm for Affine Rank Minimization

The affine rank minimization (ARM) problem arises in many real-world applications. The goal is to recover a low-rank matrix from a small amount of noisy affine measurements. The original problem is NP-hard, and so directly solving the problem is computationally prohibitive. Approximate low-complexity solutions for ARM have recently attracted much research interest. In this paper, we design an iterative algorithm for ARM based on message passing principles. The proposed algorithm is termed turbo-type ARM (TARM), as inspired by the recently developed turbo compressed sensing algorithm for sparse signal recovery. We show that, when the linear operator for measurement is right-orthogonally invariant (ROIL), a scalar function called state evolution can be established to accurately predict the behaviour of the TARM algorithm. We also show that TARM converges much faster than the counterpart algorithms for low-rank matrix recovery. We further extend the TARM algorithm for matrix completion, where the measurement operator corresponds to a random selection matrix. We show that, although the state evolution is not accurate for matrix completion, the TARM algorithm with carefully tuned parameters still significantly outperforms its counterparts.

preprint2016arXiv

D-OAMP: A Denoising-based Signal Recovery Algorithm for Compressed Sensing

Approximate message passing (AMP) is an efficient iterative signal recovery algorithm for compressed sensing (CS). For sensing matrices with independent and identically distributed (i.i.d.) Gaussian entries, the behavior of AMP can be asymptotically described by a scaler recursion called state evolution. Orthogonal AMP (OAMP) is a variant of AMP that imposes a divergence-free constraint on the denoiser. In this paper, we extend OAMP to incorporate generic denoisers, hence the name D-OAMP. Our numerical results show that state evolution predicts the performance of D-OAMP well for generic denoisers when i.i.d. Gaussian or partial orthogonal sensing matrices are involved. We compare the performances of denosing-AMP (D-AMP) and D-OAMP for recovering natural images from CS measurements. Simulation results show that D-OAMP outperforms D-AMP in both convergence speed and recovery accuracy for partial orthogonal sensing matrices.