Source author record

Zhipeng Gao

Zhipeng Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Artificial Intelligence Computation and Language Computational Engineering, Finance, and Science Cryptography and Security Networking and Internet Architecture

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Industrial Data-Service-Knowledge Governance: Toward Integrated and Trusted Intelligence for Industry 5.0

The convergence of artificial intelligence, cyber-physical systems, and cross-enterprise data ecosystems has propelled industrial intelligence to unprecedented scales. Yet, the absence of a unified trust foundation across data, services, and knowledge layers undermines reliability, accountability, and regulatory compliance in real-world deployments. While existing surveys address isolated aspects, such as data governance, service orchestration, and knowledge representation, none provides a holistic, cross-layer perspective on trustworthiness tailored to industrial settings. To bridge this gap, we present \textsc{Trisk} (TRusted Industrial Data-Service-Knowledge governance), a novel conceptual and taxonomic framework for trustworthy industrial intelligence. Grounded in a five-dimensional trust model (quality, security, privacy, fairness, and explainability), \textsc{Trisk} unifies 120+ representative studies along three orthogonal axes: governance scope (data, service, and knowledge), architectural paradigm (centralized, federated, or edge-embedded), and enabling technology (knowledge graphs, zero-trust policies, causal inference, etc.). We systematically analyze how trust propagates across digital layers, identify critical gaps in semantic interoperability, runtime policy enforcement, and operational/information technologies alignment, and evaluate the maturity of current industrial implementations. Finally, we articulate a forward-looking research agenda for Industry 5.0, advocating for an integrated governance fabric that embeds verifiable trust semantics into every layer of the industrial intelligence stack. This survey serves as both a foundational reference for researchers and a practical roadmap for engineers to deploy trustworthy AI in complex and multi-stakeholder environments.

preprint2026arXiv

Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement

With the development of large language models (LLMs) in the field of programming, intelligent programming coaching systems have gained widespread attention. However, most research focuses on repairing the buggy code of programming learners without providing the underlying causes of the bugs. To address this gap, we introduce a novel task, namely LRP (Learner-Tailored Program Repair). We then propose a novel and effective framework, LSGEN (Learner-Tailored Solution Generator), to enhance program repair while offering the bug descriptions for the buggy code. In the first stage, we utilize a repair solution retrieval framework to construct a solution retrieval database and then employ an edit-driven code retrieval approach to retrieve valuable solutions, guiding LLMs in identifying and fixing the bugs in buggy code. In the second stage, we propose a solution-guided program repair method, which fixes the code and provides explanations under the guidance of retrieval solutions. Moreover, we propose an Iterative Retrieval Enhancement method that utilizes evaluation results of the generated code to iteratively optimize the retrieval direction and explore more suitable repair strategies, improving performance in practical programming coaching scenarios. The experimental results show that our approach outperforms a set of baselines by a large margin, validating the effectiveness of our framework for the newly proposed LPR task.

preprint2022arXiv

BcMON: Blockchain Middleware for Offline Networks

Blockchain is becoming a new generation of information infrastructures. However, the current blockchain solutions rely on a continuous connectivity network to query and modify the state of the blockchain. The emerging satellite technology seems to be a good catalyst to forward offline transactions to the blockchain. However, this approach suffers expensive costs, difficult interoperability, and limited computation problems. Therefore, we propose BcMON, the first blockchain middleware for offline networks. BcMON incorporates three innovative designs: 1) it reduces the costs of offline transactions accessing the blockchain through Short Message Service (SMS), 2) it validates the authenticity of offline cross-chain transactions by two-phase consensus, 3) it supports offline clients to perform complex queries and computations on the blockchains. The prototype of BcMON has been implemented to evaluate the performance of the proposed middleware, which can show its stability, efficiency, and scalability.

preprint2020arXiv

Checking Smart Contracts with Structural Code Embedding

Smart contracts have been increasingly used together with blockchains to automate financial and business transactions. However, many bugs and vulnerabilities have been identified in many contracts which raises serious concerns about smart contract security, not to mention that the blockchain systems on which the smart contracts are built can be buggy. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this paper, we propose an automated approach to learn characteristics of smart contracts in Solidity, which is useful for clone detection, bug detection and contract validation on smart contracts. Our new approach is based on word embeddings and vector space comparison. We parse smart contract code into word streams with code structural information, convert code elements (e.g., statements, functions) into numerical vectors that are supposed to encode the code syntax and semantics, and compare the similarities among the vectors encoding code and known bugs, to identify potential issues. We have implemented the approach in a prototype, named SmartEmbed. Results show that our tool can effectively identify many repetitive instances of Solidity code, where the clone ratio is around 90\%. Code clones such as type-III or even type-IV semantic clones can also be detected accurately. Our tool can identify more than 1000 clone related bugs based on our bug databases efficiently and accurately. Our tool can also help to efficiently validate any given smart contract against a known set of bugs, which can help to improve the users' confidence in the reliability of the contract. The anonymous replication packages can be accessed at: https://drive.google.com/file/d/1kauLT3y2IiHPkUlVx4FSTda-dVAyL4za/view?usp=sharing, and evaluated it with more than 22,000 smart contracts collected from the Ethereum blockchain.

preprint2020arXiv

Code2Que: A Tool for Improving Question Titles from Mined Code Snippets in Stack Overflow

Stack Overflow is one of the most popular technical Q&A sites used by software developers. Seeking help from Stack Overflow has become an essential part of software developers' daily work for solving programming-related questions. Although the Stack Overflow community has provided quality assurance guidelines to help users write better questions, we observed that a significant number of questions submitted to Stack Overflow are of low quality. In this paper, we introduce a new web-based tool, Code2Que, which can help developers in writing higher quality questions for a given code snippet. Code2Que consists of two main stages: offline learning and online recommendation. In the offline learning phase, we first collect a set of good quality <code snippet, question> pairs as training samples. We then train our model on these training samples via a deep sequence-to-sequence approach, enhanced with an attention mechanism, a copy mechanism and a coverage mechanism. In the online recommendation phase, for a given code snippet, we use the offline trained model to generate question titles to assist less experienced developers in writing questions more effectively. At the same time, we embed the given code snippet into a vector and retrieve the related questions with similar problematic code snippets.

preprint2019arXiv

SmartEmbed: A Tool for Clone and Bug Detection in Smart Contracts through Structural Code Embedding

Ethereum has become a widely used platform to enable secure, Blockchain-based financial and business transactions. However, a major concern in Ethereum is the security of its smart contracts. Many identified bugs and vulnerabilities in smart contracts not only present challenges to maintenance of blockchain, but also lead to serious financial loses. There is a significant need to better assist developers in checking smart contracts and ensuring their reliability.In this paper, we propose a web service tool, named SmartEmbed, which can help Solidity developers to find repetitive contract code and clone-related bugs in smart contracts. Our tool is based on code embeddings and similarity checking techniques. By comparing the similarities among the code embedding vectors for existing solidity code in the Ethereum blockchain and known bugs, we are able to efficiently identify code clones and clone-related bugs for any solidity code given by users, which can help to improve the users' confidence in the reliability of their code. In addition to the uses by individual developers, SmartEmbed can also be applied to studies of smart contracts in a large scale. When applied to more than 22K solidity contracts collected from the Ethereum blockchain, we found that the clone ratio of solidity code is close to 90\%, much higher than traditional software, and 194 clone-related bugs can be identified efficiently and accurately based on our small bug database with a precision of 96\%. SmartEmbed can be accessed at \url{http://www.smartembed.net}. A demo video of SmartEmbed is at \url{https://youtu.be/o9ylyOpYFq8}

preprint2016arXiv

A Time-constraint Satisfying and Cost-reducing node evaluation metric for Message Routing in Mobile Crowd Sensing Networks

In mobile crowd sensing networks data forwarding through opportunistic contacts between participants. Data is replicated to encountered participants. For optimizing data delivery ratio and reducing redundant data a lot of data forwarding schemes, which selectively replicate data to encountered participants through node's data forwarding metric are proposed. However most of them neglect a kind of redundant data whose Time-To-Live is expired. For reducing this kind of redundant data we proposed a new method to evaluate node's data forwarding metric, which is used to measure the node's probability of forwarding data to destination within data's constraint time. The method is divided into two parts. The first is evaluating nodes whether have possibility to contact destination within time constraint based on transient cluster. We propose a method to detect node's transient cluster, which is based on node's contact rate. Only node, which has possibility to contact destination, has chances to the second step. It effectively reduces the computational complexity. The second is calculating data forwarding probability of node to destination according to individual ICT (inter contact time) distribution. Evaluation results show that our proposed transient cluster detection method is more simple and quick. And from two aspects of data delivery ratio and network overhead our approach outperforms other existing data forwarding approach.