Source author record

Zhiqi Shen

Zhiqi Shen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Software Engineering Computation and Language Computer Science and Game Theory Human-Computer Interaction Information Retrieval Machine Learning math.CO Multiagent Systems Neural and Evolutionary Computing

Catalog footprint

What is connected

10works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking

Large Language Models (LLMs) are increasingly deployed in real-world fact-checking systems, yet existing evaluations focus predominantly on claim verification and overlook the broader fact-checking workflow, including claim extraction and evidence retrieval. This narrow focus prevents current benchmarks from revealing systematic reasoning failures, factual blind spots, and robustness limitations of modern LLMs. To bridge this gap, we present FactArena, a fully automated arena-style evaluation framework that conducts comprehensive, stage-wise benchmarking of LLMs across the complete fact-checking pipeline. FactArena integrates three key components: (i) an LLM-driven fact-checking process that standardizes claim decomposition, evidence retrieval via tool-augmented interactions, and justification-based verdict prediction; (ii) an arena-styled judgment mechanism guided by consolidated reference guidelines to ensure unbiased and consistent pairwise comparisons across heterogeneous judge agents; and (iii) an arena-driven claim-evolution module that adaptively generates more challenging and semantically controlled claims to probe LLMs' factual robustness beyond fixed seed data. Across 16 state-of-the-art LLMs spanning seven model families, FactArena produces stable and interpretable rankings. Our analyses further reveal significant discrepancies between static claim-verification accuracy and end-to-end fact-checking competence, highlighting the necessity of holistic evaluation. The proposed framework offers a scalable and trustworthy paradigm for diagnosing LLMs' factual reasoning, guiding future model development, and advancing the reliable deployment of LLMs in safety-critical fact-checking applications.

preprint2022arXiv

Revisiting Item Promotion in GNN-based Collaborative Filtering: A Masked Targeted Topological Attack Perspective

Graph neural networks (GNN) based collaborative filtering (CF) have attracted increasing attention in e-commerce and social media platforms. However, there still lack efforts to evaluate the robustness of such CF systems in deployment. Fundamentally different from existing attacks, this work revisits the item promotion task and reformulates it from a targeted topological attack perspective for the first time. Specifically, we first develop a targeted attack formulation to maximally increase a target item's popularity. We then leverage gradient-based optimizations to find a solution. However, we observe the gradient estimates often appear noisy due to the discrete nature of a graph, which leads to a degradation of attack ability. To resolve noisy gradient effects, we then propose a masked attack objective that can remarkably enhance the topological attack ability. Furthermore, we design a computationally efficient approach to the proposed attack, thus making it feasible to evaluate large-large CF systems. Experiments on two real-world datasets show the effectiveness of our attack in analyzing the robustness of GNN-based CF more practically.

preprint2020arXiv

FOCUS: Dealing with Label Quality Disparity in Federated Learning

Ubiquitous systems with End-Edge-Cloud architecture are increasingly being used in healthcare applications. Federated Learning (FL) is highly useful for such applications, due to silo effect and privacy preserving. Existing FL approaches generally do not account for disparities in the quality of local data labels. However, the clients in ubiquitous systems tend to suffer from label noise due to varying skill-levels, biases or malicious tampering of the annotators. In this paper, we propose Federated Opportunistic Computing for Ubiquitous Systems (FOCUS) to address this challenge. It maintains a small set of benchmark samples on the FL server and quantifies the credibility of the client local data without directly observing them by computing the mutual cross-entropy between performance of the FL model on the local datasets and that of the client local FL model on the benchmark dataset. Then, a credit weighted orchestration is performed to adjust the weight assigned to clients in the FL model based on their credibility values. FOCUS has been experimentally evaluated on both synthetic data and real-world data. The results show that it effectively identifies clients with noisy labels and reduces their impact on the model performance, thereby significantly outperforming existing FL approaches.

preprint2014arXiv

An Empirical Analysis of Task Allocation in Scrum-based Agile Programming

Agile Software Development (ASD) methodology has become widely used in the industry. Understanding the challenges facing software engineering students is important to designing effective training methods to equip students with proper skills required for effectively using the ASD techniques. Existing empirical research mostly focused on eXtreme Programming (XP) based ASD methodologies. There is a lack of empirical studies about Scrum-based ASD programming which has become the most popular agile methodology among industry practitioners. In this paper, we present empirical findings regarding the aspects of task allocation decision-making, collaboration, and team morale related to the Scrum ASD process which have not yet been well studied by existing research. We draw our findings from a 12 week long course work project in 2014 involving 125 undergraduate software engineering students from a renowned university working in 21 Scrum teams. Instead of the traditional survey or interview based methods, which suffer from limitations in scale and level of details, we obtain fine grained data through logging students' activities in our online agile project management (APM) platform - HASE. During this study, the platform logged over 10,000 ASD activities. Deviating from existing preconceptions, our results suggest negative correlations between collaboration and team performance as well as team morale.

preprint2014arXiv

An Evolutionary Approach for Optimizing Hierarchical Multi-Agent System Organization

It has been widely recognized that the performance of a multi-agent system is highly affected by its organization. A large scale system may have billions of possible ways of organization, which makes it impractical to find an optimal choice of organization using exhaustive search methods. In this paper, we propose a genetic algorithm aided optimization scheme for designing hierarchical structures of multi-agent systems. We introduce a novel algorithm, called the hierarchical genetic algorithm, in which hierarchical crossover with a repair strategy and mutation of small perturbation are used. The phenotypic hierarchical structure space is translated to the genome-like array representation space, which makes the algorithm genetic-operator-literate. A case study with 10 scenarios of a hierarchical information retrieval model is provided. Our experiments have shown that competitive baseline structures which lead to the optimal organization in terms of utility can be found by the proposed algorithm during the evolutionary search. Compared with the traditional genetic operators, the newly introduced operators produced better organizations of higher utility more consistently in a variety of test cases. The proposed algorithm extends of the search processes of the state-of-the-art multi-agent organization design methodologies, and is more computationally efficient in a large search space.

preprint2014arXiv

Designing Socially Intelligent Virtual Companions

Virtual companions that interact with users in a socially complex environment require a wide range of social skills. Displaying curiosity is simultaneously a factor to improve a companion's believability and to unobtrusively affect the user's activities over time. Curiosity represents a drive to know new things. It is a major driving force for engaging learners in active learning. Existing research work pays little attention in curiosity. In this paper, we enrich the social skills of a virtual companion by infusing curiosity into its mental model. A curious companion residing in a Virtual Learning Environment (VLE) to stimulate user's curiosity is proposed. The curious companion model is developed based on multidisciplinary considerations. The effectiveness of the curious companion is demonstrated by a preliminary field study.

preprint2014arXiv

Identifying Talented Software Engineering Students through Data-driven Skill Assessment

For software development companies, one of the most important objectives is to identify and acquire talented software engineers in order to maintain a skilled team that can produce competitive products. Traditional approaches for finding talented young software engineers are mainly through programming contests of various forms which mostly test participants' programming skills. However, successful software engineering in practice requires a wider range of skills from team members including analysis, design, programming, testing, communication, collaboration, and self-management, etc. In this paper, we explore potential ways to identify talented software engineering students in a data-driven manner through an Agile Project Management (APM) platform. Through our proposed HASE online APM tool, we conducted a study involving 21 Scrum teams consisting of over 100 undergraduate software engineering students in multi-week coursework projects in 2014. During this study, students performed over 10,000 ASD activities logged by HASE. We demonstrate the possibility and potentials of this new research direction, and discuss its implications for software engineering education and industry recruitment.

preprint2010arXiv

A game theory approach for self-coexistence analysis among IEEE 802.22 networks

This paper has been withdrawn by the author due to some errors

preprint2010arXiv

Further Analysis on Resource Allocation in Wireless Communications Under Imperfect Channel State Information

This paper has been withdrawn by the author due to some errors.

preprint2010arXiv

Resource Allocation of MU-OFDM Based Cognitive Radio Systems Under Partial Channel State Information

This paper has been withdrawn by the author due to some errors.

Zhiqi Shen

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking

Revisiting Item Promotion in GNN-based Collaborative Filtering: A Masked Targeted Topological Attack Perspective

FOCUS: Dealing with Label Quality Disparity in Federated Learning

An Empirical Analysis of Task Allocation in Scrum-based Agile Programming

An Evolutionary Approach for Optimizing Hierarchical Multi-Agent System Organization

Designing Socially Intelligent Virtual Companions

Identifying Talented Software Engineering Students through Data-driven Skill Assessment

A game theory approach for self-coexistence analysis among IEEE 802.22 networks

Further Analysis on Resource Allocation in Wireless Communications Under Imperfect Channel State Information

Resource Allocation of MU-OFDM Based Cognitive Radio Systems Under Partial Channel State Information