Researcher profile

Zhijie Liu

Zhijie Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

Evaluating LLM-generated interactive software requires execution in addition to static analysis. The key difficulty is that correctness is a graph-level reachable property over latent UI state-transition graphs, whereas a GUI evaluator observes only a single execution trajectory. A failed rollout therefore rules out only one realized path, leaving failure attribution ambiguous between evaluator-side execution error and genuine software defect. We present DiagEval, a trajectory-conditioned diagnostic evaluation protocol for post-failure GUI-agent evaluation of interactive software. Rather than blindly retrying from scratch, DiagEval reuses the failed trajectory to choose targeted diagnostic probes and aggregates their outcomes into an internal attribution signal. The latent-graph view motivates the diagnostic problem; DiagEval does not reconstruct the graph or estimate calibrated posterior probabilities. We evaluate DiagEval on WebDevJudge-Unit and RealDevBench across multiple GUI-agent evaluators and LLM backbones. On false-negative cases, DiagEval recovers 45.6-62.1% of failures that were initially misattributed to software defects, outperforming retry-based baselines with 34.4-160.6% relative gains. On the full evaluation sets, this recovery improves accuracy from 69.9% to 78.3% on WebDevJudge-Unit and from 65.0% to 81.6% on RealDevBench. These results suggest that reliable GUI-agent evaluation requires not only stronger execution, but also active failure diagnosis to disambiguate evaluator-side errors from genuine software defects. Our code is available at https://github.com/scutGit/DiagEval.

preprint2021arXiv

A GPU based single-pulse search pipeline (GSP) with database and its application to the commensal radio astronomy FAST survey (CRAFTS)

We developed a GPU based single-pulse search pipeline (GSP) with candidate-archiving database. Largely based upon the infrastructure of Open source pulsar search and analysis toolkit (PRESTO), GSP implements GPU acceleration of the de-dispersion and integrates a candidate-archiving database. We applied GSP to the data streams from the commensal radio astronomy FAST survey (CRAFTS), which resulted in a quasi-real-time processing. The integrated candidate database facilitates synergistic usage of multiple machine-learning tools and thus improves efficient identification of radio pulsars such as rotating radio transients (RRATs) and Fast Radio Bursts (FRBs). We first tested GSP on pilot CRAFTS observations with the FAST Ultra-Wide Band (UWB) receiver. GSP detected all pulsars known from the the Parkes multibeam pulsar survey in the respective sky area covered by the FAST-UWB. GSP also discovered 13 new pulsars. We measured the computational efficiency of GSP to be ~120 times faster than the original PRESTO and ~60 times faster than a MPI-parallelized version of PRESTO.

preprint2019arXiv

A PRESTO-based Parallel Pulsar Search Pipeline Used for FAST Drift Scan Data

We developed a pulsar search pipeline based on PRESTO (PulsaR Exploration and Search Toolkit). This pipeline simply runs dedispersion, FFT (Fast Fourier Transformation), and acceleration search in process-level parallel to shorten the processing time. With two parallel strategies, the pipeline can highly shorten the processing time in both the normal searches or acceleration searches. This pipeline was first tested with PMPS (Parkes Multibeam Pulsar Survery) data and discovered two new faint pulsars. Then, it was successfully used in processing the FAST (Five-hundred-meter Aperture Spherical radio Telescope) drift scan data with tens of new pulsar discoveries up to now. The pipeline is only CPU-based and can be easily and quickly deployed in computing nodes for testing purposes or data processes.

preprint2019arXiv

Federated Forest

Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related regulations such as the European Union's General Data Protection Regulation (GDPR) and China' Cyber Security Law. Such data islands situation and data privacy & security are two major challenges for applications of artificial intelligence. In this paper, we tackle these challenges and propose a privacy-preserving machine learning model, called Federated Forest, which is a lossless learning model of the traditional random forest method, i.e., achieving the same level of accuracy as the non-privacy-preserving approach. Based on it, we developed a secure cross-regional machine learning system that allows a learning process to be jointly trained over different regions' clients with the same user samples but different attribute sets, processing the data stored in each of them without exchanging their raw data. A novel prediction algorithm was also proposed which could largely reduce the communication overhead. Experiments on both real-world and UCI data sets demonstrate the performance of the Federated Forest is as accurate as the non-federated version. The efficiency and robustness of our proposed system had been verified. Overall, our model is practical, scalable and extensible for real-life tasks.