Researcher profile

Zhendong Su

Zhendong Su contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

MLIR-Smith: A Novel Random Program Generator for Evaluating Compiler Pipelines

Compilers are essential for the performance and correct execution of software and hold universal relevance across various scientific disciplines. Despite this, there is a notable lack of tools for testing and evaluating them, especially within the adaptable Multi-Level Intermediate Representation (MLIR) context. This paper addresses the need for a tool that can accommodate MLIR's extensibility, a feature not provided by previous methods such as Csmith. Here we introduce MLIR-Smith, a novel random program generator specifically designed to test and evaluate MLIR-based compiler optimizations. We demonstrate the utility of MLIR-Smith by conducting differential testing on MLIR, LLVM, DaCe, and DCIR, which led to the discovery of multiple bugs in these compiler pipelines. The introduction of MLIR-Smith not only fills a void in the realm of compiler testing but also emphasizes the importance of comprehensive testing within these systems. By providing a tool that can generate random MLIR programs, this paper enhances our ability to evaluate and improve compilers and paves the way for future tools, potentially shaping the wider landscape of software testing and quality assurance.

preprint2021arXiv

Testing Machine Translation via Referential Transparency

Machine translation software has seen rapid progress in recent years due to the advancement of deep neural networks. People routinely use machine translation software in their daily lives, such as ordering food in a foreign restaurant, receiving medical diagnosis and treatment from foreign doctors, and reading international political news online. However, due to the complexity and intractability of the underlying neural networks, modern machine translation software is still far from robust and can produce poor or incorrect translations; this can lead to misunderstanding, financial loss, threats to personal safety and health, and political conflicts. To address this problem, we introduce referentially transparent inputs (RTIs), a simple, widely applicable methodology for validating machine translation software. A referentially transparent input is a piece of text that should have similar translations when used in different contexts. Our practical implementation, Purity, detects when this property is broken by a translation. To evaluate RTI, we use Purity to test Google Translate and Bing Microsoft Translator with 200 unlabeled sentences, which detected 123 and 142 erroneous translations with high precision (79.3% and 78.3%). The translation errors are diverse, including examples of under-translation, over-translation, word/phrase mistranslation, incorrect modification, and unclear logic.

preprint2020arXiv

Combining Symbolic Execution and Model Checking to Verify MPI Programs

Message passing is the standard paradigm of programming in high-performance computing. However, verifying Message Passing Interface (MPI) programs is challenging, due to the complex program features (such as non-determinism and non-blocking operations). In this work, we present MPI symbolic verifier (MPI-SV), the first symbolic execution based tool for automatically verifying MPI programs with non-blocking operations. MPI-SV combines symbolic execution and model checking in a synergistic way to tackle the challenges in MPI program verification. The synergy improves the scalability and enlarges the scope of verifiable properties. We have implemented MPI-SV (footnote: https://mpi-sv.github.io) and evaluated it with 111 real-world MPI verification tasks. The pure symbolic execution-based technique successfully verifies 61 out of the 111 tasks (55\%) within one hour, while in comparison, MPI-SV verifies 100 tasks (90\%). On average, compared with pure symbolic execution, MPI-SV achieves 19x speedups on verifying the satisfaction of the critical property and 5x speedups on finding violations.

preprint2020arXiv

Detecting Optimization Bugs in Database Engines via Non-Optimizing Reference Engine Construction

Database Management Systems (DBMS) are used ubiquitously. To efficiently access data, they apply sophisticated optimizations. Incorrect optimizations can result in logic bugs, which cause a query to compute an incorrect result set. We propose Non-Optimizing Reference Engine Construction (NoREC), a fully-automatic approach to detect optimization bugs in DBMS. Conceptually, this approach aims to evaluate a query by an optimizing and a non-optimizing version of a DBMS, to then detect differences in their returned result set, which would indicate a bug in the DBMS. Obtaining a non-optimizing version of a DBMS is challenging, because DBMS typically provide limited control over optimizations. Our core insight is that a given, potentially randomly-generated optimized query can be rewritten to one that the DBMS cannot optimize. Evaluating this unoptimized query effectively corresponds to a non-optimizing reference engine executing the original query. We evaluated NoREC in an extensive testing campaign on four widely-used DBMS, namely PostgreSQL, MariaDB, SQLite, and CockroachDB. We found 159 previously unknown bugs in the latest versions of these systems, 141 of which have been fixed by the developers. Of these, 51 were optimization bugs, while the remaining were error and crash bugs. Our results suggest that NoREC is effective, general and requires little implementation effort, which makes the technique widely applicable in practice.

preprint2020arXiv

Structure-Invariant Testing for Machine Translation

In recent years, machine translation software has increasingly been integrated into our daily lives. People routinely use machine translation for various applications, such as describing symptoms to a foreign doctor and reading political news in a foreign language. However, the complexity and intractability of neural machine translation (NMT) models that power modern machine translation make the robustness of these systems difficult to even assess, much less guarantee. Machine translation systems can return inferior results that lead to misunderstanding, medical misdiagnoses, threats to personal safety, or political conflicts. Despite its apparent importance, validating the robustness of machine translation systems is very difficult and has, therefore, been much under-explored. To tackle this challenge, we introduce structure-invariant testing (SIT), a novel metamorphic testing approach for validating machine translation software. Our key insight is that the translation results of "similar" source sentences should typically exhibit similar sentence structures. Specifically, SIT (1) generates similar source sentences by substituting one word in a given sentence with semantically similar, syntactically equivalent words; (2) represents sentence structure by syntax parse trees (obtained via constituency or dependency parsing); (3) reports sentence pairs whose structures differ quantitatively by more than some threshold. To evaluate SIT, we use it to test Google Translate and Bing Microsoft Translator with 200 source sentences as input, which led to 64 and 70 buggy issues with 69.5\% and 70\% top-1 accuracy, respectively. The translation errors are diverse, including under-translation, over-translation, incorrect modification, word/phrase mistranslation, and unclear logic.

preprint2020arXiv

Testing Database Engines via Pivoted Query Synthesis

Relational databases are used ubiquitously. They are managed by database management systems (DBMS), which allow inserting, modifying, and querying data using a domain-specific language called Structured Query Language (SQL). Popular DBMS have been extensively tested by fuzzers, which have been successful in finding crash bugs. However, approaches to finding logic bugs, such as when a DBMS computes an incorrect result set, have remained mostly untackled. Differential testing is an effective technique to test systems that support a common language by comparing the outputs of these systems. However, this technique is ineffective for DBMS, because each DBMS typically supports its own SQL dialect. To this end, we devised a novel and general approach that we have termed Pivoted Query Synthesis. The core idea of this approach is to automatically generate queries for which we ensure that they fetch a specific, randomly selected row, called the pivot row. If the DBMS fails to fetch the pivot row, the likely cause is a bug in the DBMS. We tested our approach on three widely-used and mature DBMS, namely SQLite, MySQL, and PostgreSQL. In total, we reported 123 bugs in these DBMS, 99 of which have been fixed or verified, demonstrating that the approach is highly effective and general. We expect that the wide applicability and simplicity of our approach will enable the improvement of robustness of many DBMS.