Source author record

Taolue Chen

Taolue Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Logic in Computer Science Software Engineering Artificial Intelligence Cryptography and Security Formal Languages and Automata Theory Machine Learning Computational Complexity Computational Engineering, Finance, and Science Computer Science and Game Theory Computer Vision eess.SY Programming Languages Systems and Control

Catalog footprint

What is connected

14works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fair Conformal Classification via Learning Representation-Based Groups

Conformal prediction methods provide statistically rigorous marginal coverage guarantees for machine learning models, but such guarantees fail to account for algorithmic biases, thereby undermining fairness and trust. This paper introduces a fair conformal inference framework for classification tasks. The proposed method constructs prediction sets that guarantee conditional coverage on adaptively identified subgroups, which can be implicitly defined through nonlinear feature combinations. By balancing effectiveness and efficiency in producing compact, informative prediction sets and ensuring adaptive equalized coverage across unfairly treated subgroups, our approach paves a practical pathway toward trustworthy machine learning. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the framework.

preprint2026arXiv

Task Abstention for Large Language Models in Code Generation

Large language models (LLMs) have revolutionized automated code generation. One serious concern, however, is the so-called ``hallucination'', i.e., LLMs may generate seemingly plausible but functionally incorrect code. In this paper, we study the task abstention problem, i.e., determining whether a given LLM should abstain from performing a specific code generation task to avoid likely hallucination. Our approach features a calibrated abstention rule, grounded in the principles of multiple hypothesis testing. The rule assesses generation consistency through code execution outcomes, allowing it to handle syntactic diversity of semantically equivalent code without reliance on oracle test cases or external databases. We prove that our approach provides a rigorous, distribution-free theoretical guarantee on its abstention decisions. We evaluate our method on benchmark datasets using several open-source code LLMs. Results show that our method allows generative models to more accurately and efficiently identify and abstain from tasks that induce hallucination compared to existing techniques, providing a reliable mechanism for safer and more robust code generation.

preprint2026arXiv

Uncertainty Quantification for LLM-based Code Generation

Prediction sets provide a theoretically grounded framework for quantifying uncertainty in machine learning models. Adapting them to structured generation tasks, in particular, large language model (LLM) based code generation, remains a challenging problem. An existing attempt proposes PAC prediction sets but is limited by its strong monotonicity assumption on risk and single-label classification framework, which severely limits the space of candidate programs and cannot accommodate the multiple valid outputs inherent to code generation. To address these limitations, we propose an approach RisCoSet that leverages multiple hypothesis testing to construct risk-controlling predictions for LLM-based code generation. Given a trained code generation model, we produce a prediction set represented by a partial program, which is guaranteed to contain a correct solution with high confidence. Extensive experiments on three LLMs demonstrate the effectiveness of the proposed method. For instance, compared with the state-of-the-art, our method can significantly reduce the code removal by up to 24.5%, at the same level of risk.

preprint2023arXiv

Test Reuse Based on Adaptive Semantic Matching across Android Mobile Applications

Automatic test generation can help verify and develop the behavior of mobile applications. Test reuse based on semantic similarities between applications of the same category has been utilized to reduce the manual effort of Graphical User Interface (GUI) testing. However, most of the existing studies fail to solve the semantic problem of event matching, which leads to the failure of test reuse. To overcome this challenge, we propose TRASM (Test Reuse based on Adaptive Semantic Matching), a test reuse approach based on adaptive strategies to find a better event matching across android mobile applications. TRASM first performs GUI events deduplication on the initial test set obtained from test generation, and then employs an adaptive strategy to find better event matching, which enables reusing the existing test. Preliminary experiments with comparison to baseline methods on 15 applications demonstrate that TRASM can improve the precision of GUI event matching while reducing the failure of test reuse and the running time required for test reuse.

preprint2022arXiv

Preventing Timing Side-Channels via Security-Aware Just-In-Time Compilation

Recent work has shown that Just-In-Time (JIT) compilation can introduce timing side-channels to constant-time programs, which would otherwise be a principled and effective means to counter timing attacks. In this paper, we propose a novel approach to eliminate JIT-induced leaks from these programs. Specifically, we present an operational semantics and a formal definition of constant-time programs under JIT compilation, laying the foundation for reasoning about programs with JIT compilation. We then propose to eliminate JIT-induced leaks via a fine-grained JIT compilation for which we provide an automated approach to generate policies and a novel type system to show its soundness. We develop a tool DeJITLeak for Java based on our approach and implement the fine-grained JIT compilation in HotSpot. Experimental results show that DeJITLeak can effectively and efficiently eliminate JIT-induced leaks on three datasets used in side-channel detection

preprint2020arXiv

A Decision Procedure for Path Feasibility of String Manipulating Programs with Integer Data Type

Strings are widely used in programs, especially in web applications. Integer data type occurs naturally in string-manipulating programs, and is frequently used to refer to lengths of, or positions in, strings. Analysis and testing of string-manipulating programs can be formulated as the path feasibility problem: given a symbolic execution path, does there exist an assignment to the inputs that yields a concrete execution that realizes this path? Such a problem can naturally be reformulated as a string constraint solving problem. Although state-of-the-art string constraint solvers usually provide support for both string and integer data types, they mainly resort to heuristics without completeness guarantees. In this paper, we propose a decision procedure for a class of string-manipulating programs which includes not only a wide range of string operations such as concatenation, replaceAll, reverse, and finite transducers, but also those involving the integer data-type such as length, indexof, and substring. To the best of our knowledge, this represents one of the most expressive string constraint languages that is currently known to be decidable. Our decision procedure is based on a variant of cost register automata. We implement the decision procedure, giving rise to a new solver OSTRICH+. We evaluate the performance of OSTRICH+ on a wide range of existing and new benchmarks. The experimental results show that OSTRICH+ is the first string decision procedure capable of tackling finite transducers and integer constraints, whilst its overall performance is comparable with the state-of-the-art string constraint solvers.

preprint2020arXiv

A Hybrid Approach to Formal Verification of Higher-Order Masked Arithmetic Programs

Side-channel attacks, which are capable of breaking secrecy via side-channel information, pose a growing threat to the implementation of cryptographic algorithms. Masking is an effective countermeasure against side-channel attacks by removing the statistical dependence between secrecy and power consumption via randomization. However, designing efficient and effective masked implementations turns out to be an error-prone task. Current techniques for verifying whether masked programs are secure are limited in their applicability and accuracy, especially when they are applied. To bridge this gap, in this article, we first propose a sound type system, equipped with an efficient type inference algorithm, for verifying masked arithmetic programs against higher-order attacks. We then give novel model-counting based and pattern-matching based methods which are able to precisely determine whether the potential leaky observable sets detected by the type system are genuine or simply spurious. We evaluate our approach on various implementations of arithmetic cryptographicprograms.The experiments confirm that our approach out performs the state-of-the-art base lines in terms of applicability, accuracy and efficiency.

preprint2020arXiv

Finger Texture Biometric Characteristic: a Survey

\begin{abstract} In recent years, the Finger Texture (FT) has attracted considerable attention as a biometric characteristic. It can provide efficient human recognition performance, because it has different human-specific features of apparent lines, wrinkles and ridges distributed along the inner surface of all fingers. Also, such pattern structures are reliable, unique and remain stable throughout a human's life. Efficient biometric systems can be established based only on FTs. In this paper, a comprehensive survey of the relevant FT studies is presented. We also summarise the main drawbacks and obstacles of employing the FT as a biometric characteristic, and provide useful suggestions to further improve the work on FT. \end{abstract}

preprint2020arXiv

Learning Safe Neural Network Controllers with Barrier Certificates

We provide a novel approach to synthesize controllers for nonlinear continuous dynamical systems with control against safety properties. The controllers are based on neural networks (NNs). To certify the safety property we utilize barrier functions, which are represented by NNs as well. We train the controller-NN and barrier-NN simultaneously, achieving a verification-in-the-loop synthesis. We provide a prototype tool nncontroller with a number of case studies. The experiment results confirm the feasibility and efficacy of our approach.

preprint2014arXiv

On the Total Variation Distance of Labelled Markov Chains

Labelled Markov chains (LMCs) are widely used in probabilistic verification, speech recognition, computational biology, and many other fields. Checking two LMCs for equivalence is a classical problem subject to extensive studies, while the total variation distance provides a natural measure for the "inequivalence" of two LMCs: it is the maximum difference between probabilities that the LMCs assign to the same event. In this paper we develop a theory of the total variation distance between two LMCs, with emphasis on the algorithmic aspects: (1) we provide a polynomial-time algorithm for determining whether two LMCs have distance 1, i.e., whether they can almost always be distinguished; (2) we provide an algorithm for approximating the distance with arbitrary precision; and (3) we show that the threshold problem, i.e., whether the distance exceeds a given threshold, is NP-hard and hard for the square-root-sum problem. We also make a connection between the total variation distance and Bernoulli convolutions.

preprint2013arXiv

Orbit Problem Revisited

In this letter, we revisit the {\em orbit problem}, which was studied in \cite{HAR69,SHA79,KL86}. In \cite{KL86}, Kannan and Lipton proved that this problem is decidable in polynomial time. In this paper, we study the {\em approximate orbit problem}, and show that this problem is decidable except for one case.

preprint2013arXiv

Solvency Markov Decision Processes with Interest

Solvency games, introduced by Berger et al., provide an abstract framework for modelling decisions of a risk-averse investor, whose goal is to avoid ever going broke. We study a new variant of this model, where, in addition to stochastic environment and fixed increments and decrements to the investor's wealth, we introduce interest, which is earned or paid on the current level of savings or debt, respectively. We study problems related to the minimum initial wealth sufficient to avoid bankruptcy (i.e. steady decrease of the wealth) with probability at least p. We present an exponential time algorithm which approximates this minimum initial wealth, and show that a polynomial time approximation is not possible unless P = NP. For the qualitative case, i.e. p=1, we show that the problem whether a given number is larger than or equal to the minimum initial wealth belongs to both NP and coNP, and show that a polynomial time algorithm would yield a polynomial time algorithm for mean-payoff games, existence of which is a longstanding open problem. We also identify some classes of solvency MDPs for which this problem is in P. In all above cases the algorithms also give corresponding bankruptcy avoiding strategies.

preprint2012arXiv

Model Checking Stochastic Branching Processes

Stochastic branching processes are a classical model for describing random trees, which have applications in numerous fields including biology, physics, and natural language processing. In particular, they have recently been proposed to describe parallel programs with stochastic process creation. In this paper, we consider the problem of model checking stochastic branching process. Given a branching process and a deterministic parity tree automaton, we are interested in computing the probability that the generated random tree is accepted by the automaton. We show that this probability can be compared with any rational number in PSPACE, and with 0 and 1 in polynomial time. In a second part, we suggest a tree extension of the logic PCTL, and develop a PSPACE algorithm for model checking a branching process against a formula of this logic. We also show that the qualitative fragment of this logic can be model checked in polynomial time.

preprint2011arXiv

Model Checking of Continuous-Time Markov Chains Against Timed Automata Specifications

We study the verification of a finite continuous-time Markov chain (CTMC) C against a linear real-time specification given as a deterministic timed automaton (DTA) A with finite or Muller acceptance conditions. The central question that we address is: what is the probability of the set of paths of C that are accepted by A, i.e., the likelihood that C satisfies A? It is shown that under finite acceptance criteria this equals the reachability probability in a finite piecewise deterministic Markov process (PDP), whereas for Muller acceptance criteria it coincides with the reachability probability of terminal strongly connected components in such a PDP. Qualitative verification is shown to amount to a graph analysis of the PDP. Reachability probabilities in our PDPs are then characterized as the least solution of a system of Volterra integral equations of the second type and are shown to be approximated by the solution of a system of partial differential equations. For single-clock DTA, this integral equation system can be transformed into a system of linear equations where the coefficients are solutions of ordinary differential equations. As the coefficients are in fact transient probabilities in CTMCs, this result implies that standard algorithms for CTMC analysis suffice to verify single-clock DTA specifications.

Taolue Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Fair Conformal Classification via Learning Representation-Based Groups

Task Abstention for Large Language Models in Code Generation

Uncertainty Quantification for LLM-based Code Generation

Test Reuse Based on Adaptive Semantic Matching across Android Mobile Applications

Preventing Timing Side-Channels via Security-Aware Just-In-Time Compilation

A Decision Procedure for Path Feasibility of String Manipulating Programs with Integer Data Type

A Hybrid Approach to Formal Verification of Higher-Order Masked Arithmetic Programs

Finger Texture Biometric Characteristic: a Survey

Learning Safe Neural Network Controllers with Barrier Certificates

On the Total Variation Distance of Labelled Markov Chains

Orbit Problem Revisited

Solvency Markov Decision Processes with Interest

Model Checking Stochastic Branching Processes

Model Checking of Continuous-Time Markov Chains Against Timed Automata Specifications