Source author record

Peter J. Stuckey

Peter J. Stuckey appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Programming Languages Logic in Computer Science Data Structures and Algorithms Machine Learning cs.CY Applications Biomolecules Formal Languages and Automata Theory Performance Quantitative Methods Symbolic Computation

Catalog footprint

What is connected

25works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Assessing the accuracy of the Australian Senate count: Key steps for a rigorous and transparent audit

This paper explains the main principles and some of the technical details for auditing the scanning and digitisation of the Australian Senate ballot papers. We give a short summary of the motivation for auditing paper ballots, explain the necessary supporting steps for a rigorous and transparent audit, and suggest some statistical methods that would be appropriate for the Australian Senate. 22 June 2022 Update: The update includes analysis of Senate preference data from the 2022 Australian election.

preprint2022arXiv

MurTree: Optimal Classification Trees via Dynamic Programming and Search

Decision tree learning is a widely used approach in machine learning, favoured in applications that require concise and interpretable models. Heuristic methods are traditionally used to quickly produce models with reasonably high accuracy. A commonly criticised point, however, is that the resulting trees may not necessarily be the best representation of the data in terms of accuracy and size. In recent years, this motivated the development of optimal classification tree algorithms that globally optimise the decision tree in contrast to heuristic methods that perform a sequence of locally optimal decisions. We follow this line of work and provide a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our algorithm supports constraints on the depth of the tree and number of nodes. The success of our approach is attributed to a series of specialised techniques that exploit properties unique to classification trees. Whereas algorithms for optimal classification trees have traditionally been plagued by high runtimes and limited scalability, we show in a detailed experimental study that our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances, providing several orders of magnitude improvements and notably contributing towards the practical realisation of optimal decision trees.

preprint2021arXiv

A Scalable Two Stage Approach to Computing Optimal Decision Sets

Machine learning (ML) is ubiquitous in modern life. Since it is being deployed in technologies that affect our privacy and safety, it is often crucial to understand the reasoning behind its decisions, warranting the need for explainable AI. Rule-based models, such as decision trees, decision lists, and decision sets, are conventionally deemed to be the most interpretable. Recent work uses propositional satisfiability (SAT) solving (and its optimization variants) to generate minimum-size decision sets. Motivated by limited practical scalability of these earlier methods, this paper proposes a novel approach to learn minimum-size decision sets by enumerating individual rules of the target decision set independently of each other, and then solving a set cover problem to select a subset of rules. The approach makes use of modern maximum satisfiability and integer linear programming technologies. Experiments on a wide range of publicly available datasets demonstrate the advantage of the new approach over the state of the art in SAT-based decision set learning.

preprint2021arXiv

Optimal Decision Trees for Nonlinear Metrics

Nonlinear metrics, such as the F1-score, Matthews correlation coefficient, and Fowlkes-Mallows index, are often used to evaluate the performance of machine learning models, in particular, when facing imbalanced datasets that contain more samples of one class than the other. Recent optimal decision tree algorithms have shown remarkable progress in producing trees that are optimal with respect to linear criteria, such as accuracy, but unfortunately nonlinear metrics remain a challenge. To address this gap, we propose a novel algorithm based on bi-objective optimisation, which treats misclassifications of each binary class as a separate objective. We show that, for a large class of metrics, the optimal tree lies on the Pareto frontier. Consequently, we obtain the optimal tree by using our method to generate the set of all nondominated trees. To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics. Our approach leads to a trade-off when compared to optimising linear metrics: the resulting trees may be more desirable according to the given nonlinear metric at the expense of higher runtimes. Nevertheless, the experiments illustrate that runtimes are reasonable for majority of the tested datasets.

preprint2020arXiv

Computing Optimal Decision Sets with SAT

As machine learning is increasingly used to help make decisions, there is a demand for these decisions to be explainable. Arguably, the most explainable machine learning models use decision rules. This paper focuses on decision sets, a type of model with unordered rules, which explains each prediction with a single rule. In order to be easy for humans to understand, these rules must be concise. Earlier work on generating optimal decision sets first minimizes the number of rules, and then minimizes the number of literals, but the resulting rules can often be very large. Here we consider a better measure, namely the total size of the decision set in terms of literals. So we are not driven to a small set of rules which require a large number of literals. We provide the first approach to determine minimum-size decision sets that achieve minimum empirical risk and then investigate sparse alternatives where we trade accuracy for size. By finding optimal solutions we show we can build decision set classifiers that are almost as accurate as the best heuristic methods, but far more concise, and hence more explainable.

preprint2020arXiv

You can do RLAs for IRV

The City and County of San Francisco, CA, has used Instant Runoff Voting (IRV) for some elections since 2004. This report describes the first ever process pilot of Risk Limiting Audits for IRV, for the San Francisco District Attorney's race in November, 2019. We found that the vote-by-mail outcome could be efficiently audited to well under the 0.05 risk limit given a sample of only 200 ballots. All the software we developed for the pilot is open source.

preprint2016arXiv

MiniZinc with Strings

Strings are extensively used in modern programming languages and constraints over strings of unknown length occur in a wide range of real-world applications such as software analysis and verification, testing, model checking, and web security. Nevertheless, practically no CP solver natively supports string constraints. We introduce string variables and a suitable set of string constraints as builtin features of the MiniZinc modelling language. Furthermore, we define an interpreter for converting a MiniZinc model with strings into a FlatZinc instance relying on only integer variables. This provides a user-friendly interface for modelling combinatorial problems with strings, and enables both string and non-string solvers to actually solve such problems.

preprint2015arXiv

Efficient Computation of Exact IRV Margins

The margin of victory is easy to compute for many election schemes but difficult for Instant Runoff Voting (IRV). This is important because arguments about the correctness of an election outcome usually rely on the size of the electoral margin. For example, risk-limiting audits require a knowledge of the margin of victory in order to determine how much auditing is necessary. This paper presents a practical branch-and-bound algorithm for exact IRV margin computation that substantially improves on the current best-known approach. Although exponential in the worst case, our algorithm runs efficiently in practice on all the real examples we could find. We can efficiently discover exact margins on election instances that cannot be solved by the current state-of-the-art.

preprint2015arXiv

Horn Clauses as an Intermediate Representation for Program Analysis and Transformation

Many recent analyses for conventional imperative programs begin by transforming programs into logic programs, capitalising on existing LP analyses and simple LP semantics. We propose using logic programs as an intermediate program representation throughout the compilation process. With restrictions ensuring determinism and single-modedness, a logic program can easily be transformed to machine language or other low-level language, while maintaining the simple semantics that makes it suitable as a language for program analysis and transformation. We present a simple LP language that enforces determinism and single-modedness, and show that it makes a convenient program representation for analysis and transformation.

preprint2015arXiv

Unsatisfiable Cores and Lower Bounding for Constraint Programming

Constraint Programming (CP) solvers typically tackle optimization problems by repeatedly finding solutions to a problem while placing tighter and tighter bounds on the solution cost. This approach is somewhat naive, especially for soft-constraint optimization problems in which the soft constraints are mostly satisfied. Unsatisfiable-core approaches to solving soft constraint problems in SAT (e.g. MAXSAT) force all soft constraints to be hard initially. When solving fails they return an unsatisfiable core, as a set of soft constraints that cannot hold simultaneously. These are reverted to soft and solving continues. Since lazy clause generation solvers can also return unsatisfiable cores we can adapt this approach to constraint programming. We adapt the original MAXSAT unsatisfiable core solving approach to be usable for constraint programming and define a number of extensions. Experimental results show that our methods are beneficial on a broad class of CP-optimization benchmarks involving soft constraints, cardinality or preferences.

preprint2014arXiv

A Complete Refinement Procedure for Regular Separability of Context-Free Languages

Often, when analyzing the behaviour of systems modelled as context-free languages, we wish to know if two languages overlap. To this end, we present an effective semi-decision procedure for regular separability of context-free languages, based on counter-example guided abstraction refinement. We propose two refinement methods, one inexpensive but incomplete, and the other complete but more expensive. We provide an experimental evaluation of this procedure, and demonstrate its practicality on a range of verification and language-theoretic instances.

preprint2014arXiv

A Partial-Order Approach to Array Content Analysis

We present a parametric abstract domain for array content analysis. The method maintains invariants for contiguous regions of the array, similar to the methods of Gopan, Reps and Sagiv, and of Halbwachs and Peron. However, it introduces a novel concept of an array content graph, avoiding the need for an up-front factorial partitioning step. The resulting analysis can be used with arbitrary numeric relational abstract domains; we evaluate the domain on a range of array manipulating program fragments.

preprint2013arXiv

Statistical Inference of a canonical dictionary of protein substructural fragments

Proteins are biomolecules of life. They fold into a great variety of three-dimensional (3D) shapes. Underlying these folding patterns are many recurrent structural fragments or building blocks (analogous to `LEGO bricks'). This paper reports an innovative statistical inference approach to discover a comprehensive dictionary of protein structural building blocks from a large corpus of experimentally determined protein structures. Our approach is built on the Bayesian and information-theoretic criterion of minimum message length. To the best of our knowledge, this work is the first systematic and rigorous treatment of a very important data mining problem that arises in the cross-disciplinary area of structural bioinformatics. The quality of the dictionary we find is demonstrated by its explanatory power -- any protein within the corpus of known 3D structures can be dissected into successive regions assigned to fragments from this dictionary. This induces a novel one-dimensional representation of three-dimensional protein folding patterns, suitable for application of the rich repertoire of character-string processing algorithms, for rapid identification of folding patterns of newly-determined structures. This paper presents the details of the methodology used to infer the dictionary of building blocks, and is supported by illustrative examples to demonstrate its effectiveness and utility.

preprint2013arXiv

Structure Based Extended Resolution for Constraint Programming

Nogood learning is a powerful approach to reducing search in Constraint Programming (CP) solvers. The current state of the art, called Lazy Clause Generation (LCG), uses resolution to derive nogoods expressing the reasons for each search failure. Such nogoods can prune other parts of the search tree, producing exponential speedups on a wide variety of problems. Nogood learning solvers can be seen as resolution proof systems. The stronger the proof system, the faster it can solve a CP problem. It has recently been shown that the proof system used in LCG is at least as strong as general resolution. However, stronger proof systems such as \emph{extended resolution} exist. Extended resolution allows for literals expressing arbitrary logical concepts over existing variables to be introduced and can allow exponentially smaller proofs than general resolution. The primary problem in using extended resolution is to figure out exactly which literals are useful to introduce. In this paper, we show that we can use the structural information contained in a CP model in order to introduce useful literals, and that this can translate into significant speedups on a range of problems.

preprint2013arXiv

Unsatisfiable Cores for Constraint Programming

Constraint Programming (CP) solvers typically tackle optimization problems by repeatedly finding solutions to a problem while placing tighter and tighter bounds on the solution cost. This approach is somewhat naive, especially for soft-constraint optimization problems in which the soft constraints are mostly satisfied. Unsatisfiable-core approaches to solving soft constraint problems in Boolean Satisfiability (e.g. MAXSAT) force all soft constraints to hold initially. When solving fails they return an unsatisfiable core, as a set of soft constraints that cannot hold simultaneously. Using this information the problem is relaxed to allow certain soft constraint(s) to be violated and solving continues. Since Lazy Clause Generation (LCG) solvers can also return unsatisfiable cores we can adapt the MAXSAT unsatisfiable core approach to CP. We implement the original MAXSAT unsatisfiable core solving algorithms WPM1, MSU3 in a state-of-the-art LCG solver and show that there exist problems which benefit from this hybrid approach.

preprint2012arXiv

Explaining Time-Table-Edge-Finding Propagation for the Cumulative Resource Constraint

Cumulative resource constraints can model scarce resources in scheduling problems or a dimension in packing and cutting problems. In order to efficiently solve such problems with a constraint programming solver, it is important to have strong and fast propagators for cumulative resource constraints. One such propagator is the recently developed time-table-edge-finding propagator, which considers the current resource profile during the edge-finding propagation. Recently, lazy clause generation solvers, i.e. constraint programming solvers incorporating nogood learning, have proved to be excellent at solving scheduling and cutting problems. For such solvers, concise and accurate explanations of the reasons for propagation are essential for strong nogood learning. In this paper, we develop the first explaining version of time-table-edge-finding propagation and show preliminary results on resource-constrained project scheduling problems from various standard benchmark suites. On the standard benchmark suite PSPLib, we were able to close one open instance and to improve the lower bound of about 60% of the remaining open instances. Moreover, 6 of those instances were closed.

preprint2012arXiv

Search Combinators

The ability to model search in a constraint solver can be an essential asset for solving combinatorial problems. However, existing infrastructure for defining search heuristics is often inadequate. Either modeling capabilities are extremely limited or users are faced with a general-purpose programming language whose features are not tailored towards writing search heuristics. As a result, major improvements in performance may remain unexplored. This article introduces search combinators, a lightweight and solver-independent method that bridges the gap between a conceptually simple modeling language for search (high-level, functional and naturally compositional) and an efficient implementation (low-level, imperative and highly non-modular). By allowing the user to define application-tailored search strategies from a small set of primitives, search combinators effectively provide a rich domain-specific language (DSL) for modeling search to the user. Remarkably, this DSL comes at a low implementation cost to the developer of a constraint solver. The article discusses two modular implementation approaches and shows, by empirical evaluation, that search combinators can be implemented without overhead compared to a native, direct implementation in a constraint solver.

preprint2011arXiv

Boolean Equi-propagation for Optimized SAT Encoding

We present an approach to propagation based solving, Boolean equi-propagation, where constraints are modelled as propagators of information about equalities between Boolean literals. Propagation based solving applies this information as a form of partial evaluation resulting in optimized SAT encodings. We demonstrate for a variety of benchmarks that our approach results in smaller CNF encodings and leads to speed-ups in solving times.

preprint2010arXiv

Solving the Resource Constrained Project Scheduling Problem with Generalized Precedences by Lazy Clause Generation

The technical report presents a generic exact solution approach for minimizing the project duration of the resource-constrained project scheduling problem with generalized precedences (Rcpsp/max). The approach uses lazy clause generation, i.e., a hybrid of finite domain and Boolean satisfiability solving, in order to apply nogood learning and conflict-driven search on the solution generation. Our experiments show the benefit of lazy clause generation for finding an optimal solutions and proving its optimality in comparison to other state-of-the-art exact and non-exact methods. The method is highly robust: it matched or bettered the best known results on all of the 2340 instances we examined except 3, according to the currently available data on the PSPLib. Of the 631 open instances in this set it closed 573 and improved the bounds of 51 of the remaining 58 instances.

preprint2007arXiv

Logic Programming with Satisfiability

This paper presents a Prolog interface to the MiniSat satisfiability solver. Logic program- ming with satisfiability combines the strengths of the two paradigms: logic programming for encoding search problems into satisfiability on the one hand and efficient SAT solving on the other. This synergy between these two exposes a programming paradigm which we propose here as a logic programming pearl. To illustrate logic programming with SAT solving we give an example Prolog program which solves instances of Partial MAXSAT.

preprint2006arXiv

Efficient constraint propagation engines

This paper presents a model and implementation techniques for speeding up constraint propagation. Three fundamental approaches to improving constraint propagation based on propagators as implementations of constraints are explored: keeping track of which propagators are at fixpoint, choosing which propagator to apply next, and how to combine several propagators for the same constraint. We show how idempotence reasoning and events help track fixpoints more accurately. We improve these methods by using them dynamically (taking into account current domains to improve accuracy). We define priority-based approaches to choosing a next propagator and show that dynamic priorities can improve propagation. We illustrate that the use of multiple propagators for the same constraint can be advantageous with priorities, and introduce staged propagators that combine the effects of multiple propagators with priorities for greater efficiency.

preprint2005arXiv

Improving PARMA Trailing

Taylor introduced a variable binding scheme for logic variables in his PARMA system, that uses cycles of bindings rather than the linear chains of bindings used in the standard WAM representation. Both the HAL and dProlog languages make use of the PARMA representation in their Herbrand constraint solvers. Unfortunately, PARMA's trailing scheme is considerably more expensive in both time and space consumption. The aim of this paper is to present several techniques that lower the cost. First, we introduce a trailing analysis for HAL using the classic PARMA trailing scheme that detects and eliminates unnecessary trailings. The analysis, whose accuracy comes from HAL's determinism and mode declarations, has been integrated in the HAL compiler and is shown to produce space improvements as well as speed improvements. Second, we explain how to modify the classic PARMA trailing scheme to halve its trailing cost. This technique is illustrated and evaluated both in the context of dProlog and HAL. Finally, we explain the modifications needed by the trailing analysis in order to be combined with our modified PARMA trailing scheme. Empirical evidence shows that the combination is more effective than any of the techniques when used in isolation. To appear in Theory and Practice of Logic Programming.

preprint2005arXiv

Solving Partial Order Constraints for LPO Termination

This paper introduces a new kind of propositional encoding for reasoning about partial orders. The symbols in an unspecified partial order are viewed as variables which take integer values and are interpreted as indices in the order. For a partial order statement on n symbols each index is represented in log2 n propositional variables and partial order constraints between symbols are modeled on the bit representations. We illustrate the application of our approach to determine LPO termination for term rewrite systems. Experimental results are unequivocal, indicating orders of magnitude speedups in comparison with current implementations for LPO termination. The proposed encoding is general and relevant to other applications which involve propositional reasoning about partial orders.

preprint2004arXiv

Checking modes of HAL programs

Recent constraint logic programming (CLP) languages, such as HAL and Mercury, require type, mode and determinism declarations for predicates. This information allows the generation of efficient target code and the detection of many errors at compile-time. Unfortunately, mode checking in such languages is difficult. One of the main reasons is that, for each predicate mode declaration, the compiler is required to appropriately re-order literals in the predicate's definition. The task is further complicated by the need to handle complex instantiations (which interact with type declarations and higher-order predicates) and automatic initialization of solver variables. Here we define mode checking for strongly typed CLP languages which require reordering of clause body literals. In addition, we show how to handle a simple case of polymorphic modes by using the corresponding polymorphic types.

preprint2004arXiv

Optimizing compilation of constraint handling rules in HAL

In this paper we discuss the optimizing compilation of Constraint Handling Rules (CHRs). CHRs are a multi-headed committed choice constraint language, commonly applied for writing incremental constraint solvers. CHRs are usually implemented as a language extension that compiles to the underlying language. In this paper we show how we can use different kinds of information in the compilation of CHRs in order to obtain access efficiency, and a better translation of the CHR rules into the underlying language, which in this case is HAL. The kinds of information used include the types, modes, determinism, functional dependencies and symmetries of the CHR constraints. We also show how to analyze CHR programs to determine this information about functional dependencies, symmetries and other kinds of information supporting optimizations.

Peter J. Stuckey

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

Assessing the accuracy of the Australian Senate count: Key steps for a rigorous and transparent audit

MurTree: Optimal Classification Trees via Dynamic Programming and Search

A Scalable Two Stage Approach to Computing Optimal Decision Sets

Optimal Decision Trees for Nonlinear Metrics

Computing Optimal Decision Sets with SAT

You can do RLAs for IRV

MiniZinc with Strings

Efficient Computation of Exact IRV Margins

Horn Clauses as an Intermediate Representation for Program Analysis and Transformation

Unsatisfiable Cores and Lower Bounding for Constraint Programming

A Complete Refinement Procedure for Regular Separability of Context-Free Languages

A Partial-Order Approach to Array Content Analysis

Statistical Inference of a canonical dictionary of protein substructural fragments

Structure Based Extended Resolution for Constraint Programming

Unsatisfiable Cores for Constraint Programming

Explaining Time-Table-Edge-Finding Propagation for the Cumulative Resource Constraint

Search Combinators

Boolean Equi-propagation for Optimized SAT Encoding

Solving the Resource Constrained Project Scheduling Problem with Generalized Precedences by Lazy Clause Generation

Logic Programming with Satisfiability

Efficient constraint propagation engines

Improving PARMA Trailing

Solving Partial Order Constraints for LPO Termination

Checking modes of HAL programs

Optimizing compilation of constraint handling rules in HAL