Source author record

James Cussens

James Cussens appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning math.CO math.ST Methodology Statistics Theory

Catalog footprint

What is connected

14works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Learning All Credible Bayesian Network Structures for Model Averaging

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known score-and-search approach. However, selecting a single model (i.e., the best scoring BN) can be misleading or may not achieve the best possible accuracy. An alternative to committing to a single model is to perform some form of Bayesian or frequentist model averaging, where the space of possible BNs is sampled or enumerated in some fashion. Unfortunately, existing approaches for model averaging either severely restrict the structure of the Bayesian network or have only been shown to scale to networks with fewer than 30 random variables. In this paper, we propose a novel approach to model averaging inspired by performance guarantees in approximation algorithms. Our approach has two primary advantages. First, our approach only considers credible models in that they are optimal or near-optimal in score. Second, our approach is more efficient and scales to significantly larger Bayesian networks than existing approaches.

preprint2020arXiv

On Pruning for Score-Based Bayesian Network Structure Learning

Many algorithms for score-based Bayesian network structure learning (BNSL), in particular exact ones, take as input a collection of potentially optimal parent sets for each variable in the data. Constructing such collections naively is computationally intensive since the number of parent sets grows exponentially with the number of variables. Thus, pruning techniques are not only desirable but essential. While good pruning rules exist for the Bayesian Information Criterion (BIC), current results for the Bayesian Dirichlet equivalent uniform (BDeu) score reduce the search space very modestly, hampering the use of the (often preferred) BDeu. We derive new non-trivial theoretical upper bounds for the BDeu score that considerably improve on the state-of-the-art. Since the new bounds are mathematically proven to be tighter than previous ones and at little extra computational cost, they are a promising addition to BNSL methods.

preprint2016arXiv

Bayesian Network Structure Learning with Integer Programming: Polytopes, Facets, and Complexity

The challenging task of learning structures of probabilistic graphical models is an important problem within modern AI research. Recent years have witnessed several major algorithmic advances in structure learning for Bayesian networks---arguably the most central class of graphical models---especially in what is known as the score-based setting. A successful generic approach to optimal Bayesian network structure learning (BNSL), based on integer programming (IP), is implemented in the GOBNILP system. Despite the recent algorithmic advances, current understanding of foundational aspects underlying the IP based approach to BNSL is still somewhat lacking. Understanding fundamental aspects of cutting planes and the related separation problem( is important not only from a purely theoretical perspective, but also since it holds out the promise of further improving the efficiency of state-of-the-art approaches to solving BNSL exactly. In this paper, we make several theoretical contributions towards these goals: (i) we study the computational complexity of the separation problem, proving that the problem is NP-hard; (ii) we formalise and analyse the relationship between three key polytopes underlying the IP-based approach to BNSL; (iii) we study the facets of the three polytopes both from the theoretical and practical perspective, providing, via exhaustive computation, a complete enumeration of facets for low-dimensional family-variable polytopes; and, furthermore, (iv) we establish a tight connection of the BNSL problem to the acyclic subgraph problem.

preprint2015arXiv

Advances in Bayesian Network Learning using Integer Programming

We consider the problem of learning Bayesian networks (BNs) from complete discrete data. This problem of discrete optimisation is formulated as an integer program (IP). We describe the various steps we have taken to allow efficient solving of this IP. These are (i) efficient search for cutting planes, (ii) a fast greedy algorithm to find high-scoring (perhaps not optimal) BNs and (iii) tightening the linear relaxation of the IP. After relating this BN learning problem to set covering and the multidimensional 0-1 knapsack problem, we present our empirical results. These show improvements, sometimes dramatic, over earlier results.

preprint2015arXiv

First-order integer programming for MAP problems

Finding the most probable (MAP) model in SRL frameworks such as Markov logic and Problog can, in principle, be solved by encoding the problem as a `grounded-out' mixed integer program (MIP). However, useful first-order structure disappears in this process motivating the development of first-order MIP approaches. Here we present mfoilp, one such approach. Since the syntax and semantics of mfoilp is essentially the same as existing approaches we focus here mainly on implementation and algorithmic issues. We start with the (conceptually) simple problem of using a logic program to generate a MIP instance before considering more ambitious exploitation of first-order representations.

preprint2015arXiv

Polyhedral aspects of score equivalence in Bayesian network structure learning

This paper deals with faces and facets of the family-variable polytope and the characteristic-imset polytope, which are special polytopes used in integer linear programming approaches to statistically learn Bayesian network structure. A common form of linear objectives to be maximized in this area leads to the concept of score equivalence (SE), both for linear objectives and for faces of the family-variable polytope. We characterize the linear space of SE objectives and establish a one-to-one correspondence between SE faces of the family-variable polytope, the faces of the characteristic-imset polytope, and standardized supermodular functions. The characterization of SE facets in terms of extremality of the corresponding supermodular function gives an elegant method to verify whether an inequality is SE-facet-defining for the family-variable polytope. We also show that when maximizing an SE objective one can eliminate linear constraints of the family-variable polytope that correspond to non-SE facets. However, we show that solely considering SE facets is not enough as a counter-example shows; one has to consider the linear inequality constraints that correspond to facets of the characteristic-imset polytope despite the fact that they may not define facets in the family-variable mode.

preprint2015arXiv

Searching Multiregression Dynamic Models of Resting-State fMRI Networks Using Integer Programming

A Multiregression Dynamic Model (MDM) is a class of multivariate time series that represents various dynamic causal processes in a graphical way. One of the advantages of this class is that, in contrast to many other Dynamic Bayesian Networks, the hypothesised relationships accommodate conditional conjugate inference. We demonstrate for the first time how straightforward it is to search over all possible connectivity networks with dynamically changing intensity of transmission to find the Maximum a Posteriori Probability (MAP) model within this class. This search method is made feasible by using a novel application of an Integer Programming algorithm. The efficacy of applying this particular class of dynamic models to this domain is shown and more specifically the computational efficiency of a corresponding search of 11-node Directed Acyclic Graph (DAG) model space. We proceed to show how diagnostic methods, analogous to those defined for static Bayesian Networks, can be used to suggest embellishment of the model class to extend the process of model selection. All methods are illustrated using simulated and real resting-state functional Magnetic Resonance Imaging (fMRI) data.

preprint2014arXiv

Exact Estimation of Multiple Directed Acyclic Graphs

This paper considers the problem of estimating the structure of multiple related directed acyclic graph (DAG) models. Building on recent developments in exact estimation of DAGs using integer linear programming (ILP), we present an ILP approach for joint estimation over multiple DAGs, that does not require that the vertices in each DAG share a common ordering. Furthermore, we allow also for (potentially unknown) dependency structure between the DAGs. Results are presented on both simulated data and fMRI data obtained from multiple subjects.

preprint2013arXiv

Loglinear models for first-order probabilistic reasoning

Recent work on loglinear models in probabilistic constraint logic programming is applied to first-order probabilistic reasoning. Probabilities are defined directly on the proofs of atomic formulae, and by marginalisation on the atomic formulae themselves. We use Stochastic Logic Programs (SLPs) composed of labelled and unlabelled definite clauses to define the proof probabilities. We have a conservative extension of first-order reasoning, so that, for example, there is a one-one mapping between logical and random variables. We show how, in this framework, Inductive Logic Programming (ILP) can be used to induce the features of a loglinear model from data. We also compare the presented framework with other approaches to first-order probabilistic reasoning.

preprint2013arXiv

Markov Chain Monte Carlo using Tree-Based Priors on Model Structure

We present a general framework for defining priors on model structure and sampling from the posterior using the Metropolis-Hastings algorithm. The key idea is that structure priors are defined via a probability tree and that the proposal mechanism for the Metropolis-Hastings algorithm operates by traversing this tree, thereby defining a cheaply computable acceptance probability. We have applied this approach to Bayesian net structure learning using a number of priors and tree traversal strategies. Our results show that these must be chosen appropriately for this approach to be successful.

preprint2013arXiv

Stochastic Logic Programs: Sampling, Inference and Applications

Algorithms for exact and approximate inference in stochastic logic programs (SLPs) are presented, based respectively, on variable elimination and importance sampling. We then show how SLPs can be used to represent prior distributions for machine learning, using (i) logic programs and (ii) Bayes net structures as examples. Drawing on existing work in statistics, we apply the Metropolis-Hasting algorithm to construct a Markov chain which samples from the posterior distribution. A Prolog implementation for this is described. We also discuss the possibility of constructing explicit representations of the posterior.

preprint2012arXiv

Bayesian network learning by compiling to weighted MAX-SAT

The problem of learning discrete Bayesian networks from data is encoded as a weighted MAX-SAT problem and the MaxWalkSat local search algorithm is used to address it. For each dataset, the per-variable summands of the (BDeu) marginal likelihood for different choices of parents ('family scores') are computed prior to applying MaxWalkSat. Each permissible choice of parents for each variable is encoded as a distinct propositional atom and the associated family score encoded as a 'soft' weighted single-literal clause. Two approaches to enforcing acyclicity are considered: either by encoding the ancestor relation or by attaching a total order to each graph and encoding that. The latter approach gives better results. Learning experiments have been conducted on 21 synthetic datasets sampled from 7 BNs. The largest dataset has 10,000 datapoints and 60 variables producing (for the 'ancestor' encoding) a weighted CNF input file with 19,932 atoms and 269,367 clauses. For most datasets, MaxWalkSat quickly finds BNs with higher BDeu score than the 'true' BN. The effect of adding prior information is assessed. It is further shown that Bayesian model averaging can be effected by collecting BNs generated during the search.

preprint2012arXiv

Bayesian network learning with cutting planes

The problem of learning the structure of Bayesian networks from complete discrete data with a limit on parent set size is considered. Learning is cast explicitly as an optimisation problem where the goal is to find a BN structure which maximises log marginal likelihood (BDe score). Integer programming, specifically the SCIP framework, is used to solve this optimisation problem. Acyclicity constraints are added to the integer program (IP) during solving in the form of cutting planes. Finding good cutting planes is the key to the success of the approach -the search for such cutting planes is effected using a sub-IP. Results show that this is a particularly fast method for exact BN learning.

preprint2012arXiv

CLP(BN): Constraint Logic Programming for Probabilistic Knowledge

We present CLP(BN), a novel approach that aims at expressing Bayesian networks through the constraint logic programming framework. Arguably, an important limitation of traditional Bayesian networks is that they are propositional, and thus cannot represent relations between multiple similar objects in multiple contexts. Several researchers have thus proposed first-order languages to describe such networks. Namely, one very successful example of this approach are the Probabilistic Relational Models (PRMs), that combine Bayesian networks with relational database technology. The key difficulty that we had to address when designing CLP(cal{BN}) is that logic based representations use ground terms to denote objects. With probabilitic data, we need to be able to uniquely represent an object whose value we are not sure about. We use {sl Skolem functions} as unique new symbols that uniquely represent objects with unknown value. The semantics of CLP(cal{BN}) programs then naturally follow from the general framework of constraint logic programming, as applied to a specific domain where we have probabilistic data. This paper introduces and defines CLP(cal{BN}), and it describes an implementation and initial experiments. The paper also shows how CLP(cal{BN}) relates to Probabilistic Relational Models (PRMs), Ngo and Haddawys Probabilistic Logic Programs, AND Kersting AND De Raedts Bayesian Logic Programs.

James Cussens

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Learning All Credible Bayesian Network Structures for Model Averaging

On Pruning for Score-Based Bayesian Network Structure Learning

Bayesian Network Structure Learning with Integer Programming: Polytopes, Facets, and Complexity

Advances in Bayesian Network Learning using Integer Programming

First-order integer programming for MAP problems

Polyhedral aspects of score equivalence in Bayesian network structure learning

Searching Multiregression Dynamic Models of Resting-State fMRI Networks Using Integer Programming

Exact Estimation of Multiple Directed Acyclic Graphs

Loglinear models for first-order probabilistic reasoning

Markov Chain Monte Carlo using Tree-Based Priors on Model Structure

Stochastic Logic Programs: Sampling, Inference and Applications

Bayesian network learning by compiling to weighted MAX-SAT

Bayesian network learning with cutting planes

CLP(BN): Constraint Logic Programming for Probabilistic Knowledge