Source author record

Thibaut Vidal

Thibaut Vidal appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Data Structures and Algorithms Artificial Intelligence Cryptography and Security Discrete Mathematics Social and Information Networks

Catalog footprint

What is connected

19works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Optimal Counterfactual Search in Tree Ensembles: A Study Across Modeling and Solution Paradigms

Trust in counterfactual explanations depends critically on whether their recommended changes are truly minimal: suboptimal explanations may vastly overshoot the actual changes needed to alter a decision, and heuristic errors can affect individuals unevenly, giving some users relevant recourse while assigning others unnecessarily costly recommendations. Consequently, we study the problem of computing optimal counterfactual explanations for tree ensembles under plausibility and actionability constraints. This is a combinatorial problem: for a fixed model, counterfactual search boils down to selecting consistent branching decisions and threshold-defined regions under a distance objective. We exploit this structure through CPCF, a constraint programming (CP) formulation in which numerical features are encoded as interval domains induced by split thresholds, while discrete features retain native finite-domain representations. This yields a compact finite-domain formulation that supports multiple distance objectives without continuous split-boundary search. We then place CPCF in a broader comparison across mathematical programming paradigms: we extend a maximum Boolean satisfiability (MaxSAT) formulation, originally designed for hard-voting random forests, to soft-voting ensembles, and compare against the current state-of-the-art mixed-integer linear programming (MILP) optimal approach. Across ten datasets and three types of tree ensembles, we analyze scalability, anytime performance, and sensitivity to distance metrics. We observe that CP achieves the best overall performance. More importantly, our results identify regimes in which the specific strengths of each paradigm make it best suited: CP is most versatile overall, MaxSAT handles hard-voting ensembles particularly well, and MILP remains competitive in amortized inference settings with a moderate number of split levels.

preprint2026arXiv

PACE: Prune-And-Compress Ensemble Models

Ensemble models achieve state-of-the-art performance on prediction tasks, but usually require aggregating a large number of weak learners. This can hinder deployment, interpretability, and downstream tasks such as robustness verification. Remedies to this issue fall into two main camps: pruning, which discards redundant learners, and compression, which generates new ones from scratch. We introduce PACE, a framework that interleaves these paradigms in a two-phase strategy. First, new learners are actively generated via a theoretically grounded procedure to enhance the diversity of the initial ensemble. When no more relevant learners can be found, a second phase of pruning is performed on this enriched ensemble. During both operations, PACE allows fine control on the faithfulness to the original ensemble. Experiments show that our method outperforms prior pruning and compression methods while offering principled control of faithfulness guarantees.

preprint2026arXiv

The XL Instances for the Capacitated Vehicle Routing Problem

This paper introduces a new set of large-scale benchmark instances for the Capacitated Vehicle Routing Problem (CVRP). The proposed XL set extends existing benchmarks by covering instances with 1,000 to 10,000 customers and a wide range of structural characteristics, following established generation principles from prior CVRP studies. A computational study involving several state-of-the-art algorithms is conducted to provide initial best known solutions (BKSs) for the XL instances, which serve as a baseline for a community-driven BKS challenge launched on the CVRPLib website. The instances are made publicly available to support experimental evaluation and comparison of solution methods. Furthermore, additional computational analyses are reported to compare algorithmic performance on other existing CVRP benchmark instances.

preprint2026arXiv

Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

Recent research has shown that structured machine learning models such as tree ensembles are vulnerable to privacy attacks targeting their training data. To mitigate these risks, differential privacy (DP) has become a widely adopted countermeasure, as it offers rigorous privacy protection. In this paper, we introduce a reconstruction attack targeting state-of-the-art $ε$-DP random forests. By leveraging a constraint programming model that incorporates knowledge of the forest's structure and DP mechanism characteristics, our approach formally reconstructs the most likely dataset that could have produced a given forest. Through extensive computational experiments, we examine the interplay between model utility, privacy guarantees and reconstruction accuracy across various configurations. Our results reveal that random forests trained with meaningful DP guarantees can still leak portions of their training data. Specifically, while DP reduces the success of reconstruction attacks, the only forests fully robust to our attack exhibit predictive performance no better than a constant classifier. Building on these insights, we also provide practical recommendations for the construction of DP random forests that are more resilient to reconstruction attacks while maintaining a non-trivial predictive performance.

preprint2022arXiv

Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor Problem

We study the feature-based newsvendor problem, in which a decision-maker has access to historical data consisting of demand observations and exogenous features. In this setting, we investigate feature selection, aiming to derive sparse, explainable models with improved out-of-sample performance. Up to now, state-of-the-art methods utilize regularization, which penalizes the number of selected features or the norm of the solution vector. As an alternative, we introduce a novel bilevel programming formulation. The upper-level problem selects a subset of features that minimizes an estimate of the out-of-sample cost of ordering decisions based on a held-out validation set. The lower-level problem learns the optimal coefficients of the decision function on a training set, using only the features selected by the upper-level. We present a mixed integer linear program reformulation for the bilevel program, which can be solved to optimality with standard optimization solvers. Our computational experiments show that the method accurately recovers ground-truth features already for instances with a sample size of a few hundred observations. In contrast, regularization-based techniques often fail at feature recovery or require thousands of observations to obtain similar accuracy. Regarding out-of-sample generalization, we achieve improved or comparable cost performance.

preprint2022arXiv

Exponential-Size Neighborhoods for the Pickup-and-Delivery Traveling Salesman Problem

Neighborhood search is a cornerstone of state-of-the-art traveling salesman and vehicle routing metaheuristics. While neighborhood exploration procedures are well developed for problems with individual services, their counterparts for one-to-one pickup-and-delivery problems have been more scarcely studied. A direct extension of classic neighborhoods is often inefficient or complex due to the necessity of jointly considering service pairs. To circumvent these issues, we introduce major improvements to existing neighborhood searches for the pickup-and-delivery traveling salesman problem and new large neighborhoods. We show that the classical Relocate-Pair neighborhood can be fully explored in $O(n^2)$ instead of $O(n^3)$ time. We adapt the 4-Opt and Balas-Simonetti neighborhoods to consider precedence constraints. Moreover, we introduce an exponential-size neighborhood called 2k-Opt, which includes all solutions generated by multiple nested 2-Opts and can be searched in $O(n^2)$ time using dynamic programming. We conduct extensive computational experiments, highlighting the significant contribution of these new neighborhoods and speed-up strategies within two classical metaheuristics. Notably, our approach permits to repeatedly solve small pickup-and-delivery problem instances to optimality or near-optimality within milliseconds, and therefore it represents a valuable tool for time-critical applications such as meal delivery or mobility on demand.

preprint2022arXiv

Optimal Decision Diagrams for Classification

Decision diagrams for classification have some notable advantages over decision trees, as their internal connections can be determined at training time and their width is not bound to grow exponentially with their depth. Accordingly, decision diagrams are usually less prone to data fragmentation in internal nodes. However, the inherent complexity of training these classifiers acted as a long-standing barrier to their widespread adoption. In this context, we study the training of optimal decision diagrams (ODDs) from a mathematical programming perspective. We introduce a novel mixed-integer linear programming model for training and demonstrate its applicability for many datasets of practical importance. Further, we show how this model can be easily extended for fairness, parsimony, and stability notions. We present numerical analyses showing that our model allows training ODDs in short computational times, and that ODDs achieve better accuracy than optimal decision trees, while allowing for improved stability without significant accuracy losses.

preprint2022arXiv

Support Vector Machines with the Hard-Margin Loss: Optimal Training via Combinatorial Benders' Cuts

The classical hinge-loss support vector machines (SVMs) model is sensitive to outlier observations due to the unboundedness of its loss function. To circumvent this issue, recent studies have focused on non-convex loss functions, such as the hard-margin loss, which associates a constant penalty to any misclassified or within-margin sample. Applying this loss function yields much-needed robustness for critical applications but it also leads to an NP-hard model that makes training difficult, since current exact optimization algorithms show limited scalability, whereas heuristics are not able to find high-quality solutions consistently. Against this background, we propose new integer programming strategies that significantly improve our ability to train the hard-margin SVM model to global optimality. We introduce an iterative sampling and decomposition approach, in which smaller subproblems are used to separate combinatorial Benders' cuts. Those cuts, used within a branch-and-cut algorithm, permit to converge much more quickly towards a global optimum. Through extensive numerical analyses on classical benchmark data sets, our solution algorithm solves, for the first time, 117 new data sets to optimality and achieves a reduction of 50% in the average optimality gap for the hardest datasets of the benchmark.

preprint2022arXiv

Vehicle Routing with Stochastic Demands and Partial Reoptimization

We consider the vehicle routing problem with stochastic demands (VRPSD), a problem in which customer demands are known in distribution at the route planning stage and revealed during route execution upon arrival at each customer. A long-standing open question on the VRPSD concerns the benefits of allowing, during route execution, partial reordering of the planned customer visits. Given the practical importance of this question and the growing interest on the VRPSD under optimal restocking, we study the VRPSD under a recourse policy known as the switch policy. The switch policy is a canonical reoptimization policy that permits only pairs of successive customers to be reordered. We consider this policy jointly with optimal preventive restocking and introduce a branch-cut-and-price algorithm to compute optimal a priori routing plans. This algorithm features pricing routines where value functions represent the expected cost-to-go along planned routes for all possible states and reordering decisions. To ensure pricing tractability, we adopt a strategy that combines elementary pricing with completion bounds of varying complexity, and solve the pricing problem without relying on dominance rules. Our numerical experiments demonstrate the effectiveness of the algorithm for solving instances with up to 50 customers. Notably, they also give us new insights into the value of reoptimization. The switch policy enables significant cost savings over optimal restocking when the planned routes come from an algorithm built on a deterministic approximation of the data, an important scenario given the difficulty of finding optimal VRPSD solutions. The benefits are smaller when comparing optimal a priori VRPSD solutions obtained for both recourse policies. As it appears, further cost savings may require joint reordering and reassignment of customer visits among vehicles when the context permits.

preprint2022arXiv

Workload Equity in Multi-Period Vehicle Routing Problems

An equitable distribution of workload is essential when deploying vehicle routing solutions in practice. For this reason, previous studies have formulated vehicle routing problems with workload-balance objectives or constraints, leading to trade-off solutions between routing costs and workload equity. These methods consider a single planning period; however, equity is often sought over several days in practice. In this work, we show that workload equity over multiple periods can be achieved without impact on transportation costs when the planning horizon is sufficiently large. To achieve this, we design a two-phase method to solve multi-period vehicle routing problems with workload balance. Firstly, our approach produces solutions with minimal distance for each period. Next, the resulting routes are allocated to drivers to obtain equitable workloads over the planning horizon. We conduct extensive numerical experiments to measure the performance of the proposed approach and the level of workload equity achieved for different planning-horizon lengths. For horizons of five days or more, we observe that near-optimal workload equity and optimal routing costs are jointly achievable.

preprint2020arXiv

A concise guide to existing and emerging vehicle routing problem variants

Vehicle routing problems have been the focus of extensive research over the past sixty years, driven by their economic importance and their theoretical interest. The diversity of applications has motivated the study of a myriad of problem variants with different attributes. In this article, we provide a concise overview of existing and emerging problem variants. Models are typically refined along three lines: considering more relevant objectives and performance metrics, integrating vehicle routing evaluations with other tactical decisions, and capturing fine-grained yet essential aspects of modern supply chains. We organize the main problem attributes within this structured framework. We discuss recent research directions and pinpoint current shortcomings, recent successes, and emerging challenges.

preprint2020arXiv

Arc Routing with Time-Dependent Travel Times and Paths

Vehicle routing algorithms usually reformulate the road network into a complete graph in which each arc represents the shortest path between two locations. Studies on time-dependent routing followed this model and therefore defined the speed functions on the complete graph. We argue that this model is often inadequate, in particular for arc routing problems involving services on edges of a road network. To fill this gap, we formally define the time-dependent capacitated arc routing problem (TDCARP), with travel and service speed functions given directly at the network level. Under these assumptions, the quickest path between locations can change over time, leading to a complex problem that challenges the capabilities of current solution methods. We introduce effective algorithms for preprocessing quickest paths in a closed form, efficient data structures for travel time queries during routing optimization, as well as heuristic and exact solution approaches for the TDCARP. Our heuristic uses the hybrid genetic search principle with tailored solution-decoding algorithms and lower bounds for filtering moves. Our branch-and-price algorithm exploits dedicated pricing routines, heuristic dominance rules and completion bounds to find optimal solutions for problem counting up to 75 services. Based on these algorithms, we measure the benefits of time-dependent routing optimization for different levels of travel-speed data accuracy.

preprint2020arXiv

Assortative-Constrained Stochastic Block Models

Stochastic block models (SBMs) are often used to find assortative community structures in networks, such that the probability of connections within communities is higher than in between communities. However, classic SBMs are not limited to assortative structures. In this study, we discuss the implications of this model-inherent indifference towards assortativity or disassortativity, and show that this characteristic can lead to undesirable outcomes for networks which are presupposedy assortative but which contain a reduced amount of information. To circumvent this issue, we introduce a constrained SBM that imposes strong assortativity constraints, along with efficient algorithmic approaches to solve it. These constraints significantly boost community recovery capabilities in regimes that are close to the information-theoretic threshold. They also permit to identify structurally-different communities in networks representing cerebral-cortex activity regions.

preprint2020arXiv

Born-Again Tree Ensembles

The use of machine learning algorithms in finance, medicine, and criminal justice can deeply impact human lives. As a consequence, research into interpretable machine learning has rapidly grown in an attempt to better control and fix possible sources of mistakes and biases. Tree ensembles offer a good prediction quality in various domains, but the concurrent use of multiple trees reduces the interpretability of the ensemble. Against this background, we study born-again tree ensembles, i.e., the process of constructing a single decision tree of minimum size that reproduces the exact same behavior as a given tree ensemble in its entire feature space. To find such a tree, we develop a dynamic-programming based algorithm that exploits sophisticated pruning and bounding rules to reduce the number of recursive calls. This algorithm generates optimal born-again trees for many datasets of practical interest, leading to classifiers which are typically simpler and more interpretable without any other form of compromise.

preprint2016arXiv

A large neighbourhood based heuristic for two-echelon routing problems

In this paper, we address two optimisation problems arising in the context of city logistics and two-level transportation systems. The two-echelon vehicle routing problem and the two-echelon location routing problem seek to produce vehicle itineraries to deliver goods to customers, with transits through intermediate facilities. To efficiently solve these problems, we propose a hybrid metaheuristic which combines enumerative local searches with destroy-and-repair principles, as well as some tailored operators to optimise the selections of intermediate facilities. We conduct extensive computational experiments to investigate the contribution of these operators to the search performance, and measure the performance of the method on both problem classes. The proposed algorithm finds the current best known solutions, or better ones, for 95% of the two-echelon vehicle routing problem benchmark instances. Overall, for both problems, it achieves high-quality solutions within short computing times. Finally, for future reference, we resolve inconsistencies between different versions of benchmark instances, document their differences, and provide them all online in a unified format.

preprint2014arXiv

A Decomposition Algorithm for Nested Resource Allocation Problems

We propose an exact polynomial algorithm for a resource allocation problem with convex costs and constraints on partial sums of resource consumptions, in the presence of either continuous or integer variables. No assumption of strict convexity or differentiability is needed. The method solves a hierarchy of resource allocation subproblems, whose solutions are used to convert constraints on sums of resources into bounds for separate variables at higher levels. The resulting time complexity for the integer problem is $O(n \log m \log (B/n))$, and the complexity of obtaining an $ε$-approximate solution for the continuous case is $O(n \log m \log (B/ε))$, $n$ being the number of variables, $m$ the number of ascending constraints (such that $m < n$), $ε$ a desired precision, and $B$ the total resource. This algorithm attains the best-known complexity when $m = n$, and improves it when $\log m = o(\log n)$. Extensive experimental analyses are conducted with four recent algorithms on various continuous problems issued from theory and practice. The proposed method achieves a higher performance than previous algorithms, addressing all problems with up to one million variables in less than one minute on a modern computer.

preprint2014arXiv

A matheuristic approach for the Pollution-Routing Problem

This paper deals with the Pollution-Routing Problem (PRP), a Vehicle Routing Problem (VRP) with environmental considerations, recently introduced in the literature by [Bektas and Laporte (2011), Transport. Res. B-Meth. 45 (8), 1232-1250]. The objective is to minimize operational and environmental costs while respecting capacity constraints and service time windows. Costs are based on driver wages and fuel consumption, which depends on many factors, such as travel distance and vehicle load. The vehicle speeds are considered as decision variables. They complement routing decisions, impacting the total cost, the travel time between locations, and thus the set of feasible routes. We propose a method which combines a local search-based metaheuristic with an integer programming approach over a set covering formulation and a recursive speed-optimization algorithm. This hybridization enables to integrate more tightly route and speed decisions. Moreover, two other "green" VRP variants, the Fuel Consumption VRP (FCVRP) and the Energy Minimizing VRP (EMVRP), are addressed. The proposed method compares very favorably with previous algorithms from the literature and many new improved solutions are reported.

preprint2014arXiv

Hybrid Metaheuristics for the Clustered Vehicle Routing Problem

The Clustered Vehicle Routing Problem (CluVRP) is a variant of the Capacitated Vehicle Routing Problem in which customers are grouped into clusters. Each cluster has to be visited once, and a vehicle entering a cluster cannot leave it until all customers have been visited. This article presents two alternative hybrid metaheuristic algorithms for the CluVRP. The first algorithm is based on an Iterated Local Search algorithm, in which only feasible solutions are explored and problem-specific local search moves are utilized. The second algorithm is a Hybrid Genetic Search, for which the shortest Hamiltonian path between each pair of vertices within each cluster should be precomputed. Using this information, a sequence of clusters can be used as a solution representation and large neighborhoods can be efficiently explored by means of bi-directional dynamic programming, sequence concatenations, by using appropriate data structures. Extensive computational experiments are performed on benchmark instances from the literature, as well as new large scale ones. Recommendations on promising algorithm choices are provided relatively to average cluster size.

preprint2014arXiv

Large neighborhoods with implicit customer selection for vehicle routing problems with profits

We consider several Vehicle Routing Problems (VRP) with profits, which seek to select a subset of customers, each one being associated with a profit, and to design service itineraries. When the sum of profits is maximized under distance constraints, the problem is usually called team orienteering problem. The capacitated profitable tour problem seeks to maximize profits minus travel costs under capacity constraints. Finally, in the VRP with private fleet and common carrier, some customers can be delegated to an external carrier subject to a cost. Three families of combined decisions must be taken: customers selection, assignment to vehicles, and sequencing of deliveries for each route. We propose a new neighborhood search for these problems which explores an exponential number of solutions in pseudo polynomial time. The search is conducted with standard VRP neighborhoods on an "exhaustive" solution representation, visiting all customers. Since visiting all customers is usually infeasible or sub-optimal, an efficient "Select" algorithm, based on resource constrained shortest paths, is repeatedly used on any new route to find the optimal subsequence of visits to customers. The good performance of these neighborhood structures is demonstrated by extensive computational experiments with a local search, an iterated local search and a hybrid genetic algorithm. Intriguingly, even a local-improvement method to the first local optimum of this neighborhood achieves an average gap of 0.09% on classic team orienteering benchmark instances, rivaling with the current state-of-the-art metaheuristics. Promising research avenues on hybridizations with more standard routing neighborhoods are also open.

Thibaut Vidal

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Optimal Counterfactual Search in Tree Ensembles: A Study Across Modeling and Solution Paradigms

PACE: Prune-And-Compress Ensemble Models

The XL Instances for the Capacitated Vehicle Routing Problem

Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor Problem

Exponential-Size Neighborhoods for the Pickup-and-Delivery Traveling Salesman Problem

Optimal Decision Diagrams for Classification

Support Vector Machines with the Hard-Margin Loss: Optimal Training via Combinatorial Benders' Cuts

Vehicle Routing with Stochastic Demands and Partial Reoptimization

Workload Equity in Multi-Period Vehicle Routing Problems

A concise guide to existing and emerging vehicle routing problem variants

Arc Routing with Time-Dependent Travel Times and Paths

Assortative-Constrained Stochastic Block Models

Born-Again Tree Ensembles

A large neighbourhood based heuristic for two-echelon routing problems

A Decomposition Algorithm for Nested Resource Allocation Problems

A matheuristic approach for the Pollution-Routing Problem

Hybrid Metaheuristics for the Clustered Vehicle Routing Problem

Large neighborhoods with implicit customer selection for vehicle routing problems with profits