Researcher profile

Thibaut Vidal

Thibaut Vidal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Optimal Counterfactual Search in Tree Ensembles: A Study Across Modeling and Solution Paradigms

Trust in counterfactual explanations depends critically on whether their recommended changes are truly minimal: suboptimal explanations may vastly overshoot the actual changes needed to alter a decision, and heuristic errors can affect individuals unevenly, giving some users relevant recourse while assigning others unnecessarily costly recommendations. Consequently, we study the problem of computing optimal counterfactual explanations for tree ensembles under plausibility and actionability constraints. This is a combinatorial problem: for a fixed model, counterfactual search boils down to selecting consistent branching decisions and threshold-defined regions under a distance objective. We exploit this structure through CPCF, a constraint programming (CP) formulation in which numerical features are encoded as interval domains induced by split thresholds, while discrete features retain native finite-domain representations. This yields a compact finite-domain formulation that supports multiple distance objectives without continuous split-boundary search. We then place CPCF in a broader comparison across mathematical programming paradigms: we extend a maximum Boolean satisfiability (MaxSAT) formulation, originally designed for hard-voting random forests, to soft-voting ensembles, and compare against the current state-of-the-art mixed-integer linear programming (MILP) optimal approach. Across ten datasets and three types of tree ensembles, we analyze scalability, anytime performance, and sensitivity to distance metrics. We observe that CP achieves the best overall performance. More importantly, our results identify regimes in which the specific strengths of each paradigm make it best suited: CP is most versatile overall, MaxSAT handles hard-voting ensembles particularly well, and MILP remains competitive in amortized inference settings with a moderate number of split levels.

preprint2026arXiv

PACE: Prune-And-Compress Ensemble Models

Ensemble models achieve state-of-the-art performance on prediction tasks, but usually require aggregating a large number of weak learners. This can hinder deployment, interpretability, and downstream tasks such as robustness verification. Remedies to this issue fall into two main camps: pruning, which discards redundant learners, and compression, which generates new ones from scratch. We introduce PACE, a framework that interleaves these paradigms in a two-phase strategy. First, new learners are actively generated via a theoretically grounded procedure to enhance the diversity of the initial ensemble. When no more relevant learners can be found, a second phase of pruning is performed on this enriched ensemble. During both operations, PACE allows fine control on the faithfulness to the original ensemble. Experiments show that our method outperforms prior pruning and compression methods while offering principled control of faithfulness guarantees.

preprint2026arXiv

The XL Instances for the Capacitated Vehicle Routing Problem

This paper introduces a new set of large-scale benchmark instances for the Capacitated Vehicle Routing Problem (CVRP). The proposed XL set extends existing benchmarks by covering instances with 1,000 to 10,000 customers and a wide range of structural characteristics, following established generation principles from prior CVRP studies. A computational study involving several state-of-the-art algorithms is conducted to provide initial best known solutions (BKSs) for the XL instances, which serve as a baseline for a community-driven BKS challenge launched on the CVRPLib website. The instances are made publicly available to support experimental evaluation and comparison of solution methods. Furthermore, additional computational analyses are reported to compare algorithmic performance on other existing CVRP benchmark instances.

preprint2026arXiv

Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

Recent research has shown that structured machine learning models such as tree ensembles are vulnerable to privacy attacks targeting their training data. To mitigate these risks, differential privacy (DP) has become a widely adopted countermeasure, as it offers rigorous privacy protection. In this paper, we introduce a reconstruction attack targeting state-of-the-art $ε$-DP random forests. By leveraging a constraint programming model that incorporates knowledge of the forest's structure and DP mechanism characteristics, our approach formally reconstructs the most likely dataset that could have produced a given forest. Through extensive computational experiments, we examine the interplay between model utility, privacy guarantees and reconstruction accuracy across various configurations. Our results reveal that random forests trained with meaningful DP guarantees can still leak portions of their training data. Specifically, while DP reduces the success of reconstruction attacks, the only forests fully robust to our attack exhibit predictive performance no better than a constant classifier. Building on these insights, we also provide practical recommendations for the construction of DP random forests that are more resilient to reconstruction attacks while maintaining a non-trivial predictive performance.

preprint2022arXiv

Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor Problem

We study the feature-based newsvendor problem, in which a decision-maker has access to historical data consisting of demand observations and exogenous features. In this setting, we investigate feature selection, aiming to derive sparse, explainable models with improved out-of-sample performance. Up to now, state-of-the-art methods utilize regularization, which penalizes the number of selected features or the norm of the solution vector. As an alternative, we introduce a novel bilevel programming formulation. The upper-level problem selects a subset of features that minimizes an estimate of the out-of-sample cost of ordering decisions based on a held-out validation set. The lower-level problem learns the optimal coefficients of the decision function on a training set, using only the features selected by the upper-level. We present a mixed integer linear program reformulation for the bilevel program, which can be solved to optimality with standard optimization solvers. Our computational experiments show that the method accurately recovers ground-truth features already for instances with a sample size of a few hundred observations. In contrast, regularization-based techniques often fail at feature recovery or require thousands of observations to obtain similar accuracy. Regarding out-of-sample generalization, we achieve improved or comparable cost performance.

preprint2022arXiv

Exponential-Size Neighborhoods for the Pickup-and-Delivery Traveling Salesman Problem

Neighborhood search is a cornerstone of state-of-the-art traveling salesman and vehicle routing metaheuristics. While neighborhood exploration procedures are well developed for problems with individual services, their counterparts for one-to-one pickup-and-delivery problems have been more scarcely studied. A direct extension of classic neighborhoods is often inefficient or complex due to the necessity of jointly considering service pairs. To circumvent these issues, we introduce major improvements to existing neighborhood searches for the pickup-and-delivery traveling salesman problem and new large neighborhoods. We show that the classical Relocate-Pair neighborhood can be fully explored in $O(n^2)$ instead of $O(n^3)$ time. We adapt the 4-Opt and Balas-Simonetti neighborhoods to consider precedence constraints. Moreover, we introduce an exponential-size neighborhood called 2k-Opt, which includes all solutions generated by multiple nested 2-Opts and can be searched in $O(n^2)$ time using dynamic programming. We conduct extensive computational experiments, highlighting the significant contribution of these new neighborhoods and speed-up strategies within two classical metaheuristics. Notably, our approach permits to repeatedly solve small pickup-and-delivery problem instances to optimality or near-optimality within milliseconds, and therefore it represents a valuable tool for time-critical applications such as meal delivery or mobility on demand.

preprint2022arXiv

Optimal Decision Diagrams for Classification

Decision diagrams for classification have some notable advantages over decision trees, as their internal connections can be determined at training time and their width is not bound to grow exponentially with their depth. Accordingly, decision diagrams are usually less prone to data fragmentation in internal nodes. However, the inherent complexity of training these classifiers acted as a long-standing barrier to their widespread adoption. In this context, we study the training of optimal decision diagrams (ODDs) from a mathematical programming perspective. We introduce a novel mixed-integer linear programming model for training and demonstrate its applicability for many datasets of practical importance. Further, we show how this model can be easily extended for fairness, parsimony, and stability notions. We present numerical analyses showing that our model allows training ODDs in short computational times, and that ODDs achieve better accuracy than optimal decision trees, while allowing for improved stability without significant accuracy losses.

preprint2022arXiv

Support Vector Machines with the Hard-Margin Loss: Optimal Training via Combinatorial Benders' Cuts

The classical hinge-loss support vector machines (SVMs) model is sensitive to outlier observations due to the unboundedness of its loss function. To circumvent this issue, recent studies have focused on non-convex loss functions, such as the hard-margin loss, which associates a constant penalty to any misclassified or within-margin sample. Applying this loss function yields much-needed robustness for critical applications but it also leads to an NP-hard model that makes training difficult, since current exact optimization algorithms show limited scalability, whereas heuristics are not able to find high-quality solutions consistently. Against this background, we propose new integer programming strategies that significantly improve our ability to train the hard-margin SVM model to global optimality. We introduce an iterative sampling and decomposition approach, in which smaller subproblems are used to separate combinatorial Benders' cuts. Those cuts, used within a branch-and-cut algorithm, permit to converge much more quickly towards a global optimum. Through extensive numerical analyses on classical benchmark data sets, our solution algorithm solves, for the first time, 117 new data sets to optimality and achieves a reduction of 50% in the average optimality gap for the hardest datasets of the benchmark.

preprint2022arXiv

Vehicle Routing with Stochastic Demands and Partial Reoptimization

We consider the vehicle routing problem with stochastic demands (VRPSD), a problem in which customer demands are known in distribution at the route planning stage and revealed during route execution upon arrival at each customer. A long-standing open question on the VRPSD concerns the benefits of allowing, during route execution, partial reordering of the planned customer visits. Given the practical importance of this question and the growing interest on the VRPSD under optimal restocking, we study the VRPSD under a recourse policy known as the switch policy. The switch policy is a canonical reoptimization policy that permits only pairs of successive customers to be reordered. We consider this policy jointly with optimal preventive restocking and introduce a branch-cut-and-price algorithm to compute optimal a priori routing plans. This algorithm features pricing routines where value functions represent the expected cost-to-go along planned routes for all possible states and reordering decisions. To ensure pricing tractability, we adopt a strategy that combines elementary pricing with completion bounds of varying complexity, and solve the pricing problem without relying on dominance rules. Our numerical experiments demonstrate the effectiveness of the algorithm for solving instances with up to 50 customers. Notably, they also give us new insights into the value of reoptimization. The switch policy enables significant cost savings over optimal restocking when the planned routes come from an algorithm built on a deterministic approximation of the data, an important scenario given the difficulty of finding optimal VRPSD solutions. The benefits are smaller when comparing optimal a priori VRPSD solutions obtained for both recourse policies. As it appears, further cost savings may require joint reordering and reassignment of customer visits among vehicles when the context permits.

preprint2022arXiv

Workload Equity in Multi-Period Vehicle Routing Problems

An equitable distribution of workload is essential when deploying vehicle routing solutions in practice. For this reason, previous studies have formulated vehicle routing problems with workload-balance objectives or constraints, leading to trade-off solutions between routing costs and workload equity. These methods consider a single planning period; however, equity is often sought over several days in practice. In this work, we show that workload equity over multiple periods can be achieved without impact on transportation costs when the planning horizon is sufficiently large. To achieve this, we design a two-phase method to solve multi-period vehicle routing problems with workload balance. Firstly, our approach produces solutions with minimal distance for each period. Next, the resulting routes are allocated to drivers to obtain equitable workloads over the planning horizon. We conduct extensive numerical experiments to measure the performance of the proposed approach and the level of workload equity achieved for different planning-horizon lengths. For horizons of five days or more, we observe that near-optimal workload equity and optimal routing costs are jointly achievable.

preprint2020arXiv

A concise guide to existing and emerging vehicle routing problem variants

Vehicle routing problems have been the focus of extensive research over the past sixty years, driven by their economic importance and their theoretical interest. The diversity of applications has motivated the study of a myriad of problem variants with different attributes. In this article, we provide a concise overview of existing and emerging problem variants. Models are typically refined along three lines: considering more relevant objectives and performance metrics, integrating vehicle routing evaluations with other tactical decisions, and capturing fine-grained yet essential aspects of modern supply chains. We organize the main problem attributes within this structured framework. We discuss recent research directions and pinpoint current shortcomings, recent successes, and emerging challenges.

preprint2020arXiv

Arc Routing with Time-Dependent Travel Times and Paths

Vehicle routing algorithms usually reformulate the road network into a complete graph in which each arc represents the shortest path between two locations. Studies on time-dependent routing followed this model and therefore defined the speed functions on the complete graph. We argue that this model is often inadequate, in particular for arc routing problems involving services on edges of a road network. To fill this gap, we formally define the time-dependent capacitated arc routing problem (TDCARP), with travel and service speed functions given directly at the network level. Under these assumptions, the quickest path between locations can change over time, leading to a complex problem that challenges the capabilities of current solution methods. We introduce effective algorithms for preprocessing quickest paths in a closed form, efficient data structures for travel time queries during routing optimization, as well as heuristic and exact solution approaches for the TDCARP. Our heuristic uses the hybrid genetic search principle with tailored solution-decoding algorithms and lower bounds for filtering moves. Our branch-and-price algorithm exploits dedicated pricing routines, heuristic dominance rules and completion bounds to find optimal solutions for problem counting up to 75 services. Based on these algorithms, we measure the benefits of time-dependent routing optimization for different levels of travel-speed data accuracy.

preprint2020arXiv

Assortative-Constrained Stochastic Block Models

Stochastic block models (SBMs) are often used to find assortative community structures in networks, such that the probability of connections within communities is higher than in between communities. However, classic SBMs are not limited to assortative structures. In this study, we discuss the implications of this model-inherent indifference towards assortativity or disassortativity, and show that this characteristic can lead to undesirable outcomes for networks which are presupposedy assortative but which contain a reduced amount of information. To circumvent this issue, we introduce a constrained SBM that imposes strong assortativity constraints, along with efficient algorithmic approaches to solve it. These constraints significantly boost community recovery capabilities in regimes that are close to the information-theoretic threshold. They also permit to identify structurally-different communities in networks representing cerebral-cortex activity regions.

preprint2020arXiv

Born-Again Tree Ensembles

The use of machine learning algorithms in finance, medicine, and criminal justice can deeply impact human lives. As a consequence, research into interpretable machine learning has rapidly grown in an attempt to better control and fix possible sources of mistakes and biases. Tree ensembles offer a good prediction quality in various domains, but the concurrent use of multiple trees reduces the interpretability of the ensemble. Against this background, we study born-again tree ensembles, i.e., the process of constructing a single decision tree of minimum size that reproduces the exact same behavior as a given tree ensemble in its entire feature space. To find such a tree, we develop a dynamic-programming based algorithm that exploits sophisticated pruning and bounding rules to reduce the number of recursive calls. This algorithm generates optimal born-again trees for many datasets of practical interest, leading to classifiers which are typically simpler and more interpretable without any other form of compromise.