Source author record

Patrick Jaillet

Patrick Jaillet appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Data Structures and Algorithms Computer Science and Game Theory math.OC Distributed, Parallel, and Cluster Computing Discrete Mathematics econ.EM Multiagent Systems Robotics math.PR math.ST Methodology physics.data-an physics.soc-ph Social and Information Networks Statistics Theory

Catalog footprint

What is connected

38works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Incentivizing Truthfulness and Collaborative Fairness in Bayesian Learning

Collaborative machine learning involves training high-quality models using datasets from a number of sources. To incentivize sources to share data, existing data valuation methods fairly reward each source based on its data submitted as is. However, as these methods do not verify nor incentivize data truthfulness, the sources can manipulate their data (e.g., by submitting duplicated or noisy data) to artificially increase their valuations and rewards or prevent others from benefiting. This paper presents the first mechanism that provably ensures (F) collaborative fairness and incentivizes (T) truthfulness at equilibrium for Bayesian models. Our mechanism combines semivalues (e.g., Shapley value), which ensure fairness, and a truthful data valuation function (DVF) based on a validation set that is unknown to the sources. As semivalues are influenced by others' data, we introduce an additional condition to prove that a source can maximize its expected data values in coalitions and semivalues by submitting a dataset that captures its true knowledge. Additionally, we discuss the implications and suitable relaxations of (F) and (T) when the mediator has a limited budget for rewards or lacks a validation set. Our theoretical findings are validated on synthetic and real-world datasets.

preprint2026arXiv

Online Scheduling for LLM Inference with KV Cache Constraints

Large Language Model (LLM) inference, where a trained model generates text one word at a time in response to user prompts, is a computationally intensive process requiring efficient scheduling to optimize latency and resource utilization. A key challenge in LLM inference is the management of the Key-Value (KV) cache, which reduces redundant computations but introduces memory constraints. In this work, we model LLM inference with KV cache constraints theoretically and propose a novel batching and scheduling algorithm that minimizes inference latency while effectively managing the KV cache's memory. More specifically, we make the following contributions. First, to evaluate the performance of online algorithms for scheduling in LLM inference, we introduce a hindsight optimal benchmark, formulated as an integer program that computes the minimum total inference latency under full future information. Second, we prove that no deterministic online algorithm can achieve a constant competitive ratio when the arrival process is arbitrary. Third, motivated by the computational intractability of solving the integer program at scale, we propose a polynomial-time online scheduling algorithm and show that under certain conditions it can achieve a constant competitive ratio. We also demonstrate our algorithm's strong empirical performance by comparing it to the hindsight optimal in a synthetic dataset. Finally, we conduct empirical evaluations on a real-world public LLM inference dataset, simulating the Llama2-70B model on A100 GPUs, and show that our algorithm significantly outperforms the benchmark algorithms. Overall, our results offer a path toward more sustainable and cost-effective LLM deployment.

preprint2025arXiv

Distribution-Dependent Rates for Multi-Distribution Learning

To address the needs of modeling uncertainty in sensitive machine learning applications, the setup of distributionally robust optimization (DRO) seeks good performance uniformly across a variety of tasks. The recent multi-distribution learning (MDL) framework tackles this objective in a dynamic interaction with the environment, where the learner has sampling access to each target distribution. Drawing inspiration from the field of pure-exploration multi-armed bandits, we provide distribution-dependent guarantees in the MDL regime, that scale with suboptimality gaps and result in superior dependence on the sample size when compared to the existing distribution-independent analyses. We investigate two non-adaptive strategies, uniform and non-uniform exploration, and present non-asymptotic regret bounds using novel tools from empirical process theory. Furthermore, we devise an adaptive optimistic algorithm, LCB-DR, that showcases enhanced dependence on the gaps, mirroring the contrast between uniform and optimistic allocation in the multi-armed bandit literature. We also conduct a small synthetic experiment illustrating the comparative strengths of each strategy.

preprint2022arXiv

Contextual Bandits and Optimistically Universal Learning

We consider the contextual bandit problem on general action and context spaces, where the learner's rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available, e.g., patients' records or customers' history, which allows for personalized treatment. We focus on consistency -- vanishing regret compared to the optimal policy -- and show that for large classes of non-i.i.d. contexts, consistency can be achieved regardless of the time-invariant reward mechanism, a property known as universal consistency. Precisely, we first give necessary and sufficient conditions on the context-generating process for universal consistency to be possible. Second, we show that there always exists an algorithm that guarantees universal consistency whenever this is achievable, called an optimistically universal learning rule. Interestingly, for finite action spaces, learnable processes for universal learning are exactly the same as in the full-feedback setting of supervised learning, previously studied in the literature. In other words, learning can be performed with partial feedback without any generalization cost. The algorithms balance a trade-off between generalization (similar to structural risk minimization) and personalization (tailoring actions to specific contexts). Lastly, we consider the case of added continuity assumptions on rewards and show that these lead to universal consistency for significantly larger classes of data-generating processes.

preprint2022arXiv

No-regret Learning in Price Competitions under Consumer Reference Effects

We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices vary over time according to a memory-based dynamic, which is a weighted average of all historical prices. We focus on the setting where firms are not aware of demand functions and how reference prices are formed but have access to an oracle that provides a measure of consumers' responsiveness to the current posted prices. We show that if the firms run no-regret algorithms, in particular, online mirror descent(OMD), with decreasing step sizes, the market stabilizes in the sense that firms' prices and reference prices converge to a stable Nash Equilibrium (SNE). Interestingly, we also show that there exist constant step sizesunder which the market stabilizes. We further characterize the rate of convergence to the SNE for both decreasing and constant OMD step sizes.

preprint2022arXiv

On Provably Robust Meta-Bayesian Optimization

Bayesian optimization (BO) has become popular for sequential optimization of black-box functions. When BO is used to optimize a target function, we often have access to previous evaluations of potentially related functions. This begs the question as to whether we can leverage these previous experiences to accelerate the current BO task through meta-learning (meta-BO), while ensuring robustness against potentially harmful dissimilar tasks that could sabotage the convergence of BO. This paper introduces two scalable and provably robust meta-BO algorithms: robust meta-Gaussian process-upper confidence bound (RM-GP-UCB) and RM-GP-Thompson sampling (RM-GP-TS). We prove that both algorithms are asymptotically no-regret even when some or all previous tasks are dissimilar to the current task, and show that RM-GP-UCB enjoys a better theoretical robustness than RM-GP-TS. We also exploit the theoretical guarantees to optimize the weights assigned to individual previous tasks through regret minimization via online learning, which diminishes the impact of dissimilar tasks and hence further enhances the robustness. Empirical evaluations show that (a) RM-GP-UCB performs effectively and consistently across various applications, and (b) RM-GP-TS, despite being less robust than RM-GP-UCB both in theory and in practice, performs competitively in some scenarios with less dissimilar tasks and is more computationally efficient.

preprint2022arXiv

Optimal Information Provision for Strategic Hybrid Workers

We study the problem of information provision by a strategic central planner who can publicly signal about an uncertain infectious risk parameter. Signalling leads to an updated public belief over the parameter, and agents then make equilibrium choices on whether to work remotely or in-person. The planner maintains a set of desirable outcomes for each realization of the uncertain parameter and seeks to maximize the probability that agents choose an acceptable outcome for the true parameter. We distinguish between stateless and stateful objectives. In the former, the set of desirable outcomes does not change as a function of the risk parameter, whereas in the latter it does. For stateless objectives, we reduce the problem to maximizing the probability of inducing mean beliefs that lie in intervals computable from the set of desirable outcomes. We derive the optimal signalling mechanism and show that it partitions the parameter domain into at most two intervals with the signals generated according to an interval-specific distribution. For the stateful case, we consider a practically relevant situation in which the planner can enforce in-person work capacity limits that progressively get more stringent as the risk parameter increases. We show that the optimal signalling mechanism for this case can be obtained by solving a linear program. We numerically verify the improvement in achieving desirable outcomes using our information design relative to no information and full information benchmarks.

preprint2022arXiv

Rectified Max-Value Entropy Search for Bayesian Optimization

Although the existing max-value entropy search (MES) is based on the widely celebrated notion of mutual information, its empirical performance can suffer due to two misconceptions whose implications on the exploration-exploitation trade-off are investigated in this paper. These issues are essential in the development of future acquisition functions and the improvement of the existing ones as they encourage an accurate measure of the mutual information such as the rectified MES (RMES) acquisition function we develop in this work. Unlike the evaluation of MES, we derive a closed-form probability density for the observation conditioned on the max-value and employ stochastic gradient ascent with reparameterization to efficiently optimize RMES. As a result of a more principled acquisition function, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems.

preprint2022arXiv

Weighted Maximum Entropy Inverse Reinforcement Learning

We study inverse reinforcement learning (IRL) and imitation learning (IM), the problems of recovering a reward or policy function from expert's demonstrated trajectories. We propose a new way to improve the learning process by adding a weight function to the maximum entropy framework, with the motivation of having the ability to learn and recover the stochasticity (or the bounded rationality) of the expert policy. Our framework and algorithms allow to learn both a reward (or policy) function and the structure of the entropy terms added to the Markov Decision Processes, thus enhancing the learning procedure. Our numerical experiments using human and simulated demonstrations and with discrete and continuous IRL/IM tasks show that our approach outperforms prior algorithms.

preprint2021arXiv

Efficient Carpooling and Toll Pricing for Autonomous Transportation

In this paper, we address the existence and computation of competitive equilibrium in the transportation market for autonomous carpooling first proposed by [Ostrovsky and Schwarz, 2019]. At equilibrium, the market organizes carpooled trips over a transportation network in a socially optimal manner and sets the corresponding payments for individual riders and toll prices on edges. The market outcome ensures individual rationality, stability of carpooled trips, budget balance, and market clearing properties under heterogeneous rider preferences. We show that the question of market's existence can be resolved by proving the existence of an integer optimal solution of a linear programming problem. We characterize conditions on the network topology and riders' disutility for carpooling under which a market equilibrium can be computed in polynomial time. This characterization relies on ideas from the theory of combinatorial auctions and minimum cost network flow problem. Finally, we characterize a market equilibrium that achieves strategyproofness and maximizes welfare of individual riders.

preprint2021arXiv

Robust Entropy-regularized Markov Decision Processes

Stochastic and soft optimal policies resulting from entropy-regularized Markov decision processes (ER-MDP) are desirable for exploration and imitation learning applications. Motivated by the fact that such policies are sensitive with respect to the state transition probabilities, and the estimation of these probabilities may be inaccurate, we study a robust version of the ER-MDP model, where the stochastic optimal policies are required to be robust with respect to the ambiguity in the underlying transition probabilities. Our work is at the crossroads of two important schemes in reinforcement learning (RL), namely, robust MDP and entropy regularized MDP. We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings, making the robust ER-MDP problem tractable. We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy iteration, which would lead to new robust RL and inverse RL algorithms to handle uncertainties. Analyses on computational complexity and error propagation under conventional uncertainty settings are also provided.

preprint2020arXiv

A Relation Analysis of Markov Decision Process Frameworks

We study the relation between different Markov Decision Process (MDP) frameworks in the machine learning and econometrics literatures, including the standard MDP, the entropy and general regularized MDP, and stochastic MDP, where the latter is based on the assumption that the reward function is stochastic and follows a given distribution. We show that the entropy-regularized MDP is equivalent to a stochastic MDP model, and is strictly subsumed by the general regularized MDP. Moreover, we propose a distributional stochastic MDP framework by assuming that the distribution of the reward function is ambiguous. We further show that the distributional stochastic MDP is equivalent to the regularized MDP, in the sense that they always yield the same optimal policies. We also provide a connection between stochastic/regularized MDP and constrained MDP. Our work gives a unified view on several important MDP frameworks, which would lead new ways to interpret the (entropy/general) regularized MDP frameworks through the lens of stochastic rewards and vice-versa. Given the recent popularity of regularized MDP in (deep) reinforcement learning, our work brings new understandings of how such algorithmic schemes work and suggest ideas to develop new ones.

preprint2020arXiv

Competitive Ratios for Online Multi-capacity Ridesharing

In multi-capacity ridesharing, multiple requests (e.g., customers, food items, parcels) with different origin and destination pairs travel in one resource. In recent years, online multi-capacity ridesharing services (i.e., where assignments are made online) like Uber-pool, foodpanda, and on-demand shuttles have become hugely popular in transportation, food delivery, logistics and other domains. This is because multi-capacity ridesharing services benefit all parties involved { the customers (due to lower costs), the drivers (due to higher revenues) and the matching platforms (due to higher revenues per vehicle/resource). Most importantly these services can also help reduce carbon emissions (due to fewer vehicles on roads). Online multi-capacity ridesharing is extremely challenging as the underlying matching graph is no longer bipartite (as in the unit-capacity case) but a tripartite graph with resources (e.g., taxis, cars), requests and request groups (combinations of requests that can travel together). The desired matching between resources and request groups is constrained by the edges between requests and request groups in this tripartite graph (i.e., a request can be part of at most one request group in the final assignment). While there have been myopic heuristic approaches employed for solving the online multi-capacity ridesharing problem, they do not provide any guarantees on the solution quality. To that end, this paper presents the first approach with bounds on the competitive ratio for online multi-capacity ridesharing (when resources rejoin the system at their initial location/depot after serving a group of requests).

preprint2020arXiv

Generalized Maximum Causal Entropy for Inverse Reinforcement Learning

We consider the problem of learning from demonstrated trajectories with inverse reinforcement learning (IRL). Motivated by a limitation of the classical maximum entropy model in capturing the structure of the network of states, we propose an IRL model based on a generalized version of the causal entropy maximization problem, which allows us to generate a class of maximum entropy IRL models. Our generalized model has an advantage of being able to recover, in addition to a reward function, another expert's function that would (partially) capture the impact of the connecting structure of the states on experts' decisions. Empirical evaluation on a real-world dataset and a grid-world dataset shows that our generalized model outperforms the classical ones, in terms of recovering reward functions and demonstrated trajectories.

preprint2020arXiv

Learning Structure in Nested Logit Models

This paper introduces a new data-driven methodology for nested logit structure discovery. Nested logit models allow the modeling of positive correlations between the error terms of the utility specifications of the different alternatives in a discrete choice scenario through the specification of a nesting structure. Current nested logit model estimation practices require an a priori specification of a nesting structure by the modeler. In this we work we optimize over all possible specifications of the nested logit model that are consistent with rational utility maximization. We formulate the problem of learning an optimal nesting structure from the data as a mixed integer nonlinear programming (MINLP) optimization problem and solve it using a variant of the linear outer approximation algorithm. We exploit the tree structure of the problem and utilize the latest advances in integer optimization to bring practical tractability to the optimization problem we introduce. We demonstrate the ability of our algorithm to correctly recover the true nesting structure from synthetic data in a Monte Carlo experiment. In an empirical illustration using a stated preference survey on modes of transportation in the U.S. state of Massachusetts, we use our algorithm to obtain an optimal nesting tree representing the correlations between the unobserved effects of the different travel mode choices. We provide our implementation as a customizable and open-source code base written in the Julia programming language.

preprint2020arXiv

Probability Distributions on Partially Ordered Sets and Network Interdiction Games

This article poses the following problem: Does there exist a probability distribution over subsets of a finite partially ordered set (poset), such that a set of constraints involving marginal probabilities of the poset's elements and maximal chains is satisfied? We present a combinatorial algorithm to positively resolve this question. The algorithm can be implemented in polynomial time in the special case where maximal chain probabilities are affine functions of their elements. This existence problem is relevant for the equilibrium characterization of a generic strategic interdiction game on a capacitated flow network. The game involves a routing entity that sends its flow through the network while facing path transportation costs, and an interdictor who simultaneously interdicts one or more edges while facing edge interdiction costs. Using our existence result on posets and strict complementary slackness in linear programming, we show that the Nash equilibria of this game can be fully described using primal and dual solutions of a minimum-cost circulation problem. Our analysis provides a new characterization of the critical components in the interdiction game. It also leads to a polynomial-time approach for equilibrium computation.

preprint2020arXiv

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games

This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.

preprint2020arXiv

Zone pAth Construction (ZAC) based Approaches for Effective Real-Time Ridesharing

Real-time ridesharing systems such as UberPool, Lyft Line, GrabShare have become hugely popular as they reduce the costs for customers, improve per trip revenue for drivers and reduce traffic on the roads by grouping customers with similar itineraries. The key challenge in these systems is to group the "right" requests to travel together in the "right" available vehicles in real-time, so that the objective (e.g., requests served, revenue or delay) is optimized. This challenge has been addressed in existing work by: (i) generating as many relevant feasible (with respect to the available delay for customers) combinations of requests as possible in real-time; and then (ii) optimizing assignment of the feasible request combinations to vehicles. Since the number of request combinations increases exponentially with the increase in vehicle capacity and number of requests, unfortunately, such approaches have to employ ad hoc heuristics to identify a subset of request combinations for assignment. Our key contribution is in developing approaches that employ zone (abstraction of individual locations) paths instead of request combinations. Zone paths allow for generation of significantly more "relevant" combinations (in comparison to ad hoc heuristics) in real-time than competing approaches due to two reasons: (i) Each zone path can typically represent multiple request combinations; (ii) Zone paths are generated using a combination of offline and online methods. Specifically, we contribute both myopic (ridesharing assignment focussed on current requests only) and non-myopic (ridesharing assignment considers impact on expected future requests) approaches that employ zone paths. In our experimental results, we demonstrate that our myopic approach outperforms (with respect to both objective and runtime) the current best myopic approach for ridesharing on both real-world and synthetic datasets.

preprint2016arXiv

No-Regret Learnability for Piecewise Linear Losses

In the convex optimization approach to online regret minimization, many methods have been developed to guarantee a $O(\sqrt{T})$ bound on regret for subdifferentiable convex loss functions with bounded subgradients, by using a reduction to linear loss functions. This suggests that linear loss functions tend to be the hardest ones to learn against, regardless of the underlying decision spaces. We investigate this question in a systematic fashion looking at the interplay between the set of possible moves for both the decision maker and the adversarial environment. This allows us to highlight sharp distinctive behaviors about the learnability of piecewise linear loss functions. On the one hand, when the decision set of the decision maker is a polyhedron, we establish $Ω(\sqrt{T})$ lower bounds on regret for a large class of piecewise linear loss functions with important applications in online linear optimization, repeated zero-sum Stackelberg games, online prediction with side information, and online two-stage optimization. On the other hand, we exhibit $o(\sqrt{T})$ learning rates, achieved by the Follow-The-Leader algorithm, in online linear optimization when the boundary of the decision maker's decision set is curved and when $0$ does not lie in the convex hull of the environment's decision set. Hence, the curvature of the decision maker's decision set is a determining factor for the optimal learning rate. These results hold in a completely adversarial setting.

preprint2016arXiv

Robust Adaptive Routing Under Uncertainty

We consider the problem of finding an optimal history-dependent routing strategy on a directed graph weighted by stochastic arc costs when the objective is to minimize the risk of spending more than a prescribed budget. To help mitigate the impact of the lack of information on the arc cost probability distributions, we introduce a robust counterpart where the distributions are only known through confidence intervals on some statistics such as the mean, the mean absolute deviation, and any quantile. Leveraging recent results in distributionally robust optimization, we develop a general-purpose algorithm to compute an approximate optimal strategy. To illustrate the benefits of the robust approach, we run numerical experiments with field data from the Singapore road network.

preprint2016arXiv

Solving Combinatorial Games using Products, Projections and Lexicographically Optimal Bases

In order to find Nash-equilibria for two-player zero-sum games where each player plays combinatorial objects like spanning trees, matchings etc, we consider two online learning algorithms: the online mirror descent (OMD) algorithm and the multiplicative weights update (MWU) algorithm. The OMD algorithm requires the computation of a certain Bregman projection, that has closed form solutions for simple convex sets like the Euclidean ball or the simplex. However, for general polyhedra one often needs to exploit the general machinery of convex optimization. We give a novel primal-style algorithm for computing Bregman projections on the base polytopes of polymatroids. Next, in the case of the MWU algorithm, although it scales logarithmically in the number of pure strategies or experts $N$ in terms of regret, the algorithm takes time polynomial in $N$; this especially becomes a problem when learning combinatorial objects. We give a general recipe to simulate the multiplicative weights update algorithm in time polynomial in their natural dimension. This is useful whenever there exists a polynomial time generalized counting oracle (even if approximate) over these objects. Finally, using the combinatorial structure of symmetric Nash-equilibria (SNE) when both players play bases of matroids, we show that these can be found with a single projection or convex minimization (without using online learning).

preprint2015arXiv

Container Relocation Problem: Approximation, Asymptotic, and Incomplete Information

The Container Relocation Problem (CRP) is concerned with finding a sequence of moves of containers that minimizes the number of relocations needed to retrieve all containers respecting a given order of retrieval. While the problem is known to be NP-hard, certain algorithms such as the A* search and heuristics perform reasonably well on many instances of the problem. In this paper, we first focus on the A* search algorithm, and analyze lower and upper bounds that are easy to compute and can be used to prune nodes. Our analysis sheds light on which bounds result in fast computation within a given approximation gap. We present extensive simulation results that improve upon our theoretical analysis, and further show that our method finds the optimum solution on most instances of medium-size bays. On "hard" instances, our method finds an approximate solution with a small gap and within a time frame that is fast for practical applications. We also study the average-case asymptotic behavior of the CRP where the number of columns grows. We calculate the expected number of relocations in the limit, and show that the optimum number of relocations converges to a simple and intuitive lower-bound. We further study the CRP with incomplete information by relaxing the assumption that the order of retrieval of all containers are initially known. This assumption is particularly unrealistic in ports without an appointment system. We assume that the retrieval order of a subset of containers is known initially and the retrieval order of the remaining containers is observed later at a given specific time. Before this time, we assume a probabilistic distribution on the retrieval order of unknown containers. We combine the A* algorithm with sampling technique to solve this two-stage stochastic optimization problem. We show that our algorithm is fast and the error due to sampling and pruning is reasonably small.

preprint2015arXiv

Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond

This paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the exploration-exploitation trade-off. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive epsilon-optimal GPP (epsilon-GPP) policy. To plan in real time, we further propose an asymptotically optimal, branch-and-bound anytime variant of epsilon-GPP with performance guarantee. We empirically demonstrate the effectiveness of our epsilon-GPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.

preprint2015arXiv

Managing Relocation and Delay in Container Terminals with Flexible Service Policies

We introduce a new model and mathematical formulation for planning crane moves in the storage yard of container terminals. Our objective is to develop a tool that captures customer centric elements, especially service time, and helps operators to manage costly relocation moves. Our model incorporates several practical details and provides port operators with expanded capabilities including planning repositioning moves in off-peak hours, controlling wait times of each customer as well as total service time, optimizing the number of relocations and wait time jointly, and optimizing simultaneously the container stacking and retrieval process. We also study a class of flexible service policies which allow for out-of-order retrieval. We show that under such flexible policies, we can decrease the number of relocations and retrieval delays without creating inequities.

preprint2014arXiv

A Decomposition Algorithm for Nested Resource Allocation Problems

We propose an exact polynomial algorithm for a resource allocation problem with convex costs and constraints on partial sums of resource consumptions, in the presence of either continuous or integer variables. No assumption of strict convexity or differentiability is needed. The method solves a hierarchy of resource allocation subproblems, whose solutions are used to convert constraints on sums of resources into bounds for separate variables at higher levels. The resulting time complexity for the integer problem is $O(n \log m \log (B/n))$, and the complexity of obtaining an $ε$-approximate solution for the continuous case is $O(n \log m \log (B/ε))$, $n$ being the number of variables, $m$ the number of ascending constraints (such that $m < n$), $ε$ a desired precision, and $B$ the total resource. This algorithm attains the best-known complexity when $m = n$, and improves it when $\log m = o(\log n)$. Extensive experimental analyses are conducted with four recent algorithms on various continuous problems issued from theory and practice. The proposed method achieves a higher performance than previous algorithms, addressing all problems with up to one million variables in less than one minute on a modern computer.

preprint2014arXiv

Advances on Matroid Secretary Problems: Free Order Model and Laminar Case

The most well-known conjecture in the context of matroid secretary problems claims the existence of a constant-factor approximation applicable to any matroid. Whereas this conjecture remains open, modified forms of it were shown to be true, when assuming that the assignment of weights to the secretaries is not adversarial but uniformly random (Soto [SODA 2011], Oveis Gharan and Vondrák [ESA 2011]). However, so far, there was no variant of the matroid secretary problem with adversarial weight assignment for which a constant-factor approximation was found. We address this point by presenting a 9-approximation for the \emph{free order model}, a model suggested shortly after the introduction of the matroid secretary problem, and for which no constant-factor approximation was known so far. The free order model is a relaxed version of the original matroid secretary problem, with the only difference that one can choose the order in which secretaries are interviewed. Furthermore, we consider the classical matroid secretary problem for the special case of laminar matroids. Only recently, a constant-factor approximation has been found for this case, using a clever but rather involved method and analysis (Im and Wang, [SODA 2011]) that leads to a 16000/3-approximation. This is arguably the most involved special case of the matroid secretary problem for which a constant-factor approximation is known. We present a considerably simpler and stronger $3\sqrt{3}e\approx 14.12$-approximation, based on reducing the problem to a matroid secretary problem on a partition matroid.

preprint2014arXiv

Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena

The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Google-like MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on real-world urban road network data shows that our D2FAS algorithm is significantly more time-efficient and scalable than state-oftheart centralized algorithms while achieving comparable predictive performance.

preprint2014arXiv

Distributed Multi-Depot Routing without Communications

We consider and formulate a class of distributed multi-depot routing problems, where servers are to visit a set of requests, with the aim of minimizing the total distance travelled by all servers. These problems fall into two categories: distributed offline routing problems where all the requests that need to be visited are known from the start; distributed online routing problems where the requests come to be known incrementally. A critical and novel feature of our formulations is that communications are not allowed among the servers, hence posing an interesting and challenging question: what performance can be achieved in comparison to the best possible solution obtained from an omniscience planner with perfect communication capabilities? The worst-case (over all possible request-set instances) performance metrics are given by the approximation ratio (offline case) and the competitive ratio (online case). Our first result indicates that the online and offline problems are effectively equivalent: for the same request-set instance, the approximation ratio and the competitive ratio differ by at most an additive factor of 2, irrespective of the release dates in the online case. Therefore, we can restrict our attention to the offline problem. For the offline problem, we show that the approximation ratio given by the Voronoi partition is m (the number of servers). For two classes of depot configurations, when the depots form a line and when the ratios between the distances of pairs of depots are upper bounded by a sublinear function f(m) (i.e., f(m) = o(m)), we give partition schemes with sublinear approximation ratios O(log m) and Θ(f(m)) respectively. We also discuss several interesting open problems in our formulations: in particular, how our initial results (on the two deliberately chosen classes of depots) shape our conjecture on the open problems.

preprint2014arXiv

Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation

The expressive power of a Gaussian process (GP) model comes at a cost of poor scalability in the data size. To improve its scalability, this paper presents a low-rank-cum-Markov approximation (LMA) of the GP model that is novel in leveraging the dual computational advantages stemming from complementing a low-rank approximate representation of the full-rank GP based on a support set of inputs with a Markov approximation of the resulting residual process; the latter approximation is guaranteed to be closest in the Kullback-Leibler distance criterion subject to some constraint and is considerably more refined than that of existing sparse GP models utilizing low-rank representations due to its more relaxed conditional independence assumption (especially with larger data). As a result, our LMA method can trade off between the size of the support set and the order of the Markov property to (a) incur lower computational cost than such sparse GP models while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale. Interestingly, varying the Markov order produces a spectrum of LMAs with PIC approximation and full-rank GP at the two extremes. An advantage of our LMA method is that it is amenable to parallelization on multiple machines/cores, thereby gaining greater scalability. Empirical evaluation on three real-world datasets in clusters of up to 32 computing nodes shows that our centralized and parallel LMA methods are significantly more time-efficient and scalable than state-of-the-art sparse and full-rank GP regression methods while achieving comparable predictive performances.

preprint2014arXiv

Parallel Gaussian Process Regression with Low-Rank Covariance Matrix Approximations

Gaussian processes (GP) are Bayesian non-parametric models that are widely used for probabilistic regression. Unfortunately, it cannot scale well with large data nor perform real-time predictions due to its cubic time cost in the data size. This paper presents two parallel GP regression methods that exploit low-rank covariance matrix approximations for distributing the computational load among parallel machines to achieve time efficiency and scalability. We theoretically guarantee the predictive performances of our proposed parallel GPs to be equivalent to that of some centralized approximate GP regression methods: The computation of their centralized counterparts can be distributed among parallel machines, hence achieving greater time efficiency and scalability. We analytically compare the properties of our parallel GPs such as time, space, and communication complexity. Empirical evaluation on two real-world datasets in a cluster of 20 computing nodes shows that our parallel GPs are significantly more time-efficient and scalable than their centralized counterparts and exact/full GP while achieving predictive performances comparable to full GP.

preprint2014arXiv

Randomized Minmax Regret for Combinatorial Optimization Under Uncertainty

The minmax regret problem for combinatorial optimization under uncertainty can be viewed as a zero-sum game played between an optimizing player and an adversary, where the optimizing player selects a solution and the adversary selects costs with the intention of maximizing the regret of the player. The existing minmax regret model considers only deterministic solutions/strategies, and minmax regret versions of most polynomial solvable problems are NP-hard. In this paper, we consider a randomized model where the optimizing player selects a probability distribution (corresponding to a mixed strategy) over solutions and the adversary selects costs with knowledge of the player's distribution, but not its realization. We show that under this randomized model, the minmax regret version of any polynomial solvable combinatorial problem becomes polynomial solvable. This holds true for both the interval and discrete scenario representations of uncertainty. Using the randomized model, we show new proofs of existing approximation algorithms for the deterministic model based on primal-dual approaches. Finally, we prove that minmax regret problems are NP-hard under general convex uncertainty.

preprint2013arXiv

Average-Case Performance of Rollout Algorithms for Knapsack Problems

Rollout algorithms have demonstrated excellent performance on a variety of dynamic and discrete optimization problems. Interpreted as an approximate dynamic programming algorithm, a rollout algorithm estimates the value-to-go at each decision stage by simulating future events while following a greedy policy, referred to as the base policy. While in many cases rollout algorithms are guaranteed to perform as well as their base policies, there have been few theoretical results showing additional improvement in performance. In this paper we perform a probabilistic analysis of the subset sum problem and knapsack problem, giving theoretical evidence that rollout algorithms perform strictly better than their base policies. Using a stochastic model from the existing literature, we analyze two rollout methods that we refer to as the consecutive rollout and exhaustive rollout, both of which employ a simple greedy base policy. For the subset sum problem, we prove that after only a single iteration of the rollout algorithm, both methods yield at least a 30% reduction in the expected gap between the solution value and capacity, relative to the base policy. Analogous results are shown for the knapsack problem.

preprint2013arXiv

Digital breadcrumbs: Detecting urban mobility patterns and transport mode choices from cellphone networks

Many modern and growing cities are facing declines in public transport usage, with few efficient methods to explain why. In this article, we show that urban mobility patterns and transport mode choices can be derived from cellphone call detail records coupled with public transport data recorded from smart cards. Specifically, we present new data mining approaches to determine the spatial and temporal variability of public and private transportation usage and transport mode preferences across Singapore. Our results, which were validated by Singapore's quadriennial Household Interview Travel Survey (HITS), revealed that there are 3.5 (HITS: 3.5 million) million and 4.3 (HITS: 4.4 million) million inter-district passengers by public and private transport, respectively. Along with classifying which transportation connections are weak or underserved, the analysis shows that the mode share of public transport use increases from 38 percent in the morning to 44 percent around mid-day and 52 percent in the evening.

preprint2013arXiv

Greedy Online Bipartite Matching on Random Graphs

We study the average performance of online greedy matching algorithms on $G(n,n,p)$, the random bipartite graph with $n$ vertices on each side and edges occurring independently with probability $p=p(n)$. In the online model, vertices on one side of the graph are given up front while vertices on the other side arrive sequentially; when a vertex arrives its edges are revealed and it must be immediately matched or dropped. We begin by analyzing the \textsc{oblivious} algorithm, which tries to match each arriving vertex to a random neighbor, even if the neighbor has already been matched. The algorithm is shown to have a performance ratio of at least $1-1/e$ for all monotonic functions $p(n)$, where the performance ratio is defined asymptotically as the ratio of the expected matching size given by the algorithm to the expected maximum matching size. Next we show that the conventional \textsc{greedy} algorithm, which assigns each vertex to a random unmatched neighbor, has a performance ratio of at least 0.837 for all monotonic functions $p(n)$. Under the $G(n,n,p)$ model, the performance of \textsc{greedy} is equivalent to the performance of the well known \textsc{ranking} algorithm, so our results show that \textsc{ranking} has a performance ratio of at least 0.837. We finally consider vertex-weighted bipartite matching. Our proofs are based on simple differential equations that describe the evolution of the matching process.

preprint2013arXiv

Log-Quadratic Bounds for the Gaussian Q-function

We present bounds of quadratic form for the logarithm of the Gaussian Q-function. We also show an analytical method for deriving log-quadratic approximations of the Q-function and give an approximation with absolute error less than $10^{-3}$.

preprint2013arXiv

Parallel Gaussian Process Regression with Low-Rank Covariance Matrix Approximations

preprint2012arXiv

Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena

The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Google-like MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on real-world urban road network data shows that our D2FAS algorithm is significantly more time-efficient and scalable than state-of-the-art centralized algorithms while achieving comparable predictive performance.

preprint2012arXiv

Near-Optimal Online Algorithms for Dynamic Resource Allocation Problems

In this paper, we study a general online linear programming problem whose formulation encompasses many practical dynamic resource allocation problems, including internet advertising display applications, revenue management, various routing, packing, and auction problems. We propose a model, which under mild assumptions, allows us to design near-optimal learning-based online algorithms that do not require the a priori knowledge about the total number of online requests to come, a first of its kind. We then consider two variants of the problem that relax the initial assumptions imposed on the proposed model.

Patrick Jaillet

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

Incentivizing Truthfulness and Collaborative Fairness in Bayesian Learning

Online Scheduling for LLM Inference with KV Cache Constraints

Distribution-Dependent Rates for Multi-Distribution Learning

Contextual Bandits and Optimistically Universal Learning

No-regret Learning in Price Competitions under Consumer Reference Effects

On Provably Robust Meta-Bayesian Optimization

Optimal Information Provision for Strategic Hybrid Workers

Rectified Max-Value Entropy Search for Bayesian Optimization

Weighted Maximum Entropy Inverse Reinforcement Learning

Efficient Carpooling and Toll Pricing for Autonomous Transportation

Robust Entropy-regularized Markov Decision Processes

A Relation Analysis of Markov Decision Process Frameworks

Competitive Ratios for Online Multi-capacity Ridesharing

Generalized Maximum Causal Entropy for Inverse Reinforcement Learning

Learning Structure in Nested Logit Models

Probability Distributions on Partially Ordered Sets and Network Interdiction Games

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games

Zone pAth Construction (ZAC) based Approaches for Effective Real-Time Ridesharing

No-Regret Learnability for Piecewise Linear Losses

Robust Adaptive Routing Under Uncertainty

Solving Combinatorial Games using Products, Projections and Lexicographically Optimal Bases

Container Relocation Problem: Approximation, Asymptotic, and Incomplete Information

Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond

Managing Relocation and Delay in Container Terminals with Flexible Service Policies

A Decomposition Algorithm for Nested Resource Allocation Problems

Advances on Matroid Secretary Problems: Free Order Model and Laminar Case

Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena

Distributed Multi-Depot Routing without Communications

Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation

Parallel Gaussian Process Regression with Low-Rank Covariance Matrix Approximations

Randomized Minmax Regret for Combinatorial Optimization Under Uncertainty

Average-Case Performance of Rollout Algorithms for Knapsack Problems

Digital breadcrumbs: Detecting urban mobility patterns and transport mode choices from cellphone networks

Greedy Online Bipartite Matching on Random Graphs

Log-Quadratic Bounds for the Gaussian Q-function

Parallel Gaussian Process Regression with Low-Rank Covariance Matrix Approximations

Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena

Near-Optimal Online Algorithms for Dynamic Resource Allocation Problems