Source author record

Alberto Marchesi

Alberto Marchesi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Machine Learning Artificial Intelligence Multiagent Systems

Catalog footprint

What is connected

14works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Multi-Armed Bandits With Best-Action Queries

We study \emph{multi-armed bandits} (MABs) augmented with \emph{best-action queries}, in which the learner may additionally query an oracle that reveals the best arm in the current round. This setting was recently characterized by Russo et al. [2024] in the \emph{full-feedback} model, where the learner observes the rewards of all arms after each round. They show that, in both \emph{stochastic} and \emph{adversarial} environments, $k$ best-action queries reduce the optimal $\widetilde{\mathcal{O}}(\sqrt{T})$ regret to $\widetilde{\mathcal{O}}(\min\{T/k,\sqrt{T}\})$. Whether this improvement extends to the more realistic \emph{bandit-feedback} model -- where the learner observes only the reward of the played arm -- was left as an open problem. We fully resolve this question. When rewards are stochastic but correlated among arms, we show that the full-feedback result does not carry over: any algorithm must incur regret at least $Ω(\sqrt{T-k})$. This lower bound directly extends to adversarial environments. On the positive side, we show that $\widetilde{\mathcal{O}}(\min\{T/k,\sqrt{T-k}\})$ regret is still achievable when rewards are stochastic and i.i.d., and establish a matching lower bound, up to logarithmic factors. Together, these results provide a complete characterization of the benefits of \emph{best-action queries} in the \emph{bandit-feedback} model.

preprint2026arXiv

Regret Minimization in Bilateral Trade With Perturbed Markets

We address the problem of maximizing Gain from Trade (GFT) in repeated buyer-seller exchanges subject to global budget balance constraints. While this problem is well-understood in purely adversarial and stochastic settings, these environments exhibit a sharp dichotomy: adversarial environments allow for no-regret learning against the best fixed-price mechanism, whereas stochastic environments allow for no-regret learning against the best distribution over prices that is budget balanced in expectation. This gap is significant, as policies balanced in expectation can increase the GFT by a multiplicative factor of two. In this work, we bridge these extremes by studying perturbed markets, where an underlying stochastic distribution is subject to an adversarial corruption $C$. We design an algorithm that adaptively scales with the level of corruption, achieving an $\tilde{\mathcal{O}}(T^{3/4}) + \mathcal{O}(C\log(T))$ regret bound against the best budget-balanced distribution over prices. Simultaneously, our algorithm maintains the worst-case $\tilde{\mathcal{O}}(T^{3/4})$ regret bound relative to a per-round budget-balanced baseline, ensuring optimality even in fully adversarial environments.

preprint2026arXiv

Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time

We design the first regret guarantees for robust dynamic pricing that decouple the dependence on the corruption $C$ and the time horizon $T$. In dynamic pricing, a seller with unlimited supply of a good interacts with a stream of buyers over $ T $ rounds, with the goal of maximizing revenue. At each round $t$, the seller posts a price $p_t$, and the buyer purchases the good only if their unknown valuation $v^\star$ exceeds this price. The seller observes only the binary feedback $\mathbb{I} \left\{ p_t \leq v^\star \right\}$, indicating whether a sale occurred. In the \emph{robust} pricing setting, a malicious adversary is allowed to corrupt this feedback in at most $C$ rounds. Even if the learner knows the corruption $C$, the best known regret bound is $\mathcal{O}(C\log\log T)$ by Gupta et al. [2025]. This leaves as an open problem to ``decouple'' the dependence on $C$ and $T$. In this work, we resolve this open problem. In particular, we develop a robust variant of binary search that achieves regret $\mathcal{O}(C+\log T)$ when the corruption $C$ is known and $\mathcal{O}(C+\log^2 T)$ when the corruption is unknown.

preprint2022arXiv

Bayesian Persuasion Meets Mechanism Design: Going Beyond Intractability with Type Reporting

Bayesian persuasion studies how an informed sender should partially disclose information so as to influence the behavior of self-interested receivers. In the last years, a growing attention has been devoted to relaxing the assumption that the sender perfectly knows receiver's payoffs. The first crucial step towards such an achievement is to study settings where each receiver's payoffs depend on their unknown type, which is randomly determined by a known finite-supported probability distribution. This begets considerable computational challenges, as computing a sender-optimal signaling scheme is inapproximable up to within any constant factor. In this work, we circumvent this issue by leveraging ideas from mechanism design. In particular, we introduce a type reporting step in which the receiver is asked to report their type to the sender, after the latter has committed to a menu defining a signaling scheme for each possible receiver's type. We prove that, with a single receiver, the addition of this type reporting stage makes the sender's computational problem tractable. Then, we extend our framework to settings with multiple receivers, focusing on the case of no inter-agent externalities and binary actions. We show that it is possible to find a sender-optimal solution in polynomial-time by means of the ellipsoid method, given access to a suitable polynomial-time separation oracle. This can be implemented for supermodular and anonymous sender's utility functions. As for the case of submodular sender's utility functions, we first approximately cast the sender's problem into a linearly-constrained mathematical program whose objective function is the multi-linear extension of the sender's utility. Then, we show how to find in polynomial-time an approximate solution to the program by means of a continuous greedy algorithm. This provides a (1 -1/e)-approximation to the problem.

preprint2022arXiv

Designing Menus of Contracts Efficiently: The Power of Randomization

We study hidden-action principal-agent problems in which a principal commits to an outcome-dependent payment scheme (called contract) so as to incentivize the agent to take a costly, unobservable action leading to favorable outcomes. In particular, we focus on Bayesian settings where the agent has private information. This is collectively encoded by the agent's type, which is unknown to the principal, but randomly drawn according to a finitely-supported, commonly-known probability distribution. In Bayesian principal-agent problems, the principal may be better off by committing to a menu of contracts specifying a contract for each agent's type, rather than committing to a single contract. This induces a two-stage process that resembles interactions studied in classical mechanism design: after the principal has committed to a menu, the agent first reports a type to the principal, and, then, the latter puts in place the contract in the menu that corresponds to the reported type. Thus, the principal's computational problem boils down to designing a menu of contracts that incentivizes the agent to report their true type and maximizes expected utility. Previous works showed that computing an optimal menu of contracts is APX-hard. Crucially, previous works focus on menus of deterministic contracts. Surprisingly, we show that, if one considers menus of randomized contracts defined as probability distributions over payment vectors, then an "almost-optimal" menu can be computed in polynomial time. Indeed, the problem of computing a principal-optimal menu of randomized contracts may not admit a maximum, but only a supremum. Nevertheless, we show how to design a polynomial-time algorithm that guarantees the principal with an expected utility arbitrarily close to the supremum. Besides this main result, we also close several gaps in the analysis of menus of deterministic contracts.

preprint2022arXiv

Efficiency of Ad Auctions with Price Displaying

Most of the economic reports forecast that almost half of the worldwide market value unlocked by AI over the next decade (up to 6 trillion USD per year) will be in marketing&sales. In particular, AI will enable the optimization of more and more intricate economic settings, in which multiple different activities need to be jointly automated. This is the case of, e.g., Google Hotel Ads and Tripadvisor, where auctions are used to display ads of similar products or services together with their prices. As in classical ad auctions, the ads are ranked depending on the advertisers' bids, whereas, differently from classical settings, ads are displayed together with their prices, so as to provide a direct comparison among them. This dramatically affects users' behavior, as well as the properties of ad auctions. We show that, in such settings, social welfare maximization can be achieved by means of a direct-revelation mechanism that jointly optimizes, in polynomial time, the ads allocation and the advertisers' prices to be displayed with them. However, in practice it is unlikely that advertisers allow the mechanism to choose prices on their behalf. Indeed, in commonly-adopted mechanisms, ads allocation and price optimization are decoupled, so that the advertisers optimize prices and bids, while the mechanism does so for the allocation, once prices and bids are given. We investigate how this decoupling affects the efficiency of mechanisms. In particular, we study the Price of Anarchy (PoA) and the Price of Stability (PoS) of indirect-revelation mechanisms with both VCG and GSP payments, showing that the PoS for the revenue may be unbounded even with two slots, and the PoA for the social welfare may be as large as the number of slots. Nevertheless, we show that, under some assumptions, simple modifications to the indirect-revelation mechanism with VCG payments achieve a PoS of 1 for the revenue.

preprint2022arXiv

Last-iterate Convergence to Trembling-hand Perfect Equilibria

Designing efficient algorithms to find Nash equilibrium (NE) refinements in sequential games is of paramount importance in practice. Indeed, it is well known that the NE has several weaknesses, since it may prescribe to play sub-optimal actions in those parts of the game that are never reached at the equilibrium. NE refinements, such as the extensive-form perfect equilibrium (EFPE), amend such weaknesses by accounting for the possibility of players' mistakes. This is crucial in real-world applications, where bounded rationality players are usually involved, and it turns out being useful also in boosting the performances of superhuman agents for recreational games like Poker. Nevertheless, only few works addressed the problem of computing NE refinements. Most of them propose algorithms finding exact NE refinements by means of linear programming, and, thus, these do not have the potential of scaling up to real-world-size games. On the other hand, existing iterative algorithms that exploit the tree structure of sequential games only provide convergence guarantees to approximate refinements. In this paper, we provide the first efficient last-iterate algorithm that provably converges to an EFPE in two-player zero-sum sequential games with imperfect information. Our algorithm works by tracking a sequence of equilibria of suitably-defined, regularized-perturbed games. In order to do that, it uses a procedure that is tailored to converge last-iterate to the equilibria of such games. Crucially, the updates performed by such a procedure can be performed efficiently by visiting the game tree, thus making our algorithm potentially more scalable than its linear-programming-based competitors. Finally, we evaluate our algorithm on a standard testbed of games, showing that it produces strategies which are much more robust to players' mistakes than those of state-of-the-art NE-computation algorithms.

preprint2022arXiv

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.

preprint2022arXiv

Public Signaling in Bayesian Ad Auctions

We study signaling in Bayesian ad auctions, in which bidders' valuations depend on a random, unknown state of nature. The auction mechanism has complete knowledge of the actual state of nature, and it can send signals to bidders so as to disclose information about the state and increase revenue. For instance, a state may collectively encode some features of the user that are known to the mechanism only, since the latter has access to data sources unaccessible to the bidders. We study the problem of computing how the mechanism should send signals to bidders in order to maximize revenue. While this problem has already been addressed in the easier setting of second-price auctions, to the best of our knowledge, our work is the first to explore ad auctions with more than one slot. In this paper, we focus on public signaling and VCG mechanisms, under which bidders truthfully report their valuations. We start with a negative result, showing that, in general, the problem does not admit a PTAS unless P = NP, even when bidders' valuations are known to the mechanism. The rest of the paper is devoted to settings in which such negative result can be circumvented. First, we prove that, with known valuations, the problem can indeed be solved in polynomial time when either the number of states d or the number of slots m is fixed. Moreover, in the same setting, we provide an FPTAS for the case in which bidders are single minded, but d and m can be arbitrary. Then, we switch to the random valuations setting, in which these are randomly drawn according to some probability distribution. In this case, we show that the problem admits an FPTAS, a PTAS, and a QPTAS, when, respectively, d is fixed, m is fixed, and bidders' valuations are bounded away from zero.

preprint2022arXiv

Sequential Information Design: Learning to Persuade in the Dark

We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. We consider settings where the receiver faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of sender's persuasive information structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result: no learning algorithm can be persuasive. Thus, we relax persuasiveness requirements by focusing on algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting -- where the sender observes all random events realizations -- , we provide an algorithm with $\tilde{O}(\sqrt{T})$ regret for both the sender and the receiver. Instead, in the bandit-feedback setting -- where the sender only observes the realizations of random events actually occurring in the SDM problem -- , we design an algorithm that, given an $α\in [1/2, 1]$ as input, ensures $\tilde{O}({T^α})$ and $\tilde{O}( T^{\max \{ α, 1-\fracα{2} \} })$ regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regrets trade-off is essentially tight.

preprint2022arXiv

Signaling in Posted Price Auctions

We study single-item single-unit Bayesian posted price auctions, where buyers arrive sequentially and their valuations for the item being sold depend on a random, unknown state of nature. The seller has complete knowledge of the actual state and can send signals to the buyers so as to disclose information about it. For instance, the state of nature may reflect the condition and/or some particular features of the item, which are known to the seller only. The problem faced by the seller is about how to partially disclose information about the state so as to maximize revenue. Unlike classical signaling problems, in this setting, the seller must also correlate the signals being sent to the buyers with some price proposals for them. This introduces additional challenges compared to standard settings. We consider two cases: the one where the seller can only send signals publicly visible to all buyers, and the case in which the seller can privately send a different signal to each buyer. As a first step, we prove that, in both settings, the problem of maximizing the seller's revenue does not admit an FPTAS unless P=NP, even for basic instances with a single buyer. As a result, in the rest of the paper, we focus on designing PTASs. In order to do so, we first introduce a unifying framework encompassing both public and private signaling, whose core result is a decomposition lemma that allows focusing on a finite set of possible buyers' posteriors. This forms the basis on which our PTASs are developed. In particular, in the public signaling setting, our PTAS employs some ad hoc techniques based on linear programming, while our PTAS for the private setting relies on the ellipsoid method to solve an exponentially-sized LP in polynomial time. In the latter case, we need a custom approximate separation oracle, which we implement with a dynamic programming approach.

preprint2022arXiv

The Power of Media Agencies in Ad Auctions: Improving Utility through Coordinated Bidding

The increasing competition in digital advertising induced a proliferation of media agencies playing the role of intermediaries between advertisers and platforms selling ad slots. When a group of competing advertisers is managed by a common agency, many forms of collusion, such as bid rigging, can be implemented by coordinating bidding strategies, dramatically increasing advertisers' value. We study the problem of finding bids and monetary transfers maximizing the utility of a group of colluders, under GSP and VCG mechanisms. First, we introduce an abstract bid optimization problem -- called weighted utility problem (WUP) -- , which is useful in proving our results. We show that the utilities of bidding strategies are related to the length of paths in a directed acyclic weighted graph, whose structure and weights depend on the mechanism under study. This allows us to solve WUP in polynomial time by finding a shortest path of the graph. Next, we switch to our original problem, focusing on two settings that differ for the incentives they allow for. Incentive constraints ensure that colluders do not leave the agency, and they can be enforced by implementing monetary transfers between the agency and the advertisers. In particular, we study the arbitrary transfers setting, where any kind of monetary transfer to and from the advertisers is allowed, and the more realistic limited liability setting, in which no advertiser can be paid by the agency. In the former, we cast the problem as a WUP instance and solve it by our graph-based algorithm, while, in the latter, we formulate it as a linear program with exponentially-many variables efficiently solvable by applying the ellipsoid algorithm to its dual. This requires to solve a suitable separation problem in polynomial time, which can be done by reducing it to a WUP instance.

preprint2020arXiv

Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces

We tackle the problem of learning equilibria in simulation-based games. In such games, the players' utility functions cannot be described analytically, as they are given through a black-box simulator that can be queried to obtain noisy estimates of the utilities. This is the case in many real-world games in which a complete description of the elements involved is not available upfront, such as complex military settings and online auctions. In these situations, one usually needs to run costly simulation processes to get an accurate estimate of the game outcome. As a result, solving these games begets the challenge of designing learning algorithms that can find (approximate) equilibria with high confidence, using as few simulator queries as possible. Moreover, since running the simulator during the game is unfeasible, the algorithms must first perform a pure exploration learning phase and, then, use the (approximate) equilibrium learned this way to play the game. In this work, we focus on two-player zero-sum games with infinite strategy spaces. Drawing from the best arm identification literature, we design two algorithms with theoretical guarantees to learn maximin strategies in these games. The first one works in the fixed-confidence setting, guaranteeing the desired confidence level while minimizing the number of queries. Instead, the second algorithm fits the fixed-budget setting, maximizing the confidence without exceeding the given maximum number of queries. First, we formally prove δ-PAC theoretical guarantees for our algorithms under some regularity assumptions, which are encoded by letting the utility functions be drawn from a Gaussian process. Then, we experimentally evaluate our techniques on a testbed made of randomly generated games and instances representing simple real-world security settings.

preprint2020arXiv

Signaling in Bayesian Network Congestion Games: the Subtle Power of Symmetry

Network congestion games are a well-understood model of multi-agent strategic interactions. Despite their ubiquitous applications, it is not clear whether it is possible to design information structures to ameliorate the overall experience of the network users. We focus on Bayesian games with atomic players, where network vagaries are modeled via a (random) state of nature which determines the costs incurred by the players. A third-party entity---the sender---can observe the realized state of the network and exploit this additional information to send a signal to each player. A natural question is the following: is it possible for an informed sender to reduce the overall social cost via the strategic provision of information to players who update their beliefs rationally? The paper focuses on the problem of computing optimal ex ante persuasive signaling schemes, showing that symmetry is a crucial property for its solution. Indeed, we show that an optimal ex ante persuasive signaling scheme can be computed in polynomial time when players are symmetric and have affine cost functions. Moreover, the problem becomes NP-hard when players are asymmetric, even in non-Bayesian settings.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint