Researcher profile

Shaddin Dughmi

Shaddin Dughmi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse

We study language generation in the limit under a global preference ordering on strings, as introduced by Kleinberg and Wei. As in [arXiv:2504.14370, arXiv:2511.05295], we aim for \emph{breadth}, but impose an additional requirement of timeliness: higher-ranked strings should be generated earlier. A string is then only credited if it is generated before a deadline, where its deadline is defined by a function that maps a string's rank in the target language to the time by which it must be produced. This is in keeping with a central consideration in machine learning, where inductive bias favors ``simpler'' or ``more plausible'' outputs, all else being equal. We show that timely generation is impossible in a strong sense for eventually consistent generators -- the protagonists of most prior related work. Under what is perhaps the mildest natural relaxation of consistency, a hallucination rate that vanishes over time, we show that we can circumvent our impossibility result. In particular, we can achieve optimal density with respect to any superlinear deadline function. We also show this is tight by ruling out timely generation with linear deadlines and vanishing hallucination rate.

preprint2026arXiv

Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification

Many inference-time language-model pipelines combine a cheap reward signal with an expensive verifier, such as exact answer checking in mathematical reasoning or hidden-test execution in code generation. We formalize this setting using a learning-theoretic lens as generative active search: a cost-sensitive first-positive search problem in which a policy adaptively samples candidates from an unknown distribution, observes cheap scores, and pays for verifier labels until it finds a positive example. For a fixed prompt, the generator and reward model induce two unknown objects: a distribution over reward scores and a score-conditioned success function. When these quantities are known, we characterize the distribution-aware optimal policy using a dynamic programming approach. In the realistic and practical setting where both the score distribution and success function are unknown, we propose ADAP, a shellwise adaptive generate-rank-verify algorithm that progressively increases the number of sampled responses and top-ranked verifications. Under the monotonicity assumption that higher reward scores are no less likely to pass verification, we show that ADAP achieves expected cost within a constant factor of the distribution-aware optimum. We complement this result with learning-theoretic lower bounds, based on a centered star number, showing that structural assumptions on the score--label relationship are necessary. Experiments on mathematical reasoning and competitive programming validate the predicted advantage over both fixed non-adaptive policies and difficulty-adaptive baselines.

preprint2022arXiv

Matroid Secretary Is Equivalent to Contention Resolution

We show that the matroid secretary problem is equivalent to correlated contention resolution in the online random-order model. Specifically, the matroid secretary conjecture is true if and only if every matroid admits an online random-order contention resolution scheme which, given an arbitrary (possibly correlated) prior distribution over subsets of the ground set, matches the balance ratio of the best offline scheme for that distribution up to a constant. We refer to such a scheme as universal. Our result indicates that the core challenge of the matroid secretary problem lies in resolving contention for positively correlated inputs, in particular when the positive correlation is benign in as much as offline contention resolution is concerned. Our result builds on our previous work which establishes one direction of this equivalence, namely that the secretary conjecture implies universal random-order contention resolution, as well as a weak converse, which derives a matroid secretary algorithm from a random-order contention resolution scheme with only partial knowledge of the distribution. It is this weak converse that we strengthen in this paper: We show that universal random-order contention resolution for matroids, in the usual setting of a fully known prior distribution, suffices to resolve the matroid secretary conjecture in the affirmative. Our proof is the composition of three reductions. First, we use duality arguments to reduce the matroid secretary problem to the matroid prophet secretary problem with arbitrarily correlated distributions. Second, we introduce a generalization of contention resolution we term labeled contention resolution, to which we reduce the correlated matroid prophet secretary problem. Finally, we combine duplication of elements with limiting arguments to reduce labeled contention resolution to classical contention resolution.

preprint2020arXiv

Persuasion with Limited Communication

We examine information structure design, also called "persuasion" or "signaling", in the presence of a constraint on the amount of communication. We focus on the fundamental setting of bilateral trade, which in its simplest form involves a seller with a single item to price, a buyer whose value for the item is drawn from a common prior distribution over $n$ different possible values, and a take-it-or-leave-it-offer protocol. A mediator with access to the buyer's type may partially reveal such information to the seller in order to further some objective such as the social welfare or the seller's revenue. In the setting of maximizing welfare under bilateral trade, we show that $O(\log(n) \log \frac{1}ε)$ signals suffice for a $1-ε$ approximation to the optimal welfare, and this bound is tight. As our main result, we exhibit an efficient algorithm for computing a $\frac{M-1}{M} \cdot (1-1/e)$-approximation to the welfare-maximizing scheme with at most M signals. For the revenue objective, we show that $Ω(n)$ signals are needed for a constant factor approximation to the revenue of a fully informed seller. From a computational perspective, however, the problem gets easier: we show that a simple dynamic program computes the signaling scheme with M signals maximizing the seller's revenue. Observing that the signaling problem in bilateral trade is a special case of the fundamental Bayesian Persuasion model of Kamenica and Gentzkow, we also examine the question of communication-constrained signaling more generally. In this model there is a sender (the mediator), a receiver (the seller) looking to take an action (setting the price), and a state of nature (the buyer's type) drawn from a common prior. We show that it is NP-hard to approximate the optimal sender's utility to within any constant factor in the presence of communication constraints.

preprint2020arXiv

The Outer Limits of Contention Resolution on Matroids and Connections to the Secretary Problem

Contention resolution schemes have proven to be a useful and unifying abstraction for a variety of constrained optimization problems, in both offline and online arrival models. Much of prior work restricts attention to product distributions for the input set of elements, and studies contention resolution for increasingly general packing constraints, both offline and online. In this paper, we instead focus on generalizing the input distribution, restricting attention to matroid constraints in both the offline and online random arrival models. In particular, we study contention resolution when the input set is arbitrarily distributed, and may exhibit positive and/or negative correlations between elements. We characterize the distributions for which offline contention resolution is possible, and establish some of their basic closure properties. Our characterization can be interpreted as a distributional generalization of the matroid covering theorem. For the online random arrival model, we show that contention resolution is intimately tied to the secretary problem via two results. First, we show that a competitive algorithm for the matroid secretary problem implies that online contention resolution is essentially as powerful as offline contention resolution for matroids, so long as the algorithm is given the input distribution. Second, we reduce the matroid secretary problem to the design of an online contention resolution scheme of a particular form.

preprint2010arXiv

Succinct Coverage Oracles

In this paper, we identify a fundamental algorithmic problem that we term succinct dynamic covering (SDC), arising in many modern-day web applications, including ad-serving and online recommendation systems in eBay and Netflix. Roughly speaking, SDC applies two restrictions to the well-studied Max-Coverage problem: Given an integer k, X={1,2,...,n} and I={S_1, ..., S_m}, S_i a subset of X, find a subset J of I, such that |J| <= k and the union of S in J is as large as possible. The two restrictions applied by SDC are: (1) Dynamic: At query-time, we are given a query Q, a subset of X, and our goal is to find J such that the intersection of Q with the union of S in J is as large as possible; (2) Space-constrained: We don&#39;t have enough space to store (and process) the entire input; specifically, we have o(mn), and maybe as little as O((m+n)polylog(mn)) space. The goal of SDC is to maintain a small data structure so as to answer most dynamic queries with high accuracy. We call such a scheme a Coverage Oracle. We present algorithms and complexity results for coverage oracles. We present deterministic and probabilistic near-tight upper and lower bounds on the approximation ratio of SDC as a function of the amount of space available to the oracle. Our lower bound results show that to obtain constant-factor approximations we need Omega(mn) space. Fortunately, our upper bounds present an explicit tradeoff between space and approximation ratio, allowing us to determine the amount of space needed to guarantee certain accuracy.