Researcher profile

Will Ma

Will Ma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Potential-Based Greedy Matching for Dynamic Delivery Pooling

We study the dynamic pooling of multiple orders into a single trip, a strategy widely adopted by online delivery platforms. When an order has to be dispatched, the platform must determine which (if any) of the available orders to pool it with, weighing the immediate efficiency gains against the uncertain, differential benefits of holding each order for future pooling opportunities. In this paper, we demonstrate the effectiveness of using the delivery distance as a proxy for opportunity cost via a potential-based greedy algorithm (PB). The algorithm is simple, pooling each departing job with the available job that maximizes the immediate savings in travel distance minus "half its delivery distance", which we call the potential of the available job. Theoretically, we show that PB achieves vanishing worst-case regret per job as market density increases, whereas a naive greedy policy suffers constant regret. We further show that the potential approximates the true opportunity cost of dispatching a job, in a stochastic setting with sufficient density. Finally, we conduct extensive numerical experiments on both synthetic data and real-world data from the Meituan platform. Despite being forecast-agnostic, PB consistently outperforms greedy heuristics that rely on historical data. Moreover, PB achieves performance comparable to computationally-intensive batching heuristics, which themselves also benefit from incorporating the potential to further improve their performance or drastically reduce computational costs.

preprint2026arXiv

Survey of Data-driven Newsvendor: Unified Analysis and Spectrum of Achievable Regrets

In the Newsvendor problem, the goal is to guess the number that will be drawn from some distribution, with asymmetric consequences for guessing too high vs. too low. In the data-driven version, the distribution is unknown, and one must work with samples from the distribution. Data-driven Newsvendor has been studied under many variants: additive vs. multiplicative regret, high probability vs. expectation bounds, and different distribution classes. This paper studies all combinations of these variants, filling in many gaps in the literature and simplifying many proofs. In particular, we provide a unified analysis based on the notion of clustered distributions, which in conjunction with our new lower bounds, shows that the entire spectrum of regrets between $1/\sqrt{n}$ and $1/n$ can be possible. Simulations on commonly-used distributions demonstrate that our notion is the "correct" predictor of empirical regret across varying data sizes.

preprint2025arXiv

Personalized Promotions in Practice: Dynamic Allocation and Reference Effects

Partnering with a large online retailer, we consider the problem of sending daily personalized promotions to a userbase of over 20 million customers. We propose an efficient policy for determining, every day, the promotion that each customer should receive (10%, 12%, 15%, 17%, or 20% off), while respecting global allocation constraints. This policy was successfully deployed to see a 4.5% revenue increase during an A/B test, by better targeting promotion-sensitive customers and also learning intertemporal patterns across customers. We also consider theoretically modeling the intertemporal state of the customer. The data suggests a simple new combinatorial model of pricing with reference effects, where the customer remembers the best promotion they saw over the past $\ell$ days as the "reference value", and is more likely to purchase if this value is poor. We tightly characterize the structure of optimal policies for maximizing long-run average revenue under this model -- they cycle between offering poor promotion values $\ell$ times and offering good values once.

preprint2022arXiv

Tight Guarantees for Static Threshold Policies in the Prophet Secretary Problem

In the prophet secretary problem, $n$ values are drawn independently from known distributions, and presented in a uniformly random order. A decision-maker must accept or reject each value when it is presented, and may accept at most $k$ values in total. The objective is to maximize the expected sum of accepted values. We analyze the performance of static threshold policies, which accept the first $k$ values exceeding a fixed threshold (or all such values, if fewer than $k$ exist). We show that an appropriate threshold guarantees $γ_k = 1 - e^{-k}k^k/k!$ times the value of the offline optimal solution. Note that $γ_1 = 1-1/e$, and by Stirling's approximation $γ_k \approx 1-1/\sqrt{2 πk}$. This represents the best-known guarantee for the prophet secretary problem for all $k>1$, and is tight for all $k$ for the class of static threshold policies. We provide two simple methods for setting the threshold. Our first method sets a threshold such that $k \cdot γ_k$ values are accepted in expectation, and offers an optimal guarantee for all $k$. Our second sets a threshold such that the expected number of values exceeding the threshold is equal to $k$. This approach gives an optimal guarantee if $k > 4$, but gives sub-optimal guarantees for $k \le 4$. Our proofs use a new result for optimizing sums of independent Bernoulli random variables, which extends a classical result of Hoeffding (1956) and is likely to be of independent interest. Finally, we note that our methods for setting thresholds can be implemented under limited information about agents' values.

preprint2022arXiv

When is Assortment Optimization Optimal?

Assortment optimization concerns the problem of selling items with fixed prices to a buyer who will purchase at most one. Typically, retailers select a subset of items, corresponding to an "assortment" of brands to carry, and make each selected item available for purchase at its brand-recommended price. Despite the tremendous importance in practice, the best method for selling these fixed-price items is not well understood, as retailers have begun experimenting with making certain items available only through a lottery. In this paper we analyze the maximum possible revenue that can be earned in this setting, given that the buyer's preference is private but drawn from a known distribution. In particular, we introduce a Bayesian mechanism design problem where the buyer has a random ranking over fixed-price items and an outside option, and the seller optimizes a (randomized) allocation of up to one item. We show that allocations corresponding to assortments are suboptimal in general, but under many commonly-studied Bayesian priors for buyer rankings such as the MNL and Markov Chain choice models, assortments are in fact optimal. Therefore, this large literature on assortment optimization has much greater significance than appreciated before -- it is not only computing optimal assortments; it is computing the economic limit of the seller's revenue for these fixed-price substitute items. We derive several further results -- a more general sufficient condition for assortments being optimal that captures choice models beyond Markov Chain, a proof that Nested Logit choice models cannot be captured by Markov Chain but can be partially captured by our condition, and suboptimality gaps for assortments when our condition does not hold. Finally, we show that our mechanism design problem provides the tightest-known LP relaxation for assortment optimization under the ranking distribution model.

preprint2021arXiv

Reaping the Benefits of Bundling under High Production Costs

It is well-known that selling different goods in a single bundle can significantly increase revenue. However, bundling is no longer profitable if the goods have high production costs. To overcome this challenge, we introduce a new mechanism, Pure Bundling with Disposal for Cost (PBDC), where after buying the bundle, the customer is allowed to return any subset of goods for their costs. We provide two types of guarantees on the profit of PBDC mechanisms relative to the optimum in the presence of production costs, under the assumption that customers have valuations which are additive over the items and drawn independently. We first provide a distribution-dependent guarantee which shows that PBDC earns at least 1-6c^{2/3} of the optimal profit, where c denotes the coefficient of variation of the welfare random variable. c approaches 0 if there are a large number of items whose individual valuations have bounded coefficients of variation, and our constants improve upon those from the classical result of Bakos and Brynjolfsson (1999) without costs. We then provide a distribution-free guarantee which shows that either PBDC or individual sales earns at least 1/5.2 times the optimal profit, generalizing and improving the constant of 1/6 from the celebrated result of Babaioff et al. (2014). Conversely, we also provide the best-known upper bound on the performance of any partitioning mechanism (which captures both individual sales and pure bundling), of 1/1.19 times the optimal profit, improving on the previously-known upper bound of 1/1.08. Finally, we conduct simulations under the same playing field as the extensive numerical study of Chu et al. (2011), which confirm that PBDC outperforms other simple pricing schemes overall.

preprint2020arXiv

Multi-stage and Multi-customer Assortment Optimization with Inventory Constraints

We consider an assortment optimization problem where a customer chooses a single item from a sequence of sets shown to her, while limited inventories constrain the items offered to customers over time. In the special case where all of the assortments have size one, our problem captures the online stochastic matching with timeouts problem. For this problem, we derive a polynomial-time approximation algorithm which earns at least 1-ln(2-1/e), or 0.51, of the optimum. This improves upon the previous-best approximation ratio of 0.46, and furthermore, we show that it is tight. For the general assortment problem, we establish the first constant-factor approximation ratio of 0.09 for the case that different types of customers value items differently, and an approximation ratio of 0.15 for the case that different customers value each item the same. Our algorithms are based on rounding an LP relaxation for multi-stage assortment optimization, and improve upon previous randomized rounding schemes to derive the tight ratio of 1-ln(2-1/e).

preprint2020arXiv

On Policies for Single-leg Revenue Management with Limited Demand Information

In this paper we study the single-item revenue management problem, with no information given about the demand trajectory over time. When the item is sold through accepting/rejecting different fare classes, Ball and Queyranne (2009) have established the tight competitive ratio for this problem using booking limit policies, which raise the acceptance threshold as the remaining inventory dwindles. However, when the item is sold through dynamic pricing instead, there is the additional challenge that offering a low price may entice high-paying customers to substitute down. We show that despite this challenge, the same competitive ratio can still be achieved using a randomized dynamic pricing policy. Our policy incorporates the price-skimming technique from Eren and Maglaras (2010), but importantly we show how the randomized price distribution should be stochastically-increased as the remaining inventory dwindles. A key technical ingredient in our policy is a new "valuation tracking" subroutine, which tracks the possible values for the optimum, and follows the most "inventory-conservative" control which maintains the desired competitive ratio. Finally, we demonstrate the empirical effectiveness of our policy in simulations, where its average-case performance surpasses all naive modifications of the existing policies.

preprint2020arXiv

Strong mixed-integer programming formulations for trained neural networks

We present strong mixed-integer programming (MIP) formulations for high-dimensional piecewise linear functions that correspond to trained neural networks. These formulations can be used for a number of important tasks, such as verifying that an image classification network is robust to adversarial inputs, or solving decision problems where the objective function is a machine learning model. We present a generic framework, which may be of independent interest, that provides a way to construct sharp or ideal formulations for the maximum of d affine functions over arbitrary polyhedral input domains. We apply this result to derive MIP formulations for a number of the most popular nonlinear operations (e.g. ReLU and max pooling) that are strictly stronger than other approaches from the literature. We corroborate this computationally, showing that our formulations are able to offer substantial improvements in solve time on verification tasks for image classification networks.