Researcher profile

Ningyuan Chen

Ningyuan Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Learning Consumer Preferences from Bundle Sales Data

Product bundling is a common selling mechanism used in online retailing. To set profitable bundle prices, the seller needs to learn consumer preferences from the transaction data. When customers purchase bundles or multiple products, classical methods such as discrete choice models cannot be used to estimate customers' valuations. In this paper, we propose an approach to learn the distribution of consumers' valuations toward the products using bundle sales data. The approach reduces it to an estimation problem where the samples are censored by polyhedral regions. Using the EM algorithm and Monte Carlo simulation, our approach can recover the distribution of consumers' valuations. The framework allows for unobserved no-purchases and clustered market segments. We provide theoretical results on the identifiability of the probability model and the convergence of the EM algorithm. The performance of the approach is also demonstrated numerically.

preprint2022arXiv

Model-Free Assortment Pricing with Transaction Data

We study the problem when a firm sets prices for products based on the transaction data, i.e., which product past customers chose from an assortment and what were the historical prices that they observed. Our approach does not impose a model on the distribution of the customers' valuations and only assumes, instead, that purchase choices satisfy incentive-compatible constraints. The individual valuation of each past customer can then be encoded as a polyhedral set, and our approach maximizes the worst-case revenue assuming that new customers' valuations are drawn from the empirical distribution implied by the collection of such polyhedra. We show that the optimal prices in this setting can be approximated at any arbitrary precision by solving a compact mixed-integer linear program. Moreover, we study the single-product case and relate it to the traditional model-based approach. We also design three approximation strategies that are of low computational complexity and interpretable. Comprehensive numerical studies based on synthetic and real data suggest that our pricing approach is uniquely beneficial when the historical data has a limited size or is susceptible to model misspecification.

preprint2022arXiv

Sublinear Regret for Learning POMDPs

We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over an infinite horizon. We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models, the belief error control in POMDPs and upper-confidence-bound methods for online learning. We establish a regret bound of $O(T^{2/3}\sqrt{\log T})$ for the proposed learning algorithm where $T$ is the learning horizon. This is, to the best of our knowledge, the first algorithm achieving sublinear regret with respect to our oracle for learning general POMDPs.

preprint2021arXiv

Regime Switching Bandits

We study a multi-armed bandit problem where the rewards exhibit regime switching. Specifically, the distributions of the random rewards generated from all arms are modulated by a common underlying state modeled as a finite-state Markov chain. The agent does not observe the underlying state and has to learn the transition matrix and the reward distributions. We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models, belief error control in partially observable Markov decision processes and upper-confidence-bound methods for online learning. We also establish an upper bound $O(T^{2/3}\sqrt{\log T})$ for the proposed learning algorithm where $T$ is the learning horizon. Finally, we conduct proof-of-concept experiments to illustrate the performance of the learning algorithm.

preprint2020arXiv

Nonparametric Pricing Analytics with Customer Covariates

Personalized pricing analytics is becoming an essential tool in retailing. Upon observing the personalized information of each arriving customer, the firm needs to set a price accordingly based on the covariates such as income, education background, past purchasing history to extract more revenue. For new entrants of the business, the lack of historical data may severely limit the power and profitability of personalized pricing. We propose a nonparametric pricing policy to simultaneously learn the preference of customers based on the covariates and maximize the expected revenue over a finite horizon. The policy does not depend on any prior assumptions on how the personalized information affects consumers' preferences (such as linear models). It is adaptively splits the covariate space into smaller bins (hyper-rectangles) and clusters customers based on their covariates and preferences, offering similar prices for customers who belong to the same cluster trading off granularity and accuracy. We show that the algorithm achieves a regret of order $O(\log(T)^2 T^{(2+d)/(4+d)})$, where $T$ is the length of the horizon and $d$ is the dimension of the covariate. It improves the current regret in the literature \citep{slivkins2014contextual}, under mild technical conditions in the pricing context (smoothness and local concavity). We also prove that no policy can achieve a regret less than $O(T^{(2+d)/(4+d)})$ for a particular instance and thus demonstrate the near optimality of the proposed policy.