Source author record

Sihan Liu

Sihan Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Artificial Intelligence Computational Complexity Computer Science and Game Theory Discrete Mathematics math.OC math.ST Multiagent Systems Statistics Theory

Catalog footprint

What is connected

3works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Computational-Statistical Gaps in Reinforcement Learning

Reinforcement learning with function approximation has recently achieved tremendous results in applications with large state spaces. This empirical success has motivated a growing body of theoretical work proposing necessary and sufficient conditions under which efficient reinforcement learning is possible. From this line of work, a remarkably simple minimal sufficient condition has emerged for sample efficient reinforcement learning: MDPs with optimal value function $V^*$ and $Q^*$ linear in some known low-dimensional features. In this setting, recent works have designed sample efficient algorithms which require a number of samples polynomial in the feature dimension and independent of the size of state space. They however leave finding computationally efficient algorithms as future work and this is considered a major open problem in the community. In this work, we make progress on this open problem by presenting the first computational lower bound for RL with linear function approximation: unless NP=RP, no randomized polynomial time algorithm exists for deterministic transition MDPs with a constant number of actions and linear optimal value functions. To prove this, we show a reduction from Unique-Sat, where we convert a CNF formula into an MDP with deterministic transitions, constant number of actions and low dimensional linear optimal value functions. This result also exhibits the first computational-statistical gap in reinforcement learning with linear function approximation, as the underlying statistical problem is information-theoretically solvable with a polynomial number of queries, but no computationally efficient algorithm exists unless NP=RP. Finally, we also prove a quasi-polynomial time lower bound under the Randomized Exponential Time Hypothesis.

preprint2022arXiv

Near-Optimal Bounds for Testing Histogram Distributions

We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$-histograms over $[n]$, are probability distributions that are piecewise constant over a set of $k$ intervals. The histogram testing problem is the following: Given samples from an unknown distribution $\mathbf{p}$ on $[n]$, we want to distinguish between the cases that $\mathbf{p}$ is a $k$-histogram versus $\varepsilon$-far from any $k$-histogram, in total variation distance. Our main result is a sample near-optimal and computationally efficient algorithm for this testing problem, and a nearly-matching (within logarithmic factors) sample complexity lower bound. Specifically, we show that the histogram testing problem has sample complexity $\widetilde Θ(\sqrt{nk} / \varepsilon + k / \varepsilon^2 + \sqrt{n} / \varepsilon^2)$.

preprint2022arXiv

Sampling Equilibria: Fast No-Regret Learning in Structured Games

Learning and equilibrium computation in games are fundamental problems across computer science and economics, with applications ranging from politics to machine learning. Much of the work in this area revolves around a simple algorithm termed \emph{randomized weighted majority} (RWM), also known as "Hedge" or "Multiplicative Weights Update," which is well known to achieve statistically optimal rates in adversarial settings (Littlestone and Warmuth '94, Freund and Schapire '99). Unfortunately, RWM comes with an inherent computational barrier: it requires maintaining and sampling from a distribution over all possible actions. In typical settings of interest the action space is exponentially large, seemingly rendering RWM useless in practice. In this work, we refute this notion for a broad variety of \emph{structured} games, showing it is possible to efficiently (approximately) sample the action space in RWM in \emph{polylogarithmic} time. This gives the first efficient no-regret algorithms for problems such as the \emph{(discrete) Colonel Blotto game}, \emph{matroid congestion}, \emph{matroid security}, and basic \emph{dueling games}. As an immediate corollary, we give a polylogarithmic time meta-algorithm to compute approximate Nash Equilibria for these games that is exponentially faster than prior methods in several important settings. Further, our algorithm is the first to efficiently compute equilibria for more involved variants of these games with general sums, more than two players, and, for Colonel Blotto, multiple resource types.

Sihan Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Computational-Statistical Gaps in Reinforcement Learning

Near-Optimal Bounds for Testing Histogram Distributions

Sampling Equilibria: Fast No-Regret Learning in Structured Games