Source author record

Tong Mu

Tong Mu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Information Theory math.IT Multiagent Systems

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Solving Dynamic Principal-Agent Problems with a Rationally Inattentive Principal

Principal-Agent (PA) problems describe a broad class of economic relationships characterized by misaligned incentives and asymmetric information. The Principal's problem is to find optimal incentives given the available information, e.g., a manager setting optimal wages for its employees. Whereas the Principal is often assumed rational, comparatively little is known about solutions when the Principal is boundedly rational, especially in the sequential setting, with multiple Agents, and with multiple information channels. Here, we develop RIRL, a deep reinforcement learning framework that solves such complex PA problems with a rationally inattentive Principal. Such a Principal incurs a cost for paying attention to information, which can model forms of bounded rationality. We use RIRL to analyze rich economic phenomena in manager-employee relationships. In the single-step setting, 1) RIRL yields wages that are consistent with theoretical predictions; and 2) non-zero attention costs lead to simpler but less profitable wage structures, and increased Agent welfare. In a sequential setting with multiple Agents, RIRL shows opposing consequences of the Principal's inattention to different information channels: 1) inattention to Agents' outputs closes wage gaps based on ability differences; and 2) inattention to Agents' efforts induces a social dilemma dynamic in which Agents work harder, but essentially for free. Moreover, RIRL reveals non-trivial relationships between the Principal's inattention and Agent types, e.g., if Agents are prone to sub-optimal effort choices, payment schedules are more sensitive to the Principal's attention cost. As such, RIRL can reveal novel economic relationships and enables progress towards understanding the effects of bounded rationality in dynamic settings.

preprint2021arXiv

Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance. To address this, we introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy. It takes in multiple potential policy constraints to maintain robustness to misspecification of individual constraints while leveraging helpful ones to learn quickly. Given a base RL learning algorithm (ex. UCRL, DQN, Rainbow) we propose an upper confidence with elimination scheme that leverages the relationship between the constraints, and their observed performance, to adaptively switch among them. We instantiate our algorithm with DQN-type algorithms and UCRL as base algorithms, and evaluate our algorithm in four environments, including three simulators based on real data: recommendations, educational activity sequencing, and HIV treatment sequencing. In all cases, CSRL learns a good policy faster than baselines.

preprint2016arXiv

Optimality and Rate-Compatibility for Erasure-Coded Packet Transmissions when Fading Channel Diversity Increases with Packet Length

A message composed of packets is transmitted using erasure and channel coding over a fading channel with no feedback. For this scenario, the paper explores the trade-off between the redundancies allocated to the packet-level erasure code and the channel code, along with an objective of a low probability of failure to recover the message. To this end, we consider a fading model that we term proportional-diversity block fading (PD block fading). For a fixed overall code rate and transmit power, we formulate an optimization problem to numerically find the optimal channel-coding rate (and thus the optimal erasure-coding rate) that minimizes the probability of failure for various approximations of the problem. Furthermore, an interpretation of the results from an incremental redundancy point of view shows how rate-compatibility affects the possible trajectories of the failure probability as a function of the overall code rate. Our numerical results suggest that an optimal, rateless, hybrid coding scheme for a single-user wireless system over the PD block-fading channel should have the rate of the erasure code approach one.