Source author record

Alan Chan

Alan Chan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning cs.CY Multiagent Systems Networking and Internet Architecture

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Permission Manifests for Web Agents

The rise of Large Language Model (LLM)-based web agents represents a significant shift in automated interactions with the web. Unlike traditional crawlers that follow simple conventions, such as robots$.$txt, modern agents engage with websites in sophisticated ways: navigating complex interfaces, extracting structured information, and completing end-to-end tasks. Existing governance mechanisms were not designed for these capabilities. Without a way to specify what interactions are and are not allowed, website owners increasingly rely on blanket blocking and CAPTCHAs, which undermine beneficial applications such as efficient automation, convenient use of e-commerce services, and accessibility tools. We introduce agent-permissions$.$json, a robots$.$txt-style lightweight manifest where websites specify allowed interactions, complemented by API references where available. This framework provides a low-friction coordination mechanism: website owners only need to write a simple JSON file, while agents can easily parse and automatically implement the manifest's provisions. Website owners can then focus on blocking non-compliant agents, rather than agents as a whole. By extending the spirit of robots$.$txt to the era of LLM-mediated interaction, and complementing data use initiatives such as AIPref, the manifest establishes a compliance framework that enables beneficial agent interactions while respecting site owners' preferences.

preprint2022arXiv

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification. Many different approaches have been explored for approximate policy evaluation, but less is understood about approximate greedification and what choices guarantee policy improvement. In this work, we investigate approximate greedification when reducing the KL divergence between the parameterized policy and the Boltzmann distribution over action values. In particular, we investigate the difference between the forward and reverse KL divergences, with varying degrees of entropy regularization. We show that the reverse KL has stronger policy improvement guarantees, but that reducing the forward KL can result in a worse policy. We also demonstrate, however, that a large enough reduction of the forward KL can induce improvement under additional assumptions. Empirically, we show on simple continuous-action environments that the forward KL can induce more exploration, but at the cost of a more suboptimal policy. No significant differences were observed in the discrete-action setting or on a suite of benchmark problems. Throughout, we highlight that many policy gradient methods can be seen as an instance of API, with either the forward or reverse KL for the policy update, and discuss next steps for understanding and improving our policy optimization algorithms.

preprint2022arXiv

Scoring Rules for Performative Binary Prediction

We construct a model of expert prediction where predictions can influence the state of the world. Under this model, we show through theoretical and numerical results that proper scoring rules can incentivize experts to manipulate the world with their predictions. We also construct a simple class of scoring rules that avoids this problem.

preprint2021arXiv

The Limits of Global Inclusion in AI Development

Those best-positioned to profit from the proliferation of artificial intelligence (AI) systems are those with the most economic power. Extant global inequality has motivated Western institutions to involve more diverse groups in the development and application of AI systems, including hiring foreign labour and establishing extra-national data centers and laboratories. However, given both the propensity of wealth to abet its own accumulation and the lack of contextual knowledge in top-down AI solutions, we argue that more focus should be placed on the redistribution of power, rather than just on including underrepresented groups. Unless more is done to ensure that opportunities to lead AI development are distributed justly, the future may hold only AI systems which are unsuited to their conditions of application, and exacerbate inequality.

preprint2020arXiv

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps. To learn the value function for horizon $h$, these algorithms bootstrap from the value function for horizon $h-1$, or some shorter horizon. Because no value function bootstraps from itself, fixed-horizon methods are immune to the stability problems that plague other off-policy TD methods using function approximation (also known as "the deadly triad"). Although fixed-horizon methods require the storage of additional value functions, this gives the agent additional predictive power, while the added complexity can be substantially reduced via parallel updates, shared weights, and $n$-step bootstrapping. We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value functions. We also prove convergence of fixed-horizon temporal difference methods with linear and general function approximation. Taken together, our results establish fixed-horizon TD methods as a viable new way of avoiding the stability problems of the deadly triad.

preprint2020arXiv

Inverse Policy Evaluation for Value-based Sequential Decision-making

Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., $Q$-learning), and acting greedily with respect to the estimates with an arbitrary degree of entropy to ensure that the state-space is sufficiently explored. Behavior based on explicit greedification assumes that the values reflect those of \textit{some} policy, over which the greedy policy will be an improvement. However, value-iteration can produce value functions that do not correspond to \textit{any} policy. This is especially relevant in the function-approximation regime, when the true value function can't be perfectly represented. In this work, we explore the use of \textit{inverse policy evaluation}, the process of solving for a likely policy given a value function, for deriving behavior from a value function. We provide theoretical and empirical results to show that inverse policy evaluation, combined with an approximate value iteration algorithm, is a feasible method for value-based control.

Alan Chan

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Permission Manifests for Web Agents

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Scoring Rules for Performative Binary Prediction

The Limits of Global Inclusion in AI Development

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Inverse Policy Evaluation for Value-based Sequential Decision-making