Source author record

Yoav Shoham

Yoav Shoham appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Science and Game Theory Computation and Language Machine Learning Neural and Evolutionary Computing

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Standing on the Shoulders of Giant Frozen Language Models

Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise versatility, suggesting a tradeoff between performance and versatility. The main message of this paper is that current frozen-model techniques such as prompt tuning are only the tip of the iceberg, and more powerful methods for leveraging frozen LMs can do just as well as fine tuning in challenging domains without sacrificing the underlying model's versatility. To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches. Indeed, some of our methods even outperform fine-tuning approaches in domains currently dominated by the latter. The computational cost of each method is higher than that of existing frozen model methods, but still negligible relative to a single pass through a huge frozen LM. Each of these methods constitutes a meaningful contribution in its own right, but by presenting these contributions together we aim to convince the reader of a broader message that goes beyond the details of any given method: that frozen models have untapped potential and that fine-tuning is often unnecessary.

preprint2022arXiv

The AI Index 2022 Annual Report

Welcome to the fifth edition of the AI Index Report! The latest edition includes data from a broad set of academic, private, and nonprofit organizations as well as more self-collected data and original analysis than any previous editions, including an expanded technical performance chapter, a new survey of robotics researchers around the world, data on global AI legislation records in 25 countries, and a new chapter with an in-depth analysis of technical AI ethics metrics. The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Its mission is to provide unbiased, rigorously vetted, and globally sourced data for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world's most credible and authoritative source for data and insights about AI.

preprint2020arXiv

SenseBERT: Driving Some Sense into BERT

The ability to learn from large unlabeled corpora has allowed neural language models to advance the frontier in natural language understanding. However, existing self-supervision techniques operate at the word form level, which serves as a surrogate for the underlying semantic content. This paper proposes a method to employ weak-supervision directly at the word sense level. Our model, named SenseBERT, is pre-trained to predict not only the masked words but also their WordNet supersenses. Accordingly, we attain a lexical-semantic level language model, without the use of human annotation. SenseBERT achieves significantly improved lexical understanding, as we demonstrate by experimenting on SemEval Word Sense Disambiguation, and by attaining a state of the art result on the Word in Context task.

preprint2020arXiv

The Cost of Training NLP Models: A Concise Overview

We review the cost of training large-scale language models, and the drivers of these costs. The intended audience includes engineers and scientists budgeting their model-training experiments, as well as non-practitioners trying to make sense of the economics of modern-day Natural Language Processing (NLP).

preprint2014arXiv

Stable Invitations

We consider the situation in which an organizer is trying to convene an event, and needs to choose a subset of agents to be invited. Agents have preferences over how many attendees should be at the event and possibly also who the attendees should be. This induces a stability requirement: All invited agents should prefer attending to not attending, and all the other agents should not regret being not invited. The organizer's objective is to find the invitation of maximum size subject to the stability requirement. We investigate the computational complexity of finding the maximum stable invitation when all agents are truthful, as well as the mechanism design problem when agents may strategically misreport their preferences.

preprint2013arXiv

Conditional Utility, Utility Independence, and Utility Networks

We introduce a new interpretation of two related notions - conditional utility and utility independence. Unlike the traditional interpretation, the new interpretation renders the notions the direct analogues of their probabilistic counterparts. To capture these notions formally, we appeal to the notion of utility distribution, introduced in previous paper. We show that utility distributions, which have a structure that is identical to that of probability distributions, can be viewed as a special case of an additive multiattribute utility functions, and show how this special case permits us to capture the novel senses of conditional utility and utility independence. Finally, we present the notion of utility networks, which do for utilities what Bayesian networks do for probabilities. Specifically, utility networks exploit the new interpretation of conditional utility and utility independence to compactly represent a utility distribution.

preprint2012arXiv

Collusion in Unrepeated, First-Price Auctions with an Uncertain Number of Participants

We consider the question of whether collusion among bidders (a "bidding ring") can be supported in equilibrium of unrepeated first-price auctions. Unlike previous work on the topic such as that by McAfee and McMillan [1992] and Marshall and Marx [2007], we do not assume that non-colluding agents have perfect knowledge about the number of colluding agents whose bids are suppressed by the bidding ring, and indeed even allow for the existence of multiple cartels. Furthermore, while we treat the association of bidders with bidding rings as exogenous, we allow bidders to make strategic decisions about whether to join bidding rings when invited. We identify a bidding ring protocol that results in an efficient allocation in Bayes{Nash equilibrium, under which non-colluding agents bid straightforwardly, and colluding agents join bidding rings when invited and truthfully declare their valuations to the ring center. We show that bidding rings benefit ring centers and all agents, both members and non-members of bidding rings, at the auctioneer's expense. The techniques we introduce in this paper may also be useful for reasoning about other problems in which agents have asymmetric information about a setting.

preprint2012arXiv

Mechanism Design with Execution Uncertainty

We introduce the notion of fault tolerant mechanism design, which extends the standard game theoretic framework of mechanism design to allow for uncertainty about execution. Specifically, we define the problem of task allocation in which the private information of the agents is not only their costs to attempt the tasks, but also their probabilities of failure. For several different instances of this setting we present technical results, including positive ones in the form of mechanisms that are incentive compatible, individually rational and efficient, and negative ones in the form of impossibility theorems.

Yoav Shoham

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Standing on the Shoulders of Giant Frozen Language Models

The AI Index 2022 Annual Report

SenseBERT: Driving Some Sense into BERT

The Cost of Training NLP Models: A Concise Overview

Stable Invitations

Conditional Utility, Utility Independence, and Utility Networks

Collusion in Unrepeated, First-Price Auctions with an Uncertain Number of Participants

Mechanism Design with Execution Uncertainty