Source author record

Xilun Chen

Xilun Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning Computer Science and Game Theory

Catalog footprint

What is connected

5works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

With the rise of large-scale pre-trained language models, open-domain question-answering (ODQA) has become an important research topic in NLP. Based on the popular pre-training fine-tuning approach, we posit that an additional in-domain pre-training stage using a large-scale, natural, and diverse question-answering (QA) dataset can be beneficial for ODQA. Consequently, we propose a novel QA dataset based on the Common Crawl project in this paper. Using the readily available schema.org annotation, we extract around 130 million multilingual question-answer pairs, including about 60 million English data-points. With this previously unseen number of natural QA pairs, we pre-train popular language models to show the potential of large-scale in-domain pre-training for the task of question-answering. In our experiments, we find that pre-training question-answering models on our Common Crawl Question Answering dataset (CCQA) achieves promising results in zero-shot, low resource and fine-tuned settings across multiple tasks, models and benchmarks.

preprint2022arXiv

Simple Local Attentions Remain Competitive for Long-Context Tasks

Many NLP tasks require processing long contexts beyond the length limit of pretrained models. In order to scale these models to longer text sequences, many efficient long-range attention variants have been proposed. Despite the abundance of research along this direction, it is still difficult to gauge the relative effectiveness of these models in practical use cases, e.g., if we apply these models following the pretrain-and-finetune paradigm. In this work, we aim to conduct a thorough analysis of these emerging models with large-scale and controlled experiments. For each attention variant, we pretrain large-size models using the same long-doc corpus and then finetune these models for real-world long-context tasks. Our findings reveal pitfalls of an existing widely-used long-range benchmark and show none of the tested efficient attentions can beat a simple local window attention under standard pretraining paradigms. Further analysis on local attention variants suggests that even the commonly used attention-window overlap is not necessary to achieve good downstream results -- using disjoint local attentions, we are able to build a simpler and more efficient long-doc QA model that matches the performance of Longformer~\citep{longformer} with half of its pretraining compute. The code to replicate our experiments can be found at https://github.com/pytorch/fairseq/tree/main/examples/xformers

preprint2022arXiv

UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering

We study open-domain question answering with structured, unstructured and semi-structured knowledge sources, including text, tables, lists and knowledge bases. Departing from prior work, we propose a unifying approach that homogenizes all sources by reducing them to text and applies the retriever-reader model which has so far been limited to text sources only. Our approach greatly improves the results on knowledge-base QA tasks by 11 points, compared to latest graph-based methods. More importantly, we demonstrate that our unified knowledge (UniK-QA) model is a simple and yet effective way to combine heterogeneous sources of knowledge, advancing the state-of-the-art results on two popular question answering benchmarks, NaturalQuestions and WebQuestions, by 3.5 and 2.6 points, respectively. The code of UniK-QA is available at: https://github.com/facebookresearch/UniK-QA.

preprint2021arXiv

Muppet: Massive Multi-task Representations with Pre-Finetuning

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

preprint2014arXiv

Price of Anarchy of Innovation Diffusion in Social Networks

There have been great efforts in studying the cascading behavior in social networks such as the innovation diffusion, etc. Game theoretically, in a social network where individuals choose from two strategies: A (the innovation) and B (the status quo) and get payoff from their neighbors for coordination, it has long been known that the Price of Anarchy (PoA) of this game is not 1, since the Nash equilibrium (NE) where all players take B (B Nash) is inferior to the one all players taking A (A Nash). However, no quantitative analysis has been performed to give an accurate upper bound of PoA in this game. In this paper, we adopt a widely used networked coordination game setting [3] to study how bad a Nash equilibrium can be and give a tight upper bound of the PoA of such games. We show that there is an NE that is slightly worse than the B Nash. On the other hand, the PoA is bounded and the worst NE cannot be much worse than the B Nash. In addition, we discuss how the PoA upper bound would change when compatibility between A and B is introduced, and show an intuitive result that the upper bound strictly decreases as the compatibility is increased.

Xilun Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

Simple Local Attentions Remain Competitive for Long-Context Tasks

UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering

Muppet: Massive Multi-task Representations with Pre-Finetuning

Price of Anarchy of Innovation Diffusion in Social Networks