Researcher profile

Wenhan Xiong

Wenhan Xiong contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Simple Local Attentions Remain Competitive for Long-Context Tasks

Many NLP tasks require processing long contexts beyond the length limit of pretrained models. In order to scale these models to longer text sequences, many efficient long-range attention variants have been proposed. Despite the abundance of research along this direction, it is still difficult to gauge the relative effectiveness of these models in practical use cases, e.g., if we apply these models following the pretrain-and-finetune paradigm. In this work, we aim to conduct a thorough analysis of these emerging models with large-scale and controlled experiments. For each attention variant, we pretrain large-size models using the same long-doc corpus and then finetune these models for real-world long-context tasks. Our findings reveal pitfalls of an existing widely-used long-range benchmark and show none of the tested efficient attentions can beat a simple local window attention under standard pretraining paradigms. Further analysis on local attention variants suggests that even the commonly used attention-window overlap is not necessary to achieve good downstream results -- using disjoint local attentions, we are able to build a simpler and more efficient long-doc QA model that matches the performance of Longformer~\citep{longformer} with half of its pretraining compute. The code to replicate our experiments can be found at https://github.com/pytorch/fairseq/tree/main/examples/xformers

preprint2021arXiv

Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be applied to any unstructured text corpus. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.

preprint2021arXiv

Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering

To extract answers from a large corpus, open-domain question answering (QA) systems usually rely on information retrieval (IR) techniques to narrow the search space. Standard inverted index methods such as TF-IDF are commonly used as thanks to their efficiency. However, their retrieval performance is limited as they simply use shallow and sparse lexical features. To break the IR bottleneck, recent studies show that stronger retrieval performance can be achieved by pretraining a effective paragraph encoder that index paragraphs into dense vectors. Once trained, the corpus can be pre-encoded into low-dimensional vectors and stored within an index structure where the retrieval can be efficiently implemented as maximum inner product search. Despite the promising results, pretraining such a dense index is expensive and often requires a very large batch size. In this work, we propose a simple and resource-efficient method to pretrain the paragraph encoder. First, instead of using heuristically created pseudo question-paragraph pairs for pretraining, we utilize an existing pretrained sequence-to-sequence model to build a strong question generator that creates high-quality pretraining data. Second, we propose a progressive pretraining algorithm to ensure the existence of effective negative samples in each batch. Across three datasets, our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.

preprint2020arXiv

Learning to Recover Reasoning Chains for Multi-Hop Question Answering via Cooperative Games

We propose the new problem of learning to recover reasoning chains from weakly supervised signals, i.e., the question-answer pairs. We propose a cooperative game approach to deal with this problem, in which how the evidence passages are selected and how the selected passages are connected are handled by two models that cooperate to select the most confident chains from a large set of candidates (from distant supervision). For evaluation, we created benchmarks based on two multi-hop QA datasets, HotpotQA and MedHop; and hand-labeled reasoning chains for the latter. The experimental results demonstrate the effectiveness of our proposed approach.