Researcher profile

Jamie Callan

Jamie Callan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

Long Document Re-ranking with Modular Re-ranker

Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks. These chunks are independently mapped to scalar scores or latent vectors, which are then pooled into a final relevance score. These encode-and-pool methods however inevitably introduce an information bottleneck: the low dimension representations. In this paper, we propose instead to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework. First, document chunks are encoded independently with an encoder module. An interaction module then encodes the query and performs joint attention from the query to all document chunk representations. We demonstrate that the model can use this new degree of freedom to aggregate important information from the entire document. Our experiments show that this design produces effective re-ranking on two classical IR collections Robust04 and ClueWeb09, and a large-scale supervised collection MS-MARCO document ranking.

preprint2022arXiv

Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval

Recent rapid advancements in deep pre-trained language models and the introductions of large datasets have powered research in embedding-based dense retrieval. While several good research papers have emerged, many of them come with their own software stacks. These stacks are typically optimized for some particular research goals instead of efficiency or code structure. In this paper, we present Tevatron, a dense retrieval toolkit optimized for efficiency, flexibility, and code simplicity. Tevatron provides a standardized pipeline for dense retrieval including text processing, model training, corpus/query encoding, and search. This paper presents an overview of Tevatron and demonstrates its effectiveness and efficiency across several IR and QA data sets. We also show how Tevatron's flexible design enables easy generalization across datasets, model architectures, and accelerator platforms(GPU/TPU). We believe Tevatron can serve as an effective software foundation for dense retrieval system research including design, modeling, and optimization.

preprint2021arXiv

Assessing the Benefits of Model Ensembles in Neural Re-Ranking for Passage Retrieval

Our work aimed at experimentally assessing the benefits of model ensembling within the context of neural methods for passage reranking. Starting from relatively standard neural models, we use a previous technique named Fast Geometric Ensembling to generate multiple model instances from particular training schedules, then focusing or attention on different types of approaches for combining the results from the multiple model instances (e.g., averaging the ranking scores, using fusion methods from the IR literature, or using supervised learning-to-rank). Tests with the MS-MARCO dataset show that model ensembling can indeed benefit the ranking quality, particularly with supervised learning-to-rank although also with unsupervised rank aggregation.

preprint2021arXiv

PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer

Most research on pseudo relevance feedback (PRF) has been done in vector space and probabilistic retrieval models. This paper shows that Transformer-based rerankers can also benefit from the extra context that PRF provides. It presents PGT, a graph-based Transformer that sparsifies attention between graph nodes to enable PRF while avoiding the high computational complexity of most Transformer architectures. Experiments show that PGT improves upon non-PRF Transformer reranker, and it is at least as accurate as Transformer PRF models that use full attention, but with lower computational costs.

preprint2021arXiv

Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline

Pre-trained deep language models~(LM) have advanced the state-of-the-art of text retrieval. Rerankers fine-tuned from deep LM estimates candidate relevance based on rich contextualized matching signals. Meanwhile, deep LMs can also be leveraged to improve search index, building retrievers with better recall. One would expect a straightforward combination of both in a pipeline to have additive performance gain. In this paper, we discover otherwise and that popular reranker cannot fully exploit the improved retrieval result. We, therefore, propose a Localized Contrastive Estimation (LCE) for training rerankers and demonstrate it significantly improves deep two-stage models.

preprint2020arXiv

Generating Categories for Sets of Entities

Category systems are central components of knowledge bases, as they provide a hierarchical grouping of semantically related concepts and entities. They are a unique and valuable resource that is utilized in a broad range of information access tasks. To aid knowledge editors in the manual process of expanding a category system, this paper presents a method of generating categories for sets of entities. First, we employ neural abstractive summarization models to generate candidate categories. Next, the location within the hierarchy is identified for each candidate. Finally, structure-, content-, and hierarchy-based features are used to rank candidates to identify by the most promising ones (measured in terms of specificity, hierarchy, and importance). We develop a test collection based on Wikipedia categories and demonstrate the effectiveness of the proposed approach.

preprint2020arXiv

Ranking Clarification Questions via Natural Language Inference

Given a natural language query, teaching machines to ask clarifying questions is of immense utility in practical natural language processing systems. Such interactions could help in filling information gaps for better machine comprehension of the query. For the task of ranking clarification questions, we hypothesize that determining whether a clarification question pertains to a missing entry in a given post (on QA forums such as StackExchange) could be considered as a special case of Natural Language Inference (NLI), where both the post and the most relevant clarification question point to a shared latent piece of information or context. We validate this hypothesis by incorporating representations from a Siamese BERT model fine-tuned on NLI and Multi-NLI datasets into our models and demonstrate that our best performing model obtains a relative performance improvement of 40 percent and 60 percent respectively (on the key metric of Precision@1), over the state-of-the-art baseline(s) on the two evaluation sets of the StackExchange dataset, thereby, significantly surpassing the state-of-the-art.

preprint2020arXiv

Summarizing and Exploring Tabular Data in Conversational Search

Tabular data provide answers to a significant portion of search queries. However, reciting an entire result table is impractical in conversational search systems. We propose to generate natural language summaries as answers to describe the complex information contained in a table. Through crowdsourcing experiments, we build a new conversation-oriented, open-domain table summarization dataset. It includes annotated table summaries, which not only answer questions but also help people explore other information in the table. We utilize this dataset to develop automatic table summarization systems as SOTA baselines. Based on the experimental results, we identify challenges and point out future research directions that this resource will support.

preprint2020arXiv

TREC CAsT 2019: The Conversational Assistance Track Overview

The Conversational Assistance Track (CAsT) is a new track for TREC 2019 to facilitate Conversational Information Seeking (CIS) research and to create a large-scale reusable test collection for conversational search systems. The document corpus is 38,426,252 passages from the TREC Complex Answer Retrieval (CAR) and Microsoft MAchine Reading COmprehension (MARCO) datasets. Eighty information seeking dialogues (30 train, 50 test) are an average of 9 to 10 questions long. Relevance assessments are provided for 30 training topics and 20 test topics. This year 21 groups submitted a total of 65 runs using varying methods for conversational query understanding and ranking. Methods include traditional retrieval based methods, feature based learning-to-rank, neural models, and knowledge enhanced methods. A common theme through the runs is the use of BERT-based neural reranking methods. Leading methods also employed document expansion, conversational query expansion, and generative language models for conversational query rewriting (GPT-2). The results show a gap between automatic systems and those using the manually resolved utterances, with a 35% relative improvement of manual rewrites over the best automatic system.

preprint2020arXiv

Understanding BERT Rankers Under Distillation

Deep language models such as BERT pre-trained on large corpus have given a huge performance boost to the state-of-the-art information retrieval ranking systems. Knowledge embedded in such models allows them to pick up complex matching signals between passages and queries. However, the high computation cost during inference limits their deployment in real-world search scenarios. In this paper, we study if and how the knowledge for search within BERT can be transferred to a smaller ranker through distillation. Our experiments demonstrate that it is crucial to use a proper distillation procedure, which produces up to nine times speedup while preserving the state-of-the-art performance.