Source author record

Andrew Arnold

Andrew Arnold appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Symbolic Computation Computation and Language Data Structures and Algorithms Artificial Intelligence Computational Complexity cs.CY Information Retrieval Machine Learning Mathematical Software

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Debiasing Neural Retrieval via In-batch Balancing Regularization

People frequently interact with information retrieval (IR) systems, however, IR models exhibit biases and discrimination towards various demographics. The in-processing fair ranking methods provide a trade-offs between accuracy and fairness through adding a fairness-related regularization term in the loss function. However, there haven't been intuitive objective functions that depend on the click probability and user engagement to directly optimize towards this. In this work, we propose the In-Batch Balancing Regularization (IBBR) to mitigate the ranking disparity among subgroups. In particular, we develop a differentiable \textit{normed Pairwise Ranking Fairness} (nPRF) and leverage the T-statistics on top of nPRF over subgroups as a regularization to improve fairness. Empirical results with the BERT-based neural rankers on the MS MARCO Passage Retrieval dataset with the human-annotated non-gendered queries benchmark \citep{rekabsaz2020neural} show that our IBBR method with nPRF achieves significantly less bias with minimal degradation in ranking performance compared with the baseline.

preprint2022arXiv

DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. However, such models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. To alleviate this issue, we propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model. Empirical analyses show that, despite the challenging nature of generative tasks, we were able to achieve a 16.5x model footprint compression ratio with little performance drop relative to the full-precision counterparts on multiple summarization and QA datasets. We further pushed the limit of compression ratio to 27.7x and presented the performance-efficiency trade-off for generative tasks using pre-trained models. To the best of our knowledge, this is the first work aiming to effectively distill and quantize sequence-to-sequence pre-trained models for language generation tasks.

preprint2022arXiv

Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

Large language models have achieved high performance on various question answering (QA) benchmarks, but the explainability of their output remains elusive. Structured explanations, called entailment trees, were recently suggested as a way to explain and inspect a QA system's answer. In order to better generate such entailment trees, we propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR). Our model is able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises. The IRGR model iteratively searches for suitable premises, constructing a single entailment step at a time. Contrary to previous approaches, our method combines generation steps and retrieval of premises, allowing the model to leverage intermediate conclusions, and mitigating the input size limit of baseline encoder-decoder models. We conduct experiments using the EntailmentBank dataset, where we outperform existing benchmarks on premise retrieval and entailment tree generation, with around 300% gain in overall correctness.

preprint2022arXiv

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fine-tuned for various downstream tasks. However, when deployed in the real world, a PTLM-based model must deal with data distributions that deviate from what the PTLM was initially trained on. In this paper, we study a lifelong language model pretraining challenge where a PTLM is continually updated so as to adapt to emerging data. Over a domain-incremental research paper stream and a chronologically-ordered tweet stream, we incrementally pretrain a PTLM with different continual learning algorithms, and keep track of the downstream task performance (after fine-tuning). We evaluate PTLM's ability to adapt to new corpora while retaining learned knowledge in earlier corpora. Our experiments show distillation-based approaches to be most effective in retaining downstream performance in earlier domains. The algorithms also improve knowledge transfer, allowing models to achieve better downstream performance over the latest data, and improve temporal generalization when distribution gaps exist between training and evaluation because of time. We believe our problem formulation, methods, and analysis will inspire future studies towards continual pretraining of language models.

preprint2022arXiv

QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition

Recently, prompt-based learning for pre-trained language models has succeeded in few-shot Named Entity Recognition (NER) by exploiting prompts as task guidance to increase label efficiency. However, previous prompt-based methods for few-shot NER have limitations such as a higher computational complexity, poor zero-shot ability, requiring manual prompt engineering, or lack of prompt robustness. In this work, we address these shortcomings by proposing a new prompt-based learning NER method with Question Answering (QA), called QaNER. Our approach includes 1) a refined strategy for converting NER problems into the QA formulation; 2) NER prompt generation for QA models; 3) prompt-based tuning with QA models on a few annotated NER examples; 4) zero-shot NER by prompting the QA model. Comparing the proposed approach with previous methods, QaNER is faster at inference, insensitive to the prompt quality, and robust to hyper-parameters, as well as demonstrating significantly better low-resource performance and zero-shot capability.

preprint2015arXiv

Output-sensitive algorithms for sumset and sparse polynomial multiplication

We present randomized algorithms to compute the sumset (Minkowski sum) of two integer sets, and to multiply two univariate integer polynomials given by sparse representations. Our algorithm for sumset has cost softly linear in the combined size of the inputs and output. This is used as part of our sparse multiplication algorithm, whose cost is softly linear in the combined size of the inputs, output, and the sumset of the supports of the inputs. As a subroutine, we present a new method for computing the coefficients of a sparse polynomial, given a set containing its support. Our multiplication algorithm extends to multivariate Laurent polynomials over finite fields and rational numbers. Our techniques are based on sparse interpolation algorithms and results from analytic number theory.

preprint2014arXiv

Faster sparse interpolation of straight-line programs

We give a new probabilistic algorithm for interpolating a "sparse" polynomial f given by a straight-line program. Our algorithm constructs an approximation f* of f, such that their difference probably has at most half the number of terms of f, then recurses on their difference. Our approach builds on previous work by Garg and Schost (2009), and Giesbrecht and Roche (2011), and is asymptotically more efficient in terms of the total cost of the probes required than previous methods, in many cases.

preprint2014arXiv

Faster Sparse Multivariate Polynomial Interpolation of Straight-Line Programs

Given a straight-line program whose output is a polynomial function of the inputs, we present a new algorithm to compute a concise representation of that unknown function. Our algorithm can handle any case where the unknown function is a multivariate polynomial, with coefficients in an arbitrary finite field, and with a reasonable number of nonzero terms but possibly very large degree. It is competitive with previously known sparse interpolation algorithms that work over an arbitrary finite field, and provides an improvement when there are a large number of variables.

preprint2014arXiv

Multivariate sparse interpolation using randomized Kronecker substitutions

We present new techniques for reducing a multivariate sparse polynomial to a univariate polynomial. The reduction works similarly to the classical and widely-used Kronecker substitution, except that we choose the degrees randomly based on the number of nonzero terms in the multivariate polynomial, that is, its sparsity. The resulting univariate polynomial often has a significantly lower degree than the Kronecker substitution polynomial, at the expense of a small number of term collisions. As an application, we give a new algorithm for multivariate interpolation which uses these new techniques along with any existing univariate interpolation algorithm.

preprint2014arXiv

Sparse interpolation over finite fields via low-order roots of unity

We present a new Monte Carlo algorithm for the interpolation of a straight-line program as a sparse polynomial $f$ over an arbitrary finite field of size $q$. We assume a priori bounds $D$ and $T$ are given on the degree and number of terms of $f$. The approach presented in this paper is a hybrid of the diversified and recursive interpolation algorithms, the two previous fastest known probabilistic methods for this problem. By making effective use of the information contained in the coefficients themselves, this new algorithm improves on the bit complexity of previous methods by a "soft-Oh" factor of $T$, $\log D$, or $\log q$.

preprint2013arXiv

A new Truncated Fourier Transform algorithm

Truncated Fourier Transforms (TFTs), first introduced by Van der Hoeven, refer to a family of algorithms that attempt to smooth "jumps" in complexity exhibited by FFT algorithms. We present an in-place TFT whose time complexity, measured in terms of ring operations, is comparable to existing not-in-place TFT methods. We also describe a transformation that maps between two families of TFT algorithms that use different sets of evaluation points.

Andrew Arnold

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Debiasing Neural Retrieval via In-batch Balancing Regularization

DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition

Output-sensitive algorithms for sumset and sparse polynomial multiplication

Faster sparse interpolation of straight-line programs

Faster Sparse Multivariate Polynomial Interpolation of Straight-Line Programs

Multivariate sparse interpolation using randomized Kronecker substitutions

Sparse interpolation over finite fields via low-order roots of unity

A new Truncated Fourier Transform algorithm