Source author record

James Henderson

James Henderson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning Artificial Intelligence Data Structures and Algorithms physics.acc-ph

Catalog footprint

What is connected

11works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Differential Privacy for Transformer Embeddings of Text with Nonparametric Variational Information Bottleneck

We propose a privacy-preserving method for sharing text data by sharing noisy versions of their transformer embeddings. It has been shown that hidden representations learned by deep models can encode sensitive information from the input, making it possible for adversaries to recover the input data with considerable accuracy. This problem is exacerbated in transformer embeddings because they consist of multiple vectors, one per token. To mitigate this risk, we propose Nonparametric Variational Differential Privacy (NVDP), which ensures both useful data sharing and strong privacy protection. We take a differential privacy (DP) approach, integrating a nonparametric variational information bottleneck (NVIB) layer into the transformer architecture to inject noise into its multivector embeddings and thereby hide information, and measuring privacy protection with Rényi Divergence (RD) and its corresponding Bayesian Differential Privacy (BDP) guarantee. Training the NVIB layer calibrates the noise level according to the utility of the downstream task. We test NVDP on the General Language Understanding Evaluation (GLUE) benchmark and show that varying the noise level gives us a useful trade-off between privacy and accuracy. With lower noise levels, our model maintains high accuracy while offering strong privacy guarantees, effectively balancing privacy and utility.

preprint2022arXiv

A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

We propose a VAE for Transformers by developing a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to derive a nonparametric variational information bottleneck (NVIB) for such attention-based embeddings. The variable number of mixture components supported by nonparametric methods captures the variable number of vectors supported by attention, and the exchangeability of our nonparametric distributions captures the permutation invariance of attention. This allows NVIB to regularise the number of vectors accessible with attention, as well as the amount of information in individual vectors. By regularising the cross-attention of a Transformer encoder-decoder with NVIB, we propose a nonparametric variational autoencoder (NVAE). Initial experiments on training a NVAE on natural language text show that the induced embedding space has the desired properties of a VAE for Transformers.

preprint2022arXiv

Graph Refinement for Coreference Resolution

The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the graph in a non-autoregressive manner, then iteratively refines it based on previous predictions, allowing global dependencies between decisions. The experimental results show improvements over various baselines, reinforcing the hypothesis that document-level information improves conference resolution.

preprint2022arXiv

Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.

preprint2022arXiv

PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points. PERFECT makes two key design choices: First, we show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning and reduce memory and storage costs by roughly factors of 5 and 100, respectively. Second, instead of using handcrafted verbalizers, we learn new multi-token label embeddings during fine-tuning, which are not tied to the model vocabulary and which allow us to avoid complex auto-regressive decoding. These embeddings are not only learnable from limited data but also enable nearly 100x faster training and inference. Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods. Our code is publicly available at https://github.com/facebookresearch/perfect.git.

preprint2020arXiv

Analysis of Ultra-Short Bunches in Free-Electron Lasers

Free-electron lasers (FELs) operate at wavelengths from millimeter waves through hard x-rays. At x-ray wavelengths, FELs typically rely on self-amplified spontaneous emission (SASE). Typical SASE emission contains multiple temporal "spikes" which limit the longitudinal coherence of the optical output; hence, alternate schemes that improve on the longitudinal coherence of the SASE emission are of interest. In this paper, we consnider electron bunches that are shorter than the SASE spike separation. In such cases, the spontaneously generated radiation consists of a single optical pulse with better longitudinal coherence than is found in typical SASE FELs. To investigate this regime, we use two FEL simulation codes. One (MINERVA) uses the slowly-varying envelope approximation(SVEA) which breaks down for extremely short pulses. The second (PUFFIN) is a particle-in-cell (PiC) simulation code that is considered to be a more complete model of the underlying physics and which is able to simulate very short pulses. We first anchor these codes by showing that there is substantial agreement between the codes in simulation of the SPARC SASE FEL experiment at ENEA Frascati. We then compare the two codes for simulations using electron bunch lengths that are shorter than the SASE slice separation. The comparisons between the two codes for short bunch simulations elucidate the limitations of the SVEA in this regime but indicate that the SVEA can treat short bunches that are comparable to the cooperation length.

preprint2020arXiv

End-to-End Bias Mitigation by Modelling Biases in Corpora

Several recent studies have shown that strong natural language understanding (NLU) models are prone to relying on unwanted dataset biases without learning the underlying task, resulting in models that fail to generalize to out-of-domain datasets and are likely to perform poorly in real-world scenarios. We propose two learning strategies to train neural models, which are more robust to such biases and transfer better to out-of-domain datasets. The biases are specified in terms of one or more bias-only models, which learn to leverage the dataset biases. During training, the bias-only models' predictions are used to adjust the loss of the base model to reduce its reliance on biases by down-weighting the biased examples and focusing the training on the hard examples. We experiment on large-scale natural language inference and fact verification benchmarks, evaluating on out-of-domain datasets that are specifically designed to assess the robustness of models against known biases in the training data. Results show that our debiasing methods greatly improve robustness in all settings and better transfer to other textual entailment datasets. Our code and data are publicly available in \url{https://github.com/rabeehk/robust-nli}.

preprint2020arXiv

The Unstoppable Rise of Computational Linguistics in Deep Learning

In this paper, we trace the history of neural networks applied to natural language understanding tasks, and identify key contributions which the nature of language has made to the development of neural network architectures. We focus on the importance of variable binding and its instantiation in attention-based models, and argue that Transformer is not a sequence model but an induced-structure model. This perspective leads to predictions of the challenges facing research in deep learning architectures for natural language understanding.

preprint2016arXiv

A Bayesian Model of Multilingual Unsupervised Semantic Role Induction

We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and use it to explore the usefulness of parallel corpora for this task. Our joint Bayesian model consists of individual models for each language plus additional latent variables that capture alignments between roles across languages. Because it is a generative Bayesian model, we can do evaluations in a variety of scenarios just by varying the inference procedure, without changing the model, thereby comparing the scenarios directly. We compare using only monolingual data, using a parallel corpus, using a parallel corpus with annotations in the other language, and using small amounts of annotation in the target language. We find that the biggest impact of adding a parallel corpus to training is actually the increase in mono-lingual data, with the alignments to another language resulting in small improvements, even with labeled data for the other language.

preprint2016arXiv

A Vector Space for Distributional Semantics for Entailment

Distributional semantics creates vector-space representations that capture many forms of semantic similarity, but their relation to semantic entailment has been less clear. We propose a vector-space model which provides a formal foundation for a distributional semantics of entailment. Using a mean-field approximation, we develop approximate inference procedures and entailment operators over vectors of probabilities of features being known (versus unknown). We use this framework to reinterpret an existing distributional-semantic model (Word2Vec) as approximating an entailment-based model of the distributions of words in contexts, thereby predicting lexical entailment relations. In both unsupervised and semi-supervised experiments on hyponymy detection, we get substantial improvements over previous results.

preprint2013arXiv

Efficient Computation of Mean Truncated Hitting Times on Very Large Graphs

Previous work has shown the effectiveness of random walk hitting times as a measure of dissimilarity in a variety of graph-based learning problems such as collaborative filtering, query suggestion or finding paraphrases. However, application of hitting times has been limited to small datasets because of computational restrictions. This paper develops a new approximation algorithm with which hitting times can be computed on very large, disk-resident graphs, making their application possible to problems which were previously out of reach. This will potentially benefit a range of large-scale problems.

James Henderson

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Differential Privacy for Transformer Embeddings of Text with Nonparametric Variational Information Bottleneck

A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

Graph Refinement for Coreference Resolution

Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Analysis of Ultra-Short Bunches in Free-Electron Lasers

End-to-End Bias Mitigation by Modelling Biases in Corpora

The Unstoppable Rise of Computational Linguistics in Deep Learning

A Bayesian Model of Multilingual Unsupervised Semantic Role Induction

A Vector Space for Distributional Semantics for Entailment

Efficient Computation of Mean Truncated Hitting Times on Very Large Graphs