Researcher profile

Chris Jones

Chris Jones contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Clarifying How Degree Entropies and Degree-Degree Correlations Relate to Network Robustness

It is often claimed that the entropy of a network's degree distribution is a proxy for its robustness. Here, we clarify the link between degree distribution entropy and giant component robustness to node removal by showing that the former merely sets a lower bound to the latter for randomly configured networks when no other network characteristics are specified. Furthermore, we show that, for networks of fixed expected degree that follow degree distributions of the same form, the degree distribution entropy is not indicative of robustness. By contrast, we show that the remaining degree entropy and robustness have a positive monotonic relationship and give an analytic expression for the remaining degree entropy of the log-normal distribution. We also show that degree-degree correlations are not by themselves indicative of a network's robustness for real networks. We propose an adjustment to how mutual information is measured which better encapsulates structural properties related to robustness.

preprint2022arXiv

Improving language models by retrieving from trillions of tokens

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

preprint2022arXiv

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

preprint2022arXiv

Unified Scaling Laws for Routed Language Models

The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. Afterwards we provide two applications of these laws: first deriving an Effective Parameter Count along which all models scale at the same rate, and then using the scaling coefficients to give a quantitative comparison of the three routing techniques considered. Our analysis derives from an extensive evaluation of Routing Networks across five orders of magnitude of size, including models with hundreds of experts and hundreds of billions of parameters.

preprint2020arXiv

HL-LHC Computing Review: Common Tools and Community Software

Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this document we address the issues for software that is used in multiple experiments (usually even more widely than ATLAS and CMS) and maintained by teams of developers who are either not linked to a particular experiment or who contribute to common software within the context of their experiment activity. We also give space to general considerations for future software and projects that tackle upcoming challenges, no matter who writes it, which is an area where community convergence on best practice is extremely useful.

preprint2020arXiv

Sum-of-Squares Lower Bounds for Sherrington-Kirkpatrick via Planted Affine Planes

The Sum-of-Squares (SoS) hierarchy is a semi-definite programming meta-algorithm that captures state-of-the-art polynomial time guarantees for many optimization problems such as Max-$k$-CSPs and Tensor PCA. On the flip side, a SoS lower bound provides evidence of hardness, which is particularly relevant to average-case problems for which NP-hardness may not be available. In this paper, we consider the following average case problem, which we call the \emph{Planted Affine Planes} (PAP) problem: Given $m$ random vectors $d_1,\ldots,d_m$ in $\mathbb{R}^n$, can we prove that there is no vector $v \in \mathbb{R}^n$ such that for all $u \in [m]$, $\langle v, d_u\rangle^2 = 1$? In other words, can we prove that $m$ random vectors are not all contained in two parallel hyperplanes at equal distance from the origin? We prove that for $m \leq n^{3/2-ε}$, with high probability, degree-$n^{Ω(ε)}$ SoS fails to refute the existence of such a vector $v$. When the vectors $d_1,\ldots,d_m$ are chosen from the multivariate normal distribution, the PAP problem is equivalent to the problem of proving that a random $n$-dimensional subspace of $\mathbb{R}^m$ does not contain a boolean vector. As shown by Mohanty--Raghavendra--Xu [STOC 2020], a lower bound for this problem implies a lower bound for the problem of certifying energy upper bounds on the Sherrington-Kirkpatrick Hamiltonian, and so our lower bound implies a degree-$n^{Ω(ε)}$ SoS lower bound for the certification version of the Sherrington-Kirkpatrick problem.

preprint2018arXiv

The long non-coding RNA HOTAIR is transcriptionally activated by HOXA9 and is an independent prognostic marker in patients with malignant glioma

The lncRNA HOTAIR has been implicated in several human cancers. Here, we evaluated the molecular alterations and upstream regulatory mechanisms of HOTAIR in glioma, the most common primary brain tumors, and its clinical relevance. HOTAIR gene expression, methylation, copy-number and prognostic value were investigated in human gliomas integrating data from online datasets and our cohorts. High levels of HOTAIR were associated with higher grades of glioma, particularly IDH wild-type cases. Mechanistically, HOTAIR was overexpressed in a gene dosage-independent manner, while DNA methylation levels of particular CpGs in HOTAIR locus were associated with HOTAIR expression levels in GBM clinical specimens and cell lines. Concordantly, the demethylating agent 5-Aza-2'-deoxycytidine affected HOTAIR transcriptional levels in a cell line-dependent manner. Importantly, HOTAIR was frequently co-expressed with HOXA9 in high-grade gliomas from TCGA, Oncomine, and our Portuguese and French datasets. Integrated in silico analyses, chromatin immunoprecipitation, and qPCR data showed that HOXA9 binds directly to the promoter of HOTAIR. Clinically, GBM patients with high HOTAIR expression had a significantly reduced overall survival, independently of other prognostic variables. In summary, this work reveals HOXA9 as a novel direct regulator of HOTAIR, and establishes HOTAIR as an independent prognostic marker, providing new therapeutic opportunities to treat this highly aggressive cancer.