Source author record

Chris Jones

Chris Jones appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Computational Complexity hep-ex Machine Learning math.CO physics.comp-ph Artificial Intelligence astro-ph cond-mat.stat-mech Genomics Molecular Networks physics.soc-ph Programming Languages Social and Information Networks

Catalog footprint

What is connected

11works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Clarifying How Degree Entropies and Degree-Degree Correlations Relate to Network Robustness

It is often claimed that the entropy of a network's degree distribution is a proxy for its robustness. Here, we clarify the link between degree distribution entropy and giant component robustness to node removal by showing that the former merely sets a lower bound to the latter for randomly configured networks when no other network characteristics are specified. Furthermore, we show that, for networks of fixed expected degree that follow degree distributions of the same form, the degree distribution entropy is not indicative of robustness. By contrast, we show that the remaining degree entropy and robustness have a positive monotonic relationship and give an analytic expression for the remaining degree entropy of the log-normal distribution. We also show that degree-degree correlations are not by themselves indicative of a network's robustness for real networks. We propose an adjustment to how mutual information is measured which better encapsulates structural properties related to robustness.

preprint2022arXiv

Improving language models by retrieving from trillions of tokens

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

preprint2022arXiv

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

preprint2022arXiv

Unified Scaling Laws for Routed Language Models

The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. Afterwards we provide two applications of these laws: first deriving an Effective Parameter Count along which all models scale at the same rate, and then using the scaling coefficients to give a quantitative comparison of the three routing techniques considered. Our analysis derives from an extensive evaluation of Routing Networks across five orders of magnitude of size, including models with hundreds of experts and hundreds of billions of parameters.

preprint2020arXiv

HL-LHC Computing Review: Common Tools and Community Software

Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this document we address the issues for software that is used in multiple experiments (usually even more widely than ATLAS and CMS) and maintained by teams of developers who are either not linked to a particular experiment or who contribute to common software within the context of their experiment activity. We also give space to general considerations for future software and projects that tackle upcoming challenges, no matter who writes it, which is an area where community convergence on best practice is extremely useful.

preprint2020arXiv

Sum-of-Squares Lower Bounds for Sherrington-Kirkpatrick via Planted Affine Planes

The Sum-of-Squares (SoS) hierarchy is a semi-definite programming meta-algorithm that captures state-of-the-art polynomial time guarantees for many optimization problems such as Max-$k$-CSPs and Tensor PCA. On the flip side, a SoS lower bound provides evidence of hardness, which is particularly relevant to average-case problems for which NP-hardness may not be available. In this paper, we consider the following average case problem, which we call the \emph{Planted Affine Planes} (PAP) problem: Given $m$ random vectors $d_1,\ldots,d_m$ in $\mathbb{R}^n$, can we prove that there is no vector $v \in \mathbb{R}^n$ such that for all $u \in [m]$, $\langle v, d_u\rangle^2 = 1$? In other words, can we prove that $m$ random vectors are not all contained in two parallel hyperplanes at equal distance from the origin? We prove that for $m \leq n^{3/2-ε}$, with high probability, degree-$n^{Ω(ε)}$ SoS fails to refute the existence of such a vector $v$. When the vectors $d_1,\ldots,d_m$ are chosen from the multivariate normal distribution, the PAP problem is equivalent to the problem of proving that a random $n$-dimensional subspace of $\mathbb{R}^m$ does not contain a boolean vector. As shown by Mohanty--Raghavendra--Xu [STOC 2020], a lower bound for this problem implies a lower bound for the problem of certifying energy upper bounds on the Sherrington-Kirkpatrick Hamiltonian, and so our lower bound implies a degree-$n^{Ω(ε)}$ SoS lower bound for the certification version of the Sherrington-Kirkpatrick problem.

preprint2018arXiv

The long non-coding RNA HOTAIR is transcriptionally activated by HOXA9 and is an independent prognostic marker in patients with malignant glioma

The lncRNA HOTAIR has been implicated in several human cancers. Here, we evaluated the molecular alterations and upstream regulatory mechanisms of HOTAIR in glioma, the most common primary brain tumors, and its clinical relevance. HOTAIR gene expression, methylation, copy-number and prognostic value were investigated in human gliomas integrating data from online datasets and our cohorts. High levels of HOTAIR were associated with higher grades of glioma, particularly IDH wild-type cases. Mechanistically, HOTAIR was overexpressed in a gene dosage-independent manner, while DNA methylation levels of particular CpGs in HOTAIR locus were associated with HOTAIR expression levels in GBM clinical specimens and cell lines. Concordantly, the demethylating agent 5-Aza-2'-deoxycytidine affected HOTAIR transcriptional levels in a cell line-dependent manner. Importantly, HOTAIR was frequently co-expressed with HOXA9 in high-grade gliomas from TCGA, Oncomine, and our Portuguese and French datasets. Integrated in silico analyses, chromatin immunoprecipitation, and qPCR data showed that HOXA9 binds directly to the promoter of HOTAIR. Clinically, GBM patients with high HOTAIR expression had a significantly reduced overall survival, independently of other prognostic variables. In summary, this work reveals HOXA9 as a novel direct regulator of HOTAIR, and establishes HOTAIR as an independent prognostic marker, providing new therapeutic opportunities to treat this highly aggressive cancer.

preprint2016arXiv

A Noisy-Influence Regularity Lemma for Boolean Functions

We present a regularity lemma for Boolean functions $f:\{-1,1\}^n \to \{-1,1\}$ based on noisy influence, a measure of how locally correlated $f$ is with each input bit. We provide an application of the regularity lemma to weaken the conditions on the Majority is Stablest Theorem. We also prove a "homogenized" version stating that there is a set of input bits so that most restrictions of $f$ on those bits have small noisy influences. These results were sketched out by [OSTW10], but never published. With their permission, we present the full details here.

preprint2016arXiv

Lightweight User-Space Record And Replay

The ability to record and replay program executions with low overhead enables many applications, such as reverse-execution debugging, debugging of hard-to-reproduce test failures, and "black box" forensic analysis of failures in deployed systems. Existing record-and-replay approaches rely on recording an entire virtual machine (which is heavyweight), modifying the OS kernel (which adds deployment and maintenance costs), or pervasive code instrumentation (which imposes significant performance and complexity overhead). We investigated whether it is possible to build a practical record-and-replay system avoiding all these issues. The answer turns out to be yes --- if the CPU and operating system meet certain non-obvious constraints. Fortunately modern Intel CPUs, Linux kernels and user-space frameworks meet these constraints, although this has only become true recently. With some novel optimizations, our system RR records and replays real-world workloads with low overhead with an entirely user-space implementation running on stock hardware and operating systems. RR forms the basis of an open-source reverse-execution debugger seeing significant use in practice. We present the design and implementation of RR, describe its performance on a variety of workloads, and identify constraints on hardware and operating system design required to support our approach.

preprint2013arXiv

Snowmass Computing Frontier: Software Development, Staffing and Training

Report of the Snowmass CpF-I4 subgroup on Software Development, Staffing and Training

preprint2003arXiv

On the Surface Heating of Synchronously-Spinning Short-Period Jovian Planets

We consider the atmospheric flow on short-period extra-solar planets through two-dimensional numerical simulations of hydrodynamics with radiation transfer. One side is always exposed to the irradiation from the host star. The other is always in shadow. The temperature of the day side is determined by the equilibrium which the planetary atmosphere establishes with stellar radiation. Part of the thermal energy deposited on the day side is advected to the night side by a current. The radiation transfer, the night-side temperature distribution and by this the spectroscopic signature of the planet are sensitive functions of the atmospheric opacity. If the atmosphere contains grains with an abundanceand size distribution comparable to that of the interstellar medium, shallow heating occurs on the day side and the night side cools well below the day side. The temperature difference decreases as the abundance of grains is reduced. A simple analytic model of the dissipation of the circulation flow and associated kinetic heating is considered. This heating effect occurs mostly near the photosphere, not deep enough to significantly affect the size of planets. We show that the surface irradiation suppresses convection near the photospheric region on the day side. In some cases convection zones appear near the surface on the night side. This structural modification may influence the response and dissipation of tidal disturbances and alter the circularization and synchronization time scales.

Chris Jones

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Clarifying How Degree Entropies and Degree-Degree Correlations Relate to Network Robustness

Improving language models by retrieving from trillions of tokens

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Unified Scaling Laws for Routed Language Models

HL-LHC Computing Review: Common Tools and Community Software

Sum-of-Squares Lower Bounds for Sherrington-Kirkpatrick via Planted Affine Planes

The long non-coding RNA HOTAIR is transcriptionally activated by HOXA9 and is an independent prognostic marker in patients with malignant glioma

A Noisy-Influence Regularity Lemma for Boolean Functions

Lightweight User-Space Record And Replay

Snowmass Computing Frontier: Software Development, Staffing and Training

On the Surface Heating of Synchronously-Spinning Short-Period Jovian Planets