Researcher profile

Thomas Wolf

Thomas Wolf contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

Charge-density-wave transitions, phase diagram, soft phonon and possible electronic nematicity: a thermodynamic investigation of BaNi2(As,P)2

A detailed investigation of BaNi2(As,P)2 single crystals using high-resolution thermal-expansion, heat-capacity,Young-modulus and resistivity measurements is presented. The experimental data are complemented by density-functional calculations. The phase diagram of BaNi2(As,P)2 is shown to be much richer than suggested by the original data of Kudo et al. [Phys. Rev. Lett. 109, 097002 (2012)]. The transition to the commensurate charge-density-wave (C-CDW) is always preceded by a four-fold symmetry-breaking transition associated with the long-range ordering of a strongly fluctuating unidirectional incommensurate charge-density wave (I-CDW). Significant precursors above the I-CDW and C-CDW transitions are seen in the thermal expansion and resistivity and are particularly evident in the temperature dependence of the $c/a$ ratio of the lattice parameters. Heat-capacity measurements of the crystals with a higher P content and a higher critical temperature of 3.2 K uncover a Debye-like behavior of a soft-phonon mode with a very low Debye temperature of roughly 15 K. Associated with this soft phonon are unusually large thermal-expansion anomalies, resulting in logarithmically diverging uniaxial phonon Grueneisen parameters. Young-modulus data of these higher-Tc crystals exhibit a significant softening in both B1g and B2g channels, which is argued to be incompatible with nematic criticality and is rather associated with a broad phase transition to an hitherto unknown structure. Possible origins of the increase in the superconducting critical temperature with P-substitution are discussed.

preprint2022arXiv

Multitask Prompted Training Enables Zero-Shot Task Generalization

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely held-out tasks. We fine-tune a pretrained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-bench benchmark, outperforming models up to 6x its size. All trained models are available at https://github.com/bigscience-workshop/t-zero and all prompts are available at https://github.com/bigscience-workshop/promptsource.

preprint2022arXiv

Novel Increase of Superconducting Critical Temperature of an Iron-Superconductor due to Ion Implantation

Energetic ion irradiation usually decreases superconducting critical temperature(Tc), with the few exceptions involving increases up to a few K only. However, our recent 2.5X10^15 Ar/cm2 irradiations by 1.5 MeV Ar6+ enhanced Tc of the single crystal Fe-superconductor Ba(Fe0.943Co0.057)2As2 by 8.2 K from its initial onset Tc of ~16.9 K as measured from the real part of the magnetic susceptibility, matching measurements from the imaginary part, electrical resistivity and magnetization. Ozaki et al. (2016) explained their Tc increase of 0.5 K in FeSe0.5Te0.5 films with the thickness (t) < the irradiating proton range (R), as due to a nanoscale compressive strain developed from radiation damage of the lattice. Here, Ar irradiation with t > R results in an Ar implanted layer in our crystal. Implanted inert gas atoms often agglomerate into high-pressure bubbles to exert a large compressive strain on the lattice. We suggest that this additional compressive strain could be the reason for such a large (~49%) Tc increase.

preprint2022arXiv

Quantum critical fluctuations in an Fe-based superconductor

Quantum critical fluctuations may prove to play an instrumental role in the formation of unconventional superconductivity. Here, we show that the characteristic scaling of a marginal Fermi liquid is present in inelastic light scattering data of an Fe-based superconductor tuned through a quantum critical point (QCP) by chemical substitution or doping. From the doping dependence of the imaginary time dynamics we are able to distinguish regions dominated by quantum critical behavior from those having classical critical responses. This dichotomy reveals a connection between the marginal Fermi liquid behavior and quantum criticality. In particular, the overlap between regions of high superconducting transition temperatures and quantum critical scaling suggests a contribution from quantum fluctuations to the formation of superconductivity.

preprint2022arXiv

Training Transformers Together

The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, we collaboratively trained a text-to-image transformer similar to OpenAI DALL-E. We invited the viewers to join the ongoing training run, showing them instructions on how to contribute using the available hardware. We explained how to address the engineering challenges associated with such a training run (slow communication, limited memory, uneven performance between devices, and security concerns) and discussed how the viewers can set up collaborative training runs themselves. Finally, we show that the resulting model generates images of reasonable quality on a number of prompts.

preprint2021arXiv

Nematic response revealed by coherent phonon oscillations in BaFe$_2$As$_2$

We investigate coherent phonon oscillations of BaFe$_2$As$_2$ using optical pump-probe spectroscopy. Time-resolved optical reflectivity shows periodic modulations due to $A_{1g}$ coherent phonon of $c$-axis arsenic vibrations. Optical probe beams polarized along the orthorhombic $a$- and $b$-axes reveal that the initial phase of coherent oscillations shows a systematic deviation as a function of temperature, although these oscillations arise from the same $c$-axis arsenic vibrations. The oscillation-phase remains anisotropic even in the tetragonal structure, reflecting a nematic response of BaFe$_2$As$_2$. Our study suggests that investigation on the phase of coherent phonon oscillations in optical reflectivity can offer unique evidence of a nematic order strongly coupled to a lattice instability.

preprint2020arXiv

Band Engineering of Dirac cones in Iron Chalcogenides

By band engineering the iron chalcogenide Fe(Se,Te) via ab-initio calculations, we search for topological surface states and realizations of Majorana bound states. Proposed topological states are expected to occur for non-stoichiometric compositions on a surface Dirac cone where issues like disorder scattering and charge transfer between relevant electronic states have to be addressed. However, this surface Dirac cone is well above the Fermi-level. Our goal is to theoretically design a substituted crystal in which the surface Dirac cone is shifted towards the Fermi-level by modifying the bulk material without disturbing the surface. Going beyond conventional density functional theory (DFT), we apply the coherent potential approximation (BEB-CPA) in a mixed basis pseudo-potential framework to scan the substitutional phase-space of co-substitutions on the Se-sites. We have identified iodine as a promising candidate for intrinsic doping. Our specific proposal is that FeSe$_{0.325}$I$_{0.175}$Te$_{0.5}$ is a very likely candidate to exhibit a Dirac cone right at the Fermi energy without inducing strong disorder scattering.

preprint2020arXiv

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device study.

preprint2020arXiv

Evolution of Helimagnetic Correlations when approaching the Quantum Critical Point of Mn$_{1-x}$Fe$_x$Si

We present a comprehensive investigation of the evolution of helimagnetic correlations in Mn$_{1-x}$Fe$_x$Si with increasing doping. By combining polarised neutron scattering and high resolution Neutron Spin Echo spectroscopy we investigate three samples with $x$=0.09, 0.11 and 0.14, i.e. with compositions on both sides of the concentration $x^* \sim 0.11$ where the helimagnetic Bragg peaks disappear and between $x^*$ and the quantum critical concentration $x_C \sim 0.17$, where $T_C$ vanishes. We find that the abrupt disappearance of the long range helical periodicity at $x^*$, does not affect the precursor fluctuating correlations. These build up with decreasing temperature in a similar way as for the parent compound MnSi. Also the dynamics bears strong similarities to MnSi. The analysis of our results indicates that frustration, possibly due to achiral RKKY interactions, increases with increasing Fe doping. We argue that this effect explains both the expansion of the precursor phase with increasing $x$ and the abrupt disappearance of long range helimagnetic periodicity at $x^*$.

preprint2020arXiv

HuggingFace&#39;s Transformers: State-of-the-art Natural Language Processing

Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. \textit{Transformers} is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at \url{https://github.com/huggingface/transformers}.

preprint2020arXiv

TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation

Natural Language Generation (NLG) models are prone to generating repetitive utterances. In this work, we study the repetition problem for encoder-decoder models, using both recurrent neural network (RNN) and transformer architectures. To this end, we consider the chit-chat task, where the problem is more prominent than in other tasks that need encoder-decoder architectures. We first study the influence of model architectures. By using pre-attention and highway connections for RNNs, we manage to achieve lower repetition rates. However, this method does not generalize to other models such as transformers. We hypothesize that the deeper reason is that in the training corpora, there are hard tokens that are more difficult for a generative model to learn than others and, once learning has finished, hard tokens are still under-learned, so that repetitive generations are more likely to happen. Based on this hypothesis, we propose token loss dynamic reweighting (TLDR) that applies differentiable weights to individual token losses. By using higher weights for hard tokens and lower weights for easy tokens, NLG models are able to learn individual tokens at different paces. Experiments on chit-chat benchmark datasets show that TLDR is more effective in repetition reduction for both RNN and transformer architectures than baselines using different weighting functions.

preprint2020arXiv

Ultrafast magnetic dynamics in insulating YBa$_2$Cu$_3$O$_{6+x}$ revealed by time resolved two-magnon Raman Scattering

Measurement and control of magnetic order and correlations in real time is a rapidly developing scientific area relevant for magnetic memory and spintronics [1,15]. In these experiments an ultra-short laser pulse (pump) is first absorbed by excitations carrying electric dipole moment. These then give their energy to the magnetic subsystem monitored by a time-resolved probe. A lot of progress has been made in investigations of ferromagnets but antiferromagnets are more challenging. Here we introduce time-resolved two-magnon Raman scattering as a novel real time probe of magnetic correlations especially well-suited for antiferromagnets. Its application to the antiferromagnetic charge transfer insulator YBa$_2$Cu$_3$O$_{6+x}$ revealed rapid demagnetization within 90fs of photoexcitation. The relaxation back to thermal equilibrium is characterized by much slower timescales. We interpret these results in terms of slow relaxation of the charge sector and rapid equilibration of the magnetic sector to a prethermal state characterized by parameters that change slowly as the charge sector relaxes.

preprint2020arXiv

Unconventional Hund Metal in a Weak Itinerant Ferromagnet

The physics of weak itinerant ferromagnets is challenging due to their small magnetic moments and the ambiguous role of local interactions governing their electronic properties, many of which violate Fermi liquid theory. While magnetic fluctuations play an important role in the materials&#39; unusual electronic states, the nature of these fluctuations and the paradigms through which they arise remain debated. Here we use inelastic neutron scattering to study magnetic fluctuations in the canonical weak itinerant ferromagnet MnSi. Data reveal that short-wavelength magnons continue to propagate until a mode crossing predicted for strongly interacting quasiparticles is reached, and the local susceptibility peaks at a coherence energy predicted for a correlated Hund metal by first-principles many-body theory. Scattering between electrons and orbital and spin fluctuations in MnSi can be understood at the local level to generate non-Fermi liquid character. These results provide crucial insight into the role of interorbital Hund&#39;s exchange within the broader class of enigmatic multiband itinerant, weak ferromagnets.