Researcher profile

Noah Smith

Noah Smith contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2020arXiv

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random seeds can lead to substantially different results. To better understand this phenomenon, we experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds. We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials. Further, we examine two factors influenced by the choice of random seed: weight initialization and training data order. We find that both contribute comparably to the variance of out-of-sample performance, and that some weight initializations perform well across all tasks explored. On small datasets, we observe that many fine-tuning trials diverge part of the way through training, and we offer best practices for practitioners to stop training less promising runs early. We publicly release all of our experimental data, including training and validation scores for 2,100 trials, to encourage further analysis of training dynamics during fine-tuning.

preprint2020arXiv

Observation of Dynamic Stark Resonances in Strong-Field Excitation

We investigate AC Stark-shifted resonances in argon with ultrashort near-infrared pulses. Using 30 fs pulses we observe periodic enhancements of the excitation yield in the intensity regions corresponding to the absorption of 13 and 14 photons. By reducing the pulse duration to 6 fs with only a few optical cycles, we also demonstrate that the enhancements are significantly reduced beyond what is measurable in the experiment. Comparing these to numerical predictions, which are in quantitative agreement with experimental results, we find that even though the quantum-state distribution can be broad, the enhancements are largely due to efficient population of a select few AC Stark-shifted resonant states rather than the closing of an ionization channel. Because these resonances are dependent on the frequency and intensity of the laser field, the broad bandwidth of the 6 fs pulses means that the resonance condition is fulfilled across a large range of intensities. This is further exaggerated by volume-averaging effects, resulting in excitation of the $5g$ state at almost all intensities and reducing the apparent magnitude of the enhancements. For 30 fs pulses, volume averaging also broadens the quantum state distribution but the enhancements are still large enough to survive. In this case, selectivity of excitation to a single state is reduced below 25% of the relative population. However, an analysis of TDSE simulations indicates that excitation of up to 60% into a single state is possible if volume averaging can be eliminated and the intensity can be precisely controlled.

preprint2020arXiv

Promoting Graph Awareness in Linearized Graph-to-Text Generation

Generating text from structured inputs, such as meaning representations or RDF triples, has often involved the use of specialized graph-encoding neural networks. However, recent applications of pretrained transformers to linearizations of graph inputs have yielded state-of-the-art generation results on graph-to-text tasks. Here, we explore the ability of these linearized models to encode local graph structures, in particular their invariance to the graph linearization strategy and their ability to reconstruct corrupted inputs. Our findings motivate solutions to enrich the quality of models' implicit graph encodings via scaffolding. Namely, we use graph-denoising objectives implemented in a multi-task text-to-text framework. We find that these denoising scaffolds lead to substantial improvements in downstream generation in low-resource settings.