Source author record

Noah Smith

Noah Smith appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning physics.atom-ph physics.optics

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random seeds can lead to substantially different results. To better understand this phenomenon, we experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds. We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials. Further, we examine two factors influenced by the choice of random seed: weight initialization and training data order. We find that both contribute comparably to the variance of out-of-sample performance, and that some weight initializations perform well across all tasks explored. On small datasets, we observe that many fine-tuning trials diverge part of the way through training, and we offer best practices for practitioners to stop training less promising runs early. We publicly release all of our experimental data, including training and validation scores for 2,100 trials, to encourage further analysis of training dynamics during fine-tuning.

preprint2020arXiv

Observation of Dynamic Stark Resonances in Strong-Field Excitation

We investigate AC Stark-shifted resonances in argon with ultrashort near-infrared pulses. Using 30 fs pulses we observe periodic enhancements of the excitation yield in the intensity regions corresponding to the absorption of 13 and 14 photons. By reducing the pulse duration to 6 fs with only a few optical cycles, we also demonstrate that the enhancements are significantly reduced beyond what is measurable in the experiment. Comparing these to numerical predictions, which are in quantitative agreement with experimental results, we find that even though the quantum-state distribution can be broad, the enhancements are largely due to efficient population of a select few AC Stark-shifted resonant states rather than the closing of an ionization channel. Because these resonances are dependent on the frequency and intensity of the laser field, the broad bandwidth of the 6 fs pulses means that the resonance condition is fulfilled across a large range of intensities. This is further exaggerated by volume-averaging effects, resulting in excitation of the $5g$ state at almost all intensities and reducing the apparent magnitude of the enhancements. For 30 fs pulses, volume averaging also broadens the quantum state distribution but the enhancements are still large enough to survive. In this case, selectivity of excitation to a single state is reduced below 25% of the relative population. However, an analysis of TDSE simulations indicates that excitation of up to 60% into a single state is possible if volume averaging can be eliminated and the intensity can be precisely controlled.

preprint2020arXiv

Promoting Graph Awareness in Linearized Graph-to-Text Generation

Generating text from structured inputs, such as meaning representations or RDF triples, has often involved the use of specialized graph-encoding neural networks. However, recent applications of pretrained transformers to linearizations of graph inputs have yielded state-of-the-art generation results on graph-to-text tasks. Here, we explore the ability of these linearized models to encode local graph structures, in particular their invariance to the graph linearization strategy and their ability to reconstruct corrupted inputs. Our findings motivate solutions to enrich the quality of models' implicit graph encodings via scaffolding. Namely, we use graph-denoising objectives implemented in a multi-task text-to-text framework. We find that these denoising scaffolds lead to substantial improvements in downstream generation in low-resource settings.

preprint2015arXiv

Sparse Overcomplete Word Vector Representations

Current distributed representations of words show little resemblance to theories of lexical semantics. The former are dense and uninterpretable, the latter largely based on familiar, discrete classes (e.g., supersenses) and relations (e.g., synonymy and hypernymy). We propose methods that transform word vectors into sparse (and optionally binary) vectors. The resulting representations are more similar to the interpretable features typically used in NLP, though they are discovered automatically from raw corpora. Because the vectors are highly sparse, they are computationally easy to work with. Most importantly, we find that they outperform the original vectors on benchmark tasks.

Noah Smith

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Observation of Dynamic Stark Resonances in Strong-Field Excitation

Promoting Graph Awareness in Linearized Graph-to-Text Generation

Sparse Overcomplete Word Vector Representations