Source author record

Vitaly Nikolaev

Vitaly Nikolaev appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning Artificial Intelligence physics.hist-ph quant-ph

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Aspects of Superdeterminism Made Intuitive

We attempt to make superdeterminism more intuitive, notably by simulating a deterministic model system, a billiard game. In this system an initial 'bang' correlates all events, just as in the superdeterministic universe. We introduce the notions of 'strong' and 'soft' superdeterminism, in order to clarify debates in the literature. Based on the analogy with billiards, we show that superdeterministic correlations may exist as a matter of principle, but be undetectable for all practical purposes. This allows us to counter classical objections to superdeterminism such as the claim that it would be at odds with the scientific method, and with the construction of new theories. Finally, we show that probability theory, as a physical theory, indicates that superdeterminism has a greater explanatory power than its competitors: it can coherently answer questions for which other positions remain powerless.

preprint2022arXiv

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.

preprint2022arXiv

Measuring Attribution in Natural Language Generation Models

With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a two-stage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether model-generated statements are supported by underlying sources. We release guidelines for the human evaluation studies.

preprint2022arXiv

SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets

NLP researchers need more, higher-quality text datasets. Human-labeled datasets are expensive to collect, while datasets collected via automatic retrieval from the web such as WikiBio are noisy and can include undesired biases. Moreover, data sourced from the web is often included in datasets used to pretrain models, leading to inadvertent cross-contamination of training and test sets. In this work we introduce a novel method for efficient dataset curation: we use a large language model to provide seed generations to human raters, thereby changing dataset authoring from a writing task to an editing task. We use our method to curate SynthBio - a new evaluation set for WikiBio - composed of structured attribute lists describing fictional individuals, mapped to natural language biographies. We show that our dataset of fictional biographies is less noisy than WikiBio, and also more balanced with respect to gender and nationality.

preprint2020arXiv

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

Confidently making progress on multilingual modeling requires challenging, trustworthy evaluations. We present TyDi QA---a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology---the set of linguistic features each language expresses---such that we expect models performing well on this set to generalize across a large number of the world's languages. We present a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora. To provide a realistic information-seeking task and avoid priming effects, questions are written by people who want to know the answer, but don't know the answer yet, and the data is collected directly in each language without the use of translation.

Vitaly Nikolaev

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Aspects of Superdeterminism Made Intuitive

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Measuring Attribution in Natural Language Generation Models

SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages