Source author record

Xavier Garcia

Xavier Garcia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning math.DS Artificial Intelligence math.PR

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Building Machine Translation Systems for the Next Thousand Languages

In this paper we share findings from our effort to build practical machine translation (MT) systems capable of translating across over one thousand languages. We describe results in three research domains: (i) Building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-driven filtering techniques; (ii) Developing practical MT models for under-served languages by leveraging massively multilingual models trained with supervised parallel data for over 100 high-resource languages and monolingual datasets for an additional 1000+ languages; and (iii) Studying the limitations of evaluation metrics for these languages and conducting qualitative analysis of the outputs from our MT models, highlighting several frequent error modes of these types of models. We hope that our work provides useful insights to practitioners working towards building MT systems for currently understudied languages, and highlights research directions that can complement the weaknesses of massively multilingual models in data-sparse settings.

preprint2022arXiv

Examining Scaling and Transfer of Language Model Architectures for Machine Translation

Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs. In this work, we thoroughly examine the role of several architectural design choices on the performance of LMs on bilingual, (massively) multilingual and zero-shot translation tasks, under systematic variations of data conditions and model sizes. Our results show that: (i) Different LMs have different scaling properties, where architectural differences often have a significant impact on model performance at small scales, but the performance gap narrows as the number of parameters increases, (ii) Several design choices, including causal masking and language-modeling objectives for the source sequence, have detrimental effects on translation quality, and (iii) When paired with full-visible masking for source sequences, LMs could perform on par with EncDec on supervised bilingual and multilingual translation tasks, and improve greatly on zero-shot directions by facilitating the reduction of off-target translations.

preprint2022arXiv

Few-shot Controllable Style Transfer for Low-Resource Multilingual Settings

Style transfer is the task of rewriting a sentence into a target style while approximately preserving content. While most prior literature assumes access to a large style-labelled corpus, recent work (Riley et al. 2021) has attempted "few-shot" style transfer using only 3-10 sentences at inference for style extraction. In this work we study a relevant low-resource setting: style transfer for languages where no style-labelled corpora are available. We notice that existing few-shot methods perform this task poorly, often copying inputs verbatim. We push the state-of-the-art for few-shot style transfer with a new method modeling the stylistic difference between paraphrases. When compared to prior work, our model achieves 2-3x better performance in formality transfer and code-mixing addition across seven languages. Moreover, our method is better at controlling the style transfer magnitude using an input scalar knob. We report promising qualitative results for several attribute transfer tasks (sentiment transfer, simplification, gender neutralization, text anonymization) all without retraining the model. Finally, we find model evaluation to be difficult due to the lack of datasets and metrics for many languages. To facilitate future research we crowdsource formality annotations for 4000 sentence pairs in four Indic languages, and use this data to design our automatic evaluations.

preprint2022arXiv

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. $\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.

preprint2022arXiv

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Achieving universal translation between all human language pairs is the holy-grail of machine translation (MT) research. While recent progress in massively multilingual MT is one step closer to reaching this goal, it is becoming evident that extending a multilingual MT system simply by training on more parallel data is unscalable, since the availability of labeled data for low-resource and non-English-centric language pairs is forbiddingly limited. To this end, we present a pragmatic approach towards building a multilingual MT model that covers hundreds of languages, using a mixture of supervised and self-supervised objectives, depending on the data availability for different language pairs. We demonstrate that the synergy between these two training paradigms enables the model to produce high-quality translations in the zero-resource setting, even surpassing supervised translation quality for low- and mid-resource languages. We conduct a wide array of experiments to understand the effect of the degree of multilingual supervision, domain mismatches and amounts of parallel and monolingual data on the quality of our self-supervised multilingual models. To demonstrate the scalability of the approach, we train models with over 200 languages and demonstrate high performance on zero-resource translation on several previously under-studied languages. We hope our findings will serve as a stepping stone towards enabling translation for the next thousand languages.

preprint2022arXiv

Using natural language prompts for machine translation

We explore the use of natural language prompts for controlling various aspects of the outputs generated by machine translation models. We demonstrate that natural language prompts allow us to influence properties like formality or specific dialect of the output. We show that using language names to control the output language of multilingual translation models enables positive transfer for unseen language pairs. This unlocks the ability to translate into languages not seen during fine-tuning by using their English names. We investigate how scale, number of pre-training steps, number of languages in fine-tuning, and language similarity affect this phenomenon.

preprint2020arXiv

Machine learning applied in the multi-scale 3D stress modelling

This paper proposes a methodology to estimate stress in the subsurface by a hybrid method combining finite element modeling and neural networks. This methodology exploits the idea of obtaining a multi-frequency solution in the numerical modeling of systems whose behavior involves a wide span of length scales. One low-frequency solution is obtained via inexpensive finite element modeling at a coarse scale. The second solution provides the fine-grained details introduced by the heterogeneity of the free parameters at the fine scale. This high-frequency solution is estimated via neural networks -trained with partial solutions obtained in high-resolution finite-element models. When the coarse finite element solutions are combined with the neural network estimates, the results are within a 2\% error of the results that would be computed with high-resolution finite element models. This paper discusses the benefits and drawbacks of the method and illustrates their applicability via a worked example.

preprint2014arXiv

Invariant Measures for Hybrid Stochastic Systems

In this paper, we seek to understand the behavior of dynamical systems that are perturbed by a parameter that changes discretely in time. If we impose certain conditions, we can study certain embedded systems within a hybrid system as time-homogeneous Markov processes. In particular, we prove the existence of invariant measures for each embedded system and relate the invariant measures for the various systems through the flow. We calculate these invariant measures explicitly in several illustrative examples.

preprint2014arXiv

Limit and Morse Sets for Deterministic Hybrid Systems

The term "hybrid system" refers to a continuous time dynamical system that undergoes Markovian perturbations at discrete time intervals. In this paper, we find that under the right formulation, a hybrid system can be treated as a dynamical system on a compact space. This allows one to study its limit sets. We examine the Morse decompositions of hybrid systems, find a sufficient condition for the existence of a non-trivial Morse decomposition, and study the Morse sets of such a decomposition. Finally, we consider the case in which the Markovian perturbations are small, showing that trajectories in a hybrid system with small perturbations behave similarly to those of the unperturbed dynamical system.

preprint2013arXiv

On Rationally Ergodic and Rationally Weakly Mixing Rank-One Transformations

We study the notions of weak rational ergodicity and rational weak mixing as defined by Jon Aaronson. We prove that various families of infinite measure-preserving rank-one transformations possess (or do not posses) these properties, and consider their relation to other notions of mixing in infinite measure.

Xavier Garcia

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Building Machine Translation Systems for the Next Thousand Languages

Examining Scaling and Transfer of Language Model Architectures for Machine Translation

Few-shot Controllable Style Transfer for Low-Resource Multilingual Settings

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Using natural language prompts for machine translation

Machine learning applied in the multi-scale 3D stress modelling

Invariant Measures for Hybrid Stochastic Systems

Limit and Morse Sets for Deterministic Hybrid Systems

On Rationally Ergodic and Rationally Weakly Mixing Rank-One Transformations