Source author record

Qijun Tan

Qijun Tan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning Artificial Intelligence math.RT nlin.CD

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics

In Neural Machine Translation, it is typically assumed that the sentence with the highest estimated probability should also be the translation with the highest quality as measured by humans. In this work, we question this assumption and show that model estimates and translation quality only vaguely correlate. We apply Minimum Bayes Risk (MBR) decoding on unbiased samples to optimize diverse automated metrics of translation quality as an alternative inference strategy to beam search. Instead of targeting the hypotheses with the highest model probability, MBR decoding extracts the hypotheses with the highest estimated quality. Our experiments show that the combination of a neural translation model with a neural reference-based metric, BLEURT, results in significant improvement in human evaluations. This improvement is obtained with translations different from classical beam-search output: these translations have much lower model likelihood and are less favored by surface metrics like BLEU.

preprint2022arXiv

Toward More Effective Human Evaluation for Machine Translation

Improvements in text generation technologies such as machine translation have necessitated more costly and time-consuming human evaluation procedures to ensure an accurate signal. We investigate a simple way to reduce cost by reducing the number of text segments that must be annotated in order to accurately predict a score for a complete test set. Using a sampling approach, we demonstrate that information from document membership and automatic metrics can help improve estimates compared to a pure random sampling baseline. We achieve gains of up to 20% in average absolute error by leveraging stratified sampling and control variates. Our techniques can improve estimates made from a fixed annotation budget, are easy to implement, and can be applied to any problem with structure similar to the one we study.

preprint2021arXiv

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

Human evaluation of modern high-quality machine translation systems is a difficult problem, and there is increasing evidence that inadequate evaluation procedures can lead to erroneous conclusions. While there has been considerable research on human evaluation, the field still lacks a commonly-accepted standard procedure. As a step toward this goal, we propose an evaluation methodology grounded in explicit error analysis, based on the Multidimensional Quality Metrics (MQM) framework. We carry out the largest MQM research study to date, scoring the outputs of top systems from the WMT 2020 shared task in two language pairs using annotations provided by professional translators with access to full document context. We analyze the resulting data extensively, finding among other results a substantially different ranking of evaluated systems from the one established by the WMT crowd workers, exhibiting a clear preference for human over machine output. Surprisingly, we also find that automatic metrics based on pre-trained embeddings can outperform human crowd workers. We make our corpus publicly available for further research.

preprint2016arXiv

Mackey analogy via $\mathcal{D}$-modules in the example of $SL(2,\mathbb{R})$

A conjecture by Mackey and Higson claims that there is close relationship between irreducible representations of a real reductive group and those of its Cartan motion group. The case of irreducible tempered unitary representations has been verified recently by Afgoustidis. We study the admissible representations of $SL(2,\mathbb{R})$ by considering families of $\D$-modules over its flag varieties. We make a conjecture which gives a geometric understanding of the Makcey-Higson bijection in the general case.

preprint2012arXiv

Potential Function in a Continuous Dissipative Chaotic System: Decomposition Scheme and Role of Strange Attractor

In this paper, we demonstrate, first in literature known to us, that potential functions can be constructed in continuous dissipative chaotic systems and can be used to reveal their dynamical properties. To attain this aim, a Lorenz-like system is proposed and rigorously proved chaotic for exemplified analysis. We explicitly construct a potential function monotonically decreasing along the system's dynamics, revealing the structure of the chaotic strange attractor. The potential function can have different forms of construction. We also decompose the dynamical system to explain for the different origins of chaotic attractor and strange attractor. Consequently, reasons for the existence of both chaotic nonstrange attractors and nonchaotic strange attractors are clearly discussed within current decomposition framework.