Source author record

Vinith Misra

Vinith Misra appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Computation and Language

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Simplify-then-Translate: Automatic Preprocessing for Black-Box Machine Translation

Black-box machine translation systems have proven incredibly useful for a variety of applications yet by design are hard to adapt, tune to a specific domain, or build on top of. In this work, we introduce a method to improve such systems via automatic pre-processing (APP) using sentence simplification. We first propose a method to automatically generate a large in-domain paraphrase corpus through back-translation with a black-box MT system, which is used to train a paraphrase model that "simplifies" the original sentence to be more conducive for translation. The model is used to preprocess source sentences of multiple low-resource language pairs. We show that this preprocessing leads to better translation performance as compared to non-preprocessed source sentences. We further perform side-by-side human evaluation to verify that translations of the simplified sentences are better than the original ones. Finally, we provide some guidance on recommended language pairs for generating the simplification model corpora by investigating the relationship between ease of translation of a language pair (as measured by BLEU) and quality of the resulting simplification model from back-translations of this language pair (as measured by SARI), and tie this into the downstream task of low-resource translation.

preprint2012arXiv

Distributed Functional Scalar Quantization Simplified

Distributed functional scalar quantization (DFSQ) theory provides optimality conditions and predicts performance of data acquisition systems in which a computation on acquired data is desired. We address two limitations of previous works: prohibitively expensive decoder design and a restriction to sources with bounded distributions. We rigorously show that a much simpler decoder has equivalent asymptotic performance as the conditional expectation estimator previously explored, thus reducing decoder design complexity. The simpler decoder has the feature of decoupled communication and computation blocks. Moreover, we extend the DFSQ framework with the simpler decoder to acquire sources with infinite-support distributions such as Gaussian or exponential distributions. Finally, through simulation results we demonstrate that performance at moderate coding rates is well predicted by the asymptotic analysis, and we give new insight on the rate of convergence.

preprint2012arXiv

The Porosity of Additive Noise Sequences

Consider a binary additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process $\mathbf{Z}$, the capacity is $1-\mathbb{H}(\mathbf{Z})$ ($\mathbb{H}(\cdot)$ denoting the entropy rate). It is shown analogously that when the noise is a deterministic sequence $z^\infty$, the capacity under finite-state encoding and decoding is $1-\barρ(z^\infty)$, where $\barρ(\cdot)$ is Lempel and Ziv's finite-state compressibility. This quantity is termed the \emph{porosity} $\underlineσ(\cdot)$ of an individual noise sequence. A sequence of schemes are presented that universally achieve porosity for any noise sequence. These converse and achievability results may be interpreted both as a channel-coding counterpart to Ziv and Lempel's work in universal source coding, as well as an extension to the work by Lomnitz and Feder and Shayevitz and Feder on communication across modulo-additive channels. Additionally, a slightly more practical architecture is suggested that draws a connection with finite-state predictability, as introduced by Feder, Gutman, and Merhav.

preprint2011arXiv

Distributed Scalar Quantization for Computing: High-Resolution Analysis and Extensions

Communication of quantized information is frequently followed by a computation. We consider situations of \emph{distributed functional scalar quantization}: distributed scalar quantization of (possibly correlated) sources followed by centralized computation of a function. Under smoothness conditions on the sources and function, companding scalar quantizer designs are developed to minimize mean-squared error (MSE) of the computed function as the quantizer resolution is allowed to grow. Striking improvements over quantizers designed without consideration of the function are possible and are larger in the entropy-constrained setting than in the fixed-rate setting. As extensions to the basic analysis, we characterize a large class of functions for which regular quantization suffices, consider certain functions for which asymptotic optimality is achieved without arbitrarily fine quantization, and allow limited collaboration between source encoders. In the entropy-constrained setting, a single bit per sample communicated between encoders can have an arbitrarily-large effect on functional distortion. In contrast, such communication has very little effect in the fixed-rate setting.