Source author record

Rafael E. Banchs

Rafael E. Banchs appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language cs.CY Information Retrieval Machine Learning

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

Evaluating Indirect Strategies for Chinese-Spanish Statistical Machine Translation

Although, Chinese and Spanish are two of the most spoken languages in the world, not much research has been done in machine translation for this language pair. This paper focuses on investigating the state-of-the-art of Chinese-to-Spanish statistical machine translation (SMT), which nowadays is one of the most popular approaches to machine translation. For this purpose, we report details of the available parallel corpus which are Basic Traveller Expressions Corpus (BTEC), Holy Bible and United Nations (UN). Additionally, we conduct experimental work with the largest of these three corpora to explore alternative SMT strategies by means of using a pivot language. Three alternatives are considered for pivoting: cascading, pseudo-corpus and triangulation. As pivot language, we use either English, Arabic or French. Results show that, for a phrase-based SMT system, English is the best pivot language between Chinese and Spanish. We propose a system output combination using the pivot strategies which is capable of outperforming the direct translation strategy. The main objective of this work is motivating and involving the research community to work in this important pair of languages given their demographic impact.

preprint2014arXiv

Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities

We present a comprehensive study on the use of autoencoders for modelling text data, in which (differently from previous studies) we focus our attention on the following issues: i) we explore the suitability of two different models bDA and rsDA for constructing deep autoencoders for text data at the sentence level; ii) we propose and evaluate two novel metrics for better assessing the text-reconstruction capabilities of autoencoders; and iii) we propose an automatic method to find the critical bottleneck dimensionality for text language representations (below which structural information is lost).

preprint2010arXiv

Emotional Reactions and the Pulse of Public Opinion: Measuring the Impact of Political Events on the Sentiment of Online Discussions

This paper analyses changes in public opinion by tracking political discussions in which people voluntarily engage online. Unlike polls or surveys, our approach does not elicit opinions but approximates what the public thinks by analysing the discussions in which they decide to take part. We measure the emotional content of online discussions in three dimensions (valence, arousal and dominance), paying special attention to deviation around average values, which we use as a proxy for disagreement and polarisation. We show that this measurement of public opinion helps predict presidential approval rates, suggesting that there is a point of connection between online discussions (often deemed not representative of the overall population) and offline polls. We also show that this measurement provides a deeper understanding of the individual mechanisms that drive aggregated shifts in public opinion. Our data spans a period that includes two US presidential elections, the attacks of September 11, and the start of military action in Afghanistan and Iraq.