Researcher profile

Samuel Rönnqvist

Samuel Rönnqvist contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
7works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2021arXiv

Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers

We explore cross-lingual transfer of register classification for web documents. Registers, that is, text varieties such as blogs or news are one of the primary predictors of linguistic variation and thus affect the automatic processing of language. We introduce two new register annotated corpora, FreCORE and SweCORE, for French and Swedish. We demonstrate that deep pre-trained language models perform strongly in these languages and outperform previous state-of-the-art in English and Finnish. Specifically, we show 1) that zero-shot cross-lingual transfer from the large English CORE corpus can match or surpass previously published monolingual models, and 2) that lightweight monolingual classification requiring very little training data can reach or surpass our zero-shot performance. We further analyse classification results finding that certain registers continue to pose challenges in particular for cross-lingual transfer.

preprint2015arXiv

Bank Networks from Text: Interrelations, Centrality and Determinants

In the wake of the still ongoing global financial crisis, bank interdependencies have come into focus in trying to assess linkages among banks and systemic risk. To date, such analysis has largely been based on numerical data. By contrast, this study attempts to gain further insight into bank interconnections by tapping into financial discourse. We present a text-to-network process, which has its basis in co-occurrences of bank names and can be analyzed quantitatively and visualized. To quantify bank importance, we propose an information centrality measure to rank and assess trends of bank centrality in discussion. For qualitative assessment of bank networks, we put forward a visual, interactive interface for better illustrating network structures. We illustrate the text-based approach on European Large and Complex Banking Groups (LCBGs) during the ongoing financial crisis by quantifying bank interrelations and centrality from discussion in 3M news articles, spanning 2007Q1 to 2014Q3.

preprint2015arXiv

Detect & Describe: Deep learning of bank stress in the news

News is a pertinent source of information on financial risks and stress factors, which nevertheless is challenging to harness due to the sparse and unstructured nature of natural text. We propose an approach based on distributional semantics and deep learning with neural networks to model and link text to a scarce set of bank distress events. Through unsupervised training, we learn semantic vector representations of news articles as predictors of distress events. The predictive model that we learn can signal coinciding stress with an aggregated index at bank or European level, while crucially allowing for automatic extraction of text descriptions of the events, based on passages with high stress levels. The method offers insight that models based on other types of data cannot provide, while offering a general means for interpreting this type of semantic-predictive model. We model bank distress with data on 243 events and 6.6M news articles for 101 large European banks.

preprint2015arXiv

Exploratory topic modeling with distributional semantics

As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover. With unsupervised, exploratory analysis, no prior knowledge about the content is required and highly open-ended tasks can be supported. In the past few years, probabilistic topic modeling has emerged as a popular approach to this problem. Nevertheless, the representation of the latent topics as aggregations of semi-coherent terms limits their interpretability and level of detail. This paper presents an alternative approach to topic modeling that maps topics as a network for exploration, based on distributional semantics using learned word vectors. From the granular level of terms and their semantic similarity relations global topic structures emerge as clustered regions and gradients of concepts. Moreover, the paper discusses the visual interactive representation of the topic map, which plays an important role in supporting its exploration.

preprint2014arXiv

Interactive Visual Exploration of Topic Models using Graphs

Probabilistic topic modeling is a popular and powerful family of tools for uncovering thematic structure in large sets of unstructured text documents. While much attention has been directed towards the modeling algorithms and their various extensions, comparatively few studies have concerned how to present or visualize topic models in meaningful ways. In this paper, we present a novel design that uses graphs to visually communicate topic structure and meaning. By connecting topic nodes via descriptive keyterms, the graph representation reveals topic similarities, topic meaning and shared, ambiguous keyterms. At the same time, the graph can be used for information retrieval purposes, to find documents by topic or topic subsets. To exemplify the utility of the design, we illustrate its use for organizing and exploring corpora of financial patents.

preprint2013arXiv

Cluster coloring of the Self-Organizing Map: An information visualization perspective

This paper takes an information visualization perspective to visual representations in the general SOM paradigm. This involves viewing SOM-based visualizations through the eyes of Bertin's and Tufte's theories on data graphics. The regular grid shape of the Self-Organizing Map (SOM), while being a virtue for linking visualizations to it, restricts representation of cluster structures. From the viewpoint of information visualization, this paper provides a general, yet simple, solution to projection-based coloring of the SOM that reveals structures. First, the proposed color space is easy to construct and customize to the purpose of use, while aiming at being perceptually correct and informative through two separable dimensions. Second, the coloring method is not dependent on any specific method of projection, but is rather modular to fit any objective function suitable for the task at hand. The cluster coloring is illustrated on two datasets: the iris data, and welfare and poverty indicators.

preprint2013arXiv

From Text to Bank Interrelation Maps

In the wake of the ongoing global financial crisis, interdependencies among banks have come into focus in trying to assess systemic risk. To date, such analysis has largely been based on numerical data. By contrast, this study attempts to gain further insight into bank interconnections by tapping into financial discussion. Co-mentions of bank names are turned into a network, which can be visualized and analyzed quantitatively, in order to illustrate characteristics of individual banks and the network as a whole. The approach allows for the study of temporal dynamics of the network, to highlight changing patterns of discussion that reflect real-world events, the current financial crisis in particular. For instance, it depicts how connections from distressed banks to other banks and supervisory authorities have emerged and faded over time, as well as how global shifts in network structure coincide with severe crisis episodes. The usage of textual data holds an additional advantage in the possibility of gaining a more qualitative understanding of an observed interrelation, through its context. We illustrate our approach using a case study on Finnish banks and financial institutions. The data set comprises 3.9M posts from online, financial and business-related discussion, during the years 2004 to 2012. Future research includes analyzing European news articles with a broader perspective, and a focus on improving semantic description of relations.