Researcher profile

Erik Skau

Erik Skau contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Nonnegative Canonical Tensor Decomposition with Linear Constraints: nnCANDELINC

There is an emerging interest in tensor factorization applications in big-data analytics and machine learning. To speed up the factorization of extra-large datasets, organized in multidimensional arrays (aka tensors), easy to compute compression-based tensor representations, such as Tucker and Tensor Train formats, are used to approximate the initial large-tensor. Further, tensor factorization is used to extract latent features that can facilitate discoveries of new mechanisms and signatures hidden in the data, where the explainability of the latent features is of principal importance. Nonnegative tensor factorization extracts latent features that are naturally sparse and parts of the data, which makes them easily interpretable. However, to take into account available domain knowledge and subject matter expertise, additional constraints often need to be imposed, which lead us to Canonical decomposition with linear constraints (CANDELINC), a Canonical Polyadic Decomposition with rank deficient factors. In CANDELINC, Tucker compression is used as a pre-processing step, which leads to a larger residual error but to more explainable latent features. Here, we propose a nonnegative CANDELINC (nnCANDELINC) accomplished via a specific nonnegative Tucker decomposition; we refer to as minimal or canonical nonnegative Tucker. We derive several results required to understand the specificity of nnCANDELINC, focusing on the difficulties of preserving the nonnegative rank to its Tucker core and comparing the real-valued to the nonnegative case. Finally, we demonstrate nnCANDELINC performance on synthetic and real-world examples.

preprint2021arXiv

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

We utilize a recently developed topic modeling method called SeNMFk, extending the standard Non-negative Matrix Factorization (NMF) methods by incorporating the semantic structure of the text, and adding a robust system for determining the number of topics. With SeNMFk, we were able to extract coherent topics validated by human experts. From these topics, a few are relatively general and cover broad concepts, while the majority can be precisely mapped to specific scientific effects or measurement techniques. The topics also differ by ubiquity, with only three topics prevalent in almost 40 percent of the abstract, while each specific topic tends to dominate a small subset of the abstracts. These results demonstrate the ability of SeNMFk to produce a layered and nuanced analysis of large scientific corpora.

preprint2020arXiv

Determination of Latent Dimensionality in International Trade Flow

Currently, high-dimensional data is ubiquitous in data science, which necessitates the development of techniques to decompose and interpret such multidimensional (aka tensor) datasets. Finding a low dimensional representation of the data, that is, its inherent structure, is one of the approaches that can serve to understand the dynamics of low dimensional latent features hidden in the data. Nonnegative RESCAL is one such technique, particularly well suited to analyze self-relational data, such as dynamic networks found in international trade flows. Nonnegative RESCAL computes a low dimensional tensor representation by finding the latent space containing multiple modalities. Estimating the dimensionality of this latent space is crucial for extracting meaningful latent features. Here, to determine the dimensionality of the latent space with nonnegative RESCAL, we propose a latent dimension determination method which is based on clustering of the solutions of multiple realizations of nonnegative RESCAL decompositions. We demonstrate the performance of our model selection method on synthetic data and then we apply our method to decompose a network of international trade flows data from International Monetary Fund and validate the resulting features against empirical facts from economic literature.

preprint2020arXiv

Distributed Non-Negative Tensor Train Decomposition

The era of exascale computing opens new venues for innovations and discoveries in many scientific, engineering, and commercial fields. However, with the exaflops also come the extra-large high-dimensional data generated by high-performance computing. High-dimensional data is presented as multidimensional arrays, aka tensors. The presence of latent (not directly observable) structures in the tensor allows a unique representation and compression of the data by classical tensor factorization techniques. However, the classical tensor methods are not always stable or they can be exponential in their memory requirements, which makes them not suitable for high-dimensional tensors. Tensor train (TT) is a state-of-the-art tensor network introduced for factorization of high-dimensional tensors. TT transforms the initial high-dimensional tensor in a network of three-dimensional tensors that requires only a linear storage. Many real-world data, such as, density, temperature, population, probability, etc., are non-negative and for an easy interpretation, the algorithms preserving non-negativity are preferred. Here, we introduce a distributed non-negative tensor-train and demonstrate its scalability and the compression on synthetic and real-world big datasets.