Source author record

Erik Skau

Erik Skau appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Digital Libraries Distributed, Parallel, and Cluster Computing econ.GN Information Retrieval math.NA Numerical Analysis q-fin.EC

Catalog footprint

What is connected

4works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Nonnegative Canonical Tensor Decomposition with Linear Constraints: nnCANDELINC

There is an emerging interest in tensor factorization applications in big-data analytics and machine learning. To speed up the factorization of extra-large datasets, organized in multidimensional arrays (aka tensors), easy to compute compression-based tensor representations, such as Tucker and Tensor Train formats, are used to approximate the initial large-tensor. Further, tensor factorization is used to extract latent features that can facilitate discoveries of new mechanisms and signatures hidden in the data, where the explainability of the latent features is of principal importance. Nonnegative tensor factorization extracts latent features that are naturally sparse and parts of the data, which makes them easily interpretable. However, to take into account available domain knowledge and subject matter expertise, additional constraints often need to be imposed, which lead us to Canonical decomposition with linear constraints (CANDELINC), a Canonical Polyadic Decomposition with rank deficient factors. In CANDELINC, Tucker compression is used as a pre-processing step, which leads to a larger residual error but to more explainable latent features. Here, we propose a nonnegative CANDELINC (nnCANDELINC) accomplished via a specific nonnegative Tucker decomposition; we refer to as minimal or canonical nonnegative Tucker. We derive several results required to understand the specificity of nnCANDELINC, focusing on the difficulties of preserving the nonnegative rank to its Tucker core and comparing the real-valued to the nonnegative case. Finally, we demonstrate nnCANDELINC performance on synthetic and real-world examples.

preprint2021arXiv

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

We utilize a recently developed topic modeling method called SeNMFk, extending the standard Non-negative Matrix Factorization (NMF) methods by incorporating the semantic structure of the text, and adding a robust system for determining the number of topics. With SeNMFk, we were able to extract coherent topics validated by human experts. From these topics, a few are relatively general and cover broad concepts, while the majority can be precisely mapped to specific scientific effects or measurement techniques. The topics also differ by ubiquity, with only three topics prevalent in almost 40 percent of the abstract, while each specific topic tends to dominate a small subset of the abstracts. These results demonstrate the ability of SeNMFk to produce a layered and nuanced analysis of large scientific corpora.

preprint2020arXiv

Determination of Latent Dimensionality in International Trade Flow

Currently, high-dimensional data is ubiquitous in data science, which necessitates the development of techniques to decompose and interpret such multidimensional (aka tensor) datasets. Finding a low dimensional representation of the data, that is, its inherent structure, is one of the approaches that can serve to understand the dynamics of low dimensional latent features hidden in the data. Nonnegative RESCAL is one such technique, particularly well suited to analyze self-relational data, such as dynamic networks found in international trade flows. Nonnegative RESCAL computes a low dimensional tensor representation by finding the latent space containing multiple modalities. Estimating the dimensionality of this latent space is crucial for extracting meaningful latent features. Here, to determine the dimensionality of the latent space with nonnegative RESCAL, we propose a latent dimension determination method which is based on clustering of the solutions of multiple realizations of nonnegative RESCAL decompositions. We demonstrate the performance of our model selection method on synthetic data and then we apply our method to decompose a network of international trade flows data from International Monetary Fund and validate the resulting features against empirical facts from economic literature.

preprint2020arXiv

Distributed Non-Negative Tensor Train Decomposition

The era of exascale computing opens new venues for innovations and discoveries in many scientific, engineering, and commercial fields. However, with the exaflops also come the extra-large high-dimensional data generated by high-performance computing. High-dimensional data is presented as multidimensional arrays, aka tensors. The presence of latent (not directly observable) structures in the tensor allows a unique representation and compression of the data by classical tensor factorization techniques. However, the classical tensor methods are not always stable or they can be exponential in their memory requirements, which makes them not suitable for high-dimensional tensors. Tensor train (TT) is a state-of-the-art tensor network introduced for factorization of high-dimensional tensors. TT transforms the initial high-dimensional tensor in a network of three-dimensional tensors that requires only a linear storage. Many real-world data, such as, density, temperature, population, probability, etc., are non-negative and for an easy interpretation, the algorithms preserving non-negativity are preferred. Here, we introduce a distributed non-negative tensor-train and demonstrate its scalability and the compression on synthetic and real-world big datasets.

Erik Skau

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Nonnegative Canonical Tensor Decomposition with Linear Constraints: nnCANDELINC

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

Determination of Latent Dimensionality in International Trade Flow

Distributed Non-Negative Tensor Train Decomposition