Source author record

Peter Filzmoser

Peter Filzmoser appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Computation Machine Learning Distributed, Parallel, and Cluster Computing math.ST Networking and Internet Architecture physics.soc-ph stat.OT Statistics Theory

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Compositional Cubes: A New Concept for Multi-factorial Compositions

Compositional data are commonly known as multivariate observations carrying relative information. Even though the case of vector or even two-factorial compositional data (compositional tables) is already well described in the literature, there is still a need for a comprehensive approach to the analysis of multi-factorial relative-valued data. Therefore, this contribution builds around the current knowledge about compositional data a general theory of work with k-factorial compositional data. As a main finding it turns out that similar to the case of compositional tables also the multi-factorial structures can be orthogonally decomposed into an independent and several interactive parts and, moreover, a coordinate representation allowing for their separate analysis by standard analytical methods can be constructed. For the sake of simplicity, these features are explained in detail for the case of three-factorial compositions (compositional cubes), followed by an outline covering the general case. The three-dimensional structure is analysed in depth in two practical examples, dealing with systems of spatial and time dependent compositional cubes. The methodology is implemented in the R package robCompositions.

preprint2022arXiv

Extending compositional data analysis from a graph signal processing perspective

Traditional methods for the analysis of compositional data consider the log-ratios between all different pairs of variables with equal weight, typically in the form of aggregated contributions. This is not meaningful in contexts where it is known that a relationship only exists between very specific variables (e.g.~for metabolomic pathways), while for other pairs a relationship does not exist. Modeling absence or presence of relationships is done in graph theory, where the vertices represent the variables, and the connections refer to relations. This paper links compositional data analysis with graph signal processing, and it extends the Aitchison geometry to a setting where only selected log-ratios can be considered. The presented framework retains the desirable properties of scale invariance and compositional coherence. An additional extension to include absolute information is readily made. Examples from bioinformatics and geochemistry underline the usefulness of thisapproach in comparison to standard methods for compositional data analysis.

preprint2022arXiv

Identifying the root cause of cable network problems with machine learning

Good quality network connectivity is ever more important. For hybrid fiber coaxial (HFC) networks, searching for upstream high noise in the past was cumbersome and time-consuming. Even with machine learning due to the heterogeneity of the network and its topological structure, the task remains challenging. We present the automation of a simple business rule (largest change of a specific value) and compare its performance with state-of-the-art machine-learning methods and conclude that the precision@1 can be improved by 2.3 times. As it is best when a fault does not occur in the first place, we secondly evaluate multiple approaches to forecast network faults, which would allow performing predictive maintenance on the network.

preprint2022arXiv

Robust and Sparse Multinomial Regression in High Dimensions

A robust and sparse estimator for multinomial regression is proposed for high dimensional data. Robustness of the estimator is achieved by trimming the observations, and sparsity of the estimator is obtained by the elastic net penalty, which is a mixture of $L_1$ and $L_2$ penalties. From this point of view, the proposed estimator is an extension of the enet-LTS estimator \citep{Kurnaz18} for linear and logistic regression to the multinomial regression setting. After introducing an algorithm for its computation, a simulation study is conducted to show the performance in comparison to the non-robust version of the multinomial regression estimator. Some real data examples underline the usefulness of this robust estimator.

preprint2020arXiv

A method to identify geochemical mineralization on linear transect

Mineral exploration in biogeochemistry is related to the detection of anomalies in soil, which is driven by many factors and thus a complex problem. Mikšová, Rieser, and Filzmoser (2019) have introduced a method for the identification of spatial patterns with increased element concentrations in samples along a linear sampling transect. This procedure is based on fitting Generalized Additive Models (GAMs) to the concentration data, and computing a curvature measure from the pairwise log-ratios of these fits. The higher the curvature, the more likely one or both elements of the pair indicate local mineralization. This method is applied on two geochemical data sets which have been collected specifically for the purpose of mineral exploration. The aim is to test the technique for its ability to identify pathfinder elements to detect mineralized zones, and to verify whether the method can indicate which sampling material is best suited for this purpose. Reference: Mikšová D., Rieser C., Filzmoser P. (2019). "Identification of mineralization in geochemistry along a transect based on the spatial curvature of log-ratios." arXiv, (1912.02867).

preprint2020arXiv

A Robust Adaptive Modified Maximum Likelihood Estimator for the Linear Regression Model

In linear regression, the least squares (LS) estimator has certain optimality properties if the errors are normally distributed. This assumption is often violated in practice, partly caused by data outliers. Robust estimators can cope with this situation and thus they are widely used in practice. One example of robust estimators for regression are adaptive modified maximum likelihood (AMML) estimators (Donmez, 2010). However, they are not robust to $x$ outliers, so-called leverage points. In this study, we propose a new regression estimator by employing an appropriate weighting scheme in the AMML estimation method. The resulting estimator is called robust AMML (RAMML) since it is not only robust to y outliers but also to x outliers. A simulation study is carried out to compare the performance of the RAMML estimator with some existing robust estimators such as MM, least trimmed squares (LTS) and S. The results show that the RAMML estimator is preferable in most settings according to the mean squared error (MSE) criterion. Two data sets taken from the literature are also analyzed to show the implementation of the RAMML estimation methodology.

preprint2020arXiv

Cellwise Robust M Regression

The cellwise robust M regression estimator is introduced as the first estimator of its kind that intrinsically yields both a map of cellwise outliers consistent with the linear model, and a vector of regression coefficients that is robust against vertical outliers and leverage points. As a by-product, the method yields a weighted and imputed data set that contains estimates of what the values in cellwise outliers would need to amount to if they had fit the model. The method is illustrated to be equally robust as its casewise counterpart, MM regression. The cellwise regression method discards less information than any casewise robust estimator. Therefore, predictive power can be expected to be at least as good as casewise alternatives. These results are corroborated in a simulation study. Moreover, while the simulations show that predictive performance is at least on par with casewise methods if not better, an application to a data set consisting of compositions of Swiss nutrients, shows that in individual cases, CRM can achieve a significantly higher predictive accuracy compared to MM regression.

preprint2020arXiv

Robust multivariate methods in Chemometrics

This chapter presents an introduction to robust statistics with applications of a chemometric nature. Following a description of the basic ideas and concepts behind robust statistics, including how robust estimators can be conceived, the chapter builds up to the construction (and use) of robust alternatives for some methods for multivariate analysis frequently used in chemometrics, such as principal component analysis and partial least squares. The chapter then provides an insight into how these robust methods can be used or extended to classification. To conclude, the issue of validation of the results is being addressed: it is shown how uncertainty statements associated with robust estimates, can be obtained.

preprint2020arXiv

The impact of COVID-19 on relative changes in aggregated mobility using mobile-phone data

Evaluating relative changes leads to additional insights which would remain hidden when only evaluating absolute changes. We analyze a dataset describing mobility of mobile phones in Austria before, during COVID-19 lock-down measures until recent. By applying compositional data analysis we show that formerly hidden information becomes available: we see that the elderly population groups increase relative mobility and that the younger groups especially on weekends also do not decrease their mobility as much as the others.

Peter Filzmoser

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Compositional Cubes: A New Concept for Multi-factorial Compositions

Extending compositional data analysis from a graph signal processing perspective

Identifying the root cause of cable network problems with machine learning

Robust and Sparse Multinomial Regression in High Dimensions

A method to identify geochemical mineralization on linear transect

A Robust Adaptive Modified Maximum Likelihood Estimator for the Linear Regression Model

Cellwise Robust M Regression

Robust multivariate methods in Chemometrics

The impact of COVID-19 on relative changes in aggregated mobility using mobile-phone data