Source author record

Andreas Groll

Andreas Groll appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation Methodology Applications

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Churn modeling of life insurance policies via statistical and machine learning methods -- Analysis of important features

Life assurance companies typically possess a wealth of data covering multiple systems and databases. These data are often used for analyzing the past and for describing the present. Taking account of the past, the future is mostly forecasted by traditional statistical methods. So far, only a few attempts were undertaken to perform estimations by means of machine learning approaches. In this work, the individual contract cancellation behavior of customers within two partial stocks is modeled by the aid of various classification methods. Partial stocks of private pension and endowment policy are considered. We describe the data used for the modeling, their structured and in which way they are cleansed. The utilized models are calibrated on the basis of an extensive tuning process, then graphically evaluated regarding their goodness-of-fit and with the help of a variable relevance concept, we investigate which features notably affect the individual contract cancellation behavior.

preprint2022arXiv

Machine Learning for Multi-Output Regression: When should a holistic multivariate approach be preferred over separate univariate ones?

Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods. In particular, they are used for predicting univariate responses. In case of multiple outputs the question arises whether we separately fit univariate models or directly follow a multivariate approach. For the latter, several possibilities exist that are, e.g. based on modified splitting or stopping rules for multi-output regression. In this work we compare these methods in extensive simulations to help in answering the primary question when to use multivariate ensemble techniques.

preprint2020arXiv

A flexible adaptive lasso Cox frailty model based on the full likelihood

In this work a method to regularize Cox frailty models is proposed that accommodates time-varying covariates and time-varying coefficients and is based on the full instead of the partial likelihood. A particular advantage in this framework is that the baseline hazard can be explicitly modeled in a smooth, semi-parametric way, e.g. via P-splines. Regularization for variable selection is performed via a lasso penalty and via group lasso for categorical variables while a second penalty regularizes wiggliness of smooth estimates of time-varying coefficients and the baseline hazard. Additionally, adaptive weights are included to stabilize the estimation. The method is implemented in R as coxlasso and will be compared to other packages for regularized Cox regression. Existing packages, however, do not allow for the combination of different effects that are accommodated in coxlasso.

preprint2020arXiv

Deducing neighborhoods of classes from a fitted model

In todays world the request for very complex models for huge data sets is rising steadily. The problem with these models is that by raising the complexity of the models, it gets much harder to interpret them. The growing field of \emph{interpretable machine learning} tries to make up for the lack of interpretability in these complex (or even blackbox-)models by using specific techniques that can help to understand those models better. In this article a new kind of interpretable machine learning method is presented, which can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts. To illustrate in which situations this quantile shift method (QSM) could become beneficial, it is applied to a theoretical medical example and a real data example. Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed. By comparing the predictions before and after the manipulations, under certain conditions the observed changes in the predictions can be interpreted as neighborhoods of the classes with regard to the manipulated features. Chordgraphs are used to visualize the observed changes.

preprint2020arXiv

Random boosting and random^2 forests -- A random tree depth injection approach

The induction of additional randomness in parallel and sequential ensemble methods has proven to be worthwhile in many aspects. In this manuscript, we propose and examine a novel random tree depth injection approach suitable for sequential and parallel tree-based approaches including Boosting and Random Forests. The resulting methods are called \emph{Random Boost} and \emph{Random$^2$ Forest}. Both approaches serve as valuable extensions to the existing literature on the gradient boosting framework and random forests. A Monte Carlo simulation, in which tree-shaped data sets with different numbers of final partitions are built, suggests that there are several scenarios where \emph{Random Boost} and \emph{Random$^2$ Forest} can improve the prediction performance of conventional hierarchical boosting and random forest approaches. The new algorithms appear to be especially successful in cases where there are merely a few high-order interactions in the generated data. In addition, our simulations suggest that our random tree depth injection approach can improve computation time by up to 40%, while at the same time the performance losses in terms of prediction accuracy turn out to be minor or even negligible in most cases.

Andreas Groll

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Churn modeling of life insurance policies via statistical and machine learning methods -- Analysis of important features

Machine Learning for Multi-Output Regression: When should a holistic multivariate approach be preferred over separate univariate ones?

A flexible adaptive lasso Cox frailty model based on the full likelihood

Deducing neighborhoods of classes from a fitted model

Random boosting and random^2 forests -- A random tree depth injection approach