Source author record

Andrei V. Konstantinov

Andrei V. Konstantinov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

5works
2topics
2close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Attention and Self-Attention in Random Forests

New models of random forests jointly using the attention and self-attention mechanisms are proposed for solving the regression problem. The models can be regarded as extensions of the attention-based random forest whose idea stems from applying a combination of the Nadaraya-Watson kernel regression and the Huber's contamination model to random forests. The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest. The self-attention module is trained jointly with the attention module for computing weights. It is shown that the training process of attention weights is reduced to solving a single quadratic or linear optimization problem. Three modifications of the general approach are proposed and compared. A specific multi-head self-attention for the random forest is also considered. Heads of the self-attention are obtained by changing its tuning parameters including the kernel parameters and the contamination parameter of models. Numerical experiments with various datasets illustrate the proposed models and show that the supplement of the self-attention improves the model performance for many datasets.

preprint2022arXiv

Attention-based Random Forest and Contamination Model

A new approach called ABRF (the attention-based random forest) and its modifications for applying the attention mechanism to the random forest (RF) for regression and classification are proposed. The main idea behind the proposed ABRF models is to assign attention weights with trainable parameters to decision trees in a specific way. The weights depend on the distance between an instance, which falls into a corresponding leaf of a tree, and instances, which fall in the same leaf. This idea stems from representation of the Nadaraya-Watson kernel regression in the form of a RF. Three modifications of the general approach are proposed. The first one is based on applying the Huber's contamination model and on computing the attention weights by solving quadratic or linear optimization problems. The second and the third modifications use the gradient-based algorithms for computing trainable parameters. Numerical experiments with various regression and classification datasets illustrate the proposed method.

preprint2022arXiv

Heterogeneous Treatment Effect with Trained Kernels of the Nadaraya-Watson Regression

A new method for estimating the conditional average treatment effect is proposed in the paper. It is called TNW-CATE (the Trainable Nadaraya-Watson regression for CATE) and based on the assumption that the number of controls is rather large whereas the number of treatments is small. TNW-CATE uses the Nadaraya-Watson regression for predicting outcomes of patients from the control and treatment groups. The main idea behind TNW-CATE is to train kernels of the Nadaraya-Watson regression by using a weight sharing neural network of a specific form. The network is trained on controls, and it replaces standard kernels with a set of neural subnetworks with shared parameters such that every subnetwork implements the trainable kernel, but the whole network implements the Nadaraya-Watson estimator. The network memorizes how the feature vectors are located in the feature space. The proposed approach is similar to the transfer learning when domains of source and target data are similar, but tasks are different. Various numerical simulation experiments illustrate TNW-CATE and compare it with the well-known T-learner, S-learner and X-learner for several types of the control and treatment outcome functions. The code of proposed algorithms implementing TNW-CATE is available in https://github.com/Stasychbr/TNW-CATE.

preprint2021arXiv

Ensembles of Random SHAPs

Ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify SHAP which is computationally expensive when there is a large number of features. The main idea behind the proposed modifications is to approximate SHAP by an ensemble of SHAPs with a smaller number of features. According to the first modification, called ER-SHAP, several features are randomly selected many times from the feature set, and Shapley values for the features are computed by means of "small" SHAPs. The explanation results are averaged to get the final Shapley values. According to the second modification, called ERW-SHAP, several points are generated around the explained instance for diversity purposes, and results of their explanation are combined with weights depending on distances between points and the explained instance. The third modification, called ER-SHAP-RF, uses the random forest for preliminary explanation of instances and determining a feature probability distribution which is applied to selection of features in the ensemble-based procedure of ER-SHAP. Many numerical experiments illustrating the proposed modifications demonstrate their efficiency and properties for local explanation.

preprint2020arXiv

Gradient boosting machine with partially randomized decision trees

The gradient boosting machine is a powerful ensemble-based machine learning method for solving regression problems. However, one of the difficulties of its using is a possible discontinuity of the regression function, which arises when regions of training data are not densely covered by training points. In order to overcome this difficulty and to reduce the computational complexity of the gradient boosting machine, we propose to apply the partially randomized trees which can be regarded as a special case of the extremely randomized trees applied to the gradient boosting. The gradient boosting machine with the partially randomized trees is illustrated by means of many numerical examples using synthetic and real data.