Source author record

Alexey Zaytsev

Alexey Zaytsev appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.AG math.NT Artificial Intelligence Computer Vision

Catalog footprint

What is connected

15works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Position: agentic AI orchestration should be Bayes-consistent

LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool to call, which expert to consult, or how many resources to invest. While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this position paper argues that the control layer of an agentic AI system (that orchestrates LLMs and tools) is a clear case where Bayesian principles should shine. Bayesian decision theory provides a framework for agentic systems that can help to maintain beliefs over task-relevant latent quantities, to update these beliefs from observed agentic and human-AI interactions, and to choose actions. Making LLMs themselves explicitly Bayesian belief-updating engines remains computationally intensive and conceptually nontrivial as a general modeling target. In contrast, this paper argues that coherent decision-making requires Bayesian principles at the orchestration level of the agentic system, not necessarily the LLM agent parameters. This paper articulates practical properties for Bayesian control that fit modern agentic AI systems and human-AI collaboration, and provides concrete examples and design patterns to illustrate how calibrated beliefs and utility-aware policies can improve agentic AI orchestration.

preprint2022arXiv

Deep learning model solves change point detection for multiple change types

A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario. Our approach learn representations for semi-structured data suitable for change point detection, while a common classifiers-based approach fails. Moreover, our model is more robust, when predicting change points. The datasets used for benchmarking are sequences of images with and without change points in them.

preprint2022arXiv

Effective training-time stacking for ensembling of deep neural networks

Ensembling is a popular and effective method for improving machine learning (ML) models. It proves its value not only in classical ML but also for deep learning. Ensembles enhance the quality and trustworthiness of ML solutions, and allow uncertainty estimation. However, they come at a price: training ensembles of deep learning models eat a huge amount of computational resources. A snapshot ensembling collects models in the ensemble along a single training path. As it runs training only one time, the computational time is similar to the training of one model. However, the quality of models along the training path is different: typically, later models are better if no overfitting occurs. So, the models are of varying utility. Our method improves snapshot ensembling by selecting and weighting ensemble members along the training path. It relies on training-time likelihoods without looking at validation sample errors that standard stacking methods do. Experimental evidence for Fashion MNIST, CIFAR-10, and CIFAR-100 datasets demonstrates the superior quality of the proposed weighted ensembles c.t. vanilla ensembling of deep learning models.

preprint2022arXiv

Embedded Ensembles: Infinite Width Limit and Operating Regimes

A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks, and further study empirically various effects such as transition between the two regimes, scaling of ensemble performance with the network width and number of models, and dependence of performance on a number of architecture and hyperparameter choices.

preprint2022arXiv

ScaleFace: Uncertainty-aware Deep Metric Learning

The performance of modern deep learning-based systems dramatically depends on the quality of input objects. For example, face recognition quality would be lower for blurry or corrupted inputs. However, it is hard to predict the influence of input quality on the resulting accuracy in more complex scenarios. We propose an approach for deep metric learning that allows direct estimation of the uncertainty with almost no additional computational cost. The developed \textit{ScaleFace} algorithm uses trainable scale values that modify similarities in the space of embeddings. These input-dependent scale values represent a measure of confidence in the recognition result, thus allowing uncertainty estimation. We provide comprehensive experiments on face recognition tasks that show the superior performance of ScaleFace compared to other uncertainty-aware face recognition approaches. We also extend the results to the task of text-to-image retrieval showing that the proposed approach beats the competitors with significant margin.

preprint2022arXiv

Similarity learning for wells based on logging data

One of the first steps during the investigation of geological objects is the interwell correlation. It provides information on the structure of the objects under study, as it comprises the framework for constructing geological models and assessing hydrocarbon reserves. Today, the detailed interwell correlation relies on manual analysis of well-logging data. Thus, it is time-consuming and of a subjective nature. The essence of the interwell correlation constitutes an assessment of the similarities between geological profiles. There were many attempts to automate the process of interwell correlation by means of rule-based approaches, classic machine learning approaches, and deep learning approaches in the past. However, most approaches are of limited usage and inherent subjectivity of experts. We propose a novel framework to solve the geological profile similarity estimation based on a deep learning model. Our similarity model takes well-logging data as input and provides the similarity of wells as output. The developed framework enables (1) extracting patterns and essential characteristics of geological profiles within the wells and (2) model training following the unsupervised paradigm without the need for manual analysis and interpretation of well-logging data. For model testing, we used two open datasets originating in New Zealand and Norway. Our data-based similarity models provide high performance: the accuracy of our model is $0.926$ compared to $0.787$ for baselines based on the popular gradient boosting approach. With them, an oil\&gas practitioner can improve interwell correlation quality and reduce operation time.

preprint2022arXiv

Towards OOD Detection in Graph Classification from Uncertainty Estimation Perspective

The problem of out-of-distribution detection for graph classification is far from being solved. The existing models tend to be overconfident about OOD examples or completely ignore the detection task. In this work, we consider this problem from the uncertainty estimation perspective and perform the comparison of several recently proposed methods. In our experiment, we find that there is no universal approach for OOD detection, and it is important to consider both graph representations and predictive categorical distribution.

preprint2022arXiv

Transfer learning for ensembles: reducing computation time and keeping the diversity

Transferring a deep neural network trained on one problem to another requires only a small amount of data and little additional computation time. The same behaviour holds for ensembles of deep learning models typically superior to a single model. However, a transfer of deep neural networks ensemble demands relatively high computational expenses. The probability of overfitting also increases. Our approach for the transfer learning of ensembles consists of two steps: (a) shifting weights of encoders of all models in the ensemble by a single shift vector and (b) doing a tiny fine-tuning for each individual model afterwards. This strategy leads to a speed-up of the training process and gives an opportunity to add models to an ensemble with significantly reduced training time using the shift vector. We compare different strategies by computation time, the accuracy of an ensemble, uncertainty estimation and disagreement and conclude that our approach gives competitive results using the same computation complexity in comparison with the traditional approach. Also, our method keeps the ensemble's models' diversity higher.

preprint2022arXiv

Usage of specific attention improves change point detection

The change point is a moment of an abrupt alteration in the data distribution. Current methods for change point detection are based on recurrent neural methods suitable for sequential data. However, recent works show that transformers based on attention mechanisms perform better than standard recurrent models for many tasks. The most benefit is noticeable in the case of longer sequences. In this paper, we investigate different attentions for the change point detection task and proposed specific form of attention related to the task at hand. We show that using a special form of attention outperforms state-of-the-art results.

preprint2020arXiv

Recurrent Convolutional Neural Networks help to predict location of Earthquakes

We examine the applicability of modern neural network architectures to the midterm prediction of earthquakes. Our data-based classification model aims to predict if an earthquake with the magnitude above a threshold takes place at a given area of size $10 \times 10$ kilometers in $10$-$60$ days from a given moment. Our deep neural network model has a recurrent part (LSTM) that accounts for time dependencies between earthquakes and a convolutional part that accounts for spatial dependencies. Obtained results show that neural networks-based models beat baseline feature-based models that also account for spatio-temporal dependencies between different earthquakes. For historical data on Japan earthquakes our model predicts occurrence of an earthquake in $10$ to $60$ days from a given moment with magnitude $M_c > 5$ with quality metrics ROC AUC $0.975$ and PR AUC $0.0890$, making $1.18 \cdot 10^3$ correct predictions, while missing $2.09 \cdot 10^3$ earthquakes and making $192 \cdot 10^3$ false alarms. The baseline approach has similar ROC AUC $0.992$, number of correct predictions $1.19 \cdot 10^3$, and missing $2.07 \cdot 10^3$ earthquakes, but significantly worse PR AUC $0.00911$, and number of false alarms $1004 \cdot 10^3$.

preprint2012arXiv

Generalization of Deuring Reduction Theorem

In this paper we generalize the Deuring theorem on a reduction of elliptic curve with complex multiplication. More precisely, for an Abelian variety $A$, arising after reduction of an Abelian variety with complex multiplication by a CM field $K$ over a number field at a pace of good reduction. We establish a connection between a decomposition of the first truncated Barsotti-Tate group scheme $A[p]$ and a decomposition of $p\cO_{K}$ into prime ideals. In particular, we produce these explicit relationships for Abelian varieties of dimensions $1, 2$ and 3.

preprint2011arXiv

Characteristic Polynomial of Supersingular Abelian Varieties over Finite Fields

In this article, we give a complete description of the characteristic polynomials of supersingular abelian varieties over finite fields. We list them for the dimensions upto 7.

preprint2011arXiv

On the Zeta Functions of an optimal tower of function fields over $\FF_4$

In this paper we derive a recursion for the zeta function of each function field in the second Garcia-Stichtenoth tower when $q=2$. We obtain our recursion by applying a theorem of Kani and Rosen that gives information about the decomposition of the Jacobians. This enables us to compute the zeta functions explicitly of the first six function fields.

preprint2011arXiv

Optimal curves of low genus over finite fields

The Hasse-Weil-Serre bound is improved for curves of low genera over finite fields with discriminant in {-3,-4,-7,-8,-11,-19} by studying optimal curves.

preprint2010arXiv

The Number of Rational Points On Genus 4 Hyperelliptic Supersingular Curves in Characteristic 2

One of the big questions in the area of curves over finite fields concerns the distribution of the numbers of points: Which numbers occur as the number of points on a curve of genus $g$? The same question can be asked of various subclasses of curves. In this article we classify the possibilities for the number of points on genus 4 hyperelliptic supersingular curves over finite fields of order $2^n$, $n$ odd.

Alexey Zaytsev

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Position: agentic AI orchestration should be Bayes-consistent

Deep learning model solves change point detection for multiple change types

Effective training-time stacking for ensembling of deep neural networks

Embedded Ensembles: Infinite Width Limit and Operating Regimes

ScaleFace: Uncertainty-aware Deep Metric Learning

Similarity learning for wells based on logging data

Towards OOD Detection in Graph Classification from Uncertainty Estimation Perspective

Transfer learning for ensembles: reducing computation time and keeping the diversity

Usage of specific attention improves change point detection

Recurrent Convolutional Neural Networks help to predict location of Earthquakes

Generalization of Deuring Reduction Theorem

Characteristic Polynomial of Supersingular Abelian Varieties over Finite Fields

On the Zeta Functions of an optimal tower of function fields over $\FF_4$

Optimal curves of low genus over finite fields

The Number of Rational Points On Genus 4 Hyperelliptic Supersingular Curves in Characteristic 2