Source author record

Andrey Ustyuzhanin

Andrey Ustyuzhanin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning hep-ex physics.ins-det Artificial Intelligence astro-ph.IM Distributed, Parallel, and Cluster Computing physics.data-an cond-mat.mtrl-sci Neural and Evolutionary Computing Symbolic Computation Systems and Control

Catalog footprint

What is connected

18works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Symbolic regression for defect interactions in 2D materials

Machine learning models have become firmly established across all scientific fields. Extracting features from data and making inferences based on them with neural network models often yields high accuracy; however, this approach has several drawbacks. Symbolic regression is a powerful technique for discovering analytical equations that describe data, providing interpretable and generalizable models capable of predicting unseen data. Symbolic regression methods have gained new momentum with the advancement of neural network technologies and offer several advantages, the main one being the interpretability of results. In this work, we examined the application of the deep symbolic regression algorithm SEGVAE to determine the properties of two-dimensional materials with defects. Comparing the results with state-of-the-art graph neural network-based methods shows comparable or, in some cases, even identical outcomes. We also discuss the applicability of this class of methods in natural sciences.

preprint2023arXiv

Symbolic expression generation via Variational Auto-Encoder

There are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. A widespread Deep Neural Networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between observations and the target variable. However, at the moment, there is no dominant solution for the symbolic regression task, and we aim to reduce this gap with our algorithm. In this work, we propose a novel deep learning framework for symbolic expression generation via variational autoencoder (VAE). In a nutshell, we suggest using a VAE to generate mathematical expressions, and our training strategy forces generated formulas to fit a given dataset. Our framework allows encoding apriori knowledge of the formulas into fast-check predicates that speed up the optimization process. We compare our method to modern symbolic regression benchmarks and show that our method outperforms the competitors under noisy conditions. The recovery rate of SEGVAE is 65% on the Ngyuen dataset with a noise level of 10%, which is better than the previously reported SOTA by 20%. We demonstrate that this value depends on the dataset and can be even higher.

preprint2022arXiv

Deep Learning for direct Dark Matter search with nuclear emulsions

We propose a new method for the discrimination of sub-micron nuclear recoil tracks from an instrumental background in fine-grain nuclear emulsions used in the directional dark matter search. The proposed method uses a 3D Convolutional Neural Network, whose parameters are optimised by Bayesian search. Unlike previous studies focused on extracting the directional information, we focus on the signal/background separation exploiting the polarisation dependence of the Localised Surface Plasmon Resonance phenomenon. Comparing the proposed method with the conventional cut-based approach shows a significant boost in the reduction factor for given signal efficiency.

preprint2022arXiv

Toward the End-to-End Optimization of Particle Physics Instruments with Differentiable Programming: a White Paper

The full optimization of the design and operation of instruments whose functioning relies on the interaction of radiation with matter is a super-human task, given the large dimensionality of the space of possible choices for geometry, detection technology, materials, data-acquisition, and information-extraction techniques, and the interdependence of the related parameters. On the other hand, massive potential gains in performance over standard, "experience-driven" layouts are in principle within our reach if an objective function fully aligned with the final goals of the instrument is maximized by means of a systematic search of the configuration space. The stochastic nature of the involved quantum processes make the modeling of these systems an intractable problem from a classical statistics point of view, yet the construction of a fully differentiable pipeline and the use of deep learning techniques may allow the simultaneous optimization of all design parameters. In this document we lay down our plans for the design of a modular and versatile modeling tool for the end-to-end optimization of complex instruments for particle physics experiments as well as industrial and medical applications that share the detection of radiation as their basic ingredient. We consider a selected set of use cases to highlight the specific needs of different applications.

preprint2021arXiv

Online detection of failures generated by storage simulator

Modern large-scale data-farms consist of hundreds of thousands of storage devices that span distributed infrastructure. Devices used in modern data centers (such as controllers, links, SSD- and HDD-disks) can fail due to hardware as well as software problems. Such failures or anomalies can be detected by monitoring the activity of components using machine learning techniques. In order to use these techniques, researchers need plenty of historical data of devices in normal and failure mode for training algorithms. In this work, we challenge two problems: 1) lack of storage data in the methods above by creating a simulator and 2) applying existing online algorithms that can faster detect a failure occurred in one of the components. We created a Go-based (golang) package for simulating the behavior of modern storage infrastructure. The software is based on the discrete-event modeling paradigm and captures the structure and dynamics of high-level storage system building blocks. The package's flexible structure allows us to create a model of a real-world storage system with a configurable number of components. The primary area of interest is exploring the storage machine's behavior under stress testing or exploitation in the medium- or long-term for observing failures of its components. To discover failures in the time series distribution generated by the simulator, we modified a change point detection algorithm that works in online mode. The goal of the change-point detection is to discover differences in time series distribution. This work describes an approach for failure detection in time series data based on direct density ratio estimation via binary classifiers.

preprint2021arXiv

Segmentation of EM showers for neutrino experiments with deep graph neural networks

We introduce a first-ever algorithm for the reconstruction of multiple showers from the data collected with electromagnetic (EM) sampling calorimeters. Such detectors are widely used in High Energy Physics to measure the energy and kinematics of in-going particles. In this work, we consider the case when many electrons pass through an Emulsion Cloud Chamber (ECC) brick, initiating electron-induced electromagnetic showers, which can be the case with long exposure times or large input particle flux. For example, SHiP experiment is planning to use emulsion detectors for dark matter search and neutrino physics investigation. The expected full flux of SHiP experiment is about 10^20 particles over five years. To reduce the cost of the experiment associated with the replacement of the ECC brick and off-line data taking (emulsion scanning), it is decided to increase exposure time. Thus, we expect to observe a lot of overlapping showers, which turn EM showers reconstruction into a challenging point cloud segmentation problem. Our reconstruction pipeline consists of a Graph Neural Network that predicts an adjacency matrix and a clustering algorithm. We propose a new layer type (EmulsionConv) that takes into account geometrical properties of shower development in ECC brick. For the clustering of overlapping showers, we use a modified hierarchical density-based clustering algorithm. Our method does not use any prior information about the incoming particles and identifies up to 87% of electromagnetic showers in emulsion detectors. The main test bench for the algorithm for reconstructing electromagnetic showers is going to be SND@LHC.

preprint2020arXiv

Deep learning for Directional Dark Matter search

We provide an algorithm for detection of possible dark matter particle interactions recorded within NEWSdm detector. The NEWSdm (Nuclear Emulsions for WIMP Search directional measure) is an underground Direct detection Dark Matter search experiment. The usage of recent developments in the nuclear emulsions allows probing new regions in the WIMP parameter space. The directional approach, which is the key feature of the NEWSdm experiment, gives the unique chance of overcoming the "neutrino floor". Deep Neural Networks were used for separation between potential DM signal and various classes of background. In this paper, we present the usage of deep 3D Convolutional Neural Networks to take into account the physical peculiarities of the datasets and report the achievement of the required $10^4$ background rejection power.

preprint2020arXiv

Generalization of Change-Point Detection in Time Series Data Based on Direct Density Ratio Estimation

The goal of the change-point detection is to discover changes of time series distribution. One of the state of the art approaches of the change-point detection are based on direct density ratio estimation. In this work we show how existing algorithms can be generalized using various binary classification and regression models. In particular, we show that the Gradient Boosting over Decision Trees and Neural Networks can be used for this purpose. The algorithms are tested on several synthetic and real-world datasets. The results show that the proposed methods outperform classical RuLSIF algorithm. Discussion of cases where the proposed algorithms have advantages over existing methods are also provided.

preprint2019arXiv

$(1 + \varepsilon)$-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets

Anomaly detection is not an easy problem since distribution of anomalous samples is unknown a priori. We explore a novel method that gives a trade-off possibility between one-class and two-class approaches, and leads to a better performance on anomaly detection problems with small or non-representative anomalous samples. The method is evaluated using several data sets and compared to a set of conventional one-class and two-class approaches.

preprint2019arXiv

Adaptive Divergence for Rapid Adversarial Optimization

Adversarial Optimization (AO) provides a reliable, practical way to match two implicitly defined distributions, one of which is usually represented by a sample of real data, and the other is defined by a generator. Typically, AO involves training of a high-capacity model on each step of the optimization. In this work, we consider computationally heavy generators, for which training of high-capacity models is associated with substantial computational costs. To address this problem, we introduce a novel family of divergences, which varies the capacity of the underlying model, and allows for a significant acceleration with respect to the number of samples drawn from the generator. We demonstrate the performance of the proposed divergences on several tasks, including tuning parameters of a physics simulator, namely, Pythia event generator.

preprint2019arXiv

Fast Data-Driven Simulation of Cherenkov Detectors Using Generative Adversarial Networks

The increasing luminosities of future Large Hadron Collider runs and next generation of collider experiments will require an unprecedented amount of simulated events to be produced. Such large scale productions are extremely demanding in terms of computing resources. Thus new approaches to event generation and simulation of detector responses are needed. In LHCb, the accurate simulation of Cherenkov detectors takes a sizeable fraction of CPU time. An alternative approach is described here, when one generates high-level reconstructed observables using a generative neural network to bypass low level details. This network is trained to reproduce the particle species likelihood function values based on the track kinematic parameters and detector occupancy. The fast simulation is trained using real data samples collected by LHCb during run 2. We demonstrate that this approach provides high-fidelity results.

preprint2019arXiv

Space Navigator: a Tool for the Optimization of Collision Avoidance Maneuvers

The number of space objects will grow several times in a few years due to the planned launches of constellations of thousands microsatellites. It leads to a significant increase in the threat of satellite collisions. Spacecraft must undertake collision avoidance maneuvers to mitigate the risk. According to publicly available information, conjunction events are now manually handled by operators on the Earth. The manual maneuver planning requires qualified personnel and will be impractical for constellations of thousands satellites. In this paper we propose a new modular autonomous collision avoidance system called "Space Navigator". It is based on a novel maneuver optimization approach that combines domain knowledge with Reinforcement Learning methods.

preprint2015arXiv

A genetic algorithm for autonomous navigation in partially observable domain

The problem of autonomous navigation is one of the basic problems for robotics. Although, in general, it may be challenging when an autonomous vehicle is placed into partially observable domain. In this paper we consider simplistic environment model and introduce a navigation algorithm based on Learning Classifier System.

preprint2015arXiv

Disk storage management for LHCb based on Data Popularity estimator

This paper presents an algorithm providing recommendations for optimizing the LHCb data storage. The LHCb data storage system is a hybrid system. All datasets are kept as archives on magnetic tapes. The most popular datasets are kept on disks. The algorithm takes the dataset usage history and metadata (size, type, configuration etc.) to generate a recommendation report. This article presents how we use machine learning algorithms to predict future data popularity. Using these predictions it is possible to estimate which datasets should be removed from disk. We use regression algorithms and time series analysis to find the optimal number of replicas for datasets that are kept on disk. Based on the data popularity and the number of replicas optimization, the algorithm minimizes a loss function to find the optimal data distribution. The loss function represents all requirements for data distribution in the data storage system. We demonstrate how our algorithm helps to save disk space and to reduce waiting times for jobs using this data.

preprint2015arXiv

Event Index - an LHCb Event Search System

During LHC Run 1, the LHCb experiment recorded around $10^{11}$ collision events. This paper describes Event Index - an event search system. Its primary function is to quickly select subsets of events from a combination of conditions, such as the estimated decay channel or number of hits in a subdetector. Event Index is essentially Apache Lucene optimized for read-only indexes distributed over independent shards on independent nodes.

preprint2015arXiv

LHCb Topological Trigger Reoptimization

The main b-physics trigger algorithm used by the LHCb experiment is the so-called topological trigger. The topological trigger selects vertices which are a) detached from the primary proton-proton collision and b) compatible with coming from the decay of a b-hadron. In the LHC Run 1, this trigger, which utilized a custom boosted decision tree algorithm, selected a nearly 100% pure sample of b-hadrons with a typical efficiency of 60-70%; its output was used in about 60% of LHCb papers. This talk presents studies carried out to optimize the topological trigger for LHC Run 2. In particular, we have carried out a detailed comparison of various machine learning classifier algorithms, e.g., AdaBoost, MatrixNet and neural networks. The topological trigger algorithm is designed to select all "interesting" decays of b-hadrons, but cannot be trained on every such decay. Studies have therefore been performed to determine how to optimize the performance of the classification algorithm on decays not used in the training. Methods studied include cascading, ensembling and blending techniques. Furthermore, novel boosting techniques have been implemented that will help reduce systematic uncertainties in Run 2 measurements. We demonstrate that the reoptimized topological trigger is expected to significantly improve on the Run 1 performance for a wide range of b-hadron decays.

preprint2015arXiv

Reproducible Experiment Platform

Data analysis in fundamental sciences nowadays is an essential process that pushes frontiers of our knowledge and leads to new discoveries. At the same time we can see that complexity of those analyses increases fast due to a)~enormous volumes of datasets being analyzed, b)~variety of techniques and algorithms one have to check inside a single analysis, c)~distributed nature of research teams that requires special communication media for knowledge and information exchange between individual researchers. There is a lot of resemblance between techniques and problems arising in the areas of industrial information retrieval and particle physics. To address those problems we propose Reproducible Experiment Platform (REP), a software infrastructure to support collaborative ecosystem for computational science. It is a Python based solution for research teams that allows running computational experiments on shared datasets, obtaining repeatable results, and consistent comparisons of the obtained results. We present some key features of REP based on case studies which include trigger optimization and physics analysis studies at the LHCb experiment.

preprint2014arXiv

New approaches for boosting to uniformity

The use of multivariate classifiers has become commonplace in particle physics. To enhance the performance, a series of classifiers is typically trained; this is a technique known as boosting. This paper explores several novel boosting methods that have been designed to produce a uniform selection efficiency in a chosen multivariate space. Such algorithms have a wide range of applications in particle physics, from producing uniform signal selection efficiency across a Dalitz-plot to avoiding the creation of false signal peaks in an invariant mass distribution when searching for new particles.

Andrey Ustyuzhanin

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Symbolic regression for defect interactions in 2D materials

Symbolic expression generation via Variational Auto-Encoder

Deep Learning for direct Dark Matter search with nuclear emulsions

Toward the End-to-End Optimization of Particle Physics Instruments with Differentiable Programming: a White Paper

Online detection of failures generated by storage simulator

Segmentation of EM showers for neutrino experiments with deep graph neural networks

Deep learning for Directional Dark Matter search

Generalization of Change-Point Detection in Time Series Data Based on Direct Density Ratio Estimation

$(1 + \varepsilon)$-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets

Adaptive Divergence for Rapid Adversarial Optimization

Fast Data-Driven Simulation of Cherenkov Detectors Using Generative Adversarial Networks

Space Navigator: a Tool for the Optimization of Collision Avoidance Maneuvers

A genetic algorithm for autonomous navigation in partially observable domain

Disk storage management for LHCb based on Data Popularity estimator

Event Index - an LHCb Event Search System

LHCb Topological Trigger Reoptimization

Reproducible Experiment Platform

New approaches for boosting to uniformity