Source author record

Nikita Kazeev

Nikita Kazeev appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning hep-ex physics.ins-det Distributed, Parallel, and Cluster Computing physics.data-an Systems and Control

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Composable Crystals: Controllable Materials Discovery via Concept Learning

De novo crystal generation, a central task in materials discovery, aims to generate crystals that are simultaneously valid, stable, unique, and novel. Existing methods mainly rely on black-box stochastic sampling, providing limited control over how generated structures move beyond the observed distribution. In this paper, we introduce a concept-based compositional framework for crystal generation. We train a vector-quantized variational autoencoder to automatically discover a shared set of reusable crystal concepts, which serve as building blocks for guided generation. These learned concepts naturally exhibit interpretability from both local atomic environments and global symmetry patterns, and generalize to crystals from different distributions. By recombining such concepts, our framework enables controllable exploration of novel crystals beyond the training distribution, rather than relying solely on unconstrained random sampling. To further improve composition efficiency, we introduce a composition generator and iteratively refine it using high-quality samples generated by the model itself. The resulting concept compositions are then used to condition downstream crystal generation. Numerical experiments on MP-20 and Alex-MP-20 show that compositing concepts separately increase base model up to 53.2% and 51.7% on V.S.U.N metric, with particular gains in novelty.

preprint2026arXiv

Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement

De novo crystal generation seeks to discover materials that are not merely realistic, but also stable and novel. However, most existing generative models are trained to maximize the likelihood of observed crystals, which encourages samples to stay close to known materials yet not necessarily align with the criteria that matter in discovery. Through an empirical investigation, we show that current crystal generative models are caught in a pronounced stability--novelty trade-off: moving toward the observed distribution preserves stability but limits novelty, whereas moving away from it quickly destroys stability. This suggests that the useful region for discovering crystals that are both stable and novel is extremely narrow. To escape the trade-off, we introduce Crys-JEPA, a joint embedding predictive architecture for crystals that learns an energy-aware latent space preserving formation-energy differences. In this space, stability assessment can be reformulated as an embedding-based comparison against accessible training crystals, reducing the reliance on expensive energy evaluation and task-specific external references. Building on Crys-JEPA, we further develop a screening-and-refinement pipeline that identifies promising generated crystals and reintroduces them to refine the generative model. On MP-20 and Alex-MP-20 datasets, we achieve improvements over baselines up to 81.4% and 82.6% on V.S.U.N metric, respectively.

preprint2026arXiv

LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

Generative machine learning (ML) models hold great promise for accelerating materials discovery through the inverse design of inorganic crystals, enabling an unprecedented exploration of chemical space. Yet, the lack of standardized evaluation frameworks makes it challenging to evaluate, compare, and further develop these ML models meaningfully. In this work, we introduce LeMat-GenBench, a unified benchmark for generative models of crystalline materials, supported by a set of evaluation metrics designed to better inform model development and downstream applications. We release both an open-source evaluation suite and a public leaderboard on Hugging Face, and benchmark 12 recent generative models. Results reveal that an increase in stability leads to a decrease in novelty and diversity on average, with no model excelling across all dimensions. Altogether, LeMat-GenBench establishes a reproducible and extensible foundation for fair model comparison and aims to guide the development of more reliable, discovery-oriented generative models for crystalline materials.

preprint2020arXiv

Muon identification for LHCb Run 3

Muon identification is of paramount importance for the physics programme of LHCb. In the upgrade phase, starting from Run 3 of the LHC, the trigger of the experiment will be solely based on software. The luminosity increase to $2\times10^{33}$ cm$^{-2}$s$^{-1}$ will require an improvement of the muon identification criteria, aiming at performances equal or better than those of Run 2, but in a much more challenging environment. In this paper, two new muon identification algorithms developed in view of the LHCb upgrade are presented, and their performance in terms of signal efficiency versus background reduction is shown.

preprint2019arXiv

Fast Data-Driven Simulation of Cherenkov Detectors Using Generative Adversarial Networks

The increasing luminosities of future Large Hadron Collider runs and next generation of collider experiments will require an unprecedented amount of simulated events to be produced. Such large scale productions are extremely demanding in terms of computing resources. Thus new approaches to event generation and simulation of detector responses are needed. In LHCb, the accurate simulation of Cherenkov detectors takes a sizeable fraction of CPU time. An alternative approach is described here, when one generates high-level reconstructed observables using a generative neural network to bypass low level details. This network is trained to reproduce the particle species likelihood function values based on the track kinematic parameters and detector occupancy. The fast simulation is trained using real data samples collected by LHCb during run 2. We demonstrate that this approach provides high-fidelity results.

preprint2019arXiv

Machine Learning on sWeighted Data

Data analysis in high energy physics has to deal with data samples produced from different sources. One of the most widely used ways to unfold their contributions is the sPlot technique. It uses the results of a maximum likelihood fit to assign weights to events. Some weights produced by sPlot are by design negative. Negative weights make it difficult to apply machine learning methods. The loss function becomes unbounded. This leads to divergent neural network training. In this paper we propose a mathematically rigorous way to transform the weights obtained by sPlot into class probabilities conditioned on observables, thus enabling to apply any machine learning algorithm out-of-the-box.

preprint2019arXiv

Space Navigator: a Tool for the Optimization of Collision Avoidance Maneuvers

The number of space objects will grow several times in a few years due to the planned launches of constellations of thousands microsatellites. It leads to a significant increase in the threat of satellite collisions. Spacecraft must undertake collision avoidance maneuvers to mitigate the risk. According to publicly available information, conjunction events are now manually handled by operators on the Earth. The manual maneuver planning requires qualified personnel and will be impractical for constellations of thousands satellites. In this paper we propose a new modular autonomous collision avoidance system called "Space Navigator". It is based on a novel maneuver optimization approach that combines domain knowledge with Reinforcement Learning methods.

preprint2015arXiv

Event Index - an LHCb Event Search System

During LHC Run 1, the LHCb experiment recorded around $10^{11}$ collision events. This paper describes Event Index - an event search system. Its primary function is to quickly select subsets of events from a combination of conditions, such as the estimated decay channel or number of hits in a subdetector. Event Index is essentially Apache Lucene optimized for read-only indexes distributed over independent shards on independent nodes.

Nikita Kazeev

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Composable Crystals: Controllable Materials Discovery via Concept Learning

Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement

LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

Muon identification for LHCb Run 3

Fast Data-Driven Simulation of Cherenkov Detectors Using Generative Adversarial Networks

Machine Learning on sWeighted Data

Space Navigator: a Tool for the Optimization of Collision Avoidance Maneuvers

Event Index - an LHCb Event Search System