Researcher profile

Konstantin T. Matchev

Konstantin T. Matchev contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Hunting for "Oddballs" with Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit Spectra with Autoencoders

This study explores the application of autoencoder-based machine learning techniques for anomaly detection to identify exoplanet atmospheres with unconventional chemical signatures using a low-dimensional data representation. We use the Atmospheric Big Challenge (ABC) database, a publicly available dataset with over 100,000 simulated exoplanet spectra, to construct an anomaly detection scenario by defining CO2-rich atmospheres as anomalies and CO2-poor atmospheres as the normal class. We benchmarked four different anomaly detection strategies: Autoencoder Reconstruction Loss, One-Class Support Vector Machine (1 class-SVM), K-means Clustering, and Local Outlier Factor (LOF). Each method was evaluated in both the original spectral space and the autoencoder's latent space using Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) metrics. To test the performance of the different methods under realistic conditions, we introduced Gaussian noise levels ranging from 10 to 50 ppm. Our results indicate that anomaly detection is consistently more effective when performed within the latent space across all noise levels. Specifically, K-means clustering in the latent space emerged as a stable and high-performing method. We demonstrate that this anomaly detection approach is robust to noise levels up to 30 ppm (consistent with realistic space-based observations) and remains viable even at 50 ppm when leveraging latent space representations. On the other hand, the performance of the anomaly detection methods applied directly in the raw spectral space degrades significantly with increasing the level of noise. This suggests that autoencoder-driven dimensionality reduction offers a robust methodology for flagging chemically anomalous targets in large-scale surveys where exhaustive retrievals are computationally prohibitive.

preprint2023arXiv

Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras from First Principles

We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset. We use fully connected neural networks to model the symmetry transformations and the corresponding generators. We construct loss functions that ensure that the applied transformations are symmetries and that the corresponding set of generators forms a closed (sub)algebra. Our procedure is validated with several examples illustrating different types of conserved quantities preserved by symmetry. In the process of deriving the full set of symmetries, we analyze the complete subgroup structure of the rotation groups $SO(2)$, $SO(3)$, and $SO(4)$, and of the Lorentz group $SO(1,3)$. Other examples include squeeze mapping, piecewise discontinuous labels, and $SO(10)$, demonstrating that our method is completely general, with many possible applications in physics and data science. Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.

preprint2022arXiv

Kinematic Variables and Feature Engineering for Particle Phenomenology

Kinematic variables have been playing an important role in collider phenomenology, as they expedite discoveries of new particles by separating signal events from unwanted background events and allow for measurements of particle properties such as masses, couplings, spins, etc. For the past 10 years, an enormous number of kinematic variables have been designed and proposed, primarily for the experiments at the Large Hadron Collider, allowing for a drastic reduction of high-dimensional experimental data to lower-dimensional observables, from which one can readily extract underlying features of phase space and develop better-optimized data-analysis strategies. We review these recent developments in the area of phase space kinematics, summarizing the new kinematic variables with important phenomenological implications and physics applications. We also review recently proposed analysis methods and techniques specifically designed to leverage the new kinematic variables. As machine learning is nowadays percolating through many fields of particle physics including collider phenomenology, we discuss the interconnection and mutual complementarity of kinematic variables and machine learning techniques. We finally discuss how the utilization of kinematic variables originally developed for colliders can be extended to other high-energy physics experiments including neutrino experiments.

preprint2022arXiv

Uncertainties associated with GAN-generated datasets in high energy physics

Recently, Generative Adversarial Networks (GANs) trained on samples of traditionally simulated collider events have been proposed as a way of generating larger simulated datasets at a reduced computational cost. In this paper we point out that data generated by a GAN cannot statistically be better than the data it was trained on, and critically examine the applicability of GANs in various situations, including a) for replacing the entire Monte Carlo pipeline or parts of it, and b) to produce datasets for usage in highly sensitive analyses or sub-optimal ones. We present our arguments using information theoretic demonstrations, a toy example, as well as in the form of a formal statement, and identify some potential valid uses of GANs in collider simulations.

preprint2022arXiv

Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet Transmission Spectra

Transit spectroscopy is a powerful tool to decode the chemical composition of the atmospheres of extrasolar planets. In this paper we focus on unsupervised techniques for analyzing spectral data from transiting exoplanets. We demonstrate methods for i) cleaning and validating the data, ii) initial exploratory data analysis based on summary statistics (estimates of location and variability), iii) exploring and quantifying the existing correlations in the data, iv) pre-processing and linearly transforming the data to its principal components, v) dimensionality reduction and manifold learning, vi) clustering and anomaly detection, vii) visualization and interpretation of the data. To illustrate the proposed unsupervised methodology, we use a well-known public benchmark data set of synthetic transit spectra. We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations. We explore a number of different techniques for such dimensionality reduction and identify several suitable options in terms of summary statistics, principal components, etc. We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes of the underlying atmospheres. We demonstrate that those branches can be successfully recovered with a K-means clustering algorithm in fully unsupervised fashion. We advocate for a three-dimensional representation of the spectroscopic data in terms of the first three principal components, in order to reveal the existing structure in the data and quickly characterize the chemical class of a planet.

preprint2021arXiv

Finding Wombling Boundaries in LHC Data with Voronoi and Delaunay Tessellations

We address the problem of finding a wombling boundary in point data generated by a general Poisson point process, a specific example of which is an LHC event sample distributed in the phase space of a final state signature, with the wombling boundary created by some new physics. We discuss the use of Voronoi and Delaunay tessellations of the point data for estimating the local gradients and investigate methods for sharpening the boundaries by reducing the statistical noise. The outcome from traditional wombling algorithms is a set of boundary cell candidates with relatively large gradients, whose spatial properties must then be scrutinized in order to construct the boundary and evaluate its significance. Here we propose an alternative approach where we simultaneously form and evaluate the significance of all possible boundaries in terms of the total gradient flux. We illustrate our method with several toy examples of both straight and curved boundaries with varying amounts of signal present in the data.

preprint2021arXiv

Superfluid Effective Field Theory for Dark Matter Direct Detection

We develop an effective field theory (EFT) framework for superfluid ${}^4$He to model the interactions among quasiparticles, helium atoms and probe particles. Our effective field theory approach brings together symmetry arguments and power-counting and matches to classical fluid dynamics. We then present the decay and scattering rates for the relevant processes involving quasiparticles and helium atoms. The presented EFT framework and results can be used to understand the dynamics of thermalization in the superfluid, and can be further applied to sub-GeV dark matter direct detection with superfluid ${}^4$He.

preprint2020arXiv

InClass Nets: Independent Classifier Networks for Nonparametric Estimation of Conditional Independence Mixture Models and Unsupervised Classification

We introduce a new machine-learning-based approach, which we call the Independent Classifier networks (InClass nets) technique, for the nonparameteric estimation of conditional independence mixture models (CIMMs). We approach the estimation of a CIMM as a multi-class classification problem, since dividing the dataset into different categories naturally leads to the estimation of the mixture model. InClass nets consist of multiple independent classifier neural networks (NNs), each of which handles one of the variates of the CIMM. Fitting the CIMM to the data is performed by simultaneously training the individual NNs using suitable cost functions. The ability of NNs to approximate arbitrary functions makes our technique nonparametric. Further leveraging the power of NNs, we allow the conditionally independent variates of the model to be individually high-dimensional, which is the main advantage of our technique over existing non-machine-learning-based approaches. We derive some new results on the nonparametric identifiability of bivariate CIMMs, in the form of a necessary and a (different) sufficient condition for a bivariate CIMM to be identifiable. We provide a public implementation of InClass nets as a Python package called RainDancesVI and validate our InClass nets technique with several worked out examples. Our method also has applications in unsupervised and semi-supervised classification problems.

preprint2020arXiv

OASIS: Optimal Analysis-Specific Importance Sampling for event generation

We propose a technique called Optimal Analysis-Specific Importance Sampling (OASIS) to reduce the number of simulated events required for a high-energy experimental analysis to reach a target sensitivity. We provide recipes to obtain the optimal sampling distributions which preferentially focus the event generation on the regions of phase space with high utility to the experimental analyses. OASIS leads to a conservation of resources at all stages of the Monte Carlo pipeline, including full-detector simulation, and is complementary to approaches which seek to speed-up the simulation pipeline.

preprint2020arXiv

Singularity Variables for Missing Energy Event Kinematics

We discuss singularity variables which are properly suited for analyzing the kinematics of events with missing transverse energy at the LHC. We consider six of the simplest event topologies encountered in studies of leptonic W-bosons and top quarks, as well as in SUSY-like searches for new physics with dark matter particles. In each case, we illustrate the general prescription for finding the relevant singularity variable, which in turn helps delineate the visible parameter subspace on which the singularities are located. Our results can be used in two different ways - first, as a guide for targeting the signal-rich regions of parameter space during the stage of discovery, and second, as a sensitive focus point method for measuring the particle mass spectrum after the initial discovery.

preprint2019arXiv

Dreaming Awake: Disentangling the Underlying Physics in Case of a SUSY-like Discovery at the LHC

The purpose of this review is to investigate what kind of physics can be extracted at the LHC, assuming a discovery is made in events with missing transverse momentum, as generically expected in supersymmetry (SUSY) with R-parity conservation. To set the scene, we first discuss the collider phenomenology of the six possible electroweakino benchmark scenarios, as they provide valuable insight into what one might be facing at the LHC. We review the existing methods for mass reconstruction from measured kinematic endpoints in the distributions of suitable variables, e.g., the invariant masses of various sets of visible decay products, as well as the $M_{T2}$ and the $M_2$ types of variables. We propose to extend the application of these methods to the various topologies of fully hadronic final states, possibly with hadronically reconstructed massive bosons (W, Z or h). We test the idea with a simplified simulation of events in the main electroweakino benchmark scenarios. We find that the fully hadronic events allow the complete determination of the relevant mass spectrum. For comparison, we also review the potential of the standard kinematic endpoint methods for final states involving leptons from the decays of (on-shell or off-shell) sleptons. We find that with 300 $fb^{-1}$, the statistics for the leptonic events is very marginal and they look less promising than the fully hadronic channels. This corresponds to a complete reversal of the usual paradigm, where leptonic events comprised the gold-plated SUSY channels. Finally, we put together all available information and summarize what level of understanding of the underlying physics can be achieved. We show that, as a by-product of the mass reconstruction, it is also possible to determine the production cross sections and decay branching ratios, which in turn enable us to pinpoint the underlying model.

preprint2019arXiv

Kinematic Focus Point Method for Particle Mass Measurements in Missing Energy Events

We investigate the solvability of the event kinematics in missing energy events at hadron colliders, as a function of the particle mass ansatz. To be specific, we reconstruct the neutrino momenta in dilepton $t\bar{t}$-like events, without assuming any prior knowledge of the mass spectrum. We identify a class of events, which we call extreme events, with the property that the kinematic boundary of their allowed region in mass parameter space passes through the true mass point. We develop techniques for recognizing extreme events in the data and demonstrate that they are abundant in a realistic data sample, due to expected singularities in phase space. We propose a new method for mass measurement whereby we obtain the true values of the mass parameters as the focus point of the kinematic boundaries for all events in the data sample. Since the masses are determined from a relatively sharp peak structure (the density of kinematic boundary curves), the method avoids some of the systematic errors associated with other techniques. We show that this new approach is complementary to previously considered methods in the literature where one studies the solvability of the kinematic constraints throughout the mass parameter space. In particular, we identify a problematic direction in mass space of nearly 100% solvability, and then show that the focus point method is effective in lifting the degeneracy.