Source author record

Katia Matcheva

Katia Matcheva appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.EP Machine Learning physics.data-an astro-ph.IM hep-ph

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Hunting for "Oddballs" with Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit Spectra with Autoencoders

This study explores the application of autoencoder-based machine learning techniques for anomaly detection to identify exoplanet atmospheres with unconventional chemical signatures using a low-dimensional data representation. We use the Atmospheric Big Challenge (ABC) database, a publicly available dataset with over 100,000 simulated exoplanet spectra, to construct an anomaly detection scenario by defining CO2-rich atmospheres as anomalies and CO2-poor atmospheres as the normal class. We benchmarked four different anomaly detection strategies: Autoencoder Reconstruction Loss, One-Class Support Vector Machine (1 class-SVM), K-means Clustering, and Local Outlier Factor (LOF). Each method was evaluated in both the original spectral space and the autoencoder's latent space using Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) metrics. To test the performance of the different methods under realistic conditions, we introduced Gaussian noise levels ranging from 10 to 50 ppm. Our results indicate that anomaly detection is consistently more effective when performed within the latent space across all noise levels. Specifically, K-means clustering in the latent space emerged as a stable and high-performing method. We demonstrate that this anomaly detection approach is robust to noise levels up to 30 ppm (consistent with realistic space-based observations) and remains viable even at 50 ppm when leveraging latent space representations. On the other hand, the performance of the anomaly detection methods applied directly in the raw spectral space degrades significantly with increasing the level of noise. This suggests that autoencoder-driven dimensionality reduction offers a robust methodology for flagging chemically anomalous targets in large-scale surveys where exhaustive retrievals are computationally prohibitive.

preprint2023arXiv

Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras from First Principles

We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset. We use fully connected neural networks to model the symmetry transformations and the corresponding generators. We construct loss functions that ensure that the applied transformations are symmetries and that the corresponding set of generators forms a closed (sub)algebra. Our procedure is validated with several examples illustrating different types of conserved quantities preserved by symmetry. In the process of deriving the full set of symmetries, we analyze the complete subgroup structure of the rotation groups $SO(2)$, $SO(3)$, and $SO(4)$, and of the Lorentz group $SO(1,3)$. Other examples include squeeze mapping, piecewise discontinuous labels, and $SO(10)$, demonstrating that our method is completely general, with many possible applications in physics and data science. Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.

preprint2022arXiv

Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet Transmission Spectra

Transit spectroscopy is a powerful tool to decode the chemical composition of the atmospheres of extrasolar planets. In this paper we focus on unsupervised techniques for analyzing spectral data from transiting exoplanets. We demonstrate methods for i) cleaning and validating the data, ii) initial exploratory data analysis based on summary statistics (estimates of location and variability), iii) exploring and quantifying the existing correlations in the data, iv) pre-processing and linearly transforming the data to its principal components, v) dimensionality reduction and manifold learning, vi) clustering and anomaly detection, vii) visualization and interpretation of the data. To illustrate the proposed unsupervised methodology, we use a well-known public benchmark data set of synthetic transit spectra. We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations. We explore a number of different techniques for such dimensionality reduction and identify several suitable options in terms of summary statistics, principal components, etc. We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes of the underlying atmospheres. We demonstrate that those branches can be successfully recovered with a K-means clustering algorithm in fully unsupervised fashion. We advocate for a three-dimensional representation of the spectroscopic data in terms of the first three principal components, in order to reveal the existing structure in the data and quickly characterize the chemical class of a planet.

preprint2010arXiv

The 1998 November 14 Occultation of GSC 0622-00345 by Saturn. II. Stratospheric Thermal Profile, Power Spectrum, and Gravity Waves

On 1998 November 14, Saturn and its rings occulted the star GSC 0622-00345. The occultation latitude was 55.5 degrees S. This paper analyzes the 2.3 μm light curve derived by Harrington & French. A fixed-baseline isothermal fit to the light curve has a temperature of 140 +/- 3 K, assuming a mean molecular mass of 2.35 AMU. The thermal profile obtained by numerical inversion is valid between 1 and 60 μbar. The vertical temperature gradient is >0.2 K/km more stable than the adiabatic lapse rate, but it still shows the alternating-rounded-spiked features seen in many temperature gradient profiles from other atmospheric occultations and usually attributed to breaking gravity (buoyancy) waves. We conduct a wavelet analysis of the thermal profile, and show that, even with our low level of noise, scintillation due to turbulence in Earth's atmosphere can produce large temperature swings in light-curve inversions. Spurious periodic features in the "reliable" region of a wavelet amplitude spectrum can exceed 0.3 K in our data. We also show that gravity-wave model fits to noisy isothermal light curves can lead to convincing wave "detections". We provide new significance tests for localized wavelet amplitudes, wave model fits, and global power spectra of inverted occultation light curves by assessing the effects of pre- and post-occultation noise on these parameters. Based on these tests, we detect several significant ridges and isolated peaks in wavelet amplitude, to which we fit a gravity wave model. We also strongly detect the global power spectrum of thermal fluctuations in Saturn's atmosphere, which resembles the "universal" (modified Desaubies) curve associated with saturated spectra of propagating gravity waves on Earth and Jupiter.