Researcher profile

Alexander Roman

Alexander Roman contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Hunting for "Oddballs" with Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit Spectra with Autoencoders

This study explores the application of autoencoder-based machine learning techniques for anomaly detection to identify exoplanet atmospheres with unconventional chemical signatures using a low-dimensional data representation. We use the Atmospheric Big Challenge (ABC) database, a publicly available dataset with over 100,000 simulated exoplanet spectra, to construct an anomaly detection scenario by defining CO2-rich atmospheres as anomalies and CO2-poor atmospheres as the normal class. We benchmarked four different anomaly detection strategies: Autoencoder Reconstruction Loss, One-Class Support Vector Machine (1 class-SVM), K-means Clustering, and Local Outlier Factor (LOF). Each method was evaluated in both the original spectral space and the autoencoder's latent space using Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) metrics. To test the performance of the different methods under realistic conditions, we introduced Gaussian noise levels ranging from 10 to 50 ppm. Our results indicate that anomaly detection is consistently more effective when performed within the latent space across all noise levels. Specifically, K-means clustering in the latent space emerged as a stable and high-performing method. We demonstrate that this anomaly detection approach is robust to noise levels up to 30 ppm (consistent with realistic space-based observations) and remains viable even at 50 ppm when leveraging latent space representations. On the other hand, the performance of the anomaly detection methods applied directly in the raw spectral space degrades significantly with increasing the level of noise. This suggests that autoencoder-driven dimensionality reduction offers a robust methodology for flagging chemically anomalous targets in large-scale surveys where exhaustive retrievals are computationally prohibitive.

preprint2026arXiv

Orchestral AI: A Framework for Agent Orchestration

The rapid proliferation of LLM agent frameworks has forced developers to choose between vendor lock-in through provider-specific SDKs and complex multi-package ecosystems that obscure control flow and hinder reproducibility. Integrating tool calling across multiple LLM providers remains a core engineering challenge due to fragmented APIs, incompatible message formats, and inconsistent streaming and tool-calling behavior, making it difficult to build portable, reliable agent systems. We introduce Orchestral, a lightweight Python framework that provides a unified, type-safe interface for building LLM agents across major providers while preserving the simplicity required for scientific computing and production deployment. Orchestral defines a single universal representation for messages, tools, and LLM usage that operates seamlessly across providers, eliminating manual format translation and reducing framework-induced complexity. Automatic tool schema generation from Python type hints removes the need for handwritten descriptors while maintaining type safety across provider boundaries. A synchronous execution model with streaming support enables deterministic behavior, straightforward debugging, and real-time interaction without introducing server dependencies. The framework's modular architecture cleanly separates provider integration, tool execution, conversation orchestration, and user-facing interfaces, enabling extensibility without architectural entanglement. Orchestral supports advanced agent capabilities found in larger frameworks, including rich tool calling, context compaction, workspace sandboxing, user approval workflows, sub-agents, memory management, and MCP integration.

preprint2023arXiv

Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras from First Principles

We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset. We use fully connected neural networks to model the symmetry transformations and the corresponding generators. We construct loss functions that ensure that the applied transformations are symmetries and that the corresponding set of generators forms a closed (sub)algebra. Our procedure is validated with several examples illustrating different types of conserved quantities preserved by symmetry. In the process of deriving the full set of symmetries, we analyze the complete subgroup structure of the rotation groups $SO(2)$, $SO(3)$, and $SO(4)$, and of the Lorentz group $SO(1,3)$. Other examples include squeeze mapping, piecewise discontinuous labels, and $SO(10)$, demonstrating that our method is completely general, with many possible applications in physics and data science. Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.

preprint2022arXiv

Uncertainties associated with GAN-generated datasets in high energy physics

Recently, Generative Adversarial Networks (GANs) trained on samples of traditionally simulated collider events have been proposed as a way of generating larger simulated datasets at a reduced computational cost. In this paper we point out that data generated by a GAN cannot statistically be better than the data it was trained on, and critically examine the applicability of GANs in various situations, including a) for replacing the entire Monte Carlo pipeline or parts of it, and b) to produce datasets for usage in highly sensitive analyses or sub-optimal ones. We present our arguments using information theoretic demonstrations, a toy example, as well as in the form of a formal statement, and identify some potential valid uses of GANs in collider simulations.

preprint2022arXiv

Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet Transmission Spectra

Transit spectroscopy is a powerful tool to decode the chemical composition of the atmospheres of extrasolar planets. In this paper we focus on unsupervised techniques for analyzing spectral data from transiting exoplanets. We demonstrate methods for i) cleaning and validating the data, ii) initial exploratory data analysis based on summary statistics (estimates of location and variability), iii) exploring and quantifying the existing correlations in the data, iv) pre-processing and linearly transforming the data to its principal components, v) dimensionality reduction and manifold learning, vi) clustering and anomaly detection, vii) visualization and interpretation of the data. To illustrate the proposed unsupervised methodology, we use a well-known public benchmark data set of synthetic transit spectra. We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations. We explore a number of different techniques for such dimensionality reduction and identify several suitable options in terms of summary statistics, principal components, etc. We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes of the underlying atmospheres. We demonstrate that those branches can be successfully recovered with a K-means clustering algorithm in fully unsupervised fashion. We advocate for a three-dimensional representation of the spectroscopic data in terms of the first three principal components, in order to reveal the existing structure in the data and quickly characterize the chemical class of a planet.

preprint2021arXiv

Finding Wombling Boundaries in LHC Data with Voronoi and Delaunay Tessellations

We address the problem of finding a wombling boundary in point data generated by a general Poisson point process, a specific example of which is an LHC event sample distributed in the phase space of a final state signature, with the wombling boundary created by some new physics. We discuss the use of Voronoi and Delaunay tessellations of the point data for estimating the local gradients and investigate methods for sharpening the boundaries by reducing the statistical noise. The outcome from traditional wombling algorithms is a set of boundary cell candidates with relatively large gradients, whose spatial properties must then be scrutinized in order to construct the boundary and evaluate its significance. Here we propose an alternative approach where we simultaneously form and evaluate the significance of all possible boundaries in terms of the total gradient flux. We illustrate our method with several toy examples of both straight and curved boundaries with varying amounts of signal present in the data.