Diagnosing the Effects of Spectroscopic Training Set Imperfection on Photometric Redshift Performance
Most LSST extragalactic science will rely on photometric redshifts (photo-$z$) to extract distance information for the galaxies. However, an incomplete or non-representative training set can introduce bias into photo-$z$ estimation. It is necessary to understand how various forms of training set imperfection, such as incompleteness and non-trivial spectroscopic target selection, affect photo-$z$ estimation algorithms, and to identify metrics best-suited to quantify the impact. This work aims to systematically study metrics for diagnosing how various photo-$z$ methods react to certain types of training set incompleteness and non-representativeness. We use methods available through the open-source Python library Redshift Assessment Infrastructure Layers (RAIL) to systematically test the algorithms CMNN, GPz, FlexZBoost, and PZFlow on mock training data degraded in accordance with several existing spectroscopic sky surveys, as well as under conditions of inverse redshift incompleteness, which approximately mimics observed patterns of incompleteness at high redshift. We employ the algorithm TrainZ as a control. Finally, we quantify photo-$z$ algorithm performance using a variety of statistical metrics implemented externally to RAIL. We determine that the Kullback-Liebler Divergence, Wasserstein Distance, and Probability Integral Transform are particularly informative metrics with which to assess the impact of training set imperfection on algorithmic performance. We also find that inverse redshift incompleteness effects alone lack the complexity to realistically represent anticipated training data.