$\textit{sentropy}$: A Python Package for Revealing Hidden Differences in Complex Datasets
Machine-learning datasets are typically characterized by measuring their size and class balance. However, there exists a richer and potentially more useful set of measures, termed S-entropy (similarity-sensitive entropy), that incorporate elements' frequencies and between-element similarities. Although these have been available in the R and Julia programming languages for other applications, they have not been as readily available in Python, which is widely used for machine learning, and are not easily applied to machine-learning-sized datasets without special coding considerations. To address these issues, we developed $\textit{sentropy}$, a Python package that calculates S-entropy and is tailored to large datasets. $\textit{sentropy}$ can calculate any of the frequency-sensitive measures of Hill's D-number framework and their similarity-sensitive counterparts. $\textit{sentropy}$ also outputs measures that compare datasets. We first briefly review S-entropy, illustrating how it incorporates elements' frequencies and elements' pairwise similarities. We then describe $\textit{sentropy}$'s key features and usage. We end with several examples - immunomics, metagenomics, computational pathology, and medical imaging - illustrating $\textit{sentropy}$'s applicability across a range of dataset types and fields.