Graph explorer

Data Twinning

In this work, we develop a method named Twinning, for partitioning a dataset into statistically similar twin sets. Twinning is based on SPlit, a recently proposed model-independent method for optimally splitting a dataset into training and testing sets. Twinning is orders of magnitude faster than the SPlit algorithm, which makes it applicable to Big Data problems such as data compression. Twinning can also be used for generating multiple splits of a given dataset to aid divide-and-conquer procedures and $k$-fold cross validation.

4 nodes3 linksoverview previewData Twinning
4 nodes3 links
Data Twinning4 visible / 4 total nodes / 4 links
Co-authorshipAuthorshipAuthorshipTopic signalWData Twinningpreprint / 2021AAkhil VakayilResearcherAV. Roshan JosephResearcherTMachine Learning49008 works
PaperSignal 103 links

Data Twinning

preprint / 2021

Open