Paper detail

The Generalization Error of Supervised Machine Learning Algorithms

In this paper, the method of gaps, a technique for deriving closed-form expressions in terms of information measures for the generalization error of supervised machine learning algorithms is introduced. The method relies on the notion of \emph{gaps}, which characterize the variation of the expected empirical risk (when either the model or dataset is kept fixed) with respect to changes in the probability measure on the varying parameter (either the dataset or the model, respectively). This distinction results in two classes of gaps: Algorithm-driven gaps (fixed dataset) and data-driven gaps (fixed model). In general, the method relies on two central observations: $(i)$~The generalization error is the expectation of an algorithm-driven gap or a data-driven gap. In the first case, the expectation is with respect to a measure on the datasets; and in the second case, with respect to a measure on the models. $(ii)$~Both, algorithm-driven gaps and data-driven gaps exhibit closed-form expressions in terms of relative entropies. In particular, algorithm-driven gaps involve a Gibbs probability measure on the set of models, which represents a supervised Gibbs algorithm. Alternatively, data-driven gaps involve a worst-case data-generating (WCDG) probability measure on the set of data points, which is also a Gibbs probability measure. Interestingly, such Gibbs measures, which are exogenous to the analysis of generalization, place both the supervised Gibbs algorithm and the WCDG probability measure as natural references for the analysis of supervised learning algorithms. All existing exact expressions for the generalization error of supervised machine learning algorithms can be obtained with the proposed method. Also, this method allows obtaining numerous new exact expressions, which allows establishing connections with other areas in statistics.

preprint2025arXivOpen access

Samir M. Perlaza Xinying Zou

math.IT Information Theory Machine Learning

Open graph Reviews Discussion

Signal facts

What is known right now

Open access2 authors3 topics

Imported metadata coverageMissing code, dataset, citation and institution fields are tracked without dominating the paper.Details

Citations: 0Reviews: 0Saves: 0Code: not linkedDataset: not linkedInstitutions: 0

Next steps

Decide what to do with this paper

Like0 Dislike0Score 0

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Save to reading list0

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Authors

Samir M. Perlaza Xinying Zou

Institutions

No institution affiliation has been imported for this paper yet.

Add specific reaction

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.

The Generalization Error of Supervised Machine Learning Algorithms

What is known right now

Decide what to do with this paper

Keep the important context close to the paper

Authors

Institutions

Research map

Building this map preview

0 review(s)

0 comment(s)