Source author record

Stefan Jaeger

Stefan Jaeger appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Neural and Evolutionary Computing Artificial Intelligence Computational Complexity

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Training Neural Networks with Optimal Double-Bayesian Learning

Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely acknowledged that selecting appropriate parameters is crucial for avoiding overfitting and achieving unbiased outcomes, this choice remains largely based on empirical experiments and experience. This paper presents a new probabilistic framework for the learning rate, a key parameter in stochastic gradient descent. The framework develops classic Bayesian statistics into a double-Bayesian decision mechanism involving two antagonistic Bayesian processes. A theoretically optimal learning rate can be derived from these two processes and used for stochastic gradient descent. Experiments across various classification, segmentation, and detection tasks corroborate the practical significance of the theoretically derived learning rate. The paper also discusses the ramifications of the proposed double-Bayesian framework for network training and model performance.

preprint2020arXiv

The Golden Ratio of Learning and Momentum

Gradient descent has been a central training principle for artificial neural networks from the early beginnings to today's deep learning networks. The most common implementation is the backpropagation algorithm for training feed-forward neural networks in a supervised fashion. Backpropagation involves computing the gradient of a loss function, with respect to the weights of the network, to update the weights and thus minimize loss. Although the mean square error is often used as a loss function, the general stochastic gradient descent principle does not immediately connect with a specific loss function. Another drawback of backpropagation has been the search for optimal values of two important training parameters, learning rate and momentum weight, which are determined empirically in most systems. The learning rate specifies the step size towards a minimum of the loss function when following the gradient, while the momentum weight considers previous weight changes when updating current weights. Using both parameters in conjunction with each other is generally accepted as a means to improving training, although their specific values do not follow immediately from standard backpropagation theory. This paper proposes a new information-theoretical loss function motivated by neural signal processing in a synapse. The new loss function implies a specific learning rate and momentum weight, leading to empirical parameters often used in practice. The proposed framework also provides a more formal explanation of the momentum term and its smoothing effect on the training process. All results taken together show that loss, learning rate, and momentum are closely connected. To support these theoretical findings, experiments for handwritten digit recognition show the practical usefulness of the proposed loss function and training parameters.

preprint2011arXiv

Computational Complexity on Signed Numbers

This paper presents a new representation of natural numbers and discusses its consequences for computability and computational complexity. The paper argues that the introduction of the first Peano axiom in the traditional definition of natural numbers is not essential. It claims that natural numbers remain usable in traditional ways without assuming the existence of at least one natural number. However, the uncertainty about the existence of natural numbers translates into every computation and introduces intrinsic uncertainty that cannot be avoided. The uncertainty in the output of a computation can be reduced, though, at the expense of a longer runtime and thus higher complexity. For the new representation of natural numbers, the paper claims that, with the first Peano axiom, P is equal to NP, and that without the first Peano axiom, P becomes a proper subset of NP.