Researcher profile

Fatih Furkan Yilmaz

Fatih Furkan Yilmaz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
2topics
1close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Regularization-wise double descent: Why it occurs and how to eliminate it

The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be double-descent shaped, and this behavior can be explained as a super-position of bias-variance tradeoffs. In this paper, we show that the risk of explicit L2-regularized models can exhibit double descent behavior as a function of the regularization strength, both in theory and practice. We find that for linear regression, a double descent shaped risk is caused by a superposition of bias-variance tradeoffs corresponding to different parts of the model and can be mitigated by scaling the regularization strength of each part appropriately. Motivated by this result, we study a two-layer neural network and show that double descent can be eliminated by adjusting the regularization strengths for the first and second layer. Lastly, we study a 5-layer CNN and ResNet-18 trained on CIFAR-10 with label noise, and CIFAR-100 without label noise, and demonstrate that all exhibit double descent behavior as a function of the regularization strength.

preprint2020arXiv

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Over-parameterized models, such as large deep networks, often exhibit a double descent phenomenon, whereas a function of model size, error first decreases, increases, and decreases at last. This intriguing double descent behavior also occurs as a function of training epochs and has been conjectured to arise because training epochs control the model complexity. In this paper, we show that such epoch-wise double descent arises for a different reason: It is caused by a superposition of two or more bias-variance tradeoffs that arise because different parts of the network are learned at different epochs, and eliminating this by proper scaling of stepsizes can significantly improve the early stopping performance. We show this analytically for i) linear regression, where differently scaled features give rise to a superposition of bias-variance tradeoffs, and for ii) a two-layer neural network, where the first and second layer each govern a bias-variance tradeoff. Inspired by this theory, we study two standard convolutional networks empirically and show that eliminating epoch-wise double descent through adjusting stepsizes of different layers improves the early stopping performance significantly.

preprint2020arXiv

Image recognition from raw labels collected without annotators

Image classification problems are typically addressed by first collecting examples with candidate labels, second cleaning the candidate labels manually, and third training a deep neural network on the clean examples. The manual labeling step is often the most expensive one as it requires workers to label millions of images. In this paper we propose to work without any explicitly labeled data by i) directly training the deep neural network on the noisy candidate labels, and ii) early stopping the training to avoid overfitting. With this procedure we exploit an intriguing property of standard overparameterized convolutional neural networks trained with (stochastic) gradient descent: Clean labels are fitted faster than noisy ones. We consider two classification problems, a subset of ImageNet and CIFAR-10. For both, we construct large candidate datasets without any explicit human annotations, that only contain 10%-50% correctly labeled examples per class. We show that training on the candidate examples and regularizing through early stopping gives higher test performance for both problems than when training on the original, clean data. This is possible because the candidate datasets contain a huge number of clean examples, and, as we show in this paper, the noise generated through the label collection process is not nearly as adversarial for learning as the noise generated by randomly flipping labels.