Graph explorer

Speeding Up Entmax

Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. $α$-entmax of Peters et al. (2019, arXiv:1905.05702) solves this problem, but is considerably slower than softmax. In this paper, we propose an alternative to $α$-entmax, which keeps its virtuous characteristics, but is as fast as optimized softmax and achieves on par or better performance in machine translation task.

7 nodes7 linksoverview previewSpeeding Up Entmax
7 nodes7 links
Speeding Up Entmax7 visible / 7 total nodes / 13 links
Related contextCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipAuthorshipAuthorshipAuthorshipAuthorshipTopic signalTopic signalWSpeeding Up Entmaxpreprint / 2022AMaxat TezekbayevResearcherAVassilina NikoulinaResearcherAMatthias GalléResearcherAZhenisbek AssylbekovResearcherTMachine Learning49008 worksTComputation and Language14115 works
PaperSignal 106 links

Speeding Up Entmax

preprint / 2022

Open