Toggle search
Search
Toggle menu
notifications
Toggle personal menu
Editing
Attention is All You Need
(section)
From llamawiki.ai
Views
Read
Edit
Edit source
View history
associated-pages
Page
Discussion
More actions
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Performance == The paper evaluated the Transformer on two machine translation tasks: WMT 2014 English-to-German and WMT 2014 English-to-French. The paper used a new training regime called label smoothing to prevent [[overfitting]] and improve generalization. The paper also used beam search with a length penalty to generate outputs. The paper reported that the Transformer achieved 28.4 BLEU on English-to-German and 41.8 BLEU on English-to-French, surpassing the previous best results by over 2 BLEU and setting new single-model state-of-the-art scores. The paper also showed that the Transformer was more efficient and faster than recurrent or convolutional models, requiring significantly less time to train and infer. The paper demonstrated that attention is indeed all you need for sequence transduction, as it can capture complex dependencies and context information without recurrence or convolution. The paper also showed that attention can be scaled up to large models and data sets, achieving impressive results on challenging tasks.
Summary:
Please note that all contributions to llamawiki.ai may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
LlamaWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)