Toggle search
Search
Toggle menu
notifications
Toggle personal menu
Editing
Positional Encoding
(section)
From llamawiki.ai
Views
Read
Edit
Edit source
View history
associated-pages
Page
Discussion
More actions
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Rotary Positional Embedding (RoPE) === Rotary Position Embedding (RoPE) is a method for encoding positional information in transformer models proposed in the paper [[RoFormer: Enhanced Transformer with Rotary Position Embedding]]. Prior position encoding mechanisms like sinusoidal position encoding used in the original Transformer model directly added position embeddings to the input token embeddings. This can make the model less flexible when dealing with variable sequence lengths. Unlike sinusoidal position encoding RoPE encodes absolute position information using rotation matrices rather than adding position embeddings. The query, key, and value vectors in self-attention are multiplied by these rotation matrices before self-attention is applied. More specifically, the input token embeddings are rotated by angles that depend on their position in the sequence. The rotation angles are predetermined constants based on the dimensionality of the embeddings. This rotation encodes relative position information between tokens. Some key advantages of RoPE: * It naturally incorporates relative position information through the rotation operations. The inner product between query and key vectors will decay with increasing relative distance between tokens, which matches the intuition that distant tokens should have less connection. * It keeps the norm of the token embeddings unchanged, so it can be flexibly combined with linear attention mechanisms like Performer. * There are no restrictions on maximum sequence length like with learned absolute position embeddings. * It showed improved performance over baseline Transformer models across machine translation, language modeling, and GLUE benchmark tasks. RoPE embeddings have opened up the potential for increasing [[Context window|context lengths]] by the application of [[RoPE Scaling]].
Summary:
Please note that all contributions to llamawiki.ai may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
LlamaWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)