Context window

From llamawiki.ai

A context window in a large language model is the range of tokens that the model can consider when generating responses to prompts. For example, the original LLaMA model has a context window of 2048 tokens, which means the total text it can handle in a single query (system text plus prompt plus response) is limited to 2048 tokens. This limits the model’s ability to handle long texts or tasks that require a lot of information.

Research[edit | edit source]

Extending and overcoming the context window limitation of large language models has been a major focus of LLM research. A number of mechanisms have been proposed to achieve this, such as using sparse or local attention mechanisms, splitting the input into chunks or windows, or interpolating the positional embeddings.

Impact on Performance[edit | edit source]

The size of the context window affects the performance of a large language model in several ways, including:

  • Quality: A larger context window allows the model to process more information and produce more coherent and relevant responses. A smaller context window may limit the model’s ability to handle long texts or tasks that require a lot of information, such as summarising long documents, answering open-book questions, or conducting long conversations.
  • Diversity: A larger context window may also increase the diversity of the generated text, as the model can draw from more sources of inspiration and avoid repetition. A smaller context window may reduce the diversity of the generated text, as the model may rely on the same words or phrases over and over.
  • Speed: A larger context window will typically decrease the speed of text generation, as the model has to process more tokens and perform more computations. A smaller context window may increase the speed of the text generation, as the model has to process fewer tokens and perform fewer computations.

See Also[edit | edit source]