Top_p and top_k are two ways of controlling the randomness and creativity of the text generated by large language models. They act as alternatives, with top_p being based on the cumulative probability of the possible tokens from which the next token is selected and top_k selecting the 'k' most probable tokens, irrespective of their actual probabilities.
Both top_p and top_k are ways to balance between diversity and quality of the generated text. A higher value of top_p or top_k means more diversity, but also more risk of generating nonsensical or irrelevant text. A lower value of top_p or top_k means more quality, but also more risk of generating boring or repetitive text.
Top_k[edit | edit source]
Top_k tells the model to pick the next word from the top k tokens in its list, sorted by probability. For example, if k is 3, then the model will only consider the three most likely words as the next word. This reduces the chance of picking a low-probability word that might not make sense in the context.
Top_p[edit | edit source]
Top_p tells the model to pick the next word from the top tokens based on the sum of their probabilities. For example, if p is 0.9, then the model will only consider the words that have a cumulative probability of 0.9 or higher as the next word. This means that the number of candidates can vary depending on their probabilities. This is more dynamic than top_k and is often used to exclude outputs with lower probabilities.
See Also[edit | edit source]
Two minutes NLP — Most used Decoding Methods for Language Models [[1]]