The LLM parameter Top-P, also known as nucleus sampling, controls the diversity of the output by setting a cumulative probability threshold for selecting the next token. It is used to produce higher quality and more diverse outputs depending on the setting.
When generating text, tokens (words, sentences etc.) are sorted by their likelihood, it choose the token with the smallest probability value first, and step by step the others till the sum reaches the threshold which was set for Top-P.
If we set Top-P to .90
, the AI will only consider those tokens that cumulatively add up to at least ~90%.
For instance, let’s consider one example here where the model predicts the next word in the sentence A kid plays with the ___
. What are the top words the AI could consider to come next?
The top 4 words it might be considering could be friends
(probability 0.5), toys
(probability 0.25), ball
(probability 0.15), stuffed animal
(probability 0.07).
What now happens is, the LLM sums up the probability, starting with the highest till it reaches the threshold given by Top-P.
- Adding
friends
-> total so far is50%
. - Then adding
toys
-> total becomes75%
. - Next comes
ball
, and now our sum reaches90%
.
So, for generating the output to complete the sentence above, the AI will randomly pick one of these three options (friends
, toys
, and ball
) as they sums up around ~90 % of all likelihoods (based on their probability). So as we see, a higher the value of Top-P allows a broader range of tokens and with that a more creative and diverse output the AI can generate. In other words, if you want to generate more accurate, factual focused text or output set a lower value for Top-P.
That gives you another settings option in addition to the LLM temperature you can play with. Try it out and test the mix between the both and which output it generates.


