Prompt-Engineering-Guide/pages/introduction/settings.en.mdx

# LLM Settings

When designing and testing prompts, you typically interact with the LLM via an API. You can configure a few parameters to get different results for your prompts. Tweaking these settings are important to improve reliability and desirability of responses and it takes experimentation to figure out the proper settings for your use cases. Below are the common settings you will come across when using different LLM providers:

**Temperature** - In short, the lower the `temperature`, the more deterministic the results in the sense that the highest probable next token is always picked. Increasing temperature could lead to more randomness, which encourages more diverse or creative outputs. You are essentially increasing the weights of the other possible tokens. In terms of application, you might want to use a lower temperature value for tasks like fact-based QA to encourage more factual and concise responses. For poem generation or other creative tasks, it might be beneficial to increase the temperature value.

**Top_p** - Similarly, with `top_p`, a sampling technique with temperature called nucleus sampling, you can control how deterministic the model is at generating a response. If you are looking for exact and factual answers keep this low. If you are looking for more diverse responses, increase to a higher value. 

The general recommendation is to alter temperature or `top_p`, not both.

**Max Length** - You can manage the number of tokens the model generates by adjusting the 'max length'. Specifying a max length helps you prevent long or irrelevant responses and control costs.

**Stop Sequences** - A 'stop sequence' is a string that stops the model from generating tokens. Specifying stop sequences is another way to control the length and structure of the model's response. For example, you can tell the model to generate lists that have no more than 10 items by adding "11" as a stop sequence.

**Frequency Penalty** - The 'frequency penalty' applies a penalty on the next token proportional to how many times that token already appeared in the response and prompt. The higher the frequency penalty, the less likely a word will appear again. This setting reduces the repetition of words in the model's response by giving tokens that appear more a higher penalty.

**Presence Penalty** - The 'presence penalty' also applies a penalty on repeated tokens but, unlike the frequency penalty, the penalty is the same for all repeated tokens. A token that appears twice and a token that appears 10 times are penalized the same. This setting prevents the model from repeating phrases too often in its response. If you want the model to generate diverse or creative text, you might want to use a higher presence penalty. Or, if you need the model to stay focused, try using a lower presence penalty.

Similar to temperature and top_p, the general recommendation is to alter the frequency or presence penalty, not both.

Before starting with some basic examples, keep in mind that your results may vary depending on the version of LLM you use.
organize pages 2023-03-12 19:14:15 +00:00			`# LLM Settings`
add pages 2023-03-11 02:21:43 +00:00
fixes 2023-12-22 22:50:29 +00:00			`When designing and testing prompts, you typically interact with the LLM via an API. You can configure a few parameters to get different results for your prompts. Tweaking these settings are important to improve reliability and desirability of responses and it takes experimentation to figure out the proper settings for your use cases. Below are the common settings you will come across when using different LLM providers:`
add pages 2023-03-11 02:21:43 +00:00
Update settings.en.mdx Avoid future tense. Avoid using first person (like we). Instead use second person directly to address the reader. 2023-04-15 06:09:48 +00:00			Temperature - In short, the lower the `temperature`, the more deterministic the results in the sense that the highest probable next token is always picked. Increasing temperature could lead to more randomness, which encourages more diverse or creative outputs. You are essentially increasing the weights of the other possible tokens. In terms of application, you might want to use a lower temperature value for tasks like fact-based QA to encourage more factual and concise responses. For poem generation or other creative tasks, it might be beneficial to increase the temperature value.
add pages 2023-03-11 02:21:43 +00:00
content review 2023-03-14 01:01:06 +00:00			Top_p - Similarly, with `top_p`, a sampling technique with temperature called nucleus sampling, you can control how deterministic the model is at generating a response. If you are looking for exact and factual answers keep this low. If you are looking for more diverse responses, increase to a higher value.
add pages 2023-03-11 02:21:43 +00:00
fixes 2023-12-22 22:50:29 +00:00			The general recommendation is to alter temperature or `top_p`, not both.
Updated settings.en.mdx to include more LLM settings 2023-09-25 01:21:03 +00:00
			`Max Length - You can manage the number of tokens the model generates by adjusting the 'max length'. Specifying a max length helps you prevent long or irrelevant responses and control costs.`

			`Stop Sequences - A 'stop sequence' is a string that stops the model from generating tokens. Specifying stop sequences is another way to control the length and structure of the model's response. For example, you can tell the model to generate lists that have no more than 10 items by adding "11" as a stop sequence.`

Updated settings.en.mdx to include all common LLM settings 2023-09-25 01:29:41 +00:00			`Frequency Penalty - The 'frequency penalty' applies a penalty on the next token proportional to how many times that token already appeared in the response and prompt. The higher the frequency penalty, the less likely a word will appear again. This setting reduces the repetition of words in the model's response by giving tokens that appear more a higher penalty.`
Updated settings.en.mdx to include more LLM settings 2023-09-25 01:21:03 +00:00
			Presence Penalty - The 'presence penalty' also applies a penalty on repeated tokens but, unlike the frequency penalty, the penalty is the same for all repeated tokens. A token that appears twice and a token that appears 10 times are penalized the same. This setting prevents the model from repeating phrases too often in its response. If you want the model to generate diverse or creative text, you might want to use a higher presence penalty. Or, if you need the model to stay focused, try using a lower presence penalty.

			`Similar to temperature and top_p, the general recommendation is to alter the frequency or presence penalty, not both.`
add pages 2023-03-11 02:21:43 +00:00
Update settings.en.mdx Avoid future tense. Avoid using first person (like we). Instead use second person directly to address the reader. 2023-04-15 06:09:48 +00:00			`Before starting with some basic examples, keep in mind that your results may vary depending on the version of LLM you use.`