diff --git a/gpt4all-bindings/python/docs/gpt4all_python.md b/gpt4all-bindings/python/docs/gpt4all_python.md index ad3e1c55..5f7397bd 100644 --- a/gpt4all-bindings/python/docs/gpt4all_python.md +++ b/gpt4all-bindings/python/docs/gpt4all_python.md @@ -2,8 +2,8 @@ The `GPT4All` python package provides bindings to our C/C++ model backend libraries. The source code and local build instructions can be found [here](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/python). -## Quickstart +## Quickstart ```bash pip install gpt4all ``` @@ -20,8 +20,16 @@ pip install gpt4all 1. Paris ``` +This will: + +- Instantiate `GPT4All`, which is the primary public API to your large language model (LLM). +- Automatically download the given model to `~/.cache/gpt4all/` if not already present. +- Through `model.generate(...)` the model starts working on a response. There are various ways to + steer that process. Here, `max_tokens` sets an upper limit, i.e. a hard cut-off point to the output. + + ### Chatting with GPT4All -Local LLMs can be optimized for chat conversions by reusing previous computational history. +Local LLMs can be optimized for chat conversations by reusing previous computational history. Use the GPT4All `chat_session` context manager to hold chat conversations with the model. @@ -29,9 +37,9 @@ Use the GPT4All `chat_session` context manager to hold chat conversations with t ``` py model = GPT4All(model_name='orca-mini-3b.ggmlv3.q4_0.bin') with model.chat_session(): - response = model.generate(prompt='hello', top_k=1) - response = model.generate(prompt='write me a short poem', top_k=1) - response = model.generate(prompt='thank you', top_k=1) + response1 = model.generate(prompt='hello', temp=0) + response2 = model.generate(prompt='write me a short poem', temp=0) + response3 = model.generate(prompt='thank you', temp=0) print(model.current_chat_session) ``` === "Output" @@ -63,19 +71,20 @@ Use the GPT4All `chat_session` context manager to hold chat conversations with t } ] ``` -When using GPT4All models in the chat_session context: - -- The model is given a prompt template which makes it chatty. -- Internal K/V caches are preserved from previous conversation history speeding up inference. +When using GPT4All models in the `chat_session` context: -### Generation Parameters +- Consecutive chat exchanges are taken into account and not discarded until the session ends; as long as the model has capacity. +- Internal K/V caches are preserved from previous conversation history, speeding up inference. +- The model is given a system and prompt template which make it chatty. Depending on `allow_download=True` (default), + it will obtain the latest version of [models.json] from the repository, which contains specifically tailored templates + for models. Conversely, if it is not allowed to download, it falls back to default templates instead. -::: gpt4all.gpt4all.GPT4All.generate +[models.json]: https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models.json ### Streaming Generations -To interact with GPT4All responses as the model generates, use the `streaming = True` flag during generation. +To interact with GPT4All responses as the model generates, use the `streaming=True` flag during generation. === "GPT4All Streaming Example" ``` py @@ -91,4 +100,366 @@ To interact with GPT4All responses as the model generates, use the `streaming = [' Paris', ' is', ' a', ' city', ' that', ' has', ' been', ' a', ' major', ' cultural', ' and', ' economic', ' center', ' for', ' over', ' ', '2', ',', '0', '0'] ``` + +### The Generate Method API +::: gpt4all.gpt4all.GPT4All.generate + + +## Examples & Explanations +### Influencing Generation +The three most influential parameters in generation are _Temperature_ (`temp`), _Top-p_ (`top_p`) and _Top-K_ (`top_k`). +In a nutshell, during the process of selecting the next token, not just one or a few are considered, but every single +token in the vocabulary is given a probability. The parameters can change the field of candidate tokens. + +- **Temperature** makes the process either more or less random. A _Temperature_ above 1 increasingly "levels the playing + field", while at a _Temperature_ between 0 and 1 the likelihood of the best token candidates grows even more. A + _Temperature_ of 0 results in selecting the best token, making the output deterministic. A _Temperature_ of 1 + represents a neutral setting with regard to randomness in the process. + +- _Top-p_ and _Top-K_ both narrow the field: + - **Top-K** limits candidate tokens to a fixed number after sorting by probability. Setting it higher than the + vocabulary size deactivates this limit. + - **Top-p** selects tokens based on their total probabilities. For example, a value of 0.8 means "include the best + tokens, whose accumulated probabilities reach or just surpass 80%". Setting _Top-p_ to 1, which is 100%, + effectively disables it. + +The recommendation is to keep at least one of _Top-K_ and _Top-p_ active. Other parameters can also influence +generation; be sure to review all their descriptions. + + +### Specifying the Model Folder +The model folder can be set with the `model_path` parameter when creating a `GPT4All` instance. The example below is +is the same as if it weren't provided; that is, `~/.cache/gpt4all/` is the default folder. + +=== "GPT4All Model Folder Example" + ``` py + from pathlib import Path + from gpt4all import GPT4All + model = GPT4All(model_name='orca-mini-3b.ggmlv3.q4_0.bin', + model_path=(Path.home() / '.cache' / 'gpt4all'), + allow_download=False) + response = model.generate('my favorite 3 fruits are:', temp=0) + print(response) + ``` +=== "Output" + ``` + My favorite three fruits are apples, bananas and oranges. + ``` + +If you want to point it at the chat GUI's default folder, it should be: +=== "macOS" + ``` py + from pathlib import Path + from gpt4all import GPT4All + + model_name = 'orca-mini-3b.ggmlv3.q4_0.bin' + model_path = Path.home() / 'Library' / 'Application Support' / 'nomic.ai' / 'GPT4All' + model = GPT4All(model_name, model_path) + ``` +=== "Windows" + ``` py + from pathlib import Path + from gpt4all import GPT4All + import os + model_name = 'orca-mini-3b.ggmlv3.q4_0.bin' + model_path = Path(os.environ['LOCALAPPDATA']) / 'nomic.ai' / 'GPT4All' + model = GPT4All(model_name, model_path) + ``` +=== "Linux" + ``` py + from pathlib import Path + from gpt4all import GPT4All + + model_name = 'orca-mini-3b.ggmlv3.q4_0.bin' + model_path = Path.home() / '.local' / 'share' / 'nomic.ai' / 'GPT4All' + model = GPT4All(model_name, model_path) + ``` + +Alternatively, you could also change the module's default model directory: + +``` py +from pathlib import Path +import gpt4all.gpt4all +gpt4all.gpt4all.DEFAULT_MODEL_DIRECTORY = Path.home() / 'my' / 'models-directory' +from gpt4all import GPT4All +model = GPT4All('orca-mini-3b.ggmlv3.q4_0.bin') +... +``` + + +### Managing Templates +Session templates can be customized when starting a `chat_session` context: + +=== "GPT4All Custom Session Templates Example" + ``` py + from gpt4all import GPT4All + model = GPT4All('ggml-Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin') + system_template = 'A chat between a curious user and an artificial intelligence assistant.' + # many models use triple hash '###' for keywords, Vicunas are simpler: + prompt_template = 'USER: {0}\nASSISTANT: ' + with model.chat_session(system_template, prompt_template): + response1 = model.generate('why is the grass green?') + print(response1) + print() + response2 = model.generate('why is the sky blue?') + print(response2) + ``` +=== "Possible Output" + ``` + The color of grass can be attributed to its chlorophyll content, which allows it + to absorb light energy from sunlight through photosynthesis. Chlorophyll absorbs + blue and red wavelengths of light while reflecting other colors such as yellow + and green. This is why the leaves appear green to our eyes. + + The color of the sky appears blue due to a phenomenon called Rayleigh scattering, + which occurs when sunlight enters Earth's atmosphere and interacts with air + molecules such as nitrogen and oxygen. Blue light has shorter wavelength than + other colors in the visible spectrum, so it is scattered more easily by these + particles, making the sky appear blue to our eyes. + ``` + +To do the same outside a session, the input has to be formatted manually. For example: + +=== "GPT4All Templates Outside a Session Example" + ``` py + model = GPT4All('ggml-Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin') + system_template = 'A chat between a curious user and an artificial intelligence assistant.' + prompt_template = 'USER: {0}\nASSISTANT: ' + prompts = ['name 3 colors', 'now name 3 fruits', 'what were the 3 colors in your earlier response?'] + first_input = system_template + prompt_template.format(prompts[0]) + response = model.generate(first_input, temp=0) + print(response) + for prompt in prompts[1:]: + response = model.generate(prompt_template.format(prompt), temp=0) + print(response) + ``` +=== "Output" + ``` + 1) Red + 2) Blue + 3) Green + + 1. Apple + 2. Banana + 3. Orange + + The colors in my previous response are blue, green and red. + ``` + +Ultimately, the method `GPT4All._format_chat_prompt_template()` is responsible for formatting templates. It can be +customized in a subclass. As an example: + +=== "Custom Subclass" + ``` py + from itertools import cycle + from gpt4all import GPT4All + + class RotatingTemplateGPT4All(GPT4All): + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self._templates = [ + "Respond like a pirate.", + "Respond like a politician.", + "Respond like a philosopher.", + "Respond like a Klingon.", + ] + self._cycling_templates = cycle(self._templates) + + def _format_chat_prompt_template( + self, + messages: list, + default_prompt_header: str = "", + default_prompt_footer: str = "", + ) -> str: + full_prompt = default_prompt_header + "\n\n" if default_prompt_header != "" else "" + for message in messages: + if message["role"] == "user": + user_message = f"USER: {message['content']} {next(self._cycling_templates)}\n" + full_prompt += user_message + if message["role"] == "assistant": + assistant_message = f"ASSISTANT: {message['content']}\n" + full_prompt += assistant_message + full_prompt += "\n\n" + default_prompt_footer if default_prompt_footer != "" else "" + print(full_prompt) + return full_prompt + ``` +=== "GPT4All Custom Subclass Example" + ``` py + model = RotatingTemplateGPT4All('ggml-Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin') + with model.chat_session(): # starting a session is optional in this example + response1 = model.generate("hi, who are you?") + print(response1) + print() + response2 = model.generate("what can you tell me about snakes?") + print(response2) + print() + response3 = model.generate("what's your opinion on Chess?") + print(response3) + print() + response4 = model.generate("tell me about ancient Rome.") + print(response4) + ``` +=== "Possible Output" + ``` + USER: hi, who are you? Respond like a pirate. + + Pirate: Ahoy there mateys! I be Cap'n Jack Sparrow of the Black Pearl. + + USER: what can you tell me about snakes? Respond like a politician. + + Politician: Snakes have been making headlines lately due to their ability to + slither into tight spaces and evade capture, much like myself during my last + election campaign. However, I believe that with proper education and + understanding of these creatures, we can work together towards creating a + safer environment for both humans and snakes alike. + + USER: what's your opinion on Chess? Respond like a philosopher. + + Philosopher: The game of chess is often used as an analogy to illustrate the + complexities of life and decision-making processes. However, I believe that it + can also be seen as a reflection of our own consciousness and subconscious mind. + Just as each piece on the board has its unique role to play in shaping the + outcome of the game, we too have different roles to fulfill in creating our own + personal narrative. + + USER: tell me about ancient Rome. Respond like a Klingon. + + Klingon: Ancient Rome was once a great empire that ruled over much of Europe and + the Mediterranean region. However, just as the Empire fell due to internal strife + and external threats, so too did my own house come crashing down when I failed to + protect our homeworld from invading forces. + ``` + + +### Introspection +A less apparent feature is the capacity to log the final prompt that gets sent to the model. It relies on +[Python's logging facilities][py-logging] implemented in the `pyllmodel` module at the `INFO` level. You can activate it +for example with a `basicConfig`, which displays it on the standard error stream. It's worth mentioning that Python's +logging infrastructure offers [many more customization options][py-logging-cookbook]. + +[py-logging]: https://docs.python.org/3/howto/logging.html +[py-logging-cookbook]: https://docs.python.org/3/howto/logging-cookbook.html + +=== "GPT4All Prompt Logging Example" + ``` py + import logging + from gpt4all import GPT4All + logging.basicConfig(level=logging.INFO) + model = GPT4All('nous-hermes-13b.ggmlv3.q4_0.bin') + with model.chat_session('You are a geography expert.\nBe terse.', + '### Instruction:\n{0}\n### Response:\n'): + response = model.generate('who are you?', temp=0) + print(response) + response = model.generate('what are your favorite 3 mountains?', temp=0) + print(response) + ``` +=== "Output" + ``` + INFO:gpt4all.pyllmodel:LLModel.prompt_model -- prompt: + You are a geography expert. + Be terse. + + ### Instruction: + who are you? + ### Response: + + ===/LLModel.prompt_model -- prompt/=== + I am an AI-powered chatbot designed to assist users with their queries related to geographical information. + INFO:gpt4all.pyllmodel:LLModel.prompt_model -- prompt: + ### Instruction: + what are your favorite 3 mountains? + ### Response: + + ===/LLModel.prompt_model -- prompt/=== + 1) Mount Everest - Located in the Himalayas, it is the highest mountain on Earth and a significant challenge for mountaineers. + 2) Kangchenjunga - This mountain is located in the Himalayas and is the third-highest peak in the world after Mount Everest and K2. + 3) Lhotse - Located in the Himalayas, it is the fourth highest mountain on Earth and offers a challenging climb for experienced mountaineers. + ``` + + +### Without Online Connectivity +To prevent GPT4All from accessing online resources, instantiate it with `allow_download=False`. This will disable both +downloading missing models and [models.json], which contains information about them. As a result, predefined templates +are used instead of model-specific system and prompt templates: + +=== "GPT4All Default Templates Example" + ``` py + from gpt4all import GPT4All + model = GPT4All('ggml-mpt-7b-chat.bin', allow_download=False) + # when downloads are disabled, it will use the default templates: + print("default system template:", repr(model.config['systemPrompt'])) + print("default prompt template:", repr(model.config['promptTemplate'])) + print() + # even when inside a session: + with model.chat_session(): + assert model.current_chat_session[0]['role'] == 'system' + print("session system template:", repr(model.current_chat_session[0]['content'])) + print("session prompt template:", repr(model._current_prompt_template)) + ``` +=== "Output" + ``` + default system template: '' + default prompt template: '### Human: \n{0}\n### Assistant:\n' + + session system template: '' + session prompt template: '### Human: \n{0}\n### Assistant:\n' + ``` + + +### Interrupting Generation +The simplest way to stop generation is to set a fixed upper limit with the `max_tokens` parameter. + +If you know exactly when a model should stop responding, you can add a custom callback, like so: + +=== "GPT4All Custom Stop Callback" + ``` py + from gpt4all import GPT4All + model = GPT4All('orca-mini-3b.ggmlv3.q4_0.bin') + + def stop_on_token_callback(token_id, token_string): + # one sentence is enough: + if '.' in token_string: + return False + else: + return True + + response = model.generate('Blue Whales are the biggest animal to ever inhabit the Earth.', + temp=0, callback=stop_on_token_callback) + print(response) + ``` +=== "Output" + ``` + They can grow up to 100 feet (30 meters) long and weigh as much as 20 tons (18 metric tons). + ``` + + +### FAQ +#### There's a problem with the download +If `allow_download=True` (default), a model is automatically downloaded into `.cache/gpt4all/` in the user's home +folder, unless it already exists. + +In case of connection issues or errors during the download, you might want to manually verify the model file's MD5 +checksum by comparing it with the one listed in [models.json]. + +As an alternative to the basic downloader built into the bindings, you can choose to download from the + website instead. Scroll down to 'Model Explorer' and pick your preferred model. + + +#### I need the chat GUI and bindings to behave the same +The chat GUI and bindings are based on the same backend. You can make them behave the same way by following these steps: + +- First of all, ensure that all parameters in the chat GUI settings match those passed to `generate()`. + +- To make comparing the output easier, set _Temperature_ in both to 0 for now. This will make the output deterministic. + +- Next you'll have to compare the templates, adjusting them as necessary, based on how you're using the bindings: + - With simple `generate()` calls, the input has to be surrounded with system and prompt templates. + - When using a chat session, it depends on whether the bindings are allowed to download [models.json]. If yes, and + in the chat GUI the default templates are used, it'll be handled automatically. If no, use `chat_session()` + template parameters to customize them. + +- Once you're done, remember to reset _Temperature_ to its previous value in both chat GUI and your Python code. + + +## API Documentation ::: gpt4all.gpt4all.GPT4All