The GPT4All Chat UI supports models from all newer versions of `ggML`, `llama.cpp` including the `LLaMA`, `MPT` and `GPT-J` architectures. The `falcon` and `replit` architectures will soon also be supported.
GPT4All maintains an official list of recommended models located in [models.json](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models.json). You can pull request new models to it and if accepted they will show up in the official download dialog.
#### Sideloading any ggML model
If a model is compatible with the gpt4all-backend, you can sideload it into GPT4All Chat by:
1. Downloading your model in ggML format. It should be a 3-8 GB file similar to the ones [here](https://huggingface.co/TheBloke/Samantha-7B-GGML/tree/main).
2. Identifying your GPT4All Chat downloads folder. This is the path listed at the bottom of the download dialog.
3. Prefixing your downloaded model with string `ggml-` and placing it into the GPT4All Chat downloads folder.
4. Restarting your chat app. Your model should appear in the download dialog.
When using LocalDocs, your LLM will cite the sources that most likely contributed to a given output. Note, even an LLM equipped with LocalDocs can hallucinate. If the LocalDocs plugin decides to utilize your documents to help answer a prompt, you will see references appear below the response.
3. Configure a collection (folder) on your computer that contains the files your LLM should have access to. You can alter the contents of the folder/directory at anytime. As you
add more files to your collection, your LLM will dynamically be able to access them.
4. Spin up a chat session with any LLM (including external ones like ChatGPT but warning data will leave your machine!)
- Query your documents based upon your prompt / question. If your documents contain answers that may help answer your question/prompt LocalDocs will try to utilize snippets of your documents to provide context.
- Make sure LocalDocs is enabled for your chat session (the DB icon on the top-right should have a border)
- Try to modify your prompt to be more specific and use terminology that is in your document. This will increase the likelihood that LocalDocs matches document snippets for your question.
- If your document collection is large, wait 1-2 minutes for it to finish indexing.
GPT4All Chat comes with a built-in server mode allowing you to programmatically interact
with any supported local LLM through a *very familiar* HTTP API. You can find the API documentation [here](https://platform.openai.com/docs/api-reference/completions).
Enabling server mode in the chat client will spin-up on an HTTP server running on `localhost` port
"text": "Who is Michael Jordan?\nMichael Jordan is a former professional basketball player who played for the Chicago Bulls in the NBA. He was born on December 30, 1963, and retired from playing basketball in 1998."