gpt4all/gpt4all-api/README.md

# GPT4All REST API

NOTICE: We are considering to deprecate this API as it has become challenging to maintain and test. If you have any interest in maintaining this or would like to takeover and adopt or discuss the future of this API please speak up in the discord channel.

This directory contains the source code to run and build docker images that run a FastAPI app
for serving inference from GPT4All models. The API matches the OpenAI API spec.

## Tutorial

The following tutorial assumes that you have checked out this repo and cd'd into it.

### Starting the app

First change your working directory to `gpt4all/gpt4all-api`.

Now you can build the FastAPI docker image. You only have to do this on initial build or when you add new dependencies to the requirements.txt file:
```bash
DOCKER_BUILDKIT=1 docker build -t gpt4all_api --progress plain -f gpt4all_api/Dockerfile.buildkit .
```

Then, start the backend with:

```bash
docker compose up --build
```

This will run both the API and locally hosted GPU inference server. If you want to run the API without the GPU inference server, you can run:

```bash
docker compose up --build gpt4all_api
```

To run the API with the GPU inference server, you will need to include environment variables (like the `MODEL_ID`). Edit the `.env` file and run
```bash
docker compose --env-file .env up --build
```


#### Spinning up your app
Run `docker compose up` to spin up the backend. Monitor the logs for errors in-case you forgot to set an environment variable above.


#### Development
Run

```bash
docker compose up --build
```
and edit files in the `app` directory. The api will hot-reload on changes.

You can run the unit tests with

```bash
make test
```

#### Viewing API documentation

Once the FastAPI ap is started you can access its documentation and test the search endpoint by going to:
```
localhost:80/docs
```

This documentation should match the OpenAI OpenAPI spec located at https://github.com/openai/openai-openapi/blob/master/openapi.yaml


#### Running inference
```python
import openai
openai.api_base = "http://localhost:4891/v1"

openai.api_key = "not needed for a local LLM"


def test_completion():
    model = "gpt4all-j-v1.3-groovy"
    prompt = "Who is Michael Jordan?"
    response = openai.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=50,
        temperature=0.28,
        top_p=0.95,
        n=1,
        echo=True,
        stream=False
    )
    assert len(response['choices'][0]['text']) > len(prompt)
    print(response)
```
GPT4All API Scaffolding. Matches OpenAI OpenAPI spec for chats and completions (#839) * GPT4All API Scaffolding. Matches OpenAI OpenAI spec for engines, chats and completions * Edits for docker building * FastAPI app builds and pydantic models are accurate * Added groovy download into dockerfile * improved dockerfile * Chat completions endpoint edits * API uni test sketch * Working example of groovy inference with open ai api * Added lines to test * Set default to mpt 2023-06-28 18:28:52 +00:00			`# GPT4All REST API`
Update README.md Signed-off-by: AT <manyoso@users.noreply.github.com> 2024-03-11 15:01:02 +00:00
			`NOTICE: We are considering to deprecate this API as it has become challenging to maintain and test. If you have any interest in maintaining this or would like to takeover and adopt or discuss the future of this API please speak up in the discord channel.`

GPT4All API Scaffolding. Matches OpenAI OpenAPI spec for chats and completions (#839) * GPT4All API Scaffolding. Matches OpenAI OpenAI spec for engines, chats and completions * Edits for docker building * FastAPI app builds and pydantic models are accurate * Added groovy download into dockerfile * improved dockerfile * Chat completions endpoint edits * API uni test sketch * Working example of groovy inference with open ai api * Added lines to test * Set default to mpt 2023-06-28 18:28:52 +00:00			`This directory contains the source code to run and build docker images that run a FastAPI app`
			`for serving inference from GPT4All models. The API matches the OpenAI API spec.`

			`## Tutorial`

Update README.md (#1260) * Update README.md Signed-off-by: Elin Angelov <me@zetxx.eu> * Update README.md Signed-off-by: Elin Angelov <me@zetxx.eu> * Update README.md Signed-off-by: Elin Angelov <me@zetxx.eu> * Changed wording a tiny bit again Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * Added missing space Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> --------- Signed-off-by: Elin Angelov <me@zetxx.eu> Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de> 2023-08-11 18:14:53 +00:00			`The following tutorial assumes that you have checked out this repo and cd'd into it.`

GPT4All API Scaffolding. Matches OpenAI OpenAPI spec for chats and completions (#839) * GPT4All API Scaffolding. Matches OpenAI OpenAI spec for engines, chats and completions * Edits for docker building * FastAPI app builds and pydantic models are accurate * Added groovy download into dockerfile * improved dockerfile * Chat completions endpoint edits * API uni test sketch * Working example of groovy inference with open ai api * Added lines to test * Set default to mpt 2023-06-28 18:28:52 +00:00			`### Starting the app`

Update README.md (#1260) * Update README.md Signed-off-by: Elin Angelov <me@zetxx.eu> * Update README.md Signed-off-by: Elin Angelov <me@zetxx.eu> * Update README.md Signed-off-by: Elin Angelov <me@zetxx.eu> * Changed wording a tiny bit again Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * Added missing space Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> --------- Signed-off-by: Elin Angelov <me@zetxx.eu> Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de> 2023-08-11 18:14:53 +00:00			First change your working directory to `gpt4all/gpt4all-api`.

			`Now you can build the FastAPI docker image. You only have to do this on initial build or when you add new dependencies to the requirements.txt file:`
GPT4All API Scaffolding. Matches OpenAI OpenAPI spec for chats and completions (#839) * GPT4All API Scaffolding. Matches OpenAI OpenAI spec for engines, chats and completions * Edits for docker building * FastAPI app builds and pydantic models are accurate * Added groovy download into dockerfile * improved dockerfile * Chat completions endpoint edits * API uni test sketch * Working example of groovy inference with open ai api * Added lines to test * Set default to mpt 2023-06-28 18:28:52 +00:00			```bash
			`DOCKER_BUILDKIT=1 docker build -t gpt4all_api --progress plain -f gpt4all_api/Dockerfile.buildkit .`
			```

			`Then, start the backend with:`

			```bash
			`docker compose up --build`
			```

GPU Inference Server (#1112) * feat: local inference server * fix: source to use bash + vars * chore: isort and black * fix: make file + inference mode * chore: logging * refactor: remove old links * fix: add new env vars * feat: hf inference server * refactor: remove old links * test: batch and single response * chore: black + isort * separate gpu and cpu dockerfiles * moved gpu to separate dockerfile * Fixed test endpoints * Edits to API. server won't start due to failed instantiation error * Method signature * fix: gpu_infer * tests: fix tests --------- Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com> 2023-07-21 19:13:29 +00:00			`This will run both the API and locally hosted GPU inference server. If you want to run the API without the GPU inference server, you can run:`

			```bash
			`docker compose up --build gpt4all_api`
			```

			To run the API with the GPU inference server, you will need to include environment variables (like the `MODEL_ID`). Edit the `.env` file and run
			```bash
			`docker compose --env-file .env up --build`
			```


GPT4All API Scaffolding. Matches OpenAI OpenAPI spec for chats and completions (#839) * GPT4All API Scaffolding. Matches OpenAI OpenAI spec for engines, chats and completions * Edits for docker building * FastAPI app builds and pydantic models are accurate * Added groovy download into dockerfile * improved dockerfile * Chat completions endpoint edits * API uni test sketch * Working example of groovy inference with open ai api * Added lines to test * Set default to mpt 2023-06-28 18:28:52 +00:00			`#### Spinning up your app`
			Run `docker compose up` to spin up the backend. Monitor the logs for errors in-case you forgot to set an environment variable above.


			`#### Development`
			`Run`

			```bash
			`docker compose up --build`
			```
Update API guidance (#1924) Signed-off-by: Bojidar Markov <75314475+boshk0@users.noreply.github.com> 2024-02-04 17:04:58 +00:00			and edit files in the `app` directory. The api will hot-reload on changes.
GPT4All API Scaffolding. Matches OpenAI OpenAPI spec for chats and completions (#839) * GPT4All API Scaffolding. Matches OpenAI OpenAI spec for engines, chats and completions * Edits for docker building * FastAPI app builds and pydantic models are accurate * Added groovy download into dockerfile * improved dockerfile * Chat completions endpoint edits * API uni test sketch * Working example of groovy inference with open ai api * Added lines to test * Set default to mpt 2023-06-28 18:28:52 +00:00
			`You can run the unit tests with`

			```bash
			`make test`
			```

			`#### Viewing API documentation`

			`Once the FastAPI ap is started you can access its documentation and test the search endpoint by going to:`
			```
			`localhost:80/docs`
			```

Update README.md Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> 2023-06-28 18:29:15 +00:00			`This documentation should match the OpenAI OpenAPI spec located at https://github.com/openai/openai-openapi/blob/master/openapi.yaml`
Update README.md to include inference example Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> 2023-06-28 20:24:48 +00:00

			`#### Running inference`
			```python
			`import openai`
			`openai.api_base = "http://localhost:4891/v1"`

			`openai.api_key = "not needed for a local LLM"`


			`def test_completion():`
			`model = "gpt4all-j-v1.3-groovy"`
			`prompt = "Who is Michael Jordan?"`
			`response = openai.Completion.create(`
			`model=model,`
			`prompt=prompt,`
			`max_tokens=50,`
			`temperature=0.28,`
			`top_p=0.95,`
			`n=1,`
			`echo=True,`
			`stream=False`
			`)`
			`assert len(response['choices'][0]['text']) > len(prompt)`
			`print(response)`
			```