gpt4all/gpt4all-api/README.md

# GPT4All REST API
This directory contains the source code to run and build docker images that run a FastAPI app
for serving inference from GPT4All models. The API matches the OpenAI API spec.

## Tutorial

### Starting the app

First build the FastAPI docker image. You only have to do this on initial build or when you add new dependencies to the requirements.txt file:
```bash
DOCKER_BUILDKIT=1 docker build -t gpt4all_api --progress plain -f gpt4all_api/Dockerfile.buildkit .
```

Then, start the backend with:

```bash
docker compose up --build
```

This will run both the API and locally hosted GPU inference server. If you want to run the API without the GPU inference server, you can run:

```bash
docker compose up --build gpt4all_api
```

To run the API with the GPU inference server, you will need to include environment variables (like the `MODEL_ID`). Edit the `.env` file and run
```bash
docker compose --env-file .env up --build
```


#### Spinning up your app
Run `docker compose up` to spin up the backend. Monitor the logs for errors in-case you forgot to set an environment variable above.


#### Development
Run

```bash
docker compose up --build
```
and edit files in the `api` directory. The api will hot-reload on changes.

You can run the unit tests with

```bash
make test
```

#### Viewing API documentation

Once the FastAPI ap is started you can access its documentation and test the search endpoint by going to:
```
localhost:80/docs
```

This documentation should match the OpenAI OpenAPI spec located at https://github.com/openai/openai-openapi/blob/master/openapi.yaml


#### Running inference
```python
import openai
openai.api_base = "http://localhost:4891/v1"

openai.api_key = "not needed for a local LLM"


def test_completion():
    model = "gpt4all-j-v1.3-groovy"
    prompt = "Who is Michael Jordan?"
    response = openai.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=50,
        temperature=0.28,
        top_p=0.95,
        n=1,
        echo=True,
        stream=False
    )
    assert len(response['choices'][0]['text']) > len(prompt)
    print(response)
```
GPT4All API Scaffolding. Matches OpenAI OpenAPI spec for chats and completions (#839) * GPT4All API Scaffolding. Matches OpenAI OpenAI spec for engines, chats and completions * Edits for docker building * FastAPI app builds and pydantic models are accurate * Added groovy download into dockerfile * improved dockerfile * Chat completions endpoint edits * API uni test sketch * Working example of groovy inference with open ai api * Added lines to test * Set default to mpt 2023-06-28 18:28:52 +00:00			`# GPT4All REST API`
			`This directory contains the source code to run and build docker images that run a FastAPI app`
			`for serving inference from GPT4All models. The API matches the OpenAI API spec.`

			`## Tutorial`

			`### Starting the app`

			`First build the FastAPI docker image. You only have to do this on initial build or when you add new dependencies to the requirements.txt file:`
			```bash
			`DOCKER_BUILDKIT=1 docker build -t gpt4all_api --progress plain -f gpt4all_api/Dockerfile.buildkit .`
			```

			`Then, start the backend with:`

			```bash
			`docker compose up --build`
			```

GPU Inference Server (#1112) * feat: local inference server * fix: source to use bash + vars * chore: isort and black * fix: make file + inference mode * chore: logging * refactor: remove old links * fix: add new env vars * feat: hf inference server * refactor: remove old links * test: batch and single response * chore: black + isort * separate gpu and cpu dockerfiles * moved gpu to separate dockerfile * Fixed test endpoints * Edits to API. server won't start due to failed instantiation error * Method signature * fix: gpu_infer * tests: fix tests --------- Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com> 2023-07-21 19:13:29 +00:00			`This will run both the API and locally hosted GPU inference server. If you want to run the API without the GPU inference server, you can run:`

			```bash
			`docker compose up --build gpt4all_api`
			```

			To run the API with the GPU inference server, you will need to include environment variables (like the `MODEL_ID`). Edit the `.env` file and run
			```bash
			`docker compose --env-file .env up --build`
			```


GPT4All API Scaffolding. Matches OpenAI OpenAPI spec for chats and completions (#839) * GPT4All API Scaffolding. Matches OpenAI OpenAI spec for engines, chats and completions * Edits for docker building * FastAPI app builds and pydantic models are accurate * Added groovy download into dockerfile * improved dockerfile * Chat completions endpoint edits * API uni test sketch * Working example of groovy inference with open ai api * Added lines to test * Set default to mpt 2023-06-28 18:28:52 +00:00			`#### Spinning up your app`
			Run `docker compose up` to spin up the backend. Monitor the logs for errors in-case you forgot to set an environment variable above.


			`#### Development`
			`Run`

			```bash
			`docker compose up --build`
			```
			and edit files in the `api` directory. The api will hot-reload on changes.

			`You can run the unit tests with`

			```bash
			`make test`
			```

			`#### Viewing API documentation`

			`Once the FastAPI ap is started you can access its documentation and test the search endpoint by going to:`
			```
			`localhost:80/docs`
			```

Update README.md Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> 2023-06-28 18:29:15 +00:00			`This documentation should match the OpenAI OpenAPI spec located at https://github.com/openai/openai-openapi/blob/master/openapi.yaml`
Update README.md to include inference example Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> 2023-06-28 20:24:48 +00:00

			`#### Running inference`
			```python
			`import openai`
			`openai.api_base = "http://localhost:4891/v1"`

			`openai.api_key = "not needed for a local LLM"`


			`def test_completion():`
			`model = "gpt4all-j-v1.3-groovy"`
			`prompt = "Who is Michael Jordan?"`
			`response = openai.Completion.create(`
			`model=model,`
			`prompt=prompt,`
			`max_tokens=50,`
			`temperature=0.28,`
			`top_p=0.95,`
			`n=1,`
			`echo=True,`
			`stream=False`
			`)`
			`assert len(response['choices'][0]['text']) > len(prompt)`
			`print(response)`
			```