gpt4all/gpt4all-backend
Jared Van Bortel 88d85be0f9
chat: fix build on Windows and Nomic Embed path on macOS (#2467)
* chat: remove unused oscompat source files

These files are no longer needed now that the hnswlib index is gone.
This fixes an issue with the Windows build as there was a compilation
error in oscompat.cpp.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* llm: fix pragma to be recognized by MSVC

Replaces this MSVC warning:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp(53,21): warning C4081: expected '('; found 'string'

With this:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp : warning : offline installer build will not check for updates!

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* usearch: fork usearch to fix `CreateFile` build error

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* dlhandle: fix incorrect assertion on Windows

SetErrorMode returns the previous value of the error mode flags, not an
indicator of success.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* llamamodel: fix UB in LLamaModel::embedInternal

It is undefined behavior to increment an STL iterator past the end of
the container. Use offsets to do the math instead.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* cmake: install embedding model to bundle's Resources dir on macOS

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* ci: fix macOS build by explicitly installing Rosetta

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

---------

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-06-25 17:22:51 -04:00
..
llama.cpp-mainline@f67f4651fa llama.cpp: update submodule for "code" model crash workaround (#2382) 2024-05-29 10:50:00 -04:00
scripts convert: replace GPTJConfig with AutoConfig (#1866) 2024-01-22 12:14:55 -05:00
CMakeLists.txt Revert "typescript bindings maintenance (#2363)" 2024-06-03 17:26:19 -04:00
dlhandle.cpp chat: fix build on Windows and Nomic Embed path on macOS (#2467) 2024-06-25 17:22:51 -04:00
dlhandle.h backend: support non-ASCII characters in path to llmodel libs on Windows (#2388) 2024-05-31 13:12:28 -04:00
gptj_impl.h backend: fix #includes with include-what-you-use (#2371) 2024-05-31 16:34:54 -04:00
gptj.cpp chat: major UI redesign for v3.0.0 (#2396) 2024-06-24 18:49:23 -04:00
llama.cpp.cmake cmake: fix Metal build after #2310 (#2350) 2024-05-15 18:12:32 -04:00
llamamodel_impl.h backend: fix #includes with include-what-you-use (#2371) 2024-05-31 16:34:54 -04:00
llamamodel.cpp chat: fix build on Windows and Nomic Embed path on macOS (#2467) 2024-06-25 17:22:51 -04:00
llmodel_c.cpp chat: major UI redesign for v3.0.0 (#2396) 2024-06-24 18:49:23 -04:00
llmodel_c.h backend: fix #includes with include-what-you-use (#2371) 2024-05-31 16:34:54 -04:00
llmodel_shared.cpp chat: major UI redesign for v3.0.0 (#2396) 2024-06-24 18:49:23 -04:00
llmodel_shared.h chat: major UI redesign for v3.0.0 (#2396) 2024-06-24 18:49:23 -04:00
llmodel.cpp chat: major UI redesign for v3.0.0 (#2396) 2024-06-24 18:49:23 -04:00
llmodel.h backend: fix #includes with include-what-you-use (#2371) 2024-05-31 16:34:54 -04:00
README.md Update README.md 2023-06-04 08:46:37 -04:00
sysinfo.h backend: fix #includes with include-what-you-use (#2371) 2024-05-31 16:34:54 -04:00
utils.cpp chat: major UI redesign for v3.0.0 (#2396) 2024-06-24 18:49:23 -04:00
utils.h chat: major UI redesign for v3.0.0 (#2396) 2024-06-24 18:49:23 -04:00

GPT4ALL Backend

This directory contains the C/C++ model backend used by GPT4All for inference on the CPU. This backend acts as a universal library/wrapper for all models that the GPT4All ecosystem supports. Language bindings are built on top of this universal library. The native GPT4all Chat application directly uses this library for all inference.

What models are supported by the GPT4All ecosystem?

Currently, there are three different model architectures that are supported:

  1. GPTJ - Based off of the GPT-J architecture with examples found here
  2. LLAMA - Based off of the LLAMA architecture with examples found here
  3. MPT - Based off of Mosaic ML's MPT architecture with examples found here

Why so many different architectures? What differentiates them?

One of the major differences is license. Currently, the LLAMA based models are subject to a non-commercial license, whereas the GPTJ and MPT base models allow commercial usage. In the early advent of the recent explosion of activity in open source local models, the llama models have generally been seen as performing better, but that is changing quickly. Every week - even every day! - new models are released with some of the GPTJ and MPT models competitive in performance/quality with LLAMA. What's more, there are some very nice architectural innovations with the MPT models that could lead to new performance/quality gains.

How does GPT4All make these models available for CPU inference?

By leveraging the ggml library written by Georgi Gerganov and a growing community of developers. There are currently multiple different versions of this library. The original github repo can be found here, but the developer of the library has also created a LLAMA based version here. Currently, this backend is using the latter as a submodule.

Does that mean GPT4All is compatible with all llama.cpp models and vice versa?

Unfortunately, no for three reasons:

  1. The upstream llama.cpp project has introduced a compatibility breaking re-quantization method recently. This is a breaking change that renders all previous models (including the ones that GPT4All uses) inoperative with newer versions of llama.cpp since that change.
  2. The GPT4All backend has the llama.cpp submodule specifically pinned to a version prior to this breaking change.
  3. The GPT4All backend currently supports MPT based models as an added feature. Neither llama.cpp nor the original ggml repo support this architecture as of this writing, however efforts are underway to make MPT available in the ggml repo which you can follow here.

What is being done to make them more compatible?

A few things. Number one, we are maintaining compatibility with our current model zoo by way of the submodule pinning. However, we are also exploring how we can update to newer versions of llama.cpp without breaking our current models. This might involve an additional magic header check or it could possibly involve keeping the currently pinned submodule and also adding a new submodule with later changes and differienting them with namespaces or some other manner. Investigations continue.

What about GPU inference?

In newer versions of llama.cpp, there has been some added support for NVIDIA GPU's for inference. We're investigating how to incorporate this into our downloadable installers.

Ok, so bottom line... how do I make my model on Hugging Face compatible with GPT4All ecosystem right now?

  1. Check to make sure the Hugging Face model is available in one of our three supported architectures
  2. If it is, then you can use the conversion script inside of our pinned llama.cpp submodule for GPTJ and LLAMA based models
  3. Or if your model is an MPT model you can use the conversion script located directly in this backend directory under the scripts subdirectory

Check back for updates as we'll try to keep this updated as things change!