Commit Graph

240 Commits (aac00d019a3b01b9a661f432ea11589f617a89c0)

Author SHA1 Message Date
Jared Van Bortel ba53ab5da0
python: do not print GPU name with verbose=False, expose this info via properties (#2222)
* llamamodel: only print device used in verbose mode

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: expose backend and device via GPT4All properties

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* backend: const correctness fixes

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: bump version

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: typing fixups

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: fix segfault with closed GPT4All

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

---------

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel ac498f79ac
fix regressions in system prompt handling (#2219)
* python: fix system prompt being ignored
* fix unintended whitespace after system prompt

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 3f8257c563
llamamodel: fix semantic typo in nomic client dynamic mode (#2216)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 46818e466e
python: embedding cancel callback for nomic client dynamic mode (#2214)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 459289b94c
embed4all: small fixes related to nomic client local embeddings (#2213)
* actually submit larger batches with increased n_ctx
* fix crash when llama_tokenize returns no tokens

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 1b84a48c47
python: add list_gpus to the GPT4All API (#2194)
Other changes:
* fix memory leak in llmodel_available_gpu_devices
* drop model argument from llmodel_available_gpu_devices
* breaking: make GPT4All/Embed4All arguments past model_name keyword-only

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 67843edc7c
backend: update llama.cpp submodule for wpm locale fix (#2163)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 83ada4ca89
backend: update llama.cpp submodule for Unicode paths fix (#2162)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 0455b80b7f
Embed4All: optionally count tokens, misc fixes (#2145)
Key changes:
* python: optionally return token count in Embed4All.embed
* python and docs: models2.json -> models3.json
* Embed4All: require explicit prefix for unknown models
* llamamodel: fix shouldAddBOS for Bert and Nomic Bert

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel a1bb6084ed
python: documentation update and typing improvements (#2129)
Key changes:
* revert "python: tweak constructor docstrings"
* docs: update python GPT4All and Embed4All documentation
* breaking: require keyword args to GPT4All.generate

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 699410014a
fix non-AVX CPU detection (#2141)
* chat: fix non-AVX CPU detection on Windows
* bindings: throw exception instead of logging to console

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 255568fb9a
python: various fixes for GPT4All and Embed4All (#2130)
Key changes:
* honor empty system prompt argument
* current_chat_session is now read-only and defaults to None
* deprecate fallback prompt template for unknown models
* fix mistakes from #2086

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 53f109f519
llamamodel: fix macOS build (#2125)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 406e88b59a
implement local Nomic Embed via llama.cpp (#2086)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 5c248dbec9
models: new MPT model file without duplicated token_embd.weight (#2006)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel c19b763e03
llmodel_c: expose fakeReply to the bindings (#2061)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel f500bcf6e5
llmodel: default to a blank line between reply and next prompt (#1996)
Also make some related adjustments to the provided Alpaca-style prompt templates
and system prompts.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel 007d469034
bert: fix layer norm epsilon value (#1946)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Adam Treat f720261d46 Fix another vulnerable spot for crashes.
Signed-off-by: Adam Treat <treat.adam@gmail.com>
7 months ago
chrisbarrera f8b1069a1c
add min_p sampling parameter (#2014)
Signed-off-by: Christopher Barrera <cb@arda.tx.rr.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
7 months ago
Jared Van Bortel e7f2ff189f fix some compilation warnings on macOS
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel 88e330ef0e
llama.cpp: enable Kompute support for 10 more model arches (#2005)
These are Baichuan, Bert and Nomic Bert, CodeShell, GPT-2, InternLM,
MiniCPM, Orion, Qwen, and StarCoder.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel fc6c5ea0c7
llama.cpp: gemma: allow offloading the output tensor (#1997)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel 4fc4d94be4
fix chat-style prompt templates (#1970)
Also use a new version of Mistral OpenOrca.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel 7810b757c9 llamamodel: add gemma model support
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Adam Treat d948a4f2ee Complete revamp of model loading to allow for more discreet control by
the user of the models loading behavior.

Signed-off-by: Adam Treat <treat.adam@gmail.com>
7 months ago
Jared Van Bortel 6fdec808b2 backend: update llama.cpp for faster state serialization
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel a1471becf3 backend: update llama.cpp for Intel GPU blacklist
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel eb1081d37e cmake: fix LLAMA_DIR use before set
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel e60b388a2e cmake: fix backwards LLAMA_KOMPUTE default
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel fc7e5f4a09
ci: fix missing Kompute support in python bindings (#1953)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel bf493bb048
Mixtral crash fix and python bindings v2.2.0 (#1931)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 92c025a7f6
llamamodel: add 12 new architectures for CPU inference (#1914)
Baichuan, BLOOM, CodeShell, GPT-2, Orion, Persimmon, Phi and Phi-2,
Plamo, Qwen, Qwen2, Refact, StableLM

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 10e3f7bbf5
Fix VRAM leak when model loading fails (#1901)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel eadc3b8d80 backend: bump llama.cpp for VRAM leak fix when switching models
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 6db5307730 update llama.cpp for unhandled Vulkan OOM exception fix
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 0a40e71652
Maxwell/Pascal GPU support and crash fix (#1895)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel b11c3f679e bump llama.cpp-mainline for C++11 compat
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 061d1969f8
expose n_gpu_layers parameter of llama.cpp (#1890)
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel f549d5a70a backend : quick llama.cpp update to fix fallback to CPU
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 38c61493d2 backend: update to latest commit of llama.cpp Vulkan PR
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 26acdebafa
convert: replace GPTJConfig with AutoConfig (#1866)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel a9c5f53562 update llama.cpp for nomic-ai/llama.cpp#12
Fixes #1477

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel b7c92c5afd
sync llama.cpp with latest Vulkan PR and newer upstream (#1819) 8 months ago
Jared Van Bortel 7e9786fccf chat: set search path early
This fixes the issues with installed versions of v2.6.0.
8 months ago
AT 96cee4f9ac
Explicitly clear the kv cache each time we eval tokens to match n_past. (#1808) 9 months ago
ThiloteE 2d566710e5 Address review 9 months ago
ThiloteE a0f7d7ae0e Fix for "LLModel ERROR: Could not find CPU LLaMA implementation" v2 9 months ago
ThiloteE 38d81c14d0 Fixes https://github.com/nomic-ai/gpt4all/issues/1760 LLModel ERROR: Could not find CPU LLaMA implementation.
Inspired by Microsoft docs for LoadLibraryExA (https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa).
When using LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR, the lpFileName parameter must specify a fully qualified path, also it needs to be backslashes (\), not forward slashes (/).
9 months ago
Jared Van Bortel d1c56b8b28
Implement configurable context length (#1749) 9 months ago