Commit Graph

47 Commits (d515ad3b183fe06a9d5c2b84a26a7384d24fc616)

Author SHA1 Message Date
Jared Van Bortel bd307abfe6
backend: fix a crash on inputs greater than n_ctx (#2498)
This fixes a regression in commit 4fc4d94b ("fix chat-style prompt
templates (#1970)"), which moved some return statements into a new
function (LLModel::decodePrompt) without making them return from the
parent as well.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
3 months ago
Jared Van Bortel 01870b4a46
chat: fix blank device in UI and improve Mixpanel reporting (#2409)
Also remove LLModel::hasGPUDevice.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
3 months ago
Jared Van Bortel 636307160e
backend: fix #includes with include-what-you-use (#2371)
Also fix a PARENT_SCOPE warning when building the backend.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
4 months ago
Jared Van Bortel d2a99d9bc6
support the llama.cpp CUDA backend (#2310)
* rebase onto llama.cpp commit ggerganov/llama.cpp@d46dbc76f
* support for CUDA backend (enabled by default)
* partial support for Occam's Vulkan backend (disabled by default)
* partial support for HIP/ROCm backend (disabled by default)
* sync llama.cpp.cmake with upstream llama.cpp CMakeLists.txt
* changes to GPT4All backend, bindings, and chat UI to handle choice of llama.cpp backend (Kompute or CUDA)
* ship CUDA runtime with installed version
* make device selection in the UI on macOS actually do something
* model whitelist: remove dbrx, mamba, persimmon, plamo; add internlm and starcoder2

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
4 months ago
Jared Van Bortel 577ebd4826
mixpanel: report cpu_supports_avx2 on startup (#2299)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel c622921894
improve mixpanel usage statistics (#2238)
Other changes:
- Always display first start dialog if privacy options are unset (e.g. if the user closed GPT4All without selecting them)
- LocalDocs scanQueue is now always deferred
- Fix a potential crash in magic_match
- LocalDocs indexing is now started after the first start dialog is dismissed so usage stats are included

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel ba53ab5da0
python: do not print GPU name with verbose=False, expose this info via properties (#2222)
* llamamodel: only print device used in verbose mode

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: expose backend and device via GPT4All properties

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* backend: const correctness fixes

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: bump version

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: typing fixups

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

* python: fix segfault with closed GPT4All

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

---------

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 46818e466e
python: embedding cancel callback for nomic client dynamic mode (#2214)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 1b84a48c47
python: add list_gpus to the GPT4All API (#2194)
Other changes:
* fix memory leak in llmodel_available_gpu_devices
* drop model argument from llmodel_available_gpu_devices
* breaking: make GPT4All/Embed4All arguments past model_name keyword-only

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 0455b80b7f
Embed4All: optionally count tokens, misc fixes (#2145)
Key changes:
* python: optionally return token count in Embed4All.embed
* python and docs: models2.json -> models3.json
* Embed4All: require explicit prefix for unknown models
* llamamodel: fix shouldAddBOS for Bert and Nomic Bert

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 699410014a
fix non-AVX CPU detection (#2141)
* chat: fix non-AVX CPU detection on Windows
* bindings: throw exception instead of logging to console

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
Jared Van Bortel 406e88b59a
implement local Nomic Embed via llama.cpp (#2086)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
6 months ago
chrisbarrera f8b1069a1c
add min_p sampling parameter (#2014)
Signed-off-by: Christopher Barrera <cb@arda.tx.rr.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
7 months ago
Jared Van Bortel 4fc4d94be4
fix chat-style prompt templates (#1970)
Also use a new version of Mistral OpenOrca.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Adam Treat d948a4f2ee Complete revamp of model loading to allow for more discreet control by
the user of the models loading behavior.

Signed-off-by: Adam Treat <treat.adam@gmail.com>
7 months ago
Jared Van Bortel 061d1969f8
expose n_gpu_layers parameter of llama.cpp (#1890)
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 38c61493d2 backend: update to latest commit of llama.cpp Vulkan PR
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 7e9786fccf chat: set search path early
This fixes the issues with installed versions of v2.6.0.
8 months ago
Jared Van Bortel d1c56b8b28
Implement configurable context length (#1749) 9 months ago
Jared Van Bortel 3acbef14b7
fix AVX support by removing direct linking to AVX2 libs (#1750) 9 months ago
Jared Van Bortel 9e28dfac9c
Update to latest llama.cpp (#1706) 10 months ago
Cebtenzzre 5fe685427a chat: clearer CPU fallback messages 12 months ago
Cebtenzzre 672cb850f9 differentiate between init failure and unsupported models 12 months ago
Adam Treat d90d003a1d Latest rebase on llama.cpp with gguf support. 12 months ago
Adam Treat 045f6e6cdc Link against ggml in bin so we can get the available devices without loading a model. 1 year ago
Adam Treat 3076e0bf26 Only show GPU when we're actually using it. 1 year ago
Adam Treat 987546c63b Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 1 year ago
Adam Treat 0efdbfcffe Bert 1 year ago
Adam Treat 315a1f2aa2 Move it back as internal class. 1 year ago
Adam Treat 1f749d7633 Clean up backend code a bit and hide impl. details. 1 year ago
Adam Treat 33557b1f39 Move the implementation out of llmodel class. 1 year ago
Aaron Miller 7a5f6e4726 limit prompt batch size to 128 1 year ago
Aaron Miller b19a3e5b2c add requiredMem method to llmodel impls
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
1 year ago
Aaron Miller 88616fde7f
llmodel: change tokenToString to not use string_view (#968)
fixes a definite use-after-free and likely avoids some other
potential ones - std::string will convert to a std::string_view
automatically but as soon as the std::string in question goes out of
scope it is already freed and the string_view is pointing at freed
memory - this is *mostly* fine if its returning a reference to the
tokenizer's internal vocab table but it's, imo, too easy to return a
reference to a dynamically constructed string with this as replit is
doing (and unfortunately needs to do to convert the internal whitespace
replacement symbol back to a space)
1 year ago
niansa/tuxifan 14e9ccbc6a Do auto detection by default in C++ API
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
1 year ago
Adam Treat 8a9ad258f4 Fix symbol resolution on windows. 1 year ago
Adam Treat 301d2fdbea Fix up for newer models on reset context. This fixes the model from totally failing after a reset context. 1 year ago
AT bbe195ee02
Backend prompt dedup (#822)
* Deduplicated prompt() function code
1 year ago
Richard Guo c54c42e3fb fixed finding model libs 1 year ago
Adam Treat a41bd6ac0a Trying to shrink the copy+paste code and do more code sharing between backend model impl. 1 year ago
niansa 5175db2781 Fixed double-free in LLModel::Implementation destructor 1 year ago
niansa/tuxifan fc60f0c09c
Cleaned up implementation management (#787)
* Cleaned up implementation management

* Initialize LLModel::m_implementation to nullptr

* llmodel.h: Moved dlhandle fwd declare above LLModel class
1 year ago
Adam Treat 1eca524171 Add fixme's and clean up a bit. 1 year ago
niansa a3d08cdcd5 Dlopen better implementation management (Version 2) 1 year ago
AT 48275d0dcc
Dlopen backend 5 (#779)
Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.
1 year ago
Juuso Alasuutari 81fdc28e58 llmodel: constify LLModel::threadCount() 1 year ago
Adam Treat d918b02c29 Move the llmodel C API to new top-level directory and version it. 1 year ago