Commit Graph

261 Commits (d92252cab15e3f9724157b4d336a536c52bc4c78)

Author SHA1 Message Date
Jared Van Bortel e60b388a2e cmake: fix backwards LLAMA_KOMPUTE default
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel fc7e5f4a09
ci: fix missing Kompute support in python bindings (#1953)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
7 months ago
Jared Van Bortel bf493bb048
Mixtral crash fix and python bindings v2.2.0 (#1931)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 92c025a7f6
llamamodel: add 12 new architectures for CPU inference (#1914)
Baichuan, BLOOM, CodeShell, GPT-2, Orion, Persimmon, Phi and Phi-2,
Plamo, Qwen, Qwen2, Refact, StableLM

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 10e3f7bbf5
Fix VRAM leak when model loading fails (#1901)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel eadc3b8d80 backend: bump llama.cpp for VRAM leak fix when switching models
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 6db5307730 update llama.cpp for unhandled Vulkan OOM exception fix
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 0a40e71652
Maxwell/Pascal GPU support and crash fix (#1895)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel b11c3f679e bump llama.cpp-mainline for C++11 compat
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 061d1969f8
expose n_gpu_layers parameter of llama.cpp (#1890)
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel f549d5a70a backend : quick llama.cpp update to fix fallback to CPU
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 38c61493d2 backend: update to latest commit of llama.cpp Vulkan PR
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel 26acdebafa
convert: replace GPTJConfig with AutoConfig (#1866)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel a9c5f53562 update llama.cpp for nomic-ai/llama.cpp#12
Fixes #1477

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
8 months ago
Jared Van Bortel b7c92c5afd
sync llama.cpp with latest Vulkan PR and newer upstream (#1819) 8 months ago
Jared Van Bortel 7e9786fccf chat: set search path early
This fixes the issues with installed versions of v2.6.0.
8 months ago
AT 96cee4f9ac
Explicitly clear the kv cache each time we eval tokens to match n_past. (#1808) 9 months ago
ThiloteE 2d566710e5 Address review 9 months ago
ThiloteE a0f7d7ae0e Fix for "LLModel ERROR: Could not find CPU LLaMA implementation" v2 9 months ago
ThiloteE 38d81c14d0 Fixes https://github.com/nomic-ai/gpt4all/issues/1760 LLModel ERROR: Could not find CPU LLaMA implementation.
Inspired by Microsoft docs for LoadLibraryExA (https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa).
When using LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR, the lpFileName parameter must specify a fully qualified path, also it needs to be backslashes (\), not forward slashes (/).
9 months ago
Jared Van Bortel d1c56b8b28
Implement configurable context length (#1749) 9 months ago
Jared Van Bortel 3acbef14b7
fix AVX support by removing direct linking to AVX2 libs (#1750) 9 months ago
Jared Van Bortel 0600f551b3
chatllm: do not attempt to serialize incompatible state (#1742) 9 months ago
Jared Van Bortel 1df3da0a88 update llama.cpp for clang warning fix 9 months ago
Jared Van Bortel dfd8ef0186
backend: use ggml_new_graph for GGML backend v2 (#1719) 10 months ago
Jared Van Bortel 9e28dfac9c
Update to latest llama.cpp (#1706) 10 months ago
Adam Treat cce5fe2045 Fix macos build. 10 months ago
Adam Treat 371e2a5cbc LocalDocs version 2 with text embeddings. 10 months ago
Jared Van Bortel d4ce9f4a7c
llmodel_c: improve quality of error messages (#1625) 11 months ago
cebtenzzre 64101d3af5 update llama.cpp-mainline 11 months ago
Adam Treat ffef60912f Update to llama.cpp 11 months ago
Adam Treat f5f22fdbd0 Update llama.cpp for latest bugfixes. 11 months ago
cebtenzzre 7bcd9e8089 update llama.cpp-mainline 11 months ago
cebtenzzre fd0c501d68
backend: support GGUFv3 (#1582) 11 months ago
Adam Treat 14b410a12a Update to latest version of llama.cpp which fixes issue 1507. 11 months ago
Adam Treat ab96035bec Update to llama.cpp submodule for some vulkan fixes. 11 months ago
cebtenzzre e90263c23f
make scripts executable (#1555) 11 months ago
Aaron Miller f414c28589 llmodel: whitelist library name patterns
this fixes some issues that were being seen on installed windows builds of 2.5.0

only load dlls that actually might be model impl dlls, otherwise we pull all sorts of random junk into the process before it might expect to be

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
11 months ago
cebtenzzre 4338e72a51
MPT: use upstream llama.cpp implementation (#1515) 11 months ago
cebtenzzre 0fe2e19691
llamamodel: re-enable error messages by default (#1537) 11 months ago
cebtenzzre 017c3a9649
python: prepare version 2.0.0rc1 (#1529) 11 months ago
cebtenzzre 9a19c740ee
kompute: fix library loading issues with kp_logger (#1517) 11 months ago
Aaron Miller f79557d2aa speedup: just use mat*vec shaders for mat*mat
so far my from-scratch mat*mats are still slower than just running more
invocations of the existing Metal ported mat*vec shaders - it should be
theoretically possible to make a mat*mat that's faster (for actual
mat*mat cases) than an optimal mat*vec, but it will need to be at
*least* as fast as the mat*vec op and then take special care to be
cache-friendly and save memory bandwidth, as the # of compute ops is the
same
11 months ago
cebtenzzre 22de3c56bd
convert scripts: fix AutoConfig typo (#1512) 11 months ago
Aaron Miller 2490977f89 q6k, q4_1 mat*mat 11 months ago
Aaron Miller afaa291eab python bindings should be quiet by default
* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
  nonempty
* make verbose flag for retrieve_model default false (but also be
  overridable via gpt4all constructor)

should be able to run a basic test:

```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```

and see no non-model output when successful
11 months ago
cebtenzzre 7b611b49f2
llmodel: print an error if the CPU does not support AVX (#1499) 11 months ago
Aaron Miller 043617168e do not process prompts on gpu yet 11 months ago
Aaron Miller 64001a480a mat*mat for q4_0, q8_0 11 months ago
cebtenzzre 7a19047329
llmodel: do not call magic_match unless build variant is correct (#1488) 11 months ago
Cebtenzzre 5fe685427a chat: clearer CPU fallback messages 12 months ago
Adam Treat eec906aa05 Speculative fix for build on mac. 12 months ago
Adam Treat a9acdd25de Push a new version number for llmodel backend now that it is based on gguf. 12 months ago
Cebtenzzre 8bb6a6c201 rebase on newer llama.cpp 12 months ago
Cebtenzzre d87573ea75 remove old llama.cpp submodules 12 months ago
Cebtenzzre cc6db61c93 backend: fix build with Visual Studio generator
Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This
is needed because Visual Studio is a multi-configuration generator, so
we do not know what the build type will be until `cmake --build` is
called.

Fixes #1470
12 months ago
Adam Treat f605a5b686 Add q8_0 kernels to kompute shaders and bump to latest llama/gguf. 12 months ago
Cebtenzzre 672cb850f9 differentiate between init failure and unsupported models 12 months ago
Adam Treat 906699e8e9 Bump to latest llama/gguf branch. 12 months ago
Cebtenzzre 088afada49 llamamodel: fix static vector in LLamaModel::endTokens 12 months ago
Adam Treat b4d82ea289 Bump to the latest fixes for vulkan in llama. 12 months ago
Adam Treat 12f943e966 Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. 12 months ago
Adam Treat 5d346e13d7 Add q6_k kernels for vulkan. 12 months ago
Adam Treat 4eefd386d0 Refactor for subgroups on mat * vec kernel. 12 months ago
Cebtenzzre 3c2aa299d8 gptj: remove unused variables 12 months ago
Cebtenzzre f9deb87d20 convert scripts: add feed-forward length for better compatiblilty
This GGUF key is used by all llama.cpp models with upstream support.
12 months ago
Cebtenzzre cc7675d432 convert scripts: make gptj script executable 12 months ago
Cebtenzzre 0493e6eb07 convert scripts: use bytes_to_unicode from transformers 12 months ago
Cebtenzzre d5d72f0361 gpt-j: update inference to match latest llama.cpp insights
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78
12 months ago
Cebtenzzre 050e7f076e backend: port GPT-J to GGUF 12 months ago
Cebtenzzre 8f3abb37ca fix references to removed model types 12 months ago
Cebtenzzre 4219c0e2e7 convert scripts: make them directly executable 12 months ago
Cebtenzzre ce7be1db48 backend: use llamamodel.cpp for Falcon 12 months ago
Cebtenzzre cca9e6ce81 convert_mpt_hf_to_gguf.py: better tokenizer decoding 12 months ago
Cebtenzzre 25297786db convert scripts: load model as late as possible 12 months ago
Cebtenzzre fd47088f2b conversion scripts: cleanup 12 months ago
Cebtenzzre 6277eac9cc backend: use llamamodel.cpp for StarCoder 12 months ago
Cebtenzzre 17fc9e3e58 backend: port Replit to GGUF 12 months ago
Cebtenzzre 7c67262a13 backend: port MPT to GGUF 12 months ago
Cebtenzzre 42bcb814b3 backend: port BERT to GGUF 12 months ago
Cebtenzzre 1d29e4696c llamamodel: metal supports all quantization types now 12 months ago
Aaron Miller 507753a37c macos build fixes 12 months ago
Adam Treat d90d003a1d Latest rebase on llama.cpp with gguf support. 12 months ago
Adam Treat 99c106e6b5 Fix a bug seen on AMD RADEON cards with vulkan backend. 12 months ago
Jacob Nguyen e86c63750d Update llama.cpp.cmake
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
1 year ago
Adam Treat 84905aa281 Fix for crashes on systems where vulkan is not installed properly. 1 year ago
Adam Treat 045f6e6cdc Link against ggml in bin so we can get the available devices without loading a model. 1 year ago
Adam Treat aa33419c6e Fallback to CPU more robustly. 1 year ago
Adam Treat 9013a089bd Bump to new llama with new bugfix. 1 year ago
Adam Treat 3076e0bf26 Only show GPU when we're actually using it. 1 year ago
Adam Treat cf4eb530ce Sync to a newer version of llama.cpp with bugfix for vulkan. 1 year ago
Adam Treat 4b9a345aee Update the submodule. 1 year ago
Aaron Miller 6f038c136b init at most one vulkan device, submodule update
fixes issues w/ multiple of the same gpu
1 year ago
Adam Treat 8f99dca70f Bring the vulkan backend to the GUI. 1 year ago
Aaron Miller f0735efa7d vulkan python bindings on windows fixes 1 year ago
Adam Treat c953b321b7 Don't link against libvulkan. 1 year ago
Aaron Miller c4d23512e4 remove extra dynamic linker deps when building with vulkan 1 year ago
Adam Treat 85e34598f9 more circleci 1 year ago
Adam Treat f578fa6cdf Fix for windows. 1 year ago
Adam Treat 17d3e4976c Add a comment indicating future work. 1 year ago