Commit Graph

153 Commits

Author SHA1 Message Date
Adam Treat
906699e8e9 Bump to latest llama/gguf branch. 2023-10-05 18:16:19 -04:00
Cebtenzzre
088afada49 llamamodel: fix static vector in LLamaModel::endTokens 2023-10-05 18:16:19 -04:00
Adam Treat
b4d82ea289 Bump to the latest fixes for vulkan in llama. 2023-10-05 18:16:19 -04:00
Adam Treat
12f943e966 Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. 2023-10-05 18:16:19 -04:00
Adam Treat
5d346e13d7 Add q6_k kernels for vulkan. 2023-10-05 18:16:19 -04:00
Adam Treat
4eefd386d0 Refactor for subgroups on mat * vec kernel. 2023-10-05 18:16:19 -04:00
Cebtenzzre
3c2aa299d8 gptj: remove unused variables 2023-10-05 18:16:19 -04:00
Cebtenzzre
f9deb87d20 convert scripts: add feed-forward length for better compatiblilty
This GGUF key is used by all llama.cpp models with upstream support.
2023-10-05 18:16:19 -04:00
Cebtenzzre
cc7675d432 convert scripts: make gptj script executable 2023-10-05 18:16:19 -04:00
Cebtenzzre
0493e6eb07 convert scripts: use bytes_to_unicode from transformers 2023-10-05 18:16:19 -04:00
Cebtenzzre
d5d72f0361 gpt-j: update inference to match latest llama.cpp insights
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78
2023-10-05 18:16:19 -04:00
Cebtenzzre
050e7f076e backend: port GPT-J to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
8f3abb37ca fix references to removed model types 2023-10-05 18:16:19 -04:00
Cebtenzzre
4219c0e2e7 convert scripts: make them directly executable 2023-10-05 18:16:19 -04:00
Cebtenzzre
ce7be1db48 backend: use llamamodel.cpp for Falcon 2023-10-05 18:16:19 -04:00
Cebtenzzre
cca9e6ce81 convert_mpt_hf_to_gguf.py: better tokenizer decoding 2023-10-05 18:16:19 -04:00
Cebtenzzre
25297786db convert scripts: load model as late as possible 2023-10-05 18:16:19 -04:00
Cebtenzzre
fd47088f2b conversion scripts: cleanup 2023-10-05 18:16:19 -04:00
Cebtenzzre
6277eac9cc backend: use llamamodel.cpp for StarCoder 2023-10-05 18:16:19 -04:00
Cebtenzzre
17fc9e3e58 backend: port Replit to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
7c67262a13 backend: port MPT to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
42bcb814b3 backend: port BERT to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
1d29e4696c llamamodel: metal supports all quantization types now 2023-10-05 18:16:19 -04:00
Aaron Miller
507753a37c macos build fixes 2023-10-05 18:16:19 -04:00
Adam Treat
d90d003a1d Latest rebase on llama.cpp with gguf support. 2023-10-05 18:16:19 -04:00
Adam Treat
99c106e6b5 Fix a bug seen on AMD RADEON cards with vulkan backend. 2023-09-26 11:59:47 -04:00
Jacob Nguyen
e86c63750d Update llama.cpp.cmake
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
2023-09-16 11:42:56 -07:00
Adam Treat
84905aa281 Fix for crashes on systems where vulkan is not installed properly. 2023-09-16 12:19:46 -04:00
Adam Treat
045f6e6cdc Link against ggml in bin so we can get the available devices without loading a model. 2023-09-15 14:45:25 -04:00
Adam Treat
aa33419c6e Fallback to CPU more robustly. 2023-09-14 16:53:11 -04:00
Adam Treat
9013a089bd Bump to new llama with new bugfix. 2023-09-14 10:02:11 -04:00
Adam Treat
3076e0bf26 Only show GPU when we're actually using it. 2023-09-14 09:59:19 -04:00
Adam Treat
cf4eb530ce Sync to a newer version of llama.cpp with bugfix for vulkan. 2023-09-13 21:01:44 -04:00
Adam Treat
4b9a345aee Update the submodule. 2023-09-13 17:05:46 -04:00
Aaron Miller
6f038c136b init at most one vulkan device, submodule update
fixes issues w/ multiple of the same gpu
2023-09-13 12:49:53 -07:00
Adam Treat
8f99dca70f Bring the vulkan backend to the GUI. 2023-09-13 11:26:10 -04:00
Aaron Miller
f0735efa7d vulkan python bindings on windows fixes 2023-09-12 14:16:02 -07:00
Adam Treat
c953b321b7 Don't link against libvulkan. 2023-09-12 14:26:56 -04:00
Aaron Miller
c4d23512e4 remove extra dynamic linker deps when building with vulkan 2023-09-11 08:44:39 -07:00
Adam Treat
85e34598f9 more circleci 2023-08-31 15:29:54 -04:00
Adam Treat
f578fa6cdf Fix for windows. 2023-08-31 15:29:54 -04:00
Adam Treat
17d3e4976c Add a comment indicating future work. 2023-08-31 15:29:54 -04:00
Adam Treat
987546c63b Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 2023-08-31 15:29:54 -04:00
Adam Treat
d55cbbee32 Update to newer llama.cpp and disable older forks. 2023-08-31 15:29:54 -04:00
Aaron Miller
0bc2274869 bump llama.cpp version + needed fixes for that 2023-08-31 15:29:54 -04:00
aaron miller
33c22be2aa starcoder: use ggml_graph_plan 2023-08-31 15:29:54 -04:00
Cosmic Snow
108d950874 Fix Windows unable to load models on older Windows builds
- Replace high-level IsProcessorFeaturePresent
- Reintroduce low-level compiler intrinsics implementation
2023-08-09 09:27:43 +02:00
Adam Treat
6d03b3e500 Add starcoder support. 2023-07-27 09:15:16 -04:00
cosmic-snow
2d02c65177
Handle edge cases when generating embeddings (#1215)
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
2023-07-17 13:21:03 -07:00
Aaron Miller
1c4a244291 bump mem allocation a bit 2023-07-14 09:48:57 -04:00