Commit Graph

153 Commits (75deee9adb440b1cbe783784cc4d73c4b4b1aff2)

Author SHA1 Message Date
Adam Treat 906699e8e9 Bump to latest llama/gguf branch. 11 months ago
Cebtenzzre 088afada49 llamamodel: fix static vector in LLamaModel::endTokens 11 months ago
Adam Treat b4d82ea289 Bump to the latest fixes for vulkan in llama. 11 months ago
Adam Treat 12f943e966 Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. 11 months ago
Adam Treat 5d346e13d7 Add q6_k kernels for vulkan. 11 months ago
Adam Treat 4eefd386d0 Refactor for subgroups on mat * vec kernel. 11 months ago
Cebtenzzre 3c2aa299d8 gptj: remove unused variables 11 months ago
Cebtenzzre f9deb87d20 convert scripts: add feed-forward length for better compatiblilty
This GGUF key is used by all llama.cpp models with upstream support.
11 months ago
Cebtenzzre cc7675d432 convert scripts: make gptj script executable 11 months ago
Cebtenzzre 0493e6eb07 convert scripts: use bytes_to_unicode from transformers 11 months ago
Cebtenzzre d5d72f0361 gpt-j: update inference to match latest llama.cpp insights
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78
11 months ago
Cebtenzzre 050e7f076e backend: port GPT-J to GGUF 11 months ago
Cebtenzzre 8f3abb37ca fix references to removed model types 11 months ago
Cebtenzzre 4219c0e2e7 convert scripts: make them directly executable 11 months ago
Cebtenzzre ce7be1db48 backend: use llamamodel.cpp for Falcon 11 months ago
Cebtenzzre cca9e6ce81 convert_mpt_hf_to_gguf.py: better tokenizer decoding 11 months ago
Cebtenzzre 25297786db convert scripts: load model as late as possible 11 months ago
Cebtenzzre fd47088f2b conversion scripts: cleanup 11 months ago
Cebtenzzre 6277eac9cc backend: use llamamodel.cpp for StarCoder 11 months ago
Cebtenzzre 17fc9e3e58 backend: port Replit to GGUF 11 months ago
Cebtenzzre 7c67262a13 backend: port MPT to GGUF 11 months ago
Cebtenzzre 42bcb814b3 backend: port BERT to GGUF 11 months ago
Cebtenzzre 1d29e4696c llamamodel: metal supports all quantization types now 11 months ago
Aaron Miller 507753a37c macos build fixes 11 months ago
Adam Treat d90d003a1d Latest rebase on llama.cpp with gguf support. 11 months ago
Adam Treat 99c106e6b5 Fix a bug seen on AMD RADEON cards with vulkan backend. 12 months ago
Jacob Nguyen e86c63750d Update llama.cpp.cmake
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
12 months ago
Adam Treat 84905aa281 Fix for crashes on systems where vulkan is not installed properly. 12 months ago
Adam Treat 045f6e6cdc Link against ggml in bin so we can get the available devices without loading a model. 12 months ago
Adam Treat aa33419c6e Fallback to CPU more robustly. 12 months ago
Adam Treat 9013a089bd Bump to new llama with new bugfix. 12 months ago
Adam Treat 3076e0bf26 Only show GPU when we're actually using it. 12 months ago
Adam Treat cf4eb530ce Sync to a newer version of llama.cpp with bugfix for vulkan. 12 months ago
Adam Treat 4b9a345aee Update the submodule. 12 months ago
Aaron Miller 6f038c136b init at most one vulkan device, submodule update
fixes issues w/ multiple of the same gpu
12 months ago
Adam Treat 8f99dca70f Bring the vulkan backend to the GUI. 12 months ago
Aaron Miller f0735efa7d vulkan python bindings on windows fixes 12 months ago
Adam Treat c953b321b7 Don't link against libvulkan. 12 months ago
Aaron Miller c4d23512e4 remove extra dynamic linker deps when building with vulkan 1 year ago
Adam Treat 85e34598f9 more circleci 1 year ago
Adam Treat f578fa6cdf Fix for windows. 1 year ago
Adam Treat 17d3e4976c Add a comment indicating future work. 1 year ago
Adam Treat 987546c63b Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 1 year ago
Adam Treat d55cbbee32 Update to newer llama.cpp and disable older forks. 1 year ago
Aaron Miller 0bc2274869 bump llama.cpp version + needed fixes for that 1 year ago
aaron miller 33c22be2aa starcoder: use ggml_graph_plan 1 year ago
Cosmic Snow 108d950874 Fix Windows unable to load models on older Windows builds
- Replace high-level IsProcessorFeaturePresent
- Reintroduce low-level compiler intrinsics implementation
1 year ago
Adam Treat 6d03b3e500 Add starcoder support. 1 year ago
cosmic-snow 2d02c65177
Handle edge cases when generating embeddings (#1215)
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
1 year ago
Aaron Miller 1c4a244291 bump mem allocation a bit 1 year ago