Commit Graph

32 Commits

Author SHA1 Message Date
Jared Van Bortel
061d1969f8
expose n_gpu_layers parameter of llama.cpp (#1890)
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 14:17:44 -05:00
Jared Van Bortel
38c61493d2 backend: update to latest commit of llama.cpp Vulkan PR
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 15:47:26 -06:00
Jared Van Bortel
d1c56b8b28
Implement configurable context length (#1749) 2023-12-16 17:58:15 -05:00
Jared Van Bortel
dfd8ef0186
backend: use ggml_new_graph for GGML backend v2 (#1719) 2023-12-06 14:38:53 -05:00
Jared Van Bortel
9e28dfac9c
Update to latest llama.cpp (#1706) 2023-12-01 16:51:15 -05:00
cebtenzzre
fd0c501d68
backend: support GGUFv3 (#1582) 2023-10-27 17:07:23 -04:00
Cebtenzzre
3c2aa299d8 gptj: remove unused variables 2023-10-05 18:16:19 -04:00
Cebtenzzre
d5d72f0361 gpt-j: update inference to match latest llama.cpp insights
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78
2023-10-05 18:16:19 -04:00
Cebtenzzre
050e7f076e backend: port GPT-J to GGUF 2023-10-05 18:16:19 -04:00
Adam Treat
6d03b3e500 Add starcoder support. 2023-07-27 09:15:16 -04:00
Aaron Miller
40a3faeb05
Use ggml scratch bufs for mpt and gptj models (#1104)
* backend/gptj: use scratch buffers

reduces total memory required and makes eval buf not grow with n_past

* backend/mpt: use scratch bufs

* fix format-related compile warnings
2023-06-30 10:53:45 -07:00
Aaron Miller
8d19ef3909
backend: factor out common elements in model code (#1089)
* backend: factor out common structs in model code

prepping to hack on these by hopefully making there be fewer places to fix the same bug

rename

* use common buffer wrapper instead of manual malloc

* fix replit compile warnings
2023-06-28 17:35:07 -07:00
Aaron Miller
b19a3e5b2c add requiredMem method to llmodel impls
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
2023-06-26 18:27:58 -03:00
Adam Treat
bd58c46da0 Initialize these to nullptr to prevent double deletion when a model fails to load. 2023-06-20 18:23:45 -04:00
niansa/tuxifan
68f9786ed9
Use operator ""_MiB (#991) 2023-06-16 15:56:22 -04:00
Aaron Miller
88616fde7f
llmodel: change tokenToString to not use string_view (#968)
fixes a definite use-after-free and likely avoids some other
potential ones - std::string will convert to a std::string_view
automatically but as soon as the std::string in question goes out of
scope it is already freed and the string_view is pointing at freed
memory - this is *mostly* fine if its returning a reference to the
tokenizer's internal vocab table but it's, imo, too easy to return a
reference to a dynamically constructed string with this as replit is
doing (and unfortunately needs to do to convert the internal whitespace
replacement symbol back to a space)
2023-06-13 07:14:02 -04:00
Adam Treat
301d2fdbea Fix up for newer models on reset context. This fixes the model from totally failing after a reset context. 2023-06-04 19:31:20 -04:00
AT
bbe195ee02
Backend prompt dedup (#822)
* Deduplicated prompt() function code
2023-06-04 08:59:24 -04:00
niansa/tuxifan
f3564ac6b9
Fixed tons of warnings and clazy findings (#811) 2023-06-02 15:46:41 -04:00
niansa/tuxifan
d6a70ddb5f
Fixed model type for GPT-J (#815)
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-02 15:46:33 -04:00
Adam Treat
a41bd6ac0a Trying to shrink the copy+paste code and do more code sharing between backend model impl. 2023-06-02 07:20:59 -04:00
niansa
a3d08cdcd5 Dlopen better implementation management (Version 2) 2023-06-01 07:44:15 -04:00
AT
48275d0dcc
Dlopen backend 5 (#779)
Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.
2023-05-31 17:04:01 -04:00
Adam Treat
7f9f91ad94 Revert "New tokenizer implementation for MPT and GPT-J"
This reverts commit bbcee1ced5.
2023-05-30 12:59:00 -04:00
Aaron Miller
bbcee1ced5 New tokenizer implementation for MPT and GPT-J
Improves output quality by making these tokenizers more closely
match the behavior of the huggingface `tokenizers` based BPE
tokenizers these models were trained with.

Featuring:
 * Fixed unicode handling (via ICU)
 * Fixed BPE token merge handling
 * Complete added vocabulary handling
2023-05-30 12:05:57 -04:00
Adam Treat
9bfff8bfcb Add new reverse prompt for new localdocs context feature. 2023-05-25 11:28:06 -04:00
Juuso Alasuutari
81fdc28e58 llmodel: constify LLModel::threadCount() 2023-05-22 08:54:46 -04:00
aaron miller
e6fd0a240d backend: fix buffer overrun in repeat penalty code
Caught with AddressSanitizer running a basic prompt test against llmodel
standalone. This fix allows ASan builds to complete a simple prompt
without illegal accesses but there are still notably several leaks.
2023-05-17 07:54:10 -04:00
kuvaus
507e913faf
gpt4all-backend: Add MSVC support to backend (#595)
* Add MSVC compatibility

* Add _MSC_VER macro

---------

Co-authored-by: kuvaus <kuvaus@users.noreply.github.com>
2023-05-16 11:35:33 -04:00
Aaron Miller
d14936bfd6 backend: dedupe tokenizing code in mpt/gptj 2023-05-16 10:30:19 -04:00
Aaron Miller
4cd8bdf9a1 backend: make initial buf_size const in model impls
more unifying mpt and gptj code - this one's never written so also
changing the name to be clearer
2023-05-16 10:30:19 -04:00
Adam Treat
d918b02c29 Move the llmodel C API to new top-level directory and version it. 2023-05-10 11:46:40 -04:00