Commit Graph

20 Commits

Author SHA1 Message Date
Aaron Miller
b19a3e5b2c add requiredMem method to llmodel impls
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
2023-06-26 18:27:58 -03:00
Adam Treat
bd58c46da0 Initialize these to nullptr to prevent double deletion when a model fails to load. 2023-06-20 18:23:45 -04:00
niansa/tuxifan
68f9786ed9
Use operator ""_MiB (#991) 2023-06-16 15:56:22 -04:00
Aaron Miller
88616fde7f
llmodel: change tokenToString to not use string_view (#968)
fixes a definite use-after-free and likely avoids some other
potential ones - std::string will convert to a std::string_view
automatically but as soon as the std::string in question goes out of
scope it is already freed and the string_view is pointing at freed
memory - this is *mostly* fine if its returning a reference to the
tokenizer's internal vocab table but it's, imo, too easy to return a
reference to a dynamically constructed string with this as replit is
doing (and unfortunately needs to do to convert the internal whitespace
replacement symbol back to a space)
2023-06-13 07:14:02 -04:00
Adam Treat
301d2fdbea Fix up for newer models on reset context. This fixes the model from totally failing after a reset context. 2023-06-04 19:31:20 -04:00
AT
bbe195ee02
Backend prompt dedup (#822)
* Deduplicated prompt() function code
2023-06-04 08:59:24 -04:00
niansa/tuxifan
f3564ac6b9
Fixed tons of warnings and clazy findings (#811) 2023-06-02 15:46:41 -04:00
niansa/tuxifan
d6a70ddb5f
Fixed model type for GPT-J (#815)
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-02 15:46:33 -04:00
Adam Treat
a41bd6ac0a Trying to shrink the copy+paste code and do more code sharing between backend model impl. 2023-06-02 07:20:59 -04:00
niansa
a3d08cdcd5 Dlopen better implementation management (Version 2) 2023-06-01 07:44:15 -04:00
AT
48275d0dcc
Dlopen backend 5 (#779)
Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.
2023-05-31 17:04:01 -04:00
Adam Treat
7f9f91ad94 Revert "New tokenizer implementation for MPT and GPT-J"
This reverts commit bbcee1ced5.
2023-05-30 12:59:00 -04:00
Aaron Miller
bbcee1ced5 New tokenizer implementation for MPT and GPT-J
Improves output quality by making these tokenizers more closely
match the behavior of the huggingface `tokenizers` based BPE
tokenizers these models were trained with.

Featuring:
 * Fixed unicode handling (via ICU)
 * Fixed BPE token merge handling
 * Complete added vocabulary handling
2023-05-30 12:05:57 -04:00
Adam Treat
9bfff8bfcb Add new reverse prompt for new localdocs context feature. 2023-05-25 11:28:06 -04:00
Juuso Alasuutari
81fdc28e58 llmodel: constify LLModel::threadCount() 2023-05-22 08:54:46 -04:00
aaron miller
e6fd0a240d backend: fix buffer overrun in repeat penalty code
Caught with AddressSanitizer running a basic prompt test against llmodel
standalone. This fix allows ASan builds to complete a simple prompt
without illegal accesses but there are still notably several leaks.
2023-05-17 07:54:10 -04:00
kuvaus
507e913faf
gpt4all-backend: Add MSVC support to backend (#595)
* Add MSVC compatibility

* Add _MSC_VER macro

---------

Co-authored-by: kuvaus <kuvaus@users.noreply.github.com>
2023-05-16 11:35:33 -04:00
Aaron Miller
d14936bfd6 backend: dedupe tokenizing code in mpt/gptj 2023-05-16 10:30:19 -04:00
Aaron Miller
4cd8bdf9a1 backend: make initial buf_size const in model impls
more unifying mpt and gptj code - this one's never written so also
changing the name to be clearer
2023-05-16 10:30:19 -04:00
Adam Treat
d918b02c29 Move the llmodel C API to new top-level directory and version it. 2023-05-10 11:46:40 -04:00