Commit Graph

56 Commits (0cc5a806563a9264b7c7751f4ca50e2f700c0847)

Author SHA1 Message Date
Adam Treat f720261d46 Fix another vulnerable spot for crashes.
Signed-off-by: Adam Treat <treat.adam@gmail.com>
4 months ago
chrisbarrera f8b1069a1c
add min_p sampling parameter (#2014)
Signed-off-by: Christopher Barrera <cb@arda.tx.rr.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
4 months ago
Jared Van Bortel e7f2ff189f fix some compilation warnings on macOS
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
4 months ago
Jared Van Bortel 4fc4d94be4
fix chat-style prompt templates (#1970)
Also use a new version of Mistral OpenOrca.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
4 months ago
Jared Van Bortel 7810b757c9 llamamodel: add gemma model support
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
4 months ago
Adam Treat d948a4f2ee Complete revamp of model loading to allow for more discreet control by
the user of the models loading behavior.

Signed-off-by: Adam Treat <treat.adam@gmail.com>
4 months ago
Jared Van Bortel fc7e5f4a09
ci: fix missing Kompute support in python bindings (#1953)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel bf493bb048
Mixtral crash fix and python bindings v2.2.0 (#1931)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 92c025a7f6
llamamodel: add 12 new architectures for CPU inference (#1914)
Baichuan, BLOOM, CodeShell, GPT-2, Orion, Persimmon, Phi and Phi-2,
Plamo, Qwen, Qwen2, Refact, StableLM

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 10e3f7bbf5
Fix VRAM leak when model loading fails (#1901)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 061d1969f8
expose n_gpu_layers parameter of llama.cpp (#1890)
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel 38c61493d2 backend: update to latest commit of llama.cpp Vulkan PR
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
5 months ago
Jared Van Bortel b7c92c5afd
sync llama.cpp with latest Vulkan PR and newer upstream (#1819) 6 months ago
AT 96cee4f9ac
Explicitly clear the kv cache each time we eval tokens to match n_past. (#1808) 6 months ago
Jared Van Bortel d1c56b8b28
Implement configurable context length (#1749) 7 months ago
Jared Van Bortel 0600f551b3
chatllm: do not attempt to serialize incompatible state (#1742) 7 months ago
Jared Van Bortel 9e28dfac9c
Update to latest llama.cpp (#1706) 7 months ago
Jared Van Bortel d4ce9f4a7c
llmodel_c: improve quality of error messages (#1625) 8 months ago
cebtenzzre fd0c501d68
backend: support GGUFv3 (#1582) 8 months ago
cebtenzzre 4338e72a51
MPT: use upstream llama.cpp implementation (#1515) 9 months ago
cebtenzzre 0fe2e19691
llamamodel: re-enable error messages by default (#1537) 9 months ago
Aaron Miller afaa291eab python bindings should be quiet by default
* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
  nonempty
* make verbose flag for retrieve_model default false (but also be
  overridable via gpt4all constructor)

should be able to run a basic test:

```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```

and see no non-model output when successful
9 months ago
Cebtenzzre 5fe685427a chat: clearer CPU fallback messages 9 months ago
Cebtenzzre d87573ea75 remove old llama.cpp submodules 9 months ago
Cebtenzzre 672cb850f9 differentiate between init failure and unsupported models 9 months ago
Cebtenzzre 088afada49 llamamodel: fix static vector in LLamaModel::endTokens 9 months ago
Adam Treat 12f943e966 Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. 9 months ago
Cebtenzzre ce7be1db48 backend: use llamamodel.cpp for Falcon 9 months ago
Cebtenzzre 6277eac9cc backend: use llamamodel.cpp for StarCoder 9 months ago
Cebtenzzre 1d29e4696c llamamodel: metal supports all quantization types now 9 months ago
Adam Treat d90d003a1d Latest rebase on llama.cpp with gguf support. 9 months ago
Adam Treat aa33419c6e Fallback to CPU more robustly. 10 months ago
Adam Treat 3076e0bf26 Only show GPU when we're actually using it. 10 months ago
Adam Treat 987546c63b Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 10 months ago
Aaron Miller 198b5e4832 add Falcon 7B model
Tested with https://huggingface.co/TheBloke/falcon-7b-instruct-GGML/blob/main/falcon7b-instruct.ggmlv3.q4_0.bin
1 year ago
Aaron Miller db34a2f670 llmodel: skip attempting Metal if model+kvcache > 53% of system ram 1 year ago
Aaron Miller b19a3e5b2c add requiredMem method to llmodel impls
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
1 year ago
Aaron Miller 88616fde7f
llmodel: change tokenToString to not use string_view (#968)
fixes a definite use-after-free and likely avoids some other
potential ones - std::string will convert to a std::string_view
automatically but as soon as the std::string in question goes out of
scope it is already freed and the string_view is pointing at freed
memory - this is *mostly* fine if its returning a reference to the
tokenizer's internal vocab table but it's, imo, too easy to return a
reference to a dynamically constructed string with this as replit is
doing (and unfortunately needs to do to convert the internal whitespace
replacement symbol back to a space)
1 year ago
Adam Treat b906fb4057 When recalculating context we can't erase the BOS. 1 year ago
Aaron Miller d3ba1295a7
Metal+LLama take two (#929)
Support latest llama with Metal
---------

Co-authored-by: Adam Treat <adam@nomic.ai>
Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>
1 year ago
Adam Treat b162b5c64e Revert "llama on Metal (#885)"
This reverts commit c55f81b860.
1 year ago
Aaron Miller c55f81b860
llama on Metal (#885)
Support latest llama with Metal

---------

Co-authored-by: Adam Treat <adam@nomic.ai>
Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>
1 year ago
Adam Treat 301d2fdbea Fix up for newer models on reset context. This fixes the model from totally failing after a reset context. 1 year ago
AT bbe195ee02
Backend prompt dedup (#822)
* Deduplicated prompt() function code
1 year ago
Peter Gagarinov 23391d44e0 Only default mlock on macOS where swap seems to be a problem
Repeating the change that once was done in https://github.com/nomic-ai/gpt4all/pull/663 but then was overriden by 48275d0dcc

Signed-off-by: Peter Gagarinov <pgagarinov@users.noreply.github.com>
1 year ago
niansa/tuxifan f3564ac6b9
Fixed tons of warnings and clazy findings (#811) 1 year ago
Adam Treat a41bd6ac0a Trying to shrink the copy+paste code and do more code sharing between backend model impl. 1 year ago
niansa a3d08cdcd5 Dlopen better implementation management (Version 2) 1 year ago
AT 48275d0dcc
Dlopen backend 5 (#779)
Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.
1 year ago
Adam Treat 9bfff8bfcb Add new reverse prompt for new localdocs context feature. 1 year ago