Commit Graph

62 Commits

Author SHA1 Message Date
Aaron Miller
d3ba1295a7
Metal+LLama take two (#929)
Support latest llama with Metal
---------

Co-authored-by: Adam Treat <adam@nomic.ai>
Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 16:48:46 -04:00
Adam Treat
b162b5c64e Revert "llama on Metal (#885)"
This reverts commit c55f81b860.
2023-06-09 15:08:46 -04:00
Aaron Miller
c55f81b860
llama on Metal (#885)
Support latest llama with Metal

---------

Co-authored-by: Adam Treat <adam@nomic.ai>
Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 14:58:12 -04:00
niansa/tuxifan
14e9ccbc6a Do auto detection by default in C++ API
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 17:01:19 +02:00
niansa/tuxifan
f03da8d732 Removed double-static from variables in replit.cpp
The anonymous namespace already makes it static.

Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 08:55:15 -04:00
niansa
0cb2b86730 Synced llama.cpp.cmake with upstream 2023-06-08 18:21:32 -04:00
Aaron Miller
47fbc0e309
non-llama: explicitly greedy sampling for temp<=0 (#901)
copied directly from llama.cpp - without this temp=0.0 will just
scale all the logits to infinity and give bad output
2023-06-08 11:08:30 -07:00
Aaron Miller
b14953e136
sampling: remove incorrect offset for n_vocab (#900)
no effect, but avoids a *potential* bug later if we use
actualVocabSize - which is for when a model has a larger
embedding tensor/# of output logits than actually trained token
to allow room for adding extras in finetuning - presently all of our
models have had "placeholder" tokens in the vocab so this hasn't broken
anything, but if the sizes did differ we want the equivalent of
`logits[actualVocabSize:]` (the start point is unchanged), not
`logits[-actualVocabSize:]` (this.)
2023-06-08 11:08:10 -07:00
Adam Treat
010a04d96f Revert "Synced llama.cpp.cmake with upstream (#887)"
This reverts commit 89910c7ca8.
2023-06-08 07:23:41 -04:00
Adam Treat
7e304106cc Fix for windows. 2023-06-07 12:58:51 -04:00
niansa/tuxifan
89910c7ca8
Synced llama.cpp.cmake with upstream (#887) 2023-06-07 09:18:22 -07:00
Richard Guo
c4706d0c14
Replit Model (#713)
* porting over replit code model to gpt4all

* replaced memory with kv_self struct

* continuing debug

* welp it built but lot of sus things

* working model loading and somewhat working generate.. need to format response?

* revert back to semi working version

* finally got rid of weird formatting

* figured out problem is with python bindings - this is good to go for testing

* addressing PR feedback

* output refactor

* fixed prompt reponse collection

* cleanup

* addressing PR comments

* building replit backend with new ggmlver code

* chatllm replit and clean python files

* cleanup

* updated replit to match new llmodel api

* match llmodel api and change size_t to Token

* resolve PR comments

* replit model commit comment
2023-06-06 17:09:00 -04:00
Adam Treat
c5de9634c9 Fix llama models on linux and windows. 2023-06-05 14:31:15 -04:00
Adam Treat
8a9ad258f4 Fix symbol resolution on windows. 2023-06-05 11:19:02 -04:00
Adam Treat
812b2f4b29 Make installers work with mac/windows for big backend change. 2023-06-05 09:23:17 -04:00
Adam Treat
f73333c6a1 Update to latest llama.cpp 2023-06-04 19:57:34 -04:00
Adam Treat
301d2fdbea Fix up for newer models on reset context. This fixes the model from totally failing after a reset context. 2023-06-04 19:31:20 -04:00
AT
5f95aa9fc6
We no longer have an avx_only repository and better error handling for minimum hardware requirements. (#833) 2023-06-04 15:28:58 -04:00
AT
bbe195ee02
Backend prompt dedup (#822)
* Deduplicated prompt() function code
2023-06-04 08:59:24 -04:00
Ikko Eltociear Ashimine
945297d837 Update README.md
huggingface -> Hugging Face

Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2023-06-04 08:46:37 -04:00
Peter Gagarinov
23391d44e0 Only default mlock on macOS where swap seems to be a problem
Repeating the change that once was done in https://github.com/nomic-ai/gpt4all/pull/663 but then was overriden by 48275d0dcc

Signed-off-by: Peter Gagarinov <pgagarinov@users.noreply.github.com>
2023-06-03 07:51:18 -04:00
niansa/tuxifan
f3564ac6b9
Fixed tons of warnings and clazy findings (#811) 2023-06-02 15:46:41 -04:00
niansa/tuxifan
d6a70ddb5f
Fixed model type for GPT-J (#815)
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-02 15:46:33 -04:00
Richard Guo
e709e58603 more cleanup 2023-06-02 12:32:26 -04:00
Richard Guo
98420ea6d5 cleanup 2023-06-02 12:32:26 -04:00
Richard Guo
c54c42e3fb fixed finding model libs 2023-06-02 12:32:26 -04:00
Adam Treat
cec8831e12 Fix mac build again. 2023-06-02 10:51:09 -04:00
Adam Treat
70e3b7e907 Try and fix build on mac. 2023-06-02 10:47:12 -04:00
Adam Treat
a41bd6ac0a Trying to shrink the copy+paste code and do more code sharing between backend model impl. 2023-06-02 07:20:59 -04:00
Tim Miller
87cb3505d3 Fix MSVC Build, Update C# Binding Scripts 2023-06-01 14:24:23 -04:00
niansa/tuxifan
27e80e1d10
Allow user to specify custom search path via $GPT4ALL_IMPLEMENTATIONS_PATH (#789) 2023-06-01 17:41:04 +02:00
niansa
5175db2781 Fixed double-free in LLModel::Implementation destructor 2023-06-01 11:19:08 -04:00
niansa/tuxifan
fc60f0c09c
Cleaned up implementation management (#787)
* Cleaned up implementation management

* Initialize LLModel::m_implementation to nullptr

* llmodel.h: Moved dlhandle fwd declare above LLModel class
2023-06-01 16:51:46 +02:00
Adam Treat
1eca524171 Add fixme's and clean up a bit. 2023-06-01 07:57:10 -04:00
niansa
a3d08cdcd5 Dlopen better implementation management (Version 2) 2023-06-01 07:44:15 -04:00
niansa/tuxifan
92407438c8
Advanced avxonly autodetection (#744)
* Advanced avxonly requirement detection
2023-05-31 21:26:18 -04:00
AT
48275d0dcc
Dlopen backend 5 (#779)
Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.
2023-05-31 17:04:01 -04:00
Adam Treat
7f9f91ad94 Revert "New tokenizer implementation for MPT and GPT-J"
This reverts commit bbcee1ced5.
2023-05-30 12:59:00 -04:00
Adam Treat
cdc7d6ccc4 Revert "buf_ref.into() can be const now"
This reverts commit d59c77ac55.
2023-05-30 12:58:53 -04:00
Adam Treat
b5edaa2656 Revert "add tokenizer readme w/ instructions for convert script"
This reverts commit 5063c2c1b2.
2023-05-30 12:58:18 -04:00
aaron miller
5063c2c1b2 add tokenizer readme w/ instructions for convert script 2023-05-30 12:05:57 -04:00
Aaron Miller
d59c77ac55 buf_ref.into() can be const now 2023-05-30 12:05:57 -04:00
Aaron Miller
bbcee1ced5 New tokenizer implementation for MPT and GPT-J
Improves output quality by making these tokenizers more closely
match the behavior of the huggingface `tokenizers` based BPE
tokenizers these models were trained with.

Featuring:
 * Fixed unicode handling (via ICU)
 * Fixed BPE token merge handling
 * Complete added vocabulary handling
2023-05-30 12:05:57 -04:00
Adam Treat
474c5387f9 Get the backend as well as the client building/working with msvc. 2023-05-25 15:22:45 -04:00
Adam Treat
9bfff8bfcb Add new reverse prompt for new localdocs context feature. 2023-05-25 11:28:06 -04:00
Juuso Alasuutari
ef052aed84 llmodel: constify some casts in LLModelWrapper 2023-05-22 08:54:46 -04:00
Juuso Alasuutari
81fdc28e58 llmodel: constify LLModel::threadCount() 2023-05-22 08:54:46 -04:00
Juuso Alasuutari
08ece43f0d llmodel: fix wrong and/or missing prompt callback type
Fix occurrences of the prompt callback being incorrectly specified, or
the response callback's prototype being incorrectly used in its place.

Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>
2023-05-21 16:02:11 -04:00
Adam Treat
8204c2eb80 Only default mlock on macOS where swap seems to be a problem. 2023-05-21 10:27:04 -04:00
Adam Treat
aba1147a22 Always default mlock to true. 2023-05-20 21:16:15 -04:00