Aaron Miller
d3ba1295a7
Metal+LLama take two ( #929 )
...
Support latest llama with Metal
---------
Co-authored-by: Adam Treat <adam@nomic.ai>
Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 16:48:46 -04:00
Adam Treat
b162b5c64e
Revert "llama on Metal ( #885 )"
...
This reverts commit c55f81b860
.
2023-06-09 15:08:46 -04:00
Aaron Miller
c55f81b860
llama on Metal ( #885 )
...
Support latest llama with Metal
---------
Co-authored-by: Adam Treat <adam@nomic.ai>
Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 14:58:12 -04:00
niansa/tuxifan
14e9ccbc6a
Do auto detection by default in C++ API
...
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 17:01:19 +02:00
niansa/tuxifan
f03da8d732
Removed double-static from variables in replit.cpp
...
The anonymous namespace already makes it static.
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 08:55:15 -04:00
niansa
0cb2b86730
Synced llama.cpp.cmake with upstream
2023-06-08 18:21:32 -04:00
Aaron Miller
47fbc0e309
non-llama: explicitly greedy sampling for temp<=0 ( #901 )
...
copied directly from llama.cpp - without this temp=0.0 will just
scale all the logits to infinity and give bad output
2023-06-08 11:08:30 -07:00
Aaron Miller
b14953e136
sampling: remove incorrect offset for n_vocab ( #900 )
...
no effect, but avoids a *potential* bug later if we use
actualVocabSize - which is for when a model has a larger
embedding tensor/# of output logits than actually trained token
to allow room for adding extras in finetuning - presently all of our
models have had "placeholder" tokens in the vocab so this hasn't broken
anything, but if the sizes did differ we want the equivalent of
`logits[actualVocabSize:]` (the start point is unchanged), not
`logits[-actualVocabSize:]` (this.)
2023-06-08 11:08:10 -07:00
Adam Treat
010a04d96f
Revert "Synced llama.cpp.cmake with upstream ( #887 )"
...
This reverts commit 89910c7ca8
.
2023-06-08 07:23:41 -04:00
Adam Treat
7e304106cc
Fix for windows.
2023-06-07 12:58:51 -04:00
niansa/tuxifan
89910c7ca8
Synced llama.cpp.cmake with upstream ( #887 )
2023-06-07 09:18:22 -07:00
Richard Guo
c4706d0c14
Replit Model ( #713 )
...
* porting over replit code model to gpt4all
* replaced memory with kv_self struct
* continuing debug
* welp it built but lot of sus things
* working model loading and somewhat working generate.. need to format response?
* revert back to semi working version
* finally got rid of weird formatting
* figured out problem is with python bindings - this is good to go for testing
* addressing PR feedback
* output refactor
* fixed prompt reponse collection
* cleanup
* addressing PR comments
* building replit backend with new ggmlver code
* chatllm replit and clean python files
* cleanup
* updated replit to match new llmodel api
* match llmodel api and change size_t to Token
* resolve PR comments
* replit model commit comment
2023-06-06 17:09:00 -04:00
Adam Treat
c5de9634c9
Fix llama models on linux and windows.
2023-06-05 14:31:15 -04:00
Adam Treat
8a9ad258f4
Fix symbol resolution on windows.
2023-06-05 11:19:02 -04:00
Adam Treat
812b2f4b29
Make installers work with mac/windows for big backend change.
2023-06-05 09:23:17 -04:00
Adam Treat
f73333c6a1
Update to latest llama.cpp
2023-06-04 19:57:34 -04:00
Adam Treat
301d2fdbea
Fix up for newer models on reset context. This fixes the model from totally failing after a reset context.
2023-06-04 19:31:20 -04:00
AT
5f95aa9fc6
We no longer have an avx_only repository and better error handling for minimum hardware requirements. ( #833 )
2023-06-04 15:28:58 -04:00
AT
bbe195ee02
Backend prompt dedup ( #822 )
...
* Deduplicated prompt() function code
2023-06-04 08:59:24 -04:00
Ikko Eltociear Ashimine
945297d837
Update README.md
...
huggingface -> Hugging Face
Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2023-06-04 08:46:37 -04:00
Peter Gagarinov
23391d44e0
Only default mlock on macOS where swap seems to be a problem
...
Repeating the change that once was done in https://github.com/nomic-ai/gpt4all/pull/663 but then was overriden by 48275d0dcc
Signed-off-by: Peter Gagarinov <pgagarinov@users.noreply.github.com>
2023-06-03 07:51:18 -04:00
niansa/tuxifan
f3564ac6b9
Fixed tons of warnings and clazy findings ( #811 )
2023-06-02 15:46:41 -04:00
niansa/tuxifan
d6a70ddb5f
Fixed model type for GPT-J ( #815 )
...
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-02 15:46:33 -04:00
Richard Guo
e709e58603
more cleanup
2023-06-02 12:32:26 -04:00
Richard Guo
98420ea6d5
cleanup
2023-06-02 12:32:26 -04:00
Richard Guo
c54c42e3fb
fixed finding model libs
2023-06-02 12:32:26 -04:00
Adam Treat
cec8831e12
Fix mac build again.
2023-06-02 10:51:09 -04:00
Adam Treat
70e3b7e907
Try and fix build on mac.
2023-06-02 10:47:12 -04:00
Adam Treat
a41bd6ac0a
Trying to shrink the copy+paste code and do more code sharing between backend model impl.
2023-06-02 07:20:59 -04:00
Tim Miller
87cb3505d3
Fix MSVC Build, Update C# Binding Scripts
2023-06-01 14:24:23 -04:00
niansa/tuxifan
27e80e1d10
Allow user to specify custom search path via $GPT4ALL_IMPLEMENTATIONS_PATH ( #789 )
2023-06-01 17:41:04 +02:00
niansa
5175db2781
Fixed double-free in LLModel::Implementation destructor
2023-06-01 11:19:08 -04:00
niansa/tuxifan
fc60f0c09c
Cleaned up implementation management ( #787 )
...
* Cleaned up implementation management
* Initialize LLModel::m_implementation to nullptr
* llmodel.h: Moved dlhandle fwd declare above LLModel class
2023-06-01 16:51:46 +02:00
Adam Treat
1eca524171
Add fixme's and clean up a bit.
2023-06-01 07:57:10 -04:00
niansa
a3d08cdcd5
Dlopen better implementation management (Version 2)
2023-06-01 07:44:15 -04:00
niansa/tuxifan
92407438c8
Advanced avxonly autodetection ( #744 )
...
* Advanced avxonly requirement detection
2023-05-31 21:26:18 -04:00
AT
48275d0dcc
Dlopen backend 5 ( #779 )
...
Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.
2023-05-31 17:04:01 -04:00
Adam Treat
7f9f91ad94
Revert "New tokenizer implementation for MPT and GPT-J"
...
This reverts commit bbcee1ced5
.
2023-05-30 12:59:00 -04:00
Adam Treat
cdc7d6ccc4
Revert "buf_ref.into() can be const now"
...
This reverts commit d59c77ac55
.
2023-05-30 12:58:53 -04:00
Adam Treat
b5edaa2656
Revert "add tokenizer readme w/ instructions for convert script"
...
This reverts commit 5063c2c1b2
.
2023-05-30 12:58:18 -04:00
aaron miller
5063c2c1b2
add tokenizer readme w/ instructions for convert script
2023-05-30 12:05:57 -04:00
Aaron Miller
d59c77ac55
buf_ref.into() can be const now
2023-05-30 12:05:57 -04:00
Aaron Miller
bbcee1ced5
New tokenizer implementation for MPT and GPT-J
...
Improves output quality by making these tokenizers more closely
match the behavior of the huggingface `tokenizers` based BPE
tokenizers these models were trained with.
Featuring:
* Fixed unicode handling (via ICU)
* Fixed BPE token merge handling
* Complete added vocabulary handling
2023-05-30 12:05:57 -04:00
Adam Treat
474c5387f9
Get the backend as well as the client building/working with msvc.
2023-05-25 15:22:45 -04:00
Adam Treat
9bfff8bfcb
Add new reverse prompt for new localdocs context feature.
2023-05-25 11:28:06 -04:00
Juuso Alasuutari
ef052aed84
llmodel: constify some casts in LLModelWrapper
2023-05-22 08:54:46 -04:00
Juuso Alasuutari
81fdc28e58
llmodel: constify LLModel::threadCount()
2023-05-22 08:54:46 -04:00
Juuso Alasuutari
08ece43f0d
llmodel: fix wrong and/or missing prompt callback type
...
Fix occurrences of the prompt callback being incorrectly specified, or
the response callback's prototype being incorrectly used in its place.
Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>
2023-05-21 16:02:11 -04:00
Adam Treat
8204c2eb80
Only default mlock on macOS where swap seems to be a problem.
2023-05-21 10:27:04 -04:00
Adam Treat
aba1147a22
Always default mlock to true.
2023-05-20 21:16:15 -04:00