Commit Graph

1827 Commits (1272b694ae0aec987a8f260b836261221ae6d8b1)
 

Author SHA1 Message Date
Jared Van Bortel d4ce9f4a7c
llmodel_c: improve quality of error messages (#1625) 11 months ago
aj-gameon 8fabf0be4a
Updated readme for correct install instructions (#1607)
Co-authored-by: aj-gameon <aj@gameontechnology.com>
11 months ago
Jacob Nguyen 45d76d6234
ts/tooling (#1602) 11 months ago
Jacob Nguyen da95bcfb4b
vulkan support for typescript bindings, gguf support (#1390)
* adding some native methods to cpp wrapper

* gpu seems to work

* typings and add availibleGpus method

* fix spelling

* fix syntax

* more

* normalize methods to conform to py

* remove extra dynamic linker deps when building with vulkan

* bump python version (library linking fix)

* Don't link against libvulkan.

* vulkan python bindings on windows fixes

* Bring the vulkan backend to the GUI.

* When device is Auto (the default) then we will only consider discrete GPU's otherwise fallback to CPU.

* Show the device we're currently using.

* Fix up the name and formatting.

* init at most one vulkan device, submodule update

fixes issues w/ multiple of the same gpu

* Update the submodule.

* Add version 2.4.15 and bump the version number.

* Fix a bug where we're not properly falling back to CPU.

* Sync to a newer version of llama.cpp with bugfix for vulkan.

* Report the actual device we're using.

* Only show GPU when we're actually using it.

* Bump to new llama with new bugfix.

* Release notes for v2.4.16 and bump the version.

* Fallback to CPU more robustly.

* Release notes for v2.4.17 and bump the version.

* Bump the Python version to python-v1.0.12 to restrict the quants that vulkan recognizes.

* Link against ggml in bin so we can get the available devices without loading a model.

* Send actual and requested device info for those who have opt-in.

* Actually bump the version.

* Release notes for v2.4.18 and bump the version.

* Fix for crashes on systems where vulkan is not installed properly.

* Release notes for v2.4.19 and bump the version.

* fix typings and vulkan build works on win

* Add flatpak manifest

* Remove unnecessary stuffs from manifest

* Update to 2.4.19

* appdata: update software description

* Latest rebase on llama.cpp with gguf support.

* macos build fixes

* llamamodel: metal supports all quantization types now

* gpt4all.py: GGUF

* pyllmodel: print specific error message

* backend: port BERT to GGUF

* backend: port MPT to GGUF

* backend: port Replit to GGUF

* backend: use gguf branch of llama.cpp-mainline

* backend: use llamamodel.cpp for StarCoder

* conversion scripts: cleanup

* convert scripts: load model as late as possible

* convert_mpt_hf_to_gguf.py: better tokenizer decoding

* backend: use llamamodel.cpp for Falcon

* convert scripts: make them directly executable

* fix references to removed model types

* modellist: fix the system prompt

* backend: port GPT-J to GGUF

* gpt-j: update inference to match latest llama.cpp insights

- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78

* chatllm: grammar fix

* convert scripts: use bytes_to_unicode from transformers

* convert scripts: make gptj script executable

* convert scripts: add feed-forward length for better compatiblilty

This GGUF key is used by all llama.cpp models with upstream support.

* gptj: remove unused variables

* Refactor for subgroups on mat * vec kernel.

* Add q6_k kernels for vulkan.

* python binding: print debug message to stderr

* Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf.

* Bump to the latest fixes for vulkan in llama.

* llamamodel: fix static vector in LLamaModel::endTokens

* Switch to new models2.json for new gguf release and bump our version to
2.5.0.

* Bump to latest llama/gguf branch.

* chat: report reason for fallback to CPU

* chat: make sure to clear fallback reason on success

* more accurate fallback descriptions

* differentiate between init failure and unsupported models

* backend: do not use Vulkan with non-LLaMA models

* Add q8_0 kernels to kompute shaders and bump to latest llama/gguf.

* backend: fix build with Visual Studio generator

Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This
is needed because Visual Studio is a multi-configuration generator, so
we do not know what the build type will be until `cmake --build` is
called.

Fixes #1470

* remove old llama.cpp submodules

* Reorder and refresh our models2.json.

* rebase on newer llama.cpp

* python/embed4all: use gguf model, allow passing kwargs/overriding model

* Add starcoder, rift and sbert to our models2.json.

* Push a new version number for llmodel backend now that it is based on gguf.

* fix stray comma in models2.json

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>

* Speculative fix for build on mac.

* chat: clearer CPU fallback messages

* Fix crasher with an empty string for prompt template.

* Update the language here to avoid misunderstanding.

* added EM German Mistral Model

* make codespell happy

* issue template: remove "Related Components" section

* cmake: install the GPT-J plugin (#1487)

* Do not delete saved chats if we fail to serialize properly.

* Restore state from text if necessary.

* Another codespell attempted fix.

* llmodel: do not call magic_match unless build variant is correct (#1488)

* chatllm: do not write uninitialized data to stream (#1486)

* mat*mat for q4_0, q8_0

* do not process prompts on gpu yet

* python: support Path in GPT4All.__init__ (#1462)

* llmodel: print an error if the CPU does not support AVX (#1499)

* python bindings should be quiet by default

* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
  nonempty
* make verbose flag for retrieve_model default false (but also be
  overridable via gpt4all constructor)

should be able to run a basic test:

```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```

and see no non-model output when successful

* python: always check status code of HTTP responses (#1502)

* Always save chats to disk, but save them as text by default. This also changes
the UI behavior to always open a 'New Chat' and setting it as current instead
of setting a restored chat as current. This improves usability by not requiring
the user to wait if they want to immediately start chatting.

* Update README.md

Signed-off-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com>

* fix embed4all filename

https://discordapp.com/channels/1076964370942267462/1093558720690143283/1161778216462192692

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>

* Improves Java API signatures maintaining back compatibility

* python: replace deprecated pkg_resources with importlib (#1505)

* Updated chat wishlist (#1351)

* q6k, q4_1 mat*mat

* update mini-orca 3b to gguf2, license

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>

* convert scripts: fix AutoConfig typo (#1512)

* publish config https://docs.npmjs.com/cli/v9/configuring-npm/package-json#publishconfig (#1375)

merge into my branch

* fix appendBin

* fix gpu not initializing first

* sync up

* progress, still wip on destructor

* some detection work

* untested dispose method

* add js side of dispose

* Update gpt4all-bindings/typescript/index.cc

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>

* Update gpt4all-bindings/typescript/index.cc

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>

* Update gpt4all-bindings/typescript/index.cc

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>

* Update gpt4all-bindings/typescript/src/gpt4all.d.ts

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>

* Update gpt4all-bindings/typescript/src/gpt4all.js

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>

* Update gpt4all-bindings/typescript/src/util.js

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>

* fix tests

* fix circleci for nodejs

* bump version

---------

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
Signed-off-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
Co-authored-by: Aaron Miller <apage43@ninjawhale.com>
Co-authored-by: Adam Treat <treat.adam@gmail.com>
Co-authored-by: Akarshan Biswas <akarshan.biswas@gmail.com>
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com>
Co-authored-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com>
Co-authored-by: Alex Soto <asotobu@gmail.com>
Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>
11 months ago
cebtenzzre 64101d3af5 update llama.cpp-mainline 11 months ago
cebtenzzre 3c561bcdf2 python: bump bindings version for AMD fixes 11 months ago
Adam Treat ffef60912f Update to llama.cpp 11 months ago
Adam Treat bc88271520 Bump version to v2.5.3 and release notes. 11 months ago
cebtenzzre 5508e43466 build_and_run: clarify which additional Qt libs are needed
Signed-off-by: cebtenzzre <cebtenzzre@gmail.com>
11 months ago
cebtenzzre 79a5522931 fix references to old backend implementations 11 months ago
Adam Treat f529d55380 Move this logic to QML. 11 months ago
Adam Treat f5f22fdbd0 Update llama.cpp for latest bugfixes. 11 months ago
Adam Treat 5c0d077f74 Remove leading whitespace in responses. 11 months ago
Adam Treat 131cfcdeae Don't regenerate the name for deserialized chats. 11 months ago
Adam Treat dc2e7d6e9b Don't start recalculating context immediately upon switching to a new chat
but rather wait until the first prompt. This allows users to switch between
chats fast and to delete chats more easily.

Fixes issue #1545
11 months ago
cebtenzzre 7bcd9e8089 update llama.cpp-mainline 11 months ago
cebtenzzre fd0c501d68
backend: support GGUFv3 (#1582) 11 months ago
Adam Treat 14b410a12a Update to latest version of llama.cpp which fixes issue 1507. 11 months ago
Adam Treat ab96035bec Update to llama.cpp submodule for some vulkan fixes. 11 months ago
Aaron Miller 9193a9517a
make codespell happy again (#1574)
* make codespell happy again

* no belong

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>

---------

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
11 months ago
cebtenzzre 8d7a3f26d3 gpt4all-training: delete old chat executables
Signed-off-by: cebtenzzre <cebtenzzre@gmail.com>
11 months ago
Andriy Mulyar 3444a47cad
Update README.md
Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com>
11 months ago
Adam Treat 89a59e7f99 Bump version and add release notes for 2.5.1 11 months ago
cebtenzzre f5dd74bcf0
models2.json: add tokenizer merges to mpt-7b-chat model (#1563) 11 months ago
cebtenzzre 78d930516d
app.py: change default model to Mistral Instruct (#1564) 11 months ago
cebtenzzre 83b8eea611 README: add clear note about new GGUF format
Signed-off-by: cebtenzzre <cebtenzzre@gmail.com>
11 months ago
Andriy Mulyar 1bebe78c56
Update README.md
Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com>
11 months ago
Andriy Mulyar b75a209374
Update README.md
Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com>
11 months ago
cebtenzzre e90263c23f
make scripts executable (#1555) 11 months ago
Aaron Miller f414c28589 llmodel: whitelist library name patterns
this fixes some issues that were being seen on installed windows builds of 2.5.0

only load dlls that actually might be model impl dlls, otherwise we pull all sorts of random junk into the process before it might expect to be

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
11 months ago
cebtenzzre 7e5e84fbb7
python: change default extension to .gguf (#1559) 11 months ago
cebtenzzre 37b007603a
bindings: replace references to GGMLv3 models with GGUF (#1547) 11 months ago
cebtenzzre c25dc51935 chat: fix syntax error in main.qml 11 months ago
Thomas 34daf240f9
Update Dockerfile.buildkit (#1542)
corrected model download directory

Signed-off-by: Thomas <tvhdev@vonhaugwitz-softwaresolutions.de>
11 months ago
Victor Tsaran 721d854095
chat: improve accessibility fields (#1532)
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
11 months ago
Andriy Mulyar d50803ff8e
GGUF Python Release (#1539) 11 months ago
Adam Treat 9e99cf937a Add release notes for 2.5.0 and bump the version. 11 months ago
cebtenzzre 245c5ce5ea
update default model URLs (#1538) 11 months ago
cebtenzzre 4338e72a51
MPT: use upstream llama.cpp implementation (#1515) 11 months ago
cebtenzzre 0fe2e19691
llamamodel: re-enable error messages by default (#1537) 11 months ago
cebtenzzre f505619c84
README: remove star history (#1536) 11 months ago
cebtenzzre 5fbeeb1cb4
python: connection resume and MSVC support (#1535) 11 months ago
cebtenzzre 017c3a9649
python: prepare version 2.0.0rc1 (#1529) 11 months ago
cebtenzzre bcbcad98d0
CI: increase minimum macOS version of Python bindings to 10.15 (#1511) 11 months ago
cebtenzzre fd3014016b
docs: clarify Vulkan dep in build instructions for bindings (#1525) 11 months ago
cebtenzzre ac33bafb91
docs: improve build_and_run.md (#1524) 11 months ago
cebtenzzre 9a19c740ee
kompute: fix library loading issues with kp_logger (#1517) 11 months ago
Aaron Miller f79557d2aa speedup: just use mat*vec shaders for mat*mat
so far my from-scratch mat*mats are still slower than just running more
invocations of the existing Metal ported mat*vec shaders - it should be
theoretically possible to make a mat*mat that's faster (for actual
mat*mat cases) than an optimal mat*vec, but it will need to be at
*least* as fast as the mat*vec op and then take special care to be
cache-friendly and save memory bandwidth, as the # of compute ops is the
same
11 months ago
cebtenzzre 22de3c56bd
convert scripts: fix AutoConfig typo (#1512) 11 months ago
Aaron Miller 10f9b49313 update mini-orca 3b to gguf2, license
Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
11 months ago