Commit Graph

124 Commits (0f046cf905067219b4030800beee778c98eae007)

Author SHA1 Message Date
Adam Treat aa33419c6e Fallback to CPU more robustly. 10 months ago
Adam Treat 9013a089bd Bump to new llama with new bugfix. 10 months ago
Adam Treat 3076e0bf26 Only show GPU when we're actually using it. 10 months ago
Adam Treat cf4eb530ce Sync to a newer version of llama.cpp with bugfix for vulkan. 10 months ago
Adam Treat 4b9a345aee Update the submodule. 10 months ago
Aaron Miller 6f038c136b init at most one vulkan device, submodule update
fixes issues w/ multiple of the same gpu
10 months ago
Adam Treat 8f99dca70f Bring the vulkan backend to the GUI. 10 months ago
Aaron Miller f0735efa7d vulkan python bindings on windows fixes 10 months ago
Adam Treat c953b321b7 Don't link against libvulkan. 10 months ago
Aaron Miller c4d23512e4 remove extra dynamic linker deps when building with vulkan 10 months ago
Adam Treat 85e34598f9 more circleci 10 months ago
Adam Treat f578fa6cdf Fix for windows. 10 months ago
Adam Treat 17d3e4976c Add a comment indicating future work. 10 months ago
Adam Treat 987546c63b Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 10 months ago
Adam Treat d55cbbee32 Update to newer llama.cpp and disable older forks. 10 months ago
Aaron Miller 0bc2274869 bump llama.cpp version + needed fixes for that 10 months ago
aaron miller 33c22be2aa starcoder: use ggml_graph_plan 10 months ago
Cosmic Snow 108d950874 Fix Windows unable to load models on older Windows builds
- Replace high-level IsProcessorFeaturePresent
- Reintroduce low-level compiler intrinsics implementation
11 months ago
Adam Treat 6d03b3e500 Add starcoder support. 12 months ago
cosmic-snow 2d02c65177
Handle edge cases when generating embeddings (#1215)
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
12 months ago
Aaron Miller 1c4a244291 bump mem allocation a bit 1 year ago
Adam Treat ee4186d579 Fixup bert python bindings. 1 year ago
cosmic-snow 6200900677
Fix Windows MSVC arch detection (#1194)
- in llmodel.cpp to fix AVX-only handling

Signed-off-by: cosmic-snow <134004613+cosmic-snow@users.noreply.github.com>
1 year ago
Adam Treat 4963db8f43 Bump the version numbers for both python and c backend. 1 year ago
Adam Treat 0efdbfcffe Bert 1 year ago
Adam Treat 315a1f2aa2 Move it back as internal class. 1 year ago
Adam Treat ae8eb297ac Add sbert backend. 1 year ago
Adam Treat 1f749d7633 Clean up backend code a bit and hide impl. details. 1 year ago
Adam Treat 33557b1f39 Move the implementation out of llmodel class. 1 year ago
Aaron Miller 432b7ebbd7 include windows.h just to be safe 1 year ago
Aaron Miller 95b8fb312e windows/msvc: use high level processor feature detection API
see https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-isprocessorfeaturepresent
1 year ago
Aaron Miller f0faa23ad5
cmakelists: always export build commands (#1179)
friendly for using editors with clangd integration that don't also
manage the build themselves
1 year ago
Aaron Miller 4a24b586df llama.cpp: metal buffer freeing 1 year ago
Aaron Miller 137bc2c367 replit: free metal context 1 year ago
Aaron Miller 57dc0c8953 adjust eval buf sizes to pass long input test 1 year ago
Aaron Miller 7a5f6e4726 limit prompt batch size to 128 1 year ago
Aaron Miller 883775bc5f move 230511 submodule to nomic fork, fix alibi assert 1 year ago
Andriy Mulyar 46a0762bd5
Python Bindings: Improved unit tests, documentation and unification of API (#1090)
* Makefiles, black, isort

* Black and isort

* unit tests and generation method

* chat context provider

* context does not reset

* Current state

* Fixup

* Python bindings with unit tests

* GPT4All Python Bindings: chat contexts, tests

* New python bindings and backend fixes

* Black and Isort

* Documentation error

* preserved n_predict for backwords compat with langchain

---------

Co-authored-by: Adam Treat <treat.adam@gmail.com>
1 year ago
Aaron Miller 40a3faeb05
Use ggml scratch bufs for mpt and gptj models (#1104)
* backend/gptj: use scratch buffers

reduces total memory required and makes eval buf not grow with n_past

* backend/mpt: use scratch bufs

* fix format-related compile warnings
1 year ago
Aaron Miller 8d19ef3909
backend: factor out common elements in model code (#1089)
* backend: factor out common structs in model code

prepping to hack on these by hopefully making there be fewer places to fix the same bug

rename

* use common buffer wrapper instead of manual malloc

* fix replit compile warnings
1 year ago
Aaron Miller 28d41d4f6d
falcon: use *model-local* eval & scratch bufs (#1079)
fixes memory leaks copied from ggml/examples based implementation
1 year ago
Zach Nussbaum 2565f6a94a feat: add conversion script 1 year ago
Aaron Miller 198b5e4832 add Falcon 7B model
Tested with https://huggingface.co/TheBloke/falcon-7b-instruct-GGML/blob/main/falcon7b-instruct.ggmlv3.q4_0.bin
1 year ago
Aaron Miller db34a2f670 llmodel: skip attempting Metal if model+kvcache > 53% of system ram 1 year ago
Aaron Miller b19a3e5b2c add requiredMem method to llmodel impls
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
1 year ago
Adam Treat a0f80453e5 Use sysinfo in backend. 1 year ago
niansa/tuxifan 47323f8591 Update replit.cpp
replit_tokenizer_detokenize returnins std::string now

Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
1 year ago
niansa 0855c0df1d Fixed Replit implementation compile warnings 1 year ago
Aaron Miller 1290b32451 update to latest mainline llama.cpp
add max_size param to ggml_metal_add_buffer - introduced in https://github.com/ggerganov/llama.cpp/pull/1826
1 year ago
niansa/tuxifan 5eee16c97c Do not specify "success" as error for unsupported models
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
1 year ago