Commit Graph

125 Commits (045f6e6cdc6dedba996d5f3c630baeb18b3739ab)

Author SHA1 Message Date
Adam Treat 045f6e6cdc Link against ggml in bin so we can get the available devices without loading a model. 1 year ago
Adam Treat aa33419c6e Fallback to CPU more robustly. 1 year ago
Adam Treat 9013a089bd Bump to new llama with new bugfix. 1 year ago
Adam Treat 3076e0bf26 Only show GPU when we're actually using it. 1 year ago
Adam Treat cf4eb530ce Sync to a newer version of llama.cpp with bugfix for vulkan. 1 year ago
Adam Treat 4b9a345aee Update the submodule. 1 year ago
Aaron Miller 6f038c136b init at most one vulkan device, submodule update
fixes issues w/ multiple of the same gpu
1 year ago
Adam Treat 8f99dca70f Bring the vulkan backend to the GUI. 1 year ago
Aaron Miller f0735efa7d vulkan python bindings on windows fixes 1 year ago
Adam Treat c953b321b7 Don't link against libvulkan. 1 year ago
Aaron Miller c4d23512e4 remove extra dynamic linker deps when building with vulkan 1 year ago
Adam Treat 85e34598f9 more circleci 1 year ago
Adam Treat f578fa6cdf Fix for windows. 1 year ago
Adam Treat 17d3e4976c Add a comment indicating future work. 1 year ago
Adam Treat 987546c63b Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 1 year ago
Adam Treat d55cbbee32 Update to newer llama.cpp and disable older forks. 1 year ago
Aaron Miller 0bc2274869 bump llama.cpp version + needed fixes for that 1 year ago
aaron miller 33c22be2aa starcoder: use ggml_graph_plan 1 year ago
Cosmic Snow 108d950874 Fix Windows unable to load models on older Windows builds
- Replace high-level IsProcessorFeaturePresent
- Reintroduce low-level compiler intrinsics implementation
1 year ago
Adam Treat 6d03b3e500 Add starcoder support. 1 year ago
cosmic-snow 2d02c65177
Handle edge cases when generating embeddings (#1215)
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
1 year ago
Aaron Miller 1c4a244291 bump mem allocation a bit 1 year ago
Adam Treat ee4186d579 Fixup bert python bindings. 1 year ago
cosmic-snow 6200900677
Fix Windows MSVC arch detection (#1194)
- in llmodel.cpp to fix AVX-only handling

Signed-off-by: cosmic-snow <134004613+cosmic-snow@users.noreply.github.com>
1 year ago
Adam Treat 4963db8f43 Bump the version numbers for both python and c backend. 1 year ago
Adam Treat 0efdbfcffe Bert 1 year ago
Adam Treat 315a1f2aa2 Move it back as internal class. 1 year ago
Adam Treat ae8eb297ac Add sbert backend. 1 year ago
Adam Treat 1f749d7633 Clean up backend code a bit and hide impl. details. 1 year ago
Adam Treat 33557b1f39 Move the implementation out of llmodel class. 1 year ago
Aaron Miller 432b7ebbd7 include windows.h just to be safe 1 year ago
Aaron Miller 95b8fb312e windows/msvc: use high level processor feature detection API
see https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-isprocessorfeaturepresent
1 year ago
Aaron Miller f0faa23ad5
cmakelists: always export build commands (#1179)
friendly for using editors with clangd integration that don't also
manage the build themselves
1 year ago
Aaron Miller 4a24b586df llama.cpp: metal buffer freeing 1 year ago
Aaron Miller 137bc2c367 replit: free metal context 1 year ago
Aaron Miller 57dc0c8953 adjust eval buf sizes to pass long input test 1 year ago
Aaron Miller 7a5f6e4726 limit prompt batch size to 128 1 year ago
Aaron Miller 883775bc5f move 230511 submodule to nomic fork, fix alibi assert 1 year ago
Andriy Mulyar 46a0762bd5
Python Bindings: Improved unit tests, documentation and unification of API (#1090)
* Makefiles, black, isort

* Black and isort

* unit tests and generation method

* chat context provider

* context does not reset

* Current state

* Fixup

* Python bindings with unit tests

* GPT4All Python Bindings: chat contexts, tests

* New python bindings and backend fixes

* Black and Isort

* Documentation error

* preserved n_predict for backwords compat with langchain

---------

Co-authored-by: Adam Treat <treat.adam@gmail.com>
1 year ago
Aaron Miller 40a3faeb05
Use ggml scratch bufs for mpt and gptj models (#1104)
* backend/gptj: use scratch buffers

reduces total memory required and makes eval buf not grow with n_past

* backend/mpt: use scratch bufs

* fix format-related compile warnings
1 year ago
Aaron Miller 8d19ef3909
backend: factor out common elements in model code (#1089)
* backend: factor out common structs in model code

prepping to hack on these by hopefully making there be fewer places to fix the same bug

rename

* use common buffer wrapper instead of manual malloc

* fix replit compile warnings
1 year ago
Aaron Miller 28d41d4f6d
falcon: use *model-local* eval & scratch bufs (#1079)
fixes memory leaks copied from ggml/examples based implementation
1 year ago
Zach Nussbaum 2565f6a94a feat: add conversion script 1 year ago
Aaron Miller 198b5e4832 add Falcon 7B model
Tested with https://huggingface.co/TheBloke/falcon-7b-instruct-GGML/blob/main/falcon7b-instruct.ggmlv3.q4_0.bin
1 year ago
Aaron Miller db34a2f670 llmodel: skip attempting Metal if model+kvcache > 53% of system ram 1 year ago
Aaron Miller b19a3e5b2c add requiredMem method to llmodel impls
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
1 year ago
Adam Treat a0f80453e5 Use sysinfo in backend. 1 year ago
niansa/tuxifan 47323f8591 Update replit.cpp
replit_tokenizer_detokenize returnins std::string now

Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
1 year ago
niansa 0855c0df1d Fixed Replit implementation compile warnings 1 year ago
Aaron Miller 1290b32451 update to latest mainline llama.cpp
add max_size param to ggml_metal_add_buffer - introduced in https://github.com/ggerganov/llama.cpp/pull/1826
1 year ago