gpt4all

mirror of https://github.com/nomic-ai/gpt4all synced 2024-11-10 01:10:35 +00:00

Author	SHA1	Message	Date
Aaron Miller	1c4a244291	bump mem allocation a bit	2023-07-14 09:48:57 -04:00
Adam Treat	ee4186d579	Fixup bert python bindings.	2023-07-14 09:48:57 -04:00
cosmic-snow	6200900677	Fix Windows MSVC arch detection (#1194 ) - in llmodel.cpp to fix AVX-only handling Signed-off-by: cosmic-snow <134004613+cosmic-snow@users.noreply.github.com>	2023-07-13 14:44:17 -04:00
Adam Treat	4963db8f43	Bump the version numbers for both python and c backend.	2023-07-13 14:21:46 -04:00
Adam Treat	0efdbfcffe	Bert	2023-07-13 14:21:46 -04:00
Adam Treat	315a1f2aa2	Move it back as internal class.	2023-07-13 14:21:46 -04:00
Adam Treat	ae8eb297ac	Add sbert backend.	2023-07-13 14:21:46 -04:00
Adam Treat	1f749d7633	Clean up backend code a bit and hide impl. details.	2023-07-13 14:21:46 -04:00
Adam Treat	33557b1f39	Move the implementation out of llmodel class.	2023-07-13 14:21:46 -04:00
Aaron Miller	432b7ebbd7	include windows.h just to be safe	2023-07-12 12:46:46 -04:00
Aaron Miller	95b8fb312e	windows/msvc: use high level processor feature detection API see https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-isprocessorfeaturepresent	2023-07-12 12:46:46 -04:00
Aaron Miller	f0faa23ad5	cmakelists: always export build commands (#1179 ) friendly for using editors with clangd integration that don't also manage the build themselves	2023-07-12 10:49:24 -04:00
Aaron Miller	4a24b586df	llama.cpp: metal buffer freeing	2023-06-30 21:07:21 -03:00
Aaron Miller	137bc2c367	replit: free metal context	2023-06-30 21:07:21 -03:00
Aaron Miller	57dc0c8953	adjust eval buf sizes to pass long input test	2023-06-30 21:07:21 -03:00
Aaron Miller	7a5f6e4726	limit prompt batch size to 128	2023-06-30 21:07:21 -03:00
Aaron Miller	883775bc5f	move 230511 submodule to nomic fork, fix alibi assert	2023-06-30 21:07:21 -03:00
Andriy Mulyar	46a0762bd5	Python Bindings: Improved unit tests, documentation and unification of API (#1090 ) * Makefiles, black, isort * Black and isort * unit tests and generation method * chat context provider * context does not reset * Current state * Fixup * Python bindings with unit tests * GPT4All Python Bindings: chat contexts, tests * New python bindings and backend fixes * Black and Isort * Documentation error * preserved n_predict for backwords compat with langchain --------- Co-authored-by: Adam Treat <treat.adam@gmail.com>	2023-06-30 16:02:02 -04:00
Aaron Miller	40a3faeb05	Use ggml scratch bufs for mpt and gptj models (#1104 ) * backend/gptj: use scratch buffers reduces total memory required and makes eval buf not grow with n_past * backend/mpt: use scratch bufs * fix format-related compile warnings	2023-06-30 10:53:45 -07:00
Aaron Miller	8d19ef3909	backend: factor out common elements in model code (#1089 ) * backend: factor out common structs in model code prepping to hack on these by hopefully making there be fewer places to fix the same bug rename * use common buffer wrapper instead of manual malloc * fix replit compile warnings	2023-06-28 17:35:07 -07:00
Aaron Miller	28d41d4f6d	falcon: use model-local eval & scratch bufs (#1079 ) fixes memory leaks copied from ggml/examples based implementation	2023-06-27 16:09:11 -07:00
Zach Nussbaum	2565f6a94a	feat: add conversion script	2023-06-27 14:06:39 -03:00
Aaron Miller	198b5e4832	add Falcon 7B model Tested with https://huggingface.co/TheBloke/falcon-7b-instruct-GGML/blob/main/falcon7b-instruct.ggmlv3.q4_0.bin	2023-06-27 14:06:39 -03:00
Aaron Miller	db34a2f670	llmodel: skip attempting Metal if model+kvcache > 53% of system ram	2023-06-26 19:46:49 -03:00
Aaron Miller	b19a3e5b2c	add requiredMem method to llmodel impls most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)	2023-06-26 18:27:58 -03:00
Adam Treat	a0f80453e5	Use sysinfo in backend.	2023-06-26 14:14:49 -04:00
niansa/tuxifan	47323f8591	Update replit.cpp replit_tokenizer_detokenize returnins std::string now Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-26 14:49:58 -03:00
niansa	0855c0df1d	Fixed Replit implementation compile warnings	2023-06-26 14:49:58 -03:00
Aaron Miller	1290b32451	update to latest mainline llama.cpp add max_size param to ggml_metal_add_buffer - introduced in https://github.com/ggerganov/llama.cpp/pull/1826	2023-06-26 14:40:52 -03:00
niansa/tuxifan	5eee16c97c	Do not specify "success" as error for unsupported models Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-22 09:28:40 +02:00
Adam Treat	bd58c46da0	Initialize these to nullptr to prevent double deletion when a model fails to load.	2023-06-20 18:23:45 -04:00
niansa/tuxifan	68f9786ed9	Use operator ""_MiB (#991 )	2023-06-16 15:56:22 -04:00
Aaron Miller	abc081e48d	fix llama.cpp k-quants (#988 ) * enable k-quants on all mainline builds	2023-06-15 14:06:14 -07:00
Aaron Miller	c4319d2c8e	dlhandle: prevent libs from using each other's symbols (#977 ) use RTLD_LOCAL so that symbols are only exposed via dlsym without this all symbols exported by the libs are available for symbol resolution, resulting in different lib versions potentially resolving each other's symbols, causing incredibly cursed behavior such as https://gist.github.com/apage43/085c1ff69f6dd05387793ebc301840f6	2023-06-13 14:52:11 -04:00
Aaron Miller	f71d8efc71	metal replit (#931 ) metal+replit makes replit work with Metal and removes its use of `mem_per_token` in favor of fixed size scratch buffers (closer to llama.cpp)	2023-06-13 07:29:14 -07:00
Aaron Miller	85964a7635	bump llama.cpp mainline to latest (#964 )	2023-06-13 08:40:38 -04:00
Tim Miller	797891c995	Initial Library Loader for .NET Bindings / Update bindings to support newest changes (#763 ) * Initial Library Loader * Load library as part of Model factory * Dynamically search and find the dlls * Update tests to use locally built runtimes * Fix dylib loading, add macos runtime support for sample/tests * Bypass automatic loading by default. * Only set CMAKE_OSX_ARCHITECTURES if not already set, allow cross-compile * Switch Loading again * Update build scripts for mac/linux * Update bindings to support newest breaking changes * Fix build * Use llmodel for Windows * Actually, it does need to be libllmodel * Name * Remove TFMs, bypass loading by default * Fix script * Delete mac script --------- Co-authored-by: Tim Miller <innerlogic4321@ghmail.com>	2023-06-13 14:05:34 +02:00
Aaron Miller	88616fde7f	llmodel: change tokenToString to not use string_view (#968 ) fixes a definite use-after-free and likely avoids some other potential ones - std::string will convert to a std::string_view automatically but as soon as the std::string in question goes out of scope it is already freed and the string_view is pointing at freed memory - this is mostly fine if its returning a reference to the tokenizer's internal vocab table but it's, imo, too easy to return a reference to a dynamically constructed string with this as replit is doing (and unfortunately needs to do to convert the internal whitespace replacement symbol back to a space)	2023-06-13 07:14:02 -04:00
Adam Treat	84deebd223	Fix compile for windows and linux again. PLEASE DON'T REVERT THISgit gui!	2023-06-12 17:08:55 -04:00
Juuso Alasuutari	5cfb1bda89	llmodel: add model wrapper destructor, fix mem leak in golang bindings (#862 ) Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>	2023-06-12 09:41:22 -07:00
Cosmic Snow	ae4a275bcd	Fix Windows MSVC AVX builds - bug introduced in `0cb2b86730` - currently getting: `warning C5102: ignoring invalid command-line macro definition '/arch:AVX2'` - solution is to use `_options(...)` not `_definitions(...)`	2023-06-12 08:55:55 -07:00
Adam Treat	b906fb4057	When recalculating context we can't erase the BOS.	2023-06-12 08:43:20 -07:00
Aaron Miller	d3ba1295a7	Metal+LLama take two (#929 ) Support latest llama with Metal --------- Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-09 16:48:46 -04:00
Adam Treat	b162b5c64e	Revert "llama on Metal (#885 )" This reverts commit `c55f81b860`.	2023-06-09 15:08:46 -04:00
Aaron Miller	c55f81b860	llama on Metal (#885 ) Support latest llama with Metal --------- Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-09 14:58:12 -04:00
niansa/tuxifan	14e9ccbc6a	Do auto detection by default in C++ API Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-09 17:01:19 +02:00
niansa/tuxifan	f03da8d732	Removed double-static from variables in replit.cpp The anonymous namespace already makes it static. Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-09 08:55:15 -04:00
niansa	0cb2b86730	Synced llama.cpp.cmake with upstream	2023-06-08 18:21:32 -04:00
Aaron Miller	47fbc0e309	non-llama: explicitly greedy sampling for temp<=0 (#901 ) copied directly from llama.cpp - without this temp=0.0 will just scale all the logits to infinity and give bad output	2023-06-08 11:08:30 -07:00
Aaron Miller	b14953e136	sampling: remove incorrect offset for n_vocab (#900 ) no effect, but avoids a potential bug later if we use actualVocabSize - which is for when a model has a larger embedding tensor/# of output logits than actually trained token to allow room for adding extras in finetuning - presently all of our models have had "placeholder" tokens in the vocab so this hasn't broken anything, but if the sizes did differ we want the equivalent of `logits[actualVocabSize:]` (the start point is unchanged), not `logits[-actualVocabSize:]` (this.)	2023-06-08 11:08:10 -07:00

1 2 3

104 Commits