- custom callbacks & session improvements PR (v1.0.6) had one too many checks
- remove the problematic config['url'] check
- add a crude test
- fixes#1261
* Added the following features: \n 1) Now prompt_model uses the positional argument callback to return the response tokens. \n 2) Due to the callback argument of prompt_model, prompt_model_streaming only manages the queue and threading now, which reduces duplication of the code. \n 3) Added optional verbose argument to prompt_model which prints out the prompt that is passed to the model. \n 4) Chat sessions can now have a header, i.e. an instruction before the transcript of the conversation. The header is set at the creation of the chat session context. \n 5) generate function now accepts an optional callback. \n 6) When streaming and using chat session, the user doesn't need to save assistant's messages by himself. This is done automatically.
* added _empty_response_callback so I don't have to check if callback is None
* added docs
* now if the callback stop generation, the last token is ignored
* fixed type hints, reimplemented chat session header as a system prompt, minor refactoring, docs: removed section about manual update of chat session for streaming
* forgot to add some type hints!
* keep the config of the model in GPT4All class which is taken from models.json if the download is allowed
* During chat sessions, the model-specific systemPrompt and promptTemplate are applied.
* implemented the changes
* Fixed typing. Now the user can set a prompt template that will be applied even outside of a chat session. The template can also have multiple placeholders that can be filled by passing a dictionary to the generate function
* reversed some changes concerning the prompt templates and their functionality
* fixed some type hints, changed list[float] to List[Float]
* fixed type hints, changed List[Float] to List[float]
* fix typo in the comment: Pepare => Prepare
---------
Signed-off-by: 385olt <385olt@gmail.com>
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
* python: do not mutate locals()
* python: fix (some) typing complaints
* python: queue sentinel need not be a str
* python: make long inference tests opt in
* Makefiles, black, isort
* Black and isort
* unit tests and generation method
* chat context provider
* context does not reset
* Current state
* Fixup
* Python bindings with unit tests
* GPT4All Python Bindings: chat contexts, tests
* New python bindings and backend fixes
* Black and Isort
* Documentation error
* preserved n_predict for backwords compat with langchain
---------
Co-authored-by: Adam Treat <treat.adam@gmail.com>
* Update gpt4all_chat.md
Cleaned up and made the sideloading part more readable, also moved Replit architecture to supported ones. (+ renamed all "ggML" to "GGML" because who calls it "ggML"??)
Signed-off-by: AMOGUS <137312610+Amogus8P@users.noreply.github.com>
* Removed the prefixing part
Signed-off-by: AMOGUS <137312610+Amogus8P@users.noreply.github.com>
* Bump version
Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com>
---------
Signed-off-by: AMOGUS <137312610+Amogus8P@users.noreply.github.com>
Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com>
Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com>
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
* Add gpt4all-bindings/cli/README.md
* Unify version information
- Was previously split; base one on the other
- Add VERSION_INFO as the "source of truth":
- Modelled after sys.version_info.
- Implemented as a tuple, because it's much easier for (partial)
programmatic comparison.
- Previous API is kept intact.
* Add gpt4all-bindings/cli/developer_notes.md
- A few notes on what's what, especially regarding docs
* Add gpt4all-bindings/python/docs/gpt4all_cli.md
- The CLI user documentation
* Bump CLI version to 0.3.5
* Finalise docs & add to index.md
- Amend where necessary
- Fix typo in gpt4all_cli.md
- Mention and add link to CLI doc in index.md
* Add docstings to gpt4all-bindings/cli/app.py
* Better 'groovy' link & fix typo
- Documentation: point to the Hugging Face model card for 'groovy'
- Correct typo in app.py
- Add some notes about common Windows problems when trying to make a local build (MinGW and MSVC).
Signed-off-by: cosmic-snow <134004613+cosmic-snow@users.noreply.github.com>
* generator method
* cleanup
* bump version number for clarity
* added replace in decode to avoid unicodedecode exception
* revert back to _build_prompt
* porting over replit code model to gpt4all
* replaced memory with kv_self struct
* continuing debug
* welp it built but lot of sus things
* working model loading and somewhat working generate.. need to format response?
* revert back to semi working version
* finally got rid of weird formatting
* figured out problem is with python bindings - this is good to go for testing
* addressing PR feedback
* output refactor
* fixed prompt reponse collection
* cleanup
* addressing PR comments
* building replit backend with new ggmlver code
* chatllm replit and clean python files
* cleanup
* updated replit to match new llmodel api
* match llmodel api and change size_t to Token
* resolve PR comments
* replit model commit comment
Fix occurrences of the prompt callback being incorrectly specified, or
the response callback's prototype being incorrectly used in its place.
Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>