* Added the following features: \n 1) Now prompt_model uses the positional argument callback to return the response tokens. \n 2) Due to the callback argument of prompt_model, prompt_model_streaming only manages the queue and threading now, which reduces duplication of the code. \n 3) Added optional verbose argument to prompt_model which prints out the prompt that is passed to the model. \n 4) Chat sessions can now have a header, i.e. an instruction before the transcript of the conversation. The header is set at the creation of the chat session context. \n 5) generate function now accepts an optional callback. \n 6) When streaming and using chat session, the user doesn't need to save assistant's messages by himself. This is done automatically.
* added _empty_response_callback so I don't have to check if callback is None
* added docs
* now if the callback stop generation, the last token is ignored
* fixed type hints, reimplemented chat session header as a system prompt, minor refactoring, docs: removed section about manual update of chat session for streaming
* forgot to add some type hints!
* keep the config of the model in GPT4All class which is taken from models.json if the download is allowed
* During chat sessions, the model-specific systemPrompt and promptTemplate are applied.
* implemented the changes
* Fixed typing. Now the user can set a prompt template that will be applied even outside of a chat session. The template can also have multiple placeholders that can be filled by passing a dictionary to the generate function
* reversed some changes concerning the prompt templates and their functionality
* fixed some type hints, changed list[float] to List[Float]
* fixed type hints, changed List[Float] to List[float]
* fix typo in the comment: Pepare => Prepare
---------
Signed-off-by: 385olt <385olt@gmail.com>
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
* Javav binding - Add check for Model file be Readable.
* add todo for java binding.
---------
Co-authored-by: Feliks Zaslavskiy <feliks.zaslavskiy@optum.com>
Co-authored-by: felix <felix@zaslavskiy.net>
* Fix CLI to work with 1.x.y version of the Python bindings (tentative)
- Adapted to bindings API changes
- Version selection based on package information
- Does not currently work with 1.x.y however, as it's not fully implemented:
"NotImplementedError: Streaming tokens in a chat session is not currently supported."
* Adapt to the completed streaming API with session support
* Bump CLI version to 1.0.2
add macos metal files
Add check for Prompt is too long.
add logging statement for gpt4all version of the binding
add version string, readme update
Add unit tests for Java code of the java bindings.
* python: do not mutate locals()
* python: fix (some) typing complaints
* python: queue sentinel need not be a str
* python: make long inference tests opt in
* Makefiles, black, isort
* Black and isort
* unit tests and generation method
* chat context provider
* context does not reset
* Current state
* Fixup
* Python bindings with unit tests
* GPT4All Python Bindings: chat contexts, tests
* New python bindings and backend fixes
* Black and Isort
* Documentation error
* preserved n_predict for backwords compat with langchain
---------
Co-authored-by: Adam Treat <treat.adam@gmail.com>
* Update gpt4all_chat.md
Cleaned up and made the sideloading part more readable, also moved Replit architecture to supported ones. (+ renamed all "ggML" to "GGML" because who calls it "ggML"??)
Signed-off-by: AMOGUS <137312610+Amogus8P@users.noreply.github.com>
* Removed the prefixing part
Signed-off-by: AMOGUS <137312610+Amogus8P@users.noreply.github.com>
* Bump version
Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com>
---------
Signed-off-by: AMOGUS <137312610+Amogus8P@users.noreply.github.com>
Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com>
Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com>
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
* Add gpt4all-bindings/cli/README.md
* Unify version information
- Was previously split; base one on the other
- Add VERSION_INFO as the "source of truth":
- Modelled after sys.version_info.
- Implemented as a tuple, because it's much easier for (partial)
programmatic comparison.
- Previous API is kept intact.
* Add gpt4all-bindings/cli/developer_notes.md
- A few notes on what's what, especially regarding docs
* Add gpt4all-bindings/python/docs/gpt4all_cli.md
- The CLI user documentation
* Bump CLI version to 0.3.5
* Finalise docs & add to index.md
- Amend where necessary
- Fix typo in gpt4all_cli.md
- Mention and add link to CLI doc in index.md
* Add docstings to gpt4all-bindings/cli/app.py
* Better 'groovy' link & fix typo
- Documentation: point to the Hugging Face model card for 'groovy'
- Correct typo in app.py