* feat: local inference server
* fix: source to use bash + vars
* chore: isort and black
* fix: make file + inference mode
* chore: logging
* refactor: remove old links
* fix: add new env vars
* feat: hf inference server
* refactor: remove old links
* test: batch and single response
* chore: black + isort
* separate gpu and cpu dockerfiles
* moved gpu to separate dockerfile
* Fixed test endpoints
* Edits to API. server won't start due to failed instantiation error
* Method signature
* fix: gpu_infer
* tests: fix tests
---------
Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com>
* Added the following features: \n 1) Now prompt_model uses the positional argument callback to return the response tokens. \n 2) Due to the callback argument of prompt_model, prompt_model_streaming only manages the queue and threading now, which reduces duplication of the code. \n 3) Added optional verbose argument to prompt_model which prints out the prompt that is passed to the model. \n 4) Chat sessions can now have a header, i.e. an instruction before the transcript of the conversation. The header is set at the creation of the chat session context. \n 5) generate function now accepts an optional callback. \n 6) When streaming and using chat session, the user doesn't need to save assistant's messages by himself. This is done automatically.
* added _empty_response_callback so I don't have to check if callback is None
* added docs
* now if the callback stop generation, the last token is ignored
* fixed type hints, reimplemented chat session header as a system prompt, minor refactoring, docs: removed section about manual update of chat session for streaming
* forgot to add some type hints!
* keep the config of the model in GPT4All class which is taken from models.json if the download is allowed
* During chat sessions, the model-specific systemPrompt and promptTemplate are applied.
* implemented the changes
* Fixed typing. Now the user can set a prompt template that will be applied even outside of a chat session. The template can also have multiple placeholders that can be filled by passing a dictionary to the generate function
* reversed some changes concerning the prompt templates and their functionality
* fixed some type hints, changed list[float] to List[Float]
* fixed type hints, changed List[Float] to List[float]
* fix typo in the comment: Pepare => Prepare
---------
Signed-off-by: 385olt <385olt@gmail.com>
TopP 0.1 was found to be somewhat too aggressive, so a more moderate default of 0.4 would be better suited for general use.
Signed-off-by: AMOGUS <137312610+Amogus8P@users.noreply.github.com>
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
* Javav binding - Add check for Model file be Readable.
* add todo for java binding.
---------
Co-authored-by: Feliks Zaslavskiy <feliks.zaslavskiy@optum.com>
Co-authored-by: felix <felix@zaslavskiy.net>