gpt4all/gpt4all-backend/llmodel.h

#ifndef LLMODEL_H
#define LLMODEL_H
#include <string>
#include <functional>
#include <vector>
#include <cstdint>


class LLModel {
public:
    explicit LLModel() {}
    virtual ~LLModel() {}

    static LLModel *construct(const std::string &modelPath, std::string buildVariant = "default");

    virtual bool loadModel(const std::string &modelPath) = 0;
    virtual bool isModelLoaded() const = 0;
    virtual size_t stateSize() const { return 0; }
    virtual size_t saveState(uint8_t */*dest*/) const { return 0; }
    virtual size_t restoreState(const uint8_t */*src*/) { return 0; }
    struct PromptContext {
        std::vector<float> logits;      // logits of current context
        std::vector<int32_t> tokens;    // current tokens in the context window
        int32_t n_past = 0;             // number of tokens in past conversation
        int32_t n_ctx = 0;              // number of tokens possible in context window
        int32_t n_predict = 200;
        int32_t top_k = 40;
        float   top_p = 0.9f;
        float   temp = 0.9f;
        int32_t n_batch = 9;
        float   repeat_penalty = 1.10f;
        int32_t repeat_last_n = 64;     // last n tokens to penalize
        float   contextErase = 0.75f;   // percent of context to erase if we exceed the context
                                        // window
    };
    virtual void prompt(const std::string &prompt,
        std::function<bool(int32_t)> promptCallback,
        std::function<bool(int32_t, const std::string&)> responseCallback,
        std::function<bool(bool)> recalculateCallback,
        PromptContext &ctx) = 0;
    virtual void setThreadCount(int32_t /*n_threads*/) {}
    virtual int32_t threadCount() const { return 1; }

    const char *getModelType() const {
        return modelType;
    }

protected:
    virtual void recalculateContext(PromptContext &promptCtx,
        std::function<bool(bool)> recalculate) = 0;

    const char *modelType;
};

#endif // LLMODEL_H
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-14 02:15:40 +00:00			`#ifndef LLMODEL_H`
			`#define LLMODEL_H`
			`#include <string>`
			`#include <functional>`
			`#include <vector>`
include <cstdint> in llmodel.h 2023-05-05 00:01:32 +00:00			`#include <cstdint>`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-14 02:15:40 +00:00
Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. 2023-05-31 21:04:01 +00:00
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-14 02:15:40 +00:00			`class LLModel {`
			`public:`
			`explicit LLModel() {}`
			`virtual ~LLModel() {}`

Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. 2023-05-31 21:04:01 +00:00			`static LLModel *construct(const std::string &modelPath, std::string buildVariant = "default");`

Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 19:57:32 +00:00			`virtual bool loadModel(const std::string &modelPath) = 0;`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-14 02:15:40 +00:00			`virtual bool isModelLoaded() const = 0;`
First attempt at providing a persistent chat list experience. Limitations: 1) Context is not restored for gpt-j models 2) When you switch between different model types in an existing chat the context and all the conversation is lost 3) The settings are not chat or conversation specific 4) The sizes of the chat persisted files are very large due to how much data the llama.cpp backend tries to persist. Need to investigate how we can shrink this. 2023-05-04 19:31:41 +00:00			`virtual size_t stateSize() const { return 0; }`
Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. 2023-05-31 21:04:01 +00:00			`virtual size_t saveState(uint8_t /dest*/) const { return 0; }`
			`virtual size_t restoreState(const uint8_t /src*/) { return 0; }`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-14 02:15:40 +00:00			`struct PromptContext {`
Implement repeat penalty for both llama and gptj in gui. 2023-04-25 12:38:29 +00:00			`std::vector<float> logits; // logits of current context`
			`std::vector<int32_t> tokens; // current tokens in the context window`
			`int32_t n_past = 0; // number of tokens in past conversation`
			`int32_t n_ctx = 0; // number of tokens possible in context window`
			`int32_t n_predict = 200;`
			`int32_t top_k = 40;`
			`float top_p = 0.9f;`
			`float temp = 0.9f;`
			`int32_t n_batch = 9;`
			`float repeat_penalty = 1.10f;`
			`int32_t repeat_last_n = 64; // last n tokens to penalize`
Infinite context window through trimming. 2023-04-25 15:20:51 +00:00			`float contextErase = 0.75f; // percent of context to erase if we exceed the context`
			`// window`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-14 02:15:40 +00:00			`};`
Implement repeat penalty for both llama and gptj in gui. 2023-04-25 12:38:29 +00:00			`virtual void prompt(const std::string &prompt,`
Move the promptCallback to own function. 2023-04-27 15:08:15 +00:00			`std::function<bool(int32_t)> promptCallback,`
			`std::function<bool(int32_t, const std::string&)> responseCallback,`
			`std::function<bool(bool)> recalculateCallback,`
Implement repeat penalty for both llama and gptj in gui. 2023-04-25 12:38:29 +00:00			`PromptContext &ctx) = 0;`
Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. 2023-05-31 21:04:01 +00:00			`virtual void setThreadCount(int32_t /n_threads/) {}`
llmodel: constify LLModel::threadCount() 2023-05-21 20:45:29 +00:00			`virtual int32_t threadCount() const { return 1; }`
Infinite context window through trimming. 2023-04-25 15:20:51 +00:00
Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. 2023-05-31 21:04:01 +00:00			`const char *getModelType() const {`
			`return modelType;`
			`}`

Infinite context window through trimming. 2023-04-25 15:20:51 +00:00			`protected:`
			`virtual void recalculateContext(PromptContext &promptCtx,`
			`std::function<bool(bool)> recalculate) = 0;`
Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. 2023-05-31 21:04:01 +00:00
			`const char *modelType;`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-14 02:15:40 +00:00			`};`

Add thread count setting 2023-04-18 13:46:03 +00:00			`#endif // LLMODEL_H`