# GPT4All Node.js API ```sh yarn add gpt4all@alpha npm install gpt4all@alpha pnpm install gpt4all@alpha ``` The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date. * New bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use. * The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart. * Everything should work out the box. * See [API Reference](#api-reference) ### Chat Completion (alpha) ```js import { createCompletion, loadModel } from '../src/gpt4all.js' const model = await loadModel('ggml-vicuna-7b-1.1-q4_2', { verbose: true }); const response = await createCompletion(model, [ { role : 'system', content: 'You are meant to be annoying and unhelpful.' }, { role : 'user', content: 'What is 1 + 1?' } ]); ``` ### Embedding (alpha) ```js import { createEmbedding, loadModel } from '../src/gpt4all.js' const model = await loadModel('ggml-all-MiniLM-L6-v2-f16', { verbose: true }); const fltArray = createEmbedding(model, "Pain is inevitable, suffering optional"); ``` ### Build Instructions * binding.gyp is compile config * Tested on Ubuntu. Everything seems to work fine * Tested on Windows. Everything works fine. * Sparse testing on mac os. * MingW works as well to build the gpt4all-backend. **HOWEVER**, this package works only with MSVC built dlls. ### Requirements * git * [node.js >= 18.0.0](https://nodejs.org/en) * [yarn](https://yarnpkg.com/) * [node-gyp](https://github.com/nodejs/node-gyp) * all of its requirements. * (unix) gcc version 12 * (win) msvc version 143 * Can be obtained with visual studio 2022 build tools * python 3 ### Build (from source) ```sh git clone https://github.com/nomic-ai/gpt4all.git cd gpt4all-bindings/typescript ``` * The below shell commands assume the current working directory is `typescript`. * To Build and Rebuild: ```sh yarn ``` * llama.cpp git submodule for gpt4all can be possibly absent. If this is the case, make sure to run in llama.cpp parent directory ```sh git submodule update --init --depth 1 --recursive ``` **AS OF NEW BACKEND** to build the backend, ```sh yarn build:backend ``` This will build platform-dependent dynamic libraries, and will be located in runtimes/(platform)/native The only current way to use them is to put them in the current working directory of your application. That is, **WHEREVER YOU RUN YOUR NODE APPLICATION** * llama-xxxx.dll is required. * According to whatever model you are using, you'll need to select the proper model loader. * For example, if you running an Mosaic MPT model, you will need to select the mpt-(buildvariant).(dynamiclibrary) ### Test ```sh yarn test ``` ### Source Overview #### src/ * Extra functions to help aid devex * Typings for the native node addon * the javascript interface #### test/ * simple unit testings for some functions exported. * more advanced ai testing is not handled #### spec/ * Average look and feel of the api * Should work assuming a model and libraries are installed locally in working directory #### index.cc * The bridge between nodejs and c. Where the bindings are. #### prompt.cc * Handling prompting and inference of models in a threadsafe, asynchronous way. ### Known Issues * why your model may be spewing bull 💩 * The downloaded model is broken (just reinstall or download from official site) * That's it so far ### Roadmap This package is in active development, and breaking changes may happen until the api stabilizes. Here's what's the todo list: * \[x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs * \[ ] ~~createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)~~ May not implement unless someone else can complete * \[x] proper unit testing (integrate with circle ci) * \[x] publish to npm under alpha tag `gpt4all@alpha` * \[x] have more people test on other platforms (mac tester needed) * \[x] switch to new pluggable backend * \[ ] NPM bundle size reduction via optionalDependencies strategy (need help) * Should include prebuilds to avoid painful node-gyp errors * \[ ] createChatSession ( the python equivalent to create\_chat\_session ) ### API Reference ##### Table of Contents * [ModelType](#modeltype) * [ModelFile](#modelfile) * [gptj](#gptj) * [llama](#llama) * [mpt](#mpt) * [replit](#replit) * [type](#type) * [LLModel](#llmodel) * [constructor](#constructor) * [Parameters](#parameters) * [type](#type-1) * [name](#name) * [stateSize](#statesize) * [threadCount](#threadcount) * [setThreadCount](#setthreadcount) * [Parameters](#parameters-1) * [raw\_prompt](#raw_prompt) * [Parameters](#parameters-2) * [embed](#embed) * [Parameters](#parameters-3) * [isModelLoaded](#ismodelloaded) * [setLibraryPath](#setlibrarypath) * [Parameters](#parameters-4) * [getLibraryPath](#getlibrarypath) * [loadModel](#loadmodel) * [Parameters](#parameters-5) * [createCompletion](#createcompletion) * [Parameters](#parameters-6) * [createEmbedding](#createembedding) * [Parameters](#parameters-7) * [CompletionOptions](#completionoptions) * [verbose](#verbose) * [systemPromptTemplate](#systemprompttemplate) * [promptTemplate](#prompttemplate) * [promptHeader](#promptheader) * [promptFooter](#promptfooter) * [PromptMessage](#promptmessage) * [role](#role) * [content](#content) * [prompt\_tokens](#prompt_tokens) * [completion\_tokens](#completion_tokens) * [total\_tokens](#total_tokens) * [CompletionReturn](#completionreturn) * [model](#model) * [usage](#usage) * [choices](#choices) * [CompletionChoice](#completionchoice) * [message](#message) * [LLModelPromptContext](#llmodelpromptcontext) * [logitsSize](#logitssize) * [tokensSize](#tokenssize) * [nPast](#npast) * [nCtx](#nctx) * [nPredict](#npredict) * [topK](#topk) * [topP](#topp) * [temp](#temp) * [nBatch](#nbatch) * [repeatPenalty](#repeatpenalty) * [repeatLastN](#repeatlastn) * [contextErase](#contexterase) * [createTokenStream](#createtokenstream) * [Parameters](#parameters-8) * [DEFAULT\_DIRECTORY](#default_directory) * [DEFAULT\_LIBRARIES\_DIRECTORY](#default_libraries_directory) * [DEFAULT\_MODEL\_CONFIG](#default_model_config) * [DEFAULT\_PROMT\_CONTEXT](#default_promt_context) * [DEFAULT\_MODEL\_LIST\_URL](#default_model_list_url) * [downloadModel](#downloadmodel) * [Parameters](#parameters-9) * [Examples](#examples) * [DownloadModelOptions](#downloadmodeloptions) * [modelPath](#modelpath) * [verbose](#verbose-1) * [url](#url) * [md5sum](#md5sum) * [DownloadController](#downloadcontroller) * [cancel](#cancel) * [promise](#promise) #### ModelType Type of the model Type: (`"gptj"` | `"llama"` | `"mpt"` | `"replit"`) #### ModelFile Full list of models available @deprecated These model names are outdated and this type will not be maintained, please use a string literal instead ##### gptj List of GPT-J Models Type: (`"ggml-gpt4all-j-v1.3-groovy.bin"` | `"ggml-gpt4all-j-v1.2-jazzy.bin"` | `"ggml-gpt4all-j-v1.1-breezy.bin"` | `"ggml-gpt4all-j.bin"`) ##### llama List Llama Models Type: (`"ggml-gpt4all-l13b-snoozy.bin"` | `"ggml-vicuna-7b-1.1-q4_2.bin"` | `"ggml-vicuna-13b-1.1-q4_2.bin"` | `"ggml-wizardLM-7B.q4_2.bin"` | `"ggml-stable-vicuna-13B.q4_2.bin"` | `"ggml-nous-gpt4-vicuna-13b.bin"` | `"ggml-v3-13b-hermes-q5_1.bin"`) ##### mpt List of MPT Models Type: (`"ggml-mpt-7b-base.bin"` | `"ggml-mpt-7b-chat.bin"` | `"ggml-mpt-7b-instruct.bin"`) ##### replit List of Replit Models Type: `"ggml-replit-code-v1-3b.bin"` #### type Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user. Type: [ModelType](#modeltype) #### LLModel LLModel class representing a language model. This is a base class that provides common functionality for different types of language models. ##### constructor Initialize a new LLModel. ###### Parameters * `path` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** Absolute path to the model file. * Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model file does not exist. ##### type either 'gpt', mpt', or 'llama' or undefined Returns **([ModelType](#modeltype) | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))** ##### name The name of the model. Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** ##### stateSize Get the size of the internal state of the model. NOTE: This state data is specific to the type of model you have created. Returns **[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)** the size in bytes of the internal state of the model ##### threadCount Get the number of threads used for model inference. The default is the number of physical cores your computer has. Returns **[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)** The number of threads used for model inference. ##### setThreadCount Set the number of threads used for model inference. ###### Parameters * `newNumber` **[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)** The new number of threads. Returns **void** ##### raw\_prompt Prompt the model with a given input and optional parameters. This is the raw output from model. Use the prompt function exported for a value ###### Parameters * `q` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The prompt input. * `params` **Partial<[LLModelPromptContext](#llmodelpromptcontext)>** Optional parameters for the prompt context. * `callback` **function (res: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)): void** Returns **void** The result of the model prompt. ##### embed Embed text with the model. Keep in mind that not all models can embed text, (only bert can embed as of 07/16/2023 (mm/dd/yyyy)) Use the prompt function exported for a value ###### Parameters * `text` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** * `q` The prompt input. * `params` Optional parameters for the prompt context. Returns **[Float32Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)** The result of the model prompt. ##### isModelLoaded Whether the model is loaded or not. Returns **[boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)** ##### setLibraryPath Where to search for the pluggable backend libraries ###### Parameters * `s` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** Returns **void** ##### getLibraryPath Where to get the pluggable backend libraries Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** #### loadModel Loads a machine learning model with the specified name. The defacto way to create a model. By default this will download a model from the official GPT4ALL website, if a model is not present at given path. ##### Parameters * `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The name of the model to load. * `options` **(LoadModelOptions | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))?** (Optional) Additional options for loading the model. Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<(InferenceModel | EmbeddingModel)>** A promise that resolves to an instance of the loaded LLModel. #### createCompletion The nodejs equivalent to python binding's chat\_completion ##### Parameters * `model` **InferenceModel** The language model object. * `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>** The array of messages for the conversation. * `options` **[CompletionOptions](#completionoptions)** The options for creating the completion. Returns **[CompletionReturn](#completionreturn)** The completion result. #### createEmbedding The nodejs moral equivalent to python binding's Embed4All().embed() meow ##### Parameters * `model` **EmbeddingModel** The language model object. * `text` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** text to embed Returns **[Float32Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)** The completion result. #### CompletionOptions **Extends Partial\** The options for creating the completion. ##### verbose Indicates if verbose logging is enabled. Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean) ##### systemPromptTemplate Template for the system message. Will be put before the conversation with %1 being replaced by all system messages. Note that if this is not defined, system messages will not be included in the prompt. Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) ##### promptTemplate Template for user messages, with %1 being replaced by the message. Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean) ##### promptHeader The initial instruction for the model, on top of the prompt Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) ##### promptFooter The last instruction for the model, appended to the end of the prompt. Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) #### PromptMessage A message in the conversation, identical to OpenAI's chat message. ##### role The role of the message. Type: (`"system"` | `"assistant"` | `"user"`) ##### content The message content. Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) #### prompt\_tokens The number of tokens used in the prompt. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) #### completion\_tokens The number of tokens used in the completion. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) #### total\_tokens The total number of tokens used. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) #### CompletionReturn The result of the completion, similar to OpenAI's format. ##### model The model used for the completion. Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) ##### usage Token usage report. Type: {prompt\_tokens: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number), completion\_tokens: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number), total\_tokens: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)} ##### choices The generated completions. Type: [Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[CompletionChoice](#completionchoice)> #### CompletionChoice A completion choice, similar to OpenAI's format. ##### message Response message Type: [PromptMessage](#promptmessage) #### LLModelPromptContext Model inference arguments for generating completions. ##### logitsSize The size of the raw logits vector. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### tokensSize The size of the raw tokens vector. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### nPast The number of tokens in the past conversation. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### nCtx The number of tokens possible in the context window. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### nPredict The number of tokens to predict. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### topK The top-k logits to sample from. Top-K sampling selects the next token only from the top K most likely tokens predicted by the model. It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit the diversity of the output. A higher value for top-K (eg., 100) will consider more tokens and lead to more diverse text, while a lower value (eg., 10) will focus on the most probable tokens and generate more conservative text. 30 - 60 is a good range for most tasks. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### topP The nucleus sampling probability threshold. Top-P limits the selection of the next token to a subset of tokens with a cumulative probability above a threshold P. This method, also known as nucleus sampling, finds a balance between diversity and quality by considering both token probabilities and the number of tokens available for sampling. When using a higher value for top-P (eg., 0.95), the generated text becomes more diverse. On the other hand, a lower value (eg., 0.1) produces more focused and conservative text. The default value is 0.4, which is aimed to be the middle ground between focus and diversity, but for more creative tasks a higher top-p value will be beneficial, about 0.5-0.9 is a good range for that. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### temp The temperature to adjust the model's output distribution. Temperature is like a knob that adjusts how creative or focused the output becomes. Higher temperatures (eg., 1.2) increase randomness, resulting in more imaginative and diverse text. Lower temperatures (eg., 0.5) make the output more focused, predictable, and conservative. When the temperature is set to 0, the output becomes completely deterministic, always selecting the most probable next token and producing identical results each time. A safe range would be around 0.6 - 0.85, but you are free to search what value fits best for you. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### nBatch The number of predictions to generate in parallel. By splitting the prompt every N tokens, prompt-batch-size reduces RAM usage during processing. However, this can increase the processing time as a trade-off. If the N value is set too low (e.g., 10), long prompts with 500+ tokens will be most affected, requiring numerous processing runs to complete the prompt processing. To ensure optimal performance, setting the prompt-batch-size to 2048 allows processing of all tokens in a single run. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### repeatPenalty The penalty factor for repeated tokens. Repeat-penalty can help penalize tokens based on how frequently they occur in the text, including the input prompt. A token that has already appeared five times is penalized more heavily than a token that has appeared only one time. A value of 1 means that there is no penalty and values larger than 1 discourage repeated tokens. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### repeatLastN The number of last tokens to penalize. The repeat-penalty-tokens N option controls the number of tokens in the history to consider for penalizing repetition. A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only consider recent tokens. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) ##### contextErase The percentage of context to erase if the context window is exceeded. Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) #### createTokenStream TODO: Help wanted to implement this ##### Parameters * `llmodel` **[LLModel](#llmodel)** * `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>** * `options` **[CompletionOptions](#completionoptions)** Returns **function (ll: [LLModel](#llmodel)): AsyncGenerator<[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)>** #### DEFAULT\_DIRECTORY From python api: models will be stored in (homedir)/.cache/gpt4all/\` Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) #### DEFAULT\_LIBRARIES\_DIRECTORY From python api: The default path for dynamic libraries to be stored. You may separate paths by a semicolon to search in multiple areas. This searches DEFAULT\_DIRECTORY/libraries, cwd/libraries, and finally cwd. Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) #### DEFAULT\_MODEL\_CONFIG Default model configuration. Type: ModelConfig #### DEFAULT\_PROMT\_CONTEXT Default prompt context. Type: [LLModelPromptContext](#llmodelpromptcontext) #### DEFAULT\_MODEL\_LIST\_URL Default model list url. Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) #### downloadModel Initiates the download of a model file. By default this downloads without waiting. use the controller returned to alter this behavior. ##### Parameters * `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The model to be downloaded. * `options` **DownloadOptions** to pass into the downloader. Default is { location: (cwd), verbose: false }. ##### Examples ```javascript const download = downloadModel('ggml-gpt4all-j-v1.3-groovy.bin') download.promise.then(() => console.log('Downloaded!')) ``` * Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model already exists in the specified location. * Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model cannot be found at the specified url. Returns **[DownloadController](#downloadcontroller)** object that allows controlling the download process. #### DownloadModelOptions Options for the model download process. ##### modelPath location to download the model. Default is process.cwd(), or the current working directory Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) ##### verbose Debug mode -- check how long it took to download in seconds Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean) ##### url Remote download url. Defaults to `https://gpt4all.io/models/` Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) ##### md5sum MD5 sum of the model file. If this is provided, the downloaded file will be checked against this sum. If the sums do not match, an error will be thrown and the file will be deleted. Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) #### DownloadController Model download controller. ##### cancel Cancel the request to download if this is called. Type: function (): void ##### promise A promise resolving to the downloaded models config once the download is done Type: [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\