feat(typescript)/dynamic template (#1287) (#1326)

* feat(typescript)/dynamic template (#1287) * remove packaged yarn * prompt templates update wip * prompt template update * system prompt template, update types, remove embed promises, cleanup * support both snakecased and camelcased prompt context * fix #1277 libbert, libfalcon and libreplit libs not being moved into the right folder after build * added support for modelConfigFile param, allowing the user to specify a local file instead of downloading the remote models.json. added a warning message if code fails to load a model config. included prompt context docs by amogus. * snakecase warning, put logic for loading local models.json into listModels, added constant for the default remote model list url, test improvements, simpler hasOwnProperty call * add DEFAULT_PROMPT_CONTEXT, export new constants * add md5sum testcase and fix constants export * update types * throw if attempting to list models without a source * rebuild docs * fix download logging undefined url, toFixed typo, pass config filesize in for future progress report * added overload with union types * bump to 2.2.0, remove alpha * code speling --------- Co-authored-by: Andreas Obersteiner <8959303+iimez@users.noreply.github.com>
1 year ago · 4e55940edf
parent 4d855afe97
commit 4e55940edf
15 changed files with 5860 additions and 6922 deletions
--- a/gpt4all-bindings/python/docs/gpt4all_typescript.md
+++ b/gpt4all-bindings/python/docs/gpt4all_typescript.md
@ -1,7 +1,7 @@
 # GPT4All Node.js API

 ```sh
-yarn install gpt4all@alpha
+yarn add gpt4all@alpha

 npm install gpt4all@alpha

@ -10,34 +10,41 @@ pnpm install gpt4all@alpha

 The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date.

-*   New bindings created by [jacoobes](https://github.com/jacoobes) and the [nomic ai community](https://home.nomic.ai) :D, for all to use.
-*   [Documentation](#Documentation)
+*   New bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use.
+*   The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
+*   Everything should work out the box.
+*   See [API Reference](#api-reference)

-### Code (alpha)
+### Chat Completion (alpha)

 ```js
 import { createCompletion, loadModel } from '../src/gpt4all.js'

-const ll = await loadModel('ggml-vicuna-7b-1.1-q4_2.bin', { verbose: true });
+const model = await loadModel('ggml-vicuna-7b-1.1-q4_2', { verbose: true });

-const response = await createCompletion(ll, [
+const response = await createCompletion(model, [
    { role : 'system', content: 'You are meant to be annoying and unhelpful.'  },
    { role : 'user', content: 'What is 1 + 1?'  } 
 ]);

 ```

-### API
+### Embedding (alpha)

-*   The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
-*   Everything should work out the box.
-*   [docs](./docs/api.md)
+```js
+import { createEmbedding, loadModel } from '../src/gpt4all.js'
+
+const model = await loadModel('ggml-all-MiniLM-L6-v2-f16', { verbose: true });
+
+const fltArray = createEmbedding(model, "Pain is inevitable, suffering optional");
+```

 ### Build Instructions

-*   As of 05/21/2023, Tested on windows (MSVC). (somehow got it to work on MSVC 🤯)
-    *   binding.gyp is compile config
+*   binding.gyp is compile config
 *   Tested on Ubuntu. Everything seems to work fine
+*   Tested on Windows. Everything works fine.
+*   Sparse testing on mac os.
 *   MingW works as well to build the gpt4all-backend. **HOWEVER**, this package works only with MSVC built dlls.

 ### Requirements
@ -48,11 +55,11 @@ const response = await createCompletion(ll, [
 *   [node-gyp](https://github.com/nodejs/node-gyp)
    *   all of its requirements.
 *   (unix) gcc version 12
-    *   These bindings use the C++ 20 standard.
 *   (win) msvc version 143
    *   Can be obtained with visual studio 2022 build tools
+*   python 3

-### Build
+### Build (from source)

 ```sh
 git clone https://github.com/nomic-ai/gpt4all.git
@ -117,22 +124,27 @@ yarn test

 *   Handling prompting and inference of models in a threadsafe, asynchronous way.

-#### docs/
+### Known Issues

-*   Autogenerated documentation using the script `yarn docs:build`
+*   why your model may be spewing bull 💩
+    *   The downloaded model is broken (just reinstall or download from official site)
+    *   That's it so far

 ### Roadmap

 This package is in active development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:

 *   \[x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs
-*   \[ ] createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)
-*   \[ ] proper unit testing (integrate with circle ci)
-*   \[ ] publish to npm under alpha tag `gpt4all@alpha`
-*   \[ ] have more people test on other platforms (mac tester needed)
+*   \[ ] ~~createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)~~ May not implement unless someone else can complete
+*   \[x] proper unit testing (integrate with circle ci)
+*   \[x] publish to npm under alpha tag `gpt4all@alpha`
+*   \[x] have more people test on other platforms (mac tester needed)
 *   \[x] switch to new pluggable backend
+*   \[ ] NPM bundle size reduction via optionalDependencies strategy (need help)
+    *   Should include prebuilds to avoid painful node-gyp errors
+*   \[ ] createChatSession ( the python equivalent to create\_chat\_session )

-### Documentation
+### API Reference

 <!-- Generated by documentation.js. Update this documentation by updating the source code. -->

@ -166,13 +178,14 @@ This package is in active development, and breaking changes may happen until the
    *   [Parameters](#parameters-5)
 *   [createCompletion](#createcompletion)
    *   [Parameters](#parameters-6)
-    *   [Examples](#examples)
 *   [createEmbedding](#createembedding)
    *   [Parameters](#parameters-7)
 *   [CompletionOptions](#completionoptions)
    *   [verbose](#verbose)
-    *   [hasDefaultHeader](#hasdefaultheader)
-    *   [hasDefaultFooter](#hasdefaultfooter)
+    *   [systemPromptTemplate](#systemprompttemplate)
+    *   [promptTemplate](#prompttemplate)
+    *   [promptHeader](#promptheader)
+    *   [promptFooter](#promptfooter)
 *   [PromptMessage](#promptmessage)
    *   [role](#role)
    *   [content](#content)
@ -186,28 +199,31 @@ This package is in active development, and breaking changes may happen until the
 *   [CompletionChoice](#completionchoice)
    *   [message](#message)
 *   [LLModelPromptContext](#llmodelpromptcontext)
-    *   [logits\_size](#logits_size)
-    *   [tokens\_size](#tokens_size)
-    *   [n\_past](#n_past)
-    *   [n\_ctx](#n_ctx)
-    *   [n\_predict](#n_predict)
-    *   [top\_k](#top_k)
-    *   [top\_p](#top_p)
+    *   [logitsSize](#logitssize)
+    *   [tokensSize](#tokenssize)
+    *   [nPast](#npast)
+    *   [nCtx](#nctx)
+    *   [nPredict](#npredict)
+    *   [topK](#topk)
+    *   [topP](#topp)
    *   [temp](#temp)
-    *   [n\_batch](#n_batch)
-    *   [repeat\_penalty](#repeat_penalty)
-    *   [repeat\_last\_n](#repeat_last_n)
-    *   [context\_erase](#context_erase)
+    *   [nBatch](#nbatch)
+    *   [repeatPenalty](#repeatpenalty)
+    *   [repeatLastN](#repeatlastn)
+    *   [contextErase](#contexterase)
 *   [createTokenStream](#createtokenstream)
    *   [Parameters](#parameters-8)
 *   [DEFAULT\_DIRECTORY](#default_directory)
 *   [DEFAULT\_LIBRARIES\_DIRECTORY](#default_libraries_directory)
+*   [DEFAULT\_MODEL\_CONFIG](#default_model_config)
+*   [DEFAULT\_PROMT\_CONTEXT](#default_promt_context)
+*   [DEFAULT\_MODEL\_LIST\_URL](#default_model_list_url)
 *   [downloadModel](#downloadmodel)
    *   [Parameters](#parameters-9)
-    *   [Examples](#examples-1)
+    *   [Examples](#examples)
 *   [DownloadModelOptions](#downloadmodeloptions)
    *   [modelPath](#modelpath)
-    *   [debug](#debug)
+    *   [verbose](#verbose-1)
    *   [url](#url)
    *   [md5sum](#md5sum)
 *   [DownloadController](#downloadcontroller)
@ -223,6 +239,7 @@ Type: (`"gptj"` | `"llama"` | `"mpt"` | `"replit"`)
 #### ModelFile

 Full list of models available
+@deprecated These model names are outdated and this type will not be maintained, please use a string literal instead

 ##### gptj

@ -367,7 +384,7 @@ By default this will download a model from the official GPT4ALL website, if a mo
 *   `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The name of the model to load.
 *   `options` **(LoadModelOptions | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))?** (Optional) Additional options for loading the model.

-Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<[LLModel](#llmodel)>** A promise that resolves to an instance of the loaded LLModel.
+Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<(InferenceModel | EmbeddingModel)>** A promise that resolves to an instance of the loaded LLModel.

 #### createCompletion

@ -375,25 +392,10 @@ The nodejs equivalent to python binding's chat\_completion

 ##### Parameters

-*   `llmodel` **[LLModel](#llmodel)** The language model object.
+*   `model` **InferenceModel** The language model object.
 *   `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>** The array of messages for the conversation.
 *   `options` **[CompletionOptions](#completionoptions)** The options for creating the completion.

-##### Examples
-
-```javascript
-const llmodel = new LLModel(model)
-const messages = [
-{ role: 'system', message: 'You are a weather forecaster.' },
-{ role: 'user', message: 'should i go out today?' } ]
-const completion = await createCompletion(llmodel, messages, {
- verbose: true,
- temp: 0.9,
-})
-console.log(completion.choices[0].message.content)
-// No, it's going to be cold and rainy.
-```
-
 Returns **[CompletionReturn](#completionreturn)** The completion result.

 #### createEmbedding
@ -403,7 +405,7 @@ meow

 ##### Parameters

-*   `llmodel` **[LLModel](#llmodel)** The language model object.
+*   `model` **EmbeddingModel** The language model object.
 *   `text` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** text to embed

 Returns **[Float32Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)** The completion result.
@ -420,18 +422,31 @@ Indicates if verbose logging is enabled.

 Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)

-##### hasDefaultHeader
+##### systemPromptTemplate

-Indicates if the default header is included in the prompt.
+Template for the system message. Will be put before the conversation with %1 being replaced by all system messages.
+Note that if this is not defined, system messages will not be included in the prompt.

-Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)

-##### hasDefaultFooter
+##### promptTemplate

-Indicates if the default footer is included in the prompt.
+Template for user messages, with %1 being replaced by the message.

 Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)

+##### promptHeader
+
+The initial instruction for the model, on top of the prompt
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+##### promptFooter
+
+The last instruction for the model, appended to the end of the prompt.
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
 #### PromptMessage

 A message in the conversation, identical to OpenAI's chat message.
@ -472,9 +487,9 @@ The result of the completion, similar to OpenAI's format.

 ##### model

-The model name.
+The model used for the completion.

-Type: [ModelFile](#modelfile)
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)

 ##### usage

@ -502,73 +517,100 @@ Type: [PromptMessage](#promptmessage)

 Model inference arguments for generating completions.

-##### logits\_size
+##### logitsSize

 The size of the raw logits vector.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### tokens\_size
+##### tokensSize

 The size of the raw tokens vector.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### n\_past
+##### nPast

 The number of tokens in the past conversation.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### n\_ctx
+##### nCtx

 The number of tokens possible in the context window.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### n\_predict
+##### nPredict

 The number of tokens to predict.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### top\_k
+##### topK

 The top-k logits to sample from.
+Top-K sampling selects the next token only from the top K most likely tokens predicted by the model.
+It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit
+the diversity of the output. A higher value for top-K (eg., 100) will consider more tokens and lead
+to more diverse text, while a lower value (eg., 10) will focus on the most probable tokens and generate
+more conservative text. 30 - 60 is a good range for most tasks.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### top\_p
+##### topP

 The nucleus sampling probability threshold.
+Top-P limits the selection of the next token to a subset of tokens with a cumulative probability
+above a threshold P. This method, also known as nucleus sampling, finds a balance between diversity
+and quality by considering both token probabilities and the number of tokens available for sampling.
+When using a higher value for top-P (eg., 0.95), the generated text becomes more diverse.
+On the other hand, a lower value (eg., 0.1) produces more focused and conservative text.
+The default value is 0.4, which is aimed to be the middle ground between focus and diversity, but
+for more creative tasks a higher top-p value will be beneficial, about 0.5-0.9 is a good range for that.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

 ##### temp

 The temperature to adjust the model's output distribution.
+Temperature is like a knob that adjusts how creative or focused the output becomes. Higher temperatures
+(eg., 1.2) increase randomness, resulting in more imaginative and diverse text. Lower temperatures (eg., 0.5)
+make the output more focused, predictable, and conservative. When the temperature is set to 0, the output
+becomes completely deterministic, always selecting the most probable next token and producing identical results
+each time. A safe range would be around 0.6 - 0.85, but you are free to search what value fits best for you.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### n\_batch
+##### nBatch

 The number of predictions to generate in parallel.
+By splitting the prompt every N tokens, prompt-batch-size reduces RAM usage during processing. However,
+this can increase the processing time as a trade-off. If the N value is set too low (e.g., 10), long prompts
+with 500+ tokens will be most affected, requiring numerous processing runs to complete the prompt processing.
+To ensure optimal performance, setting the prompt-batch-size to 2048 allows processing of all tokens in a single run.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### repeat\_penalty
+##### repeatPenalty

 The penalty factor for repeated tokens.
+Repeat-penalty can help penalize tokens based on how frequently they occur in the text, including the input prompt.
+A token that has already appeared five times is penalized more heavily than a token that has appeared only one time.
+A value of 1 means that there is no penalty and values larger than 1 discourage repeated tokens.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### repeat\_last\_n
+##### repeatLastN

 The number of last tokens to penalize.
+The repeat-penalty-tokens N option controls the number of tokens in the history to consider for penalizing repetition.
+A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only
+consider recent tokens.

 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)

-##### context\_erase
+##### contextErase

 The percentage of context to erase if the context window is exceeded.

@ -602,21 +644,39 @@ This searches DEFAULT\_DIRECTORY/libraries, cwd/libraries, and finally cwd.

 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)

+#### DEFAULT\_MODEL\_CONFIG
+
+Default model configuration.
+
+Type: ModelConfig
+
+#### DEFAULT\_PROMT\_CONTEXT
+
+Default prompt context.
+
+Type: [LLModelPromptContext](#llmodelpromptcontext)
+
+#### DEFAULT\_MODEL\_LIST\_URL
+
+Default model list url.
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
 #### downloadModel

-Initiates the download of a model file of a specific model type.
+Initiates the download of a model file.
 By default this downloads without waiting. use the controller returned to alter this behavior.

 ##### Parameters

-*   `modelName` **[ModelFile](#modelfile)** The model file to be downloaded.
-*   `options` **DownloadOptions** to pass into the downloader. Default is { location: (cwd), debug: false }.
+*   `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The model to be downloaded.
+*   `options` **DownloadOptions** to pass into the downloader. Default is { location: (cwd), verbose: false }.

 ##### Examples

 ```javascript
-const controller = download('ggml-gpt4all-j-v1.3-groovy.bin')
-controller.promise().then(() => console.log('Downloaded!'))
+const download = downloadModel('ggml-gpt4all-j-v1.3-groovy.bin')
+download.promise.then(() => console.log('Downloaded!'))
 ```

 *   Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model already exists in the specified location.
@ -635,7 +695,7 @@ Default is process.cwd(), or the current working directory

 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)

-##### debug
+##### verbose

 Debug mode -- check how long it took to download in seconds

@ -643,15 +703,16 @@ Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Glob

 ##### url

-Remote download url. Defaults to `https://gpt4all.io/models`
+Remote download url. Defaults to `https://gpt4all.io/models/<modelName>`

 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)

 ##### md5sum

-Whether to verify the hash of the download to ensure a proper download occurred.
+MD5 sum of the model file. If this is provided, the downloaded file will be checked against this sum.
+If the sums do not match, an error will be thrown and the file will be deleted.

-Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)

 #### DownloadController

@ -659,12 +720,12 @@ Model download controller.

 ##### cancel

-Cancel the request to download from gpt4all website if this is called.
+Cancel the request to download if this is called.

 Type: function (): void

 ##### promise

-Convert the downloader into a promise, allowing people to await and manage its lifetime
+A promise resolving to the downloaded models config once the download is done

-Type: function (): [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\<void>
+Type: [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\<ModelConfig>
--- a/gpt4all-bindings/typescript/.yarn/releases/yarn-3.6.1.cjs
+++ b/gpt4all-bindings/typescript/.yarn/releases/yarn-3.6.1.cjs
--- a/gpt4all-bindings/typescript/.yarnrc.yml
+++ b/gpt4all-bindings/typescript/.yarnrc.yml
@ -1 +0,0 @@
-yarnPath: .yarn/releases/yarn-3.6.1.cjs
--- a/gpt4all-bindings/typescript/README.md
+++ b/gpt4all-bindings/typescript/README.md
@ -11,36 +11,34 @@ pnpm install gpt4all@alpha
 The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date.

 *   New bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use.
-*   [Documentation](#Documentation)
+*   The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
+*   Everything should work out the box.
+*   See [API Reference](#api-reference)

-### Chat Completion (alpha)
+### Chat Completion

 ```js
 import { createCompletion, loadModel } from '../src/gpt4all.js'

-const ll = await loadModel('ggml-vicuna-7b-1.1-q4_2', { verbose: true });
+const model = await loadModel('ggml-vicuna-7b-1.1-q4_2', { verbose: true });

-const response = await createCompletion(ll, [
+const response = await createCompletion(model, [
    { role : 'system', content: 'You are meant to be annoying and unhelpful.'  },
    { role : 'user', content: 'What is 1 + 1?'  } 
 ]);

 ```
-### Embedding (alpha)
+
+### Embedding
+
 ```js
 import { createEmbedding, loadModel } from '../src/gpt4all.js'

-const ll = await loadModel('ggml-all-MiniLM-L6-v2-f16', { verbose: true });
+const model = await loadModel('ggml-all-MiniLM-L6-v2-f16', { verbose: true });

-const fltArray = createEmbedding(ll, "Pain is inevitable, suffering optional");
+const fltArray = createEmbedding(model, "Pain is inevitable, suffering optional");
 ```

-### API
-
-*   The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
-*   Everything should work out the box.
-*   [docs](./docs/api.md)
-
 ### Build Instructions

 *   binding.gyp is compile config
@ -60,6 +58,7 @@ const fltArray = createEmbedding(ll, "Pain is inevitable, suffering optional");
 *   (win) msvc version 143
    *   Can be obtained with visual studio 2022 build tools
 *   python 3
+
 ### Build (from source)

 ```sh
@ -125,15 +124,12 @@ yarn test

 *   Handling prompting and inference of models in a threadsafe, asynchronous way.

-#### docs/
-
-*   Autogenerated documentation using the script `yarn docs:build`
-
 ### Known Issues

-    * why your model may be spewing bull 💩 
-        - The downloaded model is broken (just reinstall or download from official site)
-        - That's it so far
+*   why your model may be spewing bull 💩
+    *   The downloaded model is broken (just reinstall or download from official site)
+    *   That's it so far
+
 ### Roadmap

 This package is in active development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:
@ -144,7 +140,592 @@ This package is in active development, and breaking changes may happen until the
 *   \[x] publish to npm under alpha tag `gpt4all@alpha`
 *   \[x] have more people test on other platforms (mac tester needed)
 *   \[x] switch to new pluggable backend
-*   \[ ] NPM bundle size reduction via optionalDependencies strategy (need help) 
-    - Should include prebuilds to avoid painful node-gyp errors
+*   \[ ] NPM bundle size reduction via optionalDependencies strategy (need help)
+    *   Should include prebuilds to avoid painful node-gyp errors
 *   \[ ] createChatSession ( the python equivalent to create\_chat\_session )
-### Documentation
+
+### API Reference
+
+<!-- Generated by documentation.js. Update this documentation by updating the source code. -->
+
+##### Table of Contents
+
+*   [ModelType](#modeltype)
+*   [ModelFile](#modelfile)
+    *   [gptj](#gptj)
+    *   [llama](#llama)
+    *   [mpt](#mpt)
+    *   [replit](#replit)
+*   [type](#type)
+*   [LLModel](#llmodel)
+    *   [constructor](#constructor)
+        *   [Parameters](#parameters)
+    *   [type](#type-1)
+    *   [name](#name)
+    *   [stateSize](#statesize)
+    *   [threadCount](#threadcount)
+    *   [setThreadCount](#setthreadcount)
+        *   [Parameters](#parameters-1)
+    *   [raw\_prompt](#raw_prompt)
+        *   [Parameters](#parameters-2)
+    *   [embed](#embed)
+        *   [Parameters](#parameters-3)
+    *   [isModelLoaded](#ismodelloaded)
+    *   [setLibraryPath](#setlibrarypath)
+        *   [Parameters](#parameters-4)
+    *   [getLibraryPath](#getlibrarypath)
+*   [loadModel](#loadmodel)
+    *   [Parameters](#parameters-5)
+*   [createCompletion](#createcompletion)
+    *   [Parameters](#parameters-6)
+*   [createEmbedding](#createembedding)
+    *   [Parameters](#parameters-7)
+*   [CompletionOptions](#completionoptions)
+    *   [verbose](#verbose)
+    *   [systemPromptTemplate](#systemprompttemplate)
+    *   [promptTemplate](#prompttemplate)
+    *   [promptHeader](#promptheader)
+    *   [promptFooter](#promptfooter)
+*   [PromptMessage](#promptmessage)
+    *   [role](#role)
+    *   [content](#content)
+*   [prompt\_tokens](#prompt_tokens)
+*   [completion\_tokens](#completion_tokens)
+*   [total\_tokens](#total_tokens)
+*   [CompletionReturn](#completionreturn)
+    *   [model](#model)
+    *   [usage](#usage)
+    *   [choices](#choices)
+*   [CompletionChoice](#completionchoice)
+    *   [message](#message)
+*   [LLModelPromptContext](#llmodelpromptcontext)
+    *   [logitsSize](#logitssize)
+    *   [tokensSize](#tokenssize)
+    *   [nPast](#npast)
+    *   [nCtx](#nctx)
+    *   [nPredict](#npredict)
+    *   [topK](#topk)
+    *   [topP](#topp)
+    *   [temp](#temp)
+    *   [nBatch](#nbatch)
+    *   [repeatPenalty](#repeatpenalty)
+    *   [repeatLastN](#repeatlastn)
+    *   [contextErase](#contexterase)
+*   [createTokenStream](#createtokenstream)
+    *   [Parameters](#parameters-8)
+*   [DEFAULT\_DIRECTORY](#default_directory)
+*   [DEFAULT\_LIBRARIES\_DIRECTORY](#default_libraries_directory)
+*   [DEFAULT\_MODEL\_CONFIG](#default_model_config)
+*   [DEFAULT\_PROMT\_CONTEXT](#default_promt_context)
+*   [DEFAULT\_MODEL\_LIST\_URL](#default_model_list_url)
+*   [downloadModel](#downloadmodel)
+    *   [Parameters](#parameters-9)
+    *   [Examples](#examples)
+*   [DownloadModelOptions](#downloadmodeloptions)
+    *   [modelPath](#modelpath)
+    *   [verbose](#verbose-1)
+    *   [url](#url)
+    *   [md5sum](#md5sum)
+*   [DownloadController](#downloadcontroller)
+    *   [cancel](#cancel)
+    *   [promise](#promise)
+
+#### ModelType
+
+Type of the model
+
+Type: (`"gptj"` | `"llama"` | `"mpt"` | `"replit"`)
+
+#### ModelFile
+
+Full list of models available
+@deprecated These model names are outdated and this type will not be maintained, please use a string literal instead
+
+##### gptj
+
+List of GPT-J Models
+
+Type: (`"ggml-gpt4all-j-v1.3-groovy.bin"` | `"ggml-gpt4all-j-v1.2-jazzy.bin"` | `"ggml-gpt4all-j-v1.1-breezy.bin"` | `"ggml-gpt4all-j.bin"`)
+
+##### llama
+
+List Llama Models
+
+Type: (`"ggml-gpt4all-l13b-snoozy.bin"` | `"ggml-vicuna-7b-1.1-q4_2.bin"` | `"ggml-vicuna-13b-1.1-q4_2.bin"` | `"ggml-wizardLM-7B.q4_2.bin"` | `"ggml-stable-vicuna-13B.q4_2.bin"` | `"ggml-nous-gpt4-vicuna-13b.bin"` | `"ggml-v3-13b-hermes-q5_1.bin"`)
+
+##### mpt
+
+List of MPT Models
+
+Type: (`"ggml-mpt-7b-base.bin"` | `"ggml-mpt-7b-chat.bin"` | `"ggml-mpt-7b-instruct.bin"`)
+
+##### replit
+
+List of Replit Models
+
+Type: `"ggml-replit-code-v1-3b.bin"`
+
+#### type
+
+Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user.
+
+Type: [ModelType](#modeltype)
+
+#### LLModel
+
+LLModel class representing a language model.
+This is a base class that provides common functionality for different types of language models.
+
+##### constructor
+
+Initialize a new LLModel.
+
+###### Parameters
+
+*   `path` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** Absolute path to the model file.
+
+<!---->
+
+*   Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model file does not exist.
+
+##### type
+
+either 'gpt', mpt', or 'llama' or undefined
+
+Returns **([ModelType](#modeltype) | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))**&#x20;
+
+##### name
+
+The name of the model.
+
+Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**&#x20;
+
+##### stateSize
+
+Get the size of the internal state of the model.
+NOTE: This state data is specific to the type of model you have created.
+
+Returns **[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)** the size in bytes of the internal state of the model
+
+##### threadCount
+
+Get the number of threads used for model inference.
+The default is the number of physical cores your computer has.
+
+Returns **[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)** The number of threads used for model inference.
+
+##### setThreadCount
+
+Set the number of threads used for model inference.
+
+###### Parameters
+
+*   `newNumber` **[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)** The new number of threads.
+
+Returns **void**&#x20;
+
+##### raw\_prompt
+
+Prompt the model with a given input and optional parameters.
+This is the raw output from model.
+Use the prompt function exported for a value
+
+###### Parameters
+
+*   `q` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The prompt input.
+*   `params` **Partial<[LLModelPromptContext](#llmodelpromptcontext)>** Optional parameters for the prompt context.
+*   `callback` **function (res: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)): void**&#x20;
+
+Returns **void** The result of the model prompt.
+
+##### embed
+
+Embed text with the model. Keep in mind that
+not all models can embed text, (only bert can embed as of 07/16/2023 (mm/dd/yyyy))
+Use the prompt function exported for a value
+
+###### Parameters
+
+*   `text` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**&#x20;
+*   `q`  The prompt input.
+*   `params`  Optional parameters for the prompt context.
+
+Returns **[Float32Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)** The result of the model prompt.
+
+##### isModelLoaded
+
+Whether the model is loaded or not.
+
+Returns **[boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)**&#x20;
+
+##### setLibraryPath
+
+Where to search for the pluggable backend libraries
+
+###### Parameters
+
+*   `s` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**&#x20;
+
+Returns **void**&#x20;
+
+##### getLibraryPath
+
+Where to get the pluggable backend libraries
+
+Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**&#x20;
+
+#### loadModel
+
+Loads a machine learning model with the specified name. The defacto way to create a model.
+By default this will download a model from the official GPT4ALL website, if a model is not present at given path.
+
+##### Parameters
+
+*   `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The name of the model to load.
+*   `options` **(LoadModelOptions | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))?** (Optional) Additional options for loading the model.
+
+Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<(InferenceModel | EmbeddingModel)>** A promise that resolves to an instance of the loaded LLModel.
+
+#### createCompletion
+
+The nodejs equivalent to python binding's chat\_completion
+
+##### Parameters
+
+*   `model` **InferenceModel** The language model object.
+*   `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>** The array of messages for the conversation.
+*   `options` **[CompletionOptions](#completionoptions)** The options for creating the completion.
+
+Returns **[CompletionReturn](#completionreturn)** The completion result.
+
+#### createEmbedding
+
+The nodejs moral equivalent to python binding's Embed4All().embed()
+meow
+
+##### Parameters
+
+*   `model` **EmbeddingModel** The language model object.
+*   `text` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** text to embed
+
+Returns **[Float32Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)** The completion result.
+
+#### CompletionOptions
+
+**Extends Partial\<LLModelPromptContext>**
+
+The options for creating the completion.
+
+##### verbose
+
+Indicates if verbose logging is enabled.
+
+Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
+
+##### systemPromptTemplate
+
+Template for the system message. Will be put before the conversation with %1 being replaced by all system messages.
+Note that if this is not defined, system messages will not be included in the prompt.
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+##### promptTemplate
+
+Template for user messages, with %1 being replaced by the message.
+
+Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
+
+##### promptHeader
+
+The initial instruction for the model, on top of the prompt
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+##### promptFooter
+
+The last instruction for the model, appended to the end of the prompt.
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+#### PromptMessage
+
+A message in the conversation, identical to OpenAI's chat message.
+
+##### role
+
+The role of the message.
+
+Type: (`"system"` | `"assistant"` | `"user"`)
+
+##### content
+
+The message content.
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+#### prompt\_tokens
+
+The number of tokens used in the prompt.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+#### completion\_tokens
+
+The number of tokens used in the completion.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+#### total\_tokens
+
+The total number of tokens used.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+#### CompletionReturn
+
+The result of the completion, similar to OpenAI's format.
+
+##### model
+
+The model used for the completion.
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+##### usage
+
+Token usage report.
+
+Type: {prompt\_tokens: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number), completion\_tokens: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number), total\_tokens: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)}
+
+##### choices
+
+The generated completions.
+
+Type: [Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[CompletionChoice](#completionchoice)>
+
+#### CompletionChoice
+
+A completion choice, similar to OpenAI's format.
+
+##### message
+
+Response message
+
+Type: [PromptMessage](#promptmessage)
+
+#### LLModelPromptContext
+
+Model inference arguments for generating completions.
+
+##### logitsSize
+
+The size of the raw logits vector.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### tokensSize
+
+The size of the raw tokens vector.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### nPast
+
+The number of tokens in the past conversation.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### nCtx
+
+The number of tokens possible in the context window.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### nPredict
+
+The number of tokens to predict.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### topK
+
+The top-k logits to sample from.
+Top-K sampling selects the next token only from the top K most likely tokens predicted by the model.
+It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit
+the diversity of the output. A higher value for top-K (eg., 100) will consider more tokens and lead
+to more diverse text, while a lower value (eg., 10) will focus on the most probable tokens and generate
+more conservative text. 30 - 60 is a good range for most tasks.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### topP
+
+The nucleus sampling probability threshold.
+Top-P limits the selection of the next token to a subset of tokens with a cumulative probability
+above a threshold P. This method, also known as nucleus sampling, finds a balance between diversity
+and quality by considering both token probabilities and the number of tokens available for sampling.
+When using a higher value for top-P (eg., 0.95), the generated text becomes more diverse.
+On the other hand, a lower value (eg., 0.1) produces more focused and conservative text.
+The default value is 0.4, which is aimed to be the middle ground between focus and diversity, but
+for more creative tasks a higher top-p value will be beneficial, about 0.5-0.9 is a good range for that.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### temp
+
+The temperature to adjust the model's output distribution.
+Temperature is like a knob that adjusts how creative or focused the output becomes. Higher temperatures
+(eg., 1.2) increase randomness, resulting in more imaginative and diverse text. Lower temperatures (eg., 0.5)
+make the output more focused, predictable, and conservative. When the temperature is set to 0, the output
+becomes completely deterministic, always selecting the most probable next token and producing identical results
+each time. A safe range would be around 0.6 - 0.85, but you are free to search what value fits best for you.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### nBatch
+
+The number of predictions to generate in parallel.
+By splitting the prompt every N tokens, prompt-batch-size reduces RAM usage during processing. However,
+this can increase the processing time as a trade-off. If the N value is set too low (e.g., 10), long prompts
+with 500+ tokens will be most affected, requiring numerous processing runs to complete the prompt processing.
+To ensure optimal performance, setting the prompt-batch-size to 2048 allows processing of all tokens in a single run.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### repeatPenalty
+
+The penalty factor for repeated tokens.
+Repeat-penalty can help penalize tokens based on how frequently they occur in the text, including the input prompt.
+A token that has already appeared five times is penalized more heavily than a token that has appeared only one time.
+A value of 1 means that there is no penalty and values larger than 1 discourage repeated tokens.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### repeatLastN
+
+The number of last tokens to penalize.
+The repeat-penalty-tokens N option controls the number of tokens in the history to consider for penalizing repetition.
+A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only
+consider recent tokens.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+##### contextErase
+
+The percentage of context to erase if the context window is exceeded.
+
+Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
+
+#### createTokenStream
+
+TODO: Help wanted to implement this
+
+##### Parameters
+
+*   `llmodel` **[LLModel](#llmodel)**&#x20;
+*   `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>**&#x20;
+*   `options` **[CompletionOptions](#completionoptions)**&#x20;
+
+Returns **function (ll: [LLModel](#llmodel)): AsyncGenerator<[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)>**&#x20;
+
+#### DEFAULT\_DIRECTORY
+
+From python api:
+models will be stored in (homedir)/.cache/gpt4all/\`
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+#### DEFAULT\_LIBRARIES\_DIRECTORY
+
+From python api:
+The default path for dynamic libraries to be stored.
+You may separate paths by a semicolon to search in multiple areas.
+This searches DEFAULT\_DIRECTORY/libraries, cwd/libraries, and finally cwd.
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+#### DEFAULT\_MODEL\_CONFIG
+
+Default model configuration.
+
+Type: ModelConfig
+
+#### DEFAULT\_PROMT\_CONTEXT
+
+Default prompt context.
+
+Type: [LLModelPromptContext](#llmodelpromptcontext)
+
+#### DEFAULT\_MODEL\_LIST\_URL
+
+Default model list url.
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+#### downloadModel
+
+Initiates the download of a model file.
+By default this downloads without waiting. use the controller returned to alter this behavior.
+
+##### Parameters
+
+*   `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The model to be downloaded.
+*   `options` **DownloadOptions** to pass into the downloader. Default is { location: (cwd), verbose: false }.
+
+##### Examples
+
+```javascript
+const download = downloadModel('ggml-gpt4all-j-v1.3-groovy.bin')
+download.promise.then(() => console.log('Downloaded!'))
+```
+
+*   Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model already exists in the specified location.
+*   Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model cannot be found at the specified url.
+
+Returns **[DownloadController](#downloadcontroller)** object that allows controlling the download process.
+
+#### DownloadModelOptions
+
+Options for the model download process.
+
+##### modelPath
+
+location to download the model.
+Default is process.cwd(), or the current working directory
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+##### verbose
+
+Debug mode -- check how long it took to download in seconds
+
+Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
+
+##### url
+
+Remote download url. Defaults to `https://gpt4all.io/models/<modelName>`
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+##### md5sum
+
+MD5 sum of the model file. If this is provided, the downloaded file will be checked against this sum.
+If the sums do not match, an error will be thrown and the file will be deleted.
+
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
+
+#### DownloadController
+
+Model download controller.
+
+##### cancel
+
+Cancel the request to download if this is called.
+
+Type: function (): void
+
+##### promise
+
+A promise resolving to the downloaded models config once the download is done
+
+Type: [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\<ModelConfig>
--- a/gpt4all-bindings/typescript/package.json
+++ b/gpt4all-bindings/typescript/package.json
@ -1,6 +1,6 @@
 {
  "name": "gpt4all",
-  "version": "2.1.1-alpha",
+  "version": "2.2.0",
  "packageManager": "yarn@3.6.1",
  "main": "src/gpt4all.js",
  "repository": "nomic-ai/gpt4all",
@ -10,8 +10,8 @@
    "build:backend": "node scripts/build.js",
    "build": "node-gyp-build",
    "predocs:build": "node scripts/docs.js",
-    "docs:build": "documentation readme ./src/gpt4all.d.ts --parse-extension js d.ts --format md --section documentation --readme-file ../python/docs/gpt4all_typescript.md",
-    "postdocs:build": "node scripts/docs.js"
+    "docs:build": "documentation readme ./src/gpt4all.d.ts --parse-extension js d.ts --format md --section \"API Reference\" --readme-file ../python/docs/gpt4all_typescript.md",
+    "postdocs:build": "documentation readme ./src/gpt4all.d.ts --parse-extension js d.ts --format md --section \"API Reference\" --readme-file README.md"
  },
  "files": [
    "src/**/*",
--- a/gpt4all-bindings/typescript/scripts/build_unix.sh
+++ b/gpt4all-bindings/typescript/scripts/build_unix.sh
@ -24,7 +24,9 @@ mkdir -p "$NATIVE_DIR" "$BUILD_DIR"

 cmake -S ../../gpt4all-backend -B "$BUILD_DIR" &&
 cmake --build "$BUILD_DIR" -j --config Release && {
-  cp "$BUILD_DIR"/libllmodel.$LIB_EXT "$NATIVE_DIR"/
+  cp "$BUILD_DIR"/libbert*.$LIB_EXT   "$NATIVE_DIR"/
+  cp "$BUILD_DIR"/libfalcon*.$LIB_EXT "$NATIVE_DIR"/
+  cp "$BUILD_DIR"/libreplit*.$LIB_EXT "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libgptj*.$LIB_EXT   "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libllama*.$LIB_EXT  "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libmpt*.$LIB_EXT    "$NATIVE_DIR"/
--- a/gpt4all-bindings/typescript/spec/chat.mjs
+++ b/gpt4all-bindings/typescript/spec/chat.mjs
@ -1,9 +1,10 @@
 import { LLModel, createCompletion, DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY, loadModel } from '../src/gpt4all.js'

-const ll = await loadModel(
+const model = await loadModel(
    'orca-mini-3b.ggmlv3.q4_0.bin',
    { verbose: true }
 );
+const ll = model.llm;

 try {
   class Extended extends LLModel {
@ -26,13 +27,13 @@ console.log("type: " + ll.type());
 console.log("Default directory for models", DEFAULT_DIRECTORY);
 console.log("Default directory for libraries", DEFAULT_LIBRARIES_DIRECTORY);

-const completion1 = await createCompletion(ll, [ 
+const completion1 = await createCompletion(model, [ 
    { role : 'system', content: 'You are an advanced mathematician.'  },
    { role : 'user', content: 'What is 1 + 1?'  }, 
 ], { verbose: true })
 console.log(completion1.choices[0].message)

-const completion2 = await createCompletion(ll, [
+const completion2 = await createCompletion(model, [
    { role : 'system', content: 'You are an advanced mathematician.'  },
    { role : 'user', content: 'What is two plus two?'  }, 
 ], {  verbose: true })
--- a/gpt4all-bindings/typescript/src/config.js
+++ b/gpt4all-bindings/typescript/src/config.js
@ -16,7 +16,26 @@ const librarySearchPaths = [

 const DEFAULT_LIBRARIES_DIRECTORY = librarySearchPaths.join(";");

+const DEFAULT_MODEL_CONFIG = {
+    systemPrompt: "",
+    promptTemplate: "### Human: \n%1\n### Assistant:\n",
+}
+
+const DEFAULT_MODEL_LIST_URL = "https://gpt4all.io/models/models.json";
+
+const DEFAULT_PROMPT_CONTEXT = {
+    temp: 0.7,
+    topK: 40,
+    topP: 0.4,
+    repeatPenalty: 1.18,
+    repeatLastN: 64,
+    nBatch: 8,
+}
+
 module.exports = {
    DEFAULT_DIRECTORY,
    DEFAULT_LIBRARIES_DIRECTORY,
+    DEFAULT_MODEL_CONFIG,
+    DEFAULT_MODEL_LIST_URL,
+    DEFAULT_PROMPT_CONTEXT,
 };
--- a/gpt4all-bindings/typescript/src/gpt4all.d.ts
+++ b/gpt4all-bindings/typescript/src/gpt4all.d.ts
@ -1,12 +1,13 @@
 /// <reference types="node" />
 declare module "gpt4all";

-
 /** Type of the model */
 type ModelType = "gptj" | "llama" | "mpt" | "replit";

+// NOTE: "deprecated" tag in below comment breaks the doc generator https://github.com/documentationjs/documentation/issues/1596
 /**
 * Full list of models available
+ * @deprecated These model names are outdated and this type will not be maintained, please use a string literal instead
 */
 interface ModelFile {
    /** List of GPT-J Models */
@ -39,10 +40,37 @@ interface LLModelOptions {
     * Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user.
     */
    type?: ModelType;
-    model_name: ModelFile[ModelType];
+    model_name: string;
    model_path: string;
    library_path?: string;
 }
+
+interface ModelConfig {
+    systemPrompt: string;
+    promptTemplate: string;
+    path: string;
+    url?: string;
+}
+
+declare class InferenceModel {
+    constructor(llm: LLModel, config: ModelConfig);
+    llm: LLModel;
+    config: ModelConfig;
+
+    generate(
+        prompt: string,
+        options?: Partial<LLModelPromptContext>
+    ): Promise<string>;
+}
+
+declare class EmbeddingModel {
+    constructor(llm: LLModel, config: ModelConfig);
+    llm: LLModel;
+    config: ModelConfig;
+
+    embed(text: string): Float32Array;
+}
+
 /**
 * LLModel class representing a language model.
 * This is a base class that provides common functionality for different types of language models.
@ -90,17 +118,21 @@ declare class LLModel {
     * @param params Optional parameters for the prompt context.
     * @returns The result of the model prompt.
     */
-    raw_prompt(q: string, params: Partial<LLModelPromptContext>, callback: (res: string) => void): void; // TODO work on return type
+    raw_prompt(
+        q: string,
+        params: Partial<LLModelPromptContext>,
+        callback: (res: string) => void
+    ): void; // TODO work on return type

    /**
-     * Embed text with the model. Keep in mind that 
+     * Embed text with the model. Keep in mind that
     * not all models can embed text, (only bert can embed as of 07/16/2023 (mm/dd/yyyy))
     * Use the prompt function exported for a value
     * @param q The prompt input.
     * @param params Optional parameters for the prompt context.
     * @returns The result of the model prompt.
     */
-    embed(text: string) : Float32Array
+    embed(text: string): Float32Array;
    /**
     * Whether the model is loaded or not.
     */
@ -119,60 +151,66 @@ declare class LLModel {
 interface LoadModelOptions {
    modelPath?: string;
    librariesPath?: string;
+    modelConfigFile?: string;
    allowDownload?: boolean;
    verbose?: boolean;
 }

+interface InferenceModelOptions extends LoadModelOptions {
+    type?: "inference";
+}
+
+interface EmbeddingModelOptions extends LoadModelOptions {
+    type: "embedding";
+}
+
 /**
 * Loads a machine learning model with the specified name. The defacto way to create a model.
 * By default this will download a model from the official GPT4ALL website, if a model is not present at given path.
 *
 * @param {string} modelName - The name of the model to load.
 * @param {LoadModelOptions|undefined} [options] - (Optional) Additional options for loading the model.
- * @returns {Promise<LLModel>} A promise that resolves to an instance of the loaded LLModel.
+ * @returns {Promise<InferenceModel | EmbeddingModel>} A promise that resolves to an instance of the loaded LLModel.
 */
 declare function loadModel(
    modelName: string,
-    options?: LoadModelOptions
-): Promise<LLModel>;
+    options?: InferenceModelOptions
+): Promise<InferenceModel>;

+declare function loadModel(
+    modelName: string,
+    options?: EmbeddingModelOptions
+): Promise<EmbeddingModel>;
+
+declare function loadModel(
+    modelName: string,
+    options?: EmbeddingOptions | InferenceOptions
+): Promise<InferenceModel | EmbeddingModel>;

 /**
 * The nodejs equivalent to python binding's chat_completion
- * @param {LLModel} llmodel - The language model object.
+ * @param {InferenceModel} model - The language model object.
 * @param {PromptMessage[]} messages - The array of messages for the conversation.
 * @param {CompletionOptions} options - The options for creating the completion.
 * @returns {CompletionReturn} The completion result.
- * @example
- * const llmodel = new LLModel(model)
- * const messages = [
- * { role: 'system', message: 'You are a weather forecaster.' },
- * { role: 'user', message: 'should i go out today?' } ]
- * const completion = await createCompletion(llmodel, messages, {
- *  verbose: true,
- *  temp: 0.9,
- * })
- * console.log(completion.choices[0].message.content)
- * // No, it's going to be cold and rainy.
 */
 declare function createCompletion(
-    llmodel: LLModel,
+    model: InferenceModel,
    messages: PromptMessage[],
    options?: CompletionOptions
 ): Promise<CompletionReturn>;

-
 /**
 * The nodejs moral equivalent to python binding's Embed4All().embed()
 * meow
- * @param {LLModel} llmodel - The language model object.
+ * @param {EmbeddingModel} model - The language model object.
 * @param {string} text - text to embed
 * @returns {Float32Array} The completion result.
 */
 declare function createEmbedding(
-    llmodel: LLModel,
-    text: string,
-): Float32Array
+    model: EmbeddingModel,
+    text: string
+): Float32Array;

 /**
 * The options for creating the completion.
@ -185,16 +223,25 @@ interface CompletionOptions extends Partial<LLModelPromptContext> {
    verbose?: boolean;

    /**
-     * Indicates if the default header is included in the prompt.
-     * @default true
+     * Template for the system message. Will be put before the conversation with %1 being replaced by all system messages.
+     * Note that if this is not defined, system messages will not be included in the prompt.
     */
-    hasDefaultHeader?: boolean;
+    systemPromptTemplate?: string;

    /**
-     * Indicates if the default footer is included in the prompt.
-     * @default true
+     * Template for user messages, with %1 being replaced by the message.
     */
-    hasDefaultFooter?: boolean;
+    promptTemplate?: boolean;
+
+    /**
+     * The initial instruction for the model, on top of the prompt
+     */
+    promptHeader?: string;
+
+    /**
+     * The last instruction for the model, appended to the end of the prompt.
+     */
+    promptFooter?: string;
 }

 /**
@ -212,10 +259,8 @@ interface PromptMessage {
 * The result of the completion, similar to OpenAI's format.
 */
 interface CompletionReturn {
-    /** The model name.
-     * @type {ModelFile}
-     */
-    model: ModelFile[ModelType];
+    /** The model used for the completion. */
+    model: string;

    /** Token usage report. */
    usage: {
@ -246,58 +291,85 @@ interface CompletionChoice {
 */
 interface LLModelPromptContext {
    /** The size of the raw logits vector. */
-    logits_size: number;
+    logitsSize: number;

    /** The size of the raw tokens vector. */
-    tokens_size: number;
+    tokensSize: number;

    /** The number of tokens in the past conversation. */
-    n_past: number;
+    nPast: number;

    /** The number of tokens possible in the context window.
     * @default 1024
     */
-    n_ctx: number;
+    nCtx: number;

    /** The number of tokens to predict.
     * @default 128
     * */
-    n_predict: number;
+    nPredict: number;

    /** The top-k logits to sample from.
+     * Top-K sampling selects the next token only from the top K most likely tokens predicted by the model.
+     * It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit
+     * the diversity of the output. A higher value for top-K (eg., 100) will consider more tokens and lead
+     * to more diverse text, while a lower value (eg., 10) will focus on the most probable tokens and generate
+     * more conservative text. 30 - 60 is a good range for most tasks.
     * @default 40
     * */
-    top_k: number;
+    topK: number;

    /** The nucleus sampling probability threshold.
-     * @default 0.9
+     * Top-P limits the selection of the next token to a subset of tokens with a cumulative probability 
+     * above a threshold P. This method, also known as nucleus sampling, finds a balance between diversity
+     * and quality by considering both token probabilities and the number of tokens available for sampling.
+     * When using a higher value for top-P (eg., 0.95), the generated text becomes more diverse.
+     * On the other hand, a lower value (eg., 0.1) produces more focused and conservative text.
+     * The default value is 0.4, which is aimed to be the middle ground between focus and diversity, but
+     * for more creative tasks a higher top-p value will be beneficial, about 0.5-0.9 is a good range for that.
+     * @default 0.4
     * */
-    top_p: number;
+    topP: number;

    /** The temperature to adjust the model's output distribution.
-     * @default 0.72
+     * Temperature is like a knob that adjusts how creative or focused the output becomes. Higher temperatures
+     * (eg., 1.2) increase randomness, resulting in more imaginative and diverse text. Lower temperatures (eg., 0.5)
+     * make the output more focused, predictable, and conservative. When the temperature is set to 0, the output
+     * becomes completely deterministic, always selecting the most probable next token and producing identical results
+     * each time. A safe range would be around 0.6 - 0.85, but you are free to search what value fits best for you.
+     * @default 0.7
     * */
    temp: number;

    /** The number of predictions to generate in parallel.
+     * By splitting the prompt every N tokens, prompt-batch-size reduces RAM usage during processing. However,
+     * this can increase the processing time as a trade-off. If the N value is set too low (e.g., 10), long prompts
+     * with 500+ tokens will be most affected, requiring numerous processing runs to complete the prompt processing.
+     * To ensure optimal performance, setting the prompt-batch-size to 2048 allows processing of all tokens in a single run.
     * @default 8
     * */
-    n_batch: number;
+    nBatch: number;

    /** The penalty factor for repeated tokens.
-     * @default 1
+     * Repeat-penalty can help penalize tokens based on how frequently they occur in the text, including the input prompt.
+     * A token that has already appeared five times is penalized more heavily than a token that has appeared only one time.
+     * A value of 1 means that there is no penalty and values larger than 1 discourage repeated tokens.
+     * @default 1.18
     * */
-    repeat_penalty: number;
+    repeatPenalty: number;

    /** The number of last tokens to penalize.
-     * @default 10
+     * The repeat-penalty-tokens N option controls the number of tokens in the history to consider for penalizing repetition.
+     * A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only
+     * consider recent tokens.
+     * @default 64
     * */
-    repeat_last_n: number;
+    repeatLastN: number;

    /** The percentage of context to erase if the context window is exceeded.
     * @default 0.5
     * */
-    context_erase: number;
+    contextErase: number;
 }

 /**
@ -320,24 +392,35 @@ declare const DEFAULT_DIRECTORY: string;
 * This searches DEFAULT_DIRECTORY/libraries, cwd/libraries, and finally cwd.
 */
 declare const DEFAULT_LIBRARIES_DIRECTORY: string;
-interface PromptMessage {
-    role: "system" | "assistant" | "user";
-    content: string;
-}

 /**
- * Initiates the download of a model file of a specific model type.
+ * Default model configuration.
+ */
+declare const DEFAULT_MODEL_CONFIG: ModelConfig;
+
+/**
+ * Default prompt context.
+ */
+declare const DEFAULT_PROMT_CONTEXT: LLModelPromptContext;
+
+/**
+ * Default model list url.
+ */
+declare const DEFAULT_MODEL_LIST_URL: string;
+
+/**
+ * Initiates the download of a model file.
 * By default this downloads without waiting. use the controller returned to alter this behavior.
- * @param {ModelFile} modelName - The model file to be downloaded.
- * @param {DownloadOptions} options - to pass into the downloader. Default is { location: (cwd), debug: false }.
+ * @param {string} modelName - The model to be downloaded.
+ * @param {DownloadOptions} options - to pass into the downloader. Default is { location: (cwd), verbose: false }.
 * @returns {DownloadController} object that allows controlling the download process.
 *
 * @throws {Error} If the model already exists in the specified location.
 * @throws {Error} If the model cannot be found at the specified url.
 *
 * @example
- * const controller = download('ggml-gpt4all-j-v1.3-groovy.bin')
- * controller.promise().then(() => console.log('Downloaded!'))
+ * const download = downloadModel('ggml-gpt4all-j-v1.3-groovy.bin')
+ * download.promise.then(() => console.log('Downloaded!'))
 */
 declare function downloadModel(
    modelName: string,
@ -358,46 +441,55 @@ interface DownloadModelOptions {
     * Debug mode -- check how long it took to download in seconds
     * @default false
     */
-    debug?: boolean;
+    verbose?: boolean;

    /**
-     * Remote download url. Defaults to `https://gpt4all.io/models`
-     * @default https://gpt4all.io/models
+     * Remote download url. Defaults to `https://gpt4all.io/models/<modelName>`
+     * @default https://gpt4all.io/models/<modelName>
     */
    url?: string;
    /**
-     * Whether to verify the hash of the download to ensure a proper download occurred.
-     * @default true
+     * MD5 sum of the model file. If this is provided, the downloaded file will be checked against this sum.
+     * If the sums do not match, an error will be thrown and the file will be deleted.
     */
-    md5sum?: boolean;
+    md5sum?: string;
+}
+
+interface ListModelsOptions {
+    url?: string;
+    file?: string;
 }

-declare function listModels(): Promise<Record<string, string>[]>;
+declare function listModels(options?: ListModelsOptions): Promise<ModelConfig[]>;

 interface RetrieveModelOptions {
    allowDownload?: boolean;
    verbose?: boolean;
    modelPath?: string;
+    modelConfigFile?: string;
 }

 declare function retrieveModel(
-    model: string,
+    modelName: string,
    options?: RetrieveModelOptions
-): Promise<string>;
+): Promise<ModelConfig>;

 /**
 * Model download controller.
 */
 interface DownloadController {
-    /** Cancel the request to download from gpt4all website if this is called. */
+    /** Cancel the request to download if this is called. */
    cancel: () => void;
-    /** Convert the downloader into a promise, allowing people to await and manage its lifetime */
-    promise: () => Promise<void>;
+    /** A promise resolving to the downloaded models config once the download is done */
+    promise: Promise<ModelConfig>;
 }

 export {
    ModelType,
    ModelFile,
+    ModelConfig,
+    InferenceModel,
+    EmbeddingModel,
    LLModel,
    LLModelPromptContext,
    PromptMessage,
@ -409,10 +501,13 @@ export {
    createTokenStream,
    DEFAULT_DIRECTORY,
    DEFAULT_LIBRARIES_DIRECTORY,
+    DEFAULT_MODEL_CONFIG,
+    DEFAULT_PROMT_CONTEXT,
+    DEFAULT_MODEL_LIST_URL,
    downloadModel,
    retrieveModel,
    listModels,
    DownloadController,
    RetrieveModelOptions,
-    DownloadModelOptions
+    DownloadModelOptions,
 };
--- a/gpt4all-bindings/typescript/src/gpt4all.js
+++ b/gpt4all-bindings/typescript/src/gpt4all.js
@ -10,19 +10,36 @@ const {
    downloadModel,
    appendBinSuffixIfMissing,
 } = require("./util.js");
-const { DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } = require("./config.js");
+const {
+    DEFAULT_DIRECTORY,
+    DEFAULT_LIBRARIES_DIRECTORY,
+    DEFAULT_PROMPT_CONTEXT,
+    DEFAULT_MODEL_CONFIG,
+    DEFAULT_MODEL_LIST_URL,
+} = require("./config.js");
+const { InferenceModel, EmbeddingModel } = require("./models.js");

+/**
+ * Loads a machine learning model with the specified name. The defacto way to create a model.
+ * By default this will download a model from the official GPT4ALL website, if a model is not present at given path.
+ *
+ * @param {string} modelName - The name of the model to load.
+ * @param {LoadModelOptions|undefined} [options] - (Optional) Additional options for loading the model.
+ * @returns {Promise<InferenceModel | EmbeddingModel>} A promise that resolves to an instance of the loaded LLModel.
+ */
 async function loadModel(modelName, options = {}) {
    const loadOptions = {
        modelPath: DEFAULT_DIRECTORY,
        librariesPath: DEFAULT_LIBRARIES_DIRECTORY,
+        type: "inference",
        allowDownload: true,
        verbose: true,
        ...options,
    };

-    await retrieveModel(modelName, {
+    const modelConfig = await retrieveModel(modelName, {
        modelPath: loadOptions.modelPath,
+        modelConfigFile: loadOptions.modelConfigFile,
        allowDownload: loadOptions.allowDownload,
        verbose: loadOptions.verbose,
    });
@ -37,7 +54,7 @@ async function loadModel(modelName, options = {}) {
            break;
        }
    }
-    if(!libPath) {
+    if (!libPath) {
        throw Error("Could not find a valid path from " + libSearchPaths);
    }
    const llmOptions = {
@ -47,99 +64,183 @@ async function loadModel(modelName, options = {}) {
    };

    if (loadOptions.verbose) {
-        console.log("Creating LLModel with options:", llmOptions);
+        console.debug("Creating LLModel with options:", llmOptions);
    }
    const llmodel = new LLModel(llmOptions);

-    return llmodel;
+    if (loadOptions.type === "embedding") {
+        return new EmbeddingModel(llmodel, modelConfig);
+    } else if (loadOptions.type === "inference") {
+        return new InferenceModel(llmodel, modelConfig);
+    } else {
+        throw Error("Invalid model type: " + loadOptions.type);
+    }
 }

-function createPrompt(messages, hasDefaultHeader, hasDefaultFooter) {
-    let fullPrompt = [];
+/**
+ * Formats a list of messages into a single prompt string.
+ */
+function formatChatPrompt(
+    messages,
+    {
+        systemPromptTemplate,
+        defaultSystemPrompt,
+        promptTemplate,
+        promptFooter,
+        promptHeader,
+    }
+) {
+    const systemMessages = messages
+        .filter((message) => message.role === "system")
+        .map((message) => message.content);

-    for (const message of messages) {
-        if (message.role === "system") {
-            const systemMessage = message.content;
-            fullPrompt.push(systemMessage);
+    let fullPrompt = "";
+
+    if (promptHeader) {
+        fullPrompt += promptHeader + "\n\n";
+    }
+
+    if (systemPromptTemplate) {
+        // if user specified a template for the system prompt, put all system messages in the template
+        let systemPrompt = "";
+
+        if (systemMessages.length > 0) {
+            systemPrompt += systemMessages.join("\n");
+        }
+
+        if (systemPrompt) {
+            fullPrompt +=
+                systemPromptTemplate.replace("%1", systemPrompt) + "\n";
        }
+    } else if (defaultSystemPrompt) {
+        // otherwise, use the system prompt from the model config and ignore system messages
+        fullPrompt += defaultSystemPrompt + "\n\n";
    }
-    if (hasDefaultHeader) {
-        fullPrompt.push(`### Instruction: The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.`);
+
+    if (systemMessages.length > 0 && !systemPromptTemplate) {
+        console.warn(
+            "System messages were provided, but no systemPromptTemplate was specified. System messages will be ignored."
+        );
    }
-    let prompt = "### Prompt:";
+
    for (const message of messages) {
        if (message.role === "user") {
-            const user_message = message["content"];
-            prompt += user_message;
+            const userMessage = promptTemplate.replace(
+                "%1",
+                message["content"]
+            );
+            fullPrompt += userMessage;
        }
        if (message["role"] == "assistant") {
-            const assistant_message = "Response:" + message["content"];
-            prompt += assistant_message;
+            const assistantMessage = message["content"] + "\n";
+            fullPrompt += assistantMessage;
        }
    }
-    fullPrompt.push(prompt);
-    if (hasDefaultFooter) {
-        fullPrompt.push("### Response:");
+
+    if (promptFooter) {
+        fullPrompt += "\n\n" + promptFooter;
    }

-    return fullPrompt.join('\n');
+    return fullPrompt;
 }

-
-function createEmbedding(llmodel, text) {
-    return llmodel.embed(text)
+function createEmbedding(model, text) {
+    return model.embed(text);
 }
+
+const defaultCompletionOptions = {
+    verbose: false,
+    ...DEFAULT_PROMPT_CONTEXT,
+};
+
 async function createCompletion(
-    llmodel,
+    model,
    messages,
-    options = {
-        hasDefaultHeader: true,
-        hasDefaultFooter: false,
-        verbose: true,
-    }
+    options = defaultCompletionOptions
 ) {
-    //creating the keys to insert into promptMaker.
-    const fullPrompt = createPrompt(
-        messages,
-        options.hasDefaultHeader ?? true,
-        options.hasDefaultFooter ?? true
-    );
-    if (options.verbose) {
-        console.log("Sent: " + fullPrompt);
+    if (options.hasDefaultHeader !== undefined) {
+        console.warn(
+            "hasDefaultHeader (bool) is deprecated and has no effect, use promptHeader (string) instead"
+        );
    }
-    const promisifiedRawPrompt = llmodel.raw_prompt(fullPrompt, options, (s) => {});
-    return promisifiedRawPrompt.then((response) => {
-        return {
-            llmodel: llmodel.name(),
-            usage: {
-                prompt_tokens: fullPrompt.length,
-                completion_tokens: response.length, //TODO
-                total_tokens: fullPrompt.length + response.length, //TODO
-            },
-            choices: [
-                {
-                    message: {
-                        role: "assistant",
-                        content: response,
-                    },
-                },
-            ],
-        };
+
+    if (options.hasDefaultFooter !== undefined) {
+        console.warn(
+            "hasDefaultFooter (bool) is deprecated and has no effect, use promptFooter (string) instead"
+        );
+    }
+
+    const optionsWithDefaults = {
+        ...defaultCompletionOptions,
+        ...options,
+    };
+
+    const {
+        verbose,
+        systemPromptTemplate,
+        promptTemplate,
+        promptHeader,
+        promptFooter,
+        ...promptContext
+    } = optionsWithDefaults;
+
+    const prompt = formatChatPrompt(messages, {
+        systemPromptTemplate,
+        defaultSystemPrompt: model.config.systemPrompt,
+        promptTemplate: promptTemplate || model.config.promptTemplate || "%1",
+        promptHeader: promptHeader || "",
+        promptFooter: promptFooter || "",
+        // These were the default header/footer prompts used for non-chat single turn completions.
+        // both seem to be working well still with some models, so keeping them here for reference.
+        // promptHeader: '### Instruction: The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.',
+        // promptFooter: '### Response:',
    });
+
+    if (verbose) {
+        console.debug("Sending Prompt:\n" + prompt);
+    }
+
+    const response = await model.generate(prompt, promptContext);
+
+    if (verbose) {
+        console.debug("Received Response:\n" + response);
+    }
+
+    return {
+        llmodel: model.llm.name(),
+        usage: {
+            prompt_tokens: prompt.length,
+            completion_tokens: response.length, //TODO
+            total_tokens: prompt.length + response.length, //TODO
+        },
+        choices: [
+            {
+                message: {
+                    role: "assistant",
+                    content: response,
+                },
+            },
+        ],
+    };
 }

 function createTokenStream() {
-    throw Error("This API has not been completed yet!")
+    throw Error("This API has not been completed yet!");
 }

 module.exports = {
    DEFAULT_LIBRARIES_DIRECTORY,
    DEFAULT_DIRECTORY,
+    DEFAULT_PROMPT_CONTEXT,
+    DEFAULT_MODEL_CONFIG,
+    DEFAULT_MODEL_LIST_URL,
    LLModel,
+    InferenceModel,
+    EmbeddingModel,
    createCompletion,
    createEmbedding,
    downloadModel,
    retrieveModel,
    loadModel,
-    createTokenStream
+    createTokenStream,
 };
--- a/gpt4all-bindings/typescript/src/models.js
+++ b/gpt4all-bindings/typescript/src/models.js
@ -0,0 +1,38 @@
+const { normalizePromptContext, warnOnSnakeCaseKeys } = require('./util');
+
+class InferenceModel {
+    llm;
+    config;
+
+    constructor(llmodel, config) {
+        this.llm = llmodel;
+        this.config = config;
+    }
+
+    async generate(prompt, promptContext) {
+        warnOnSnakeCaseKeys(promptContext);
+        const normalizedPromptContext = normalizePromptContext(promptContext);
+        const result = this.llm.raw_prompt(prompt, normalizedPromptContext, () => {});
+        return result;
+    }
+}
+
+class EmbeddingModel {
+    llm;
+    config;
+
+    constructor(llmodel, config) {
+        this.llm = llmodel;
+        this.config = config;
+    }
+
+    embed(text) {
+        return this.llm.embed(text)
+    }
+}
+
+
+module.exports = {
+    InferenceModel,
+    EmbeddingModel,
+};
--- a/gpt4all-bindings/typescript/src/util.js
+++ b/gpt4all-bindings/typescript/src/util.js
@ -1,14 +1,45 @@
 const { createWriteStream, existsSync, statSync } = require("node:fs");
-const fsp = require('node:fs/promises')
+const fsp = require("node:fs/promises");
 const { performance } = require("node:perf_hooks");
 const path = require("node:path");
-const {mkdirp} = require("mkdirp");
-const { DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } = require("./config.js");
-const md5File = require('md5-file');
-async function listModels() {
-    const res = await fetch("https://gpt4all.io/models/models.json");
-    const modelList = await res.json();
-    return modelList;
+const { mkdirp } = require("mkdirp");
+const md5File = require("md5-file");
+const {
+    DEFAULT_DIRECTORY,
+    DEFAULT_MODEL_CONFIG,
+    DEFAULT_MODEL_LIST_URL,
+} = require("./config.js");
+
+async function listModels(
+    options = {
+        url: DEFAULT_MODEL_LIST_URL,
+    }
+) {
+    if (!options || (!options.url && !options.file)) {
+        throw new Error(
+            `No model list source specified. Please specify either a url or a file.`
+        );
+    }
+
+    if (options.file) {
+        if (!existsSync(options.file)) {
+            throw new Error(`Model list file ${options.file} does not exist.`);
+        }
+
+        const fileContents = await fsp.readFile(options.file, "utf-8");
+        const modelList = JSON.parse(fileContents);
+        return modelList;
+    } else if (options.url) {
+        const res = await fetch(options.url);
+
+        if (!res.ok) {
+            throw Error(
+                `Failed to retrieve model list from ${url} - ${res.status} ${res.statusText}`
+            );
+        }
+        const modelList = await res.json();
+        return modelList;
+    }
 }

 function appendBinSuffixIfMissing(name) {
@ -32,11 +63,46 @@ function readChunks(reader) {
    };
 }

+/**
+ * Prints a warning if any keys in the prompt context are snake_case.
+ */
+function warnOnSnakeCaseKeys(promptContext) {
+    const snakeCaseKeys = Object.keys(promptContext).filter((key) =>
+        key.includes("_")
+    );
+
+    if (snakeCaseKeys.length > 0) {
+        console.warn(
+            "Prompt context keys should be camelCase. Support for snake_case might be removed in the future. Found keys: " +
+                snakeCaseKeys.join(", ")
+        );
+    }
+}
+
+/**
+ * Converts all keys in the prompt context to snake_case
+ * For duplicate definitions, the value of the last occurrence will be used.
+ */
+function normalizePromptContext(promptContext) {
+    const normalizedPromptContext = {};
+
+    for (const key in promptContext) {
+        if (promptContext.hasOwnProperty(key)) {
+            const snakeKey = key.replace(
+                /[A-Z]/g,
+                (match) => `_${match.toLowerCase()}`
+            );
+            normalizedPromptContext[snakeKey] = promptContext[key];
+        }
+    }
+
+    return normalizedPromptContext;
+}
+
 function downloadModel(modelName, options = {}) {
    const downloadOptions = {
        modelPath: DEFAULT_DIRECTORY,
-        debug: false,
-        md5sum: true,
+        verbose: false,
        ...options,
    };

@ -46,11 +112,16 @@ function downloadModel(modelName, options = {}) {
        modelName + ".part"
    );
    const finalModelPath = path.join(downloadOptions.modelPath, modelFileName);
-    const modelUrl = downloadOptions.url ?? `https://gpt4all.io/models/${modelFileName}`;
+    const modelUrl =
+        downloadOptions.url ?? `https://gpt4all.io/models/${modelFileName}`;

    if (existsSync(finalModelPath)) {
        throw Error(`Model already exists at ${finalModelPath}`);
    }
+    
+    if (downloadOptions.verbose) {
+        console.log(`Downloading ${modelName} from ${modelUrl}`);
+    }

    const headers = {
        "Accept-Ranges": "arraybuffer",
@ -69,85 +140,81 @@ function downloadModel(modelName, options = {}) {
    const abortController = new AbortController();
    const signal = abortController.signal;

-    // wrapper function to get the readable stream from request
-    const fetchModel = (fetchOpts = {}) =>
-        fetch(modelUrl, {
-            signal,
-            ...fetchOpts,
-        }).then((res) => {
-            if (!res.ok) {
-                throw Error(
-                    `Failed to download model from ${modelUrl} - ${res.statusText}`
-                );
+    const finalizeDownload = async () => {
+        if (options.md5sum) {
+            const fileHash = await md5File(partialModelPath);
+            if (fileHash !== options.md5sum) {
+                await fsp.unlink(partialModelPath);
+                const message = `Model "${modelName}" failed verification: Hashes mismatch. Expected ${options.md5sum}, got ${fileHash}`;
+                throw Error(message);
            }
-            return res.body.getReader();
+            if (options.verbose) {
+                console.log(`MD5 hash verified: ${fileHash}`);
+            }
+        }
+
+        await fsp.rename(partialModelPath, finalModelPath);
+    };
+
+    // a promise that executes and writes to a stream. Resolves to the path the model was downloaded to when done writing.
+    const downloadPromise = new Promise((resolve, reject) => {
+        let timestampStart;
+
+        if (options.verbose) {
+            console.log(`Downloading @ ${partialModelPath} ...`);
+            timestampStart = performance.now();
+        }
+
+        const writeStream = createWriteStream(
+            partialModelPath,
+            writeStreamOpts
+        );
+
+        writeStream.on("error", (e) => {
+            writeStream.close();
+            reject(e);
        });

-    // a promise that executes and writes to a stream. Resolves when done writing.
-    const res = new Promise((resolve, reject) => {
-        fetchModel({ headers })
-            // Resolves an array of a reader and writestream.
-            .then((reader) => [
-                reader,
-                createWriteStream(partialModelPath, writeStreamOpts),
-            ])
-            .then(async ([readable, wstream]) => {
-                console.log("Downloading @ ", partialModelPath);
-                let perf;
-
-                if (options.debug) {
-                    perf = performance.now();
-                }
+        writeStream.on("finish", () => {
+            if (options.verbose) {
+                const elapsed = performance.now() - timestampStart;
+                console.log(`Finished. Download took ${elapsed.toFixed(2)} ms`);
+            }

-                wstream.on("finish", () => {
-                    if (options.debug) {
-                        console.log(
-                            "Time taken: ",
-                            (performance.now() - perf).toFixed(2),
-                            " ms"
-                        );
-                    }
-                    wstream.close();
-                });
-
-                wstream.on("error", (e) => {
-                    wstream.close();
-                    reject(e);
-                });
-
-                for await (const chunk of readChunks(readable)) {
-                    wstream.write(chunk);
-                }
+            finalizeDownload()
+                .then(() => {
+                    resolve(finalModelPath);
+                })
+                .catch(reject);
+        });

-                if (options.md5sum) {
-                    const fileHash = await md5File(partialModelPath);
-                    if (fileHash !== options.md5sum) {
-                        await fsp.unlink(partialModelPath);
-                        return reject(
-                            Error(`Model "${modelName}" failed verification: Hashes mismatch`)
-                        );
-                    }
-                    if (options.debug) {
-                        console.log("MD5 hash verified: ", fileHash);
-                    }
+        fetch(modelUrl, {
+            signal,
+            headers,
+        })
+            .then((res) => {
+                if (!res.ok) {
+                    const message = `Failed to download model from ${modelUrl} - ${res.status} ${res.statusText}`;
+                    reject(Error(message));
                }
-
-                await fsp.rename(partialModelPath, finalModelPath);
-                resolve(finalModelPath);
+                return res.body.getReader();
+            })
+            .then(async (reader) => {
+                for await (const chunk of readChunks(reader)) {
+                    writeStream.write(chunk);
+                }
+                writeStream.end();
            })
            .catch(reject);
    });

    return {
        cancel: () => abortController.abort(),
-        promise: () => res,
+        promise: downloadPromise,
    };
 }

-async function retrieveModel (
-    modelName,
-    options = {}
-) {
+async function retrieveModel(modelName, options = {}) {
    const retrieveOptions = {
        modelPath: DEFAULT_DIRECTORY,
        allowDownload: true,
@ -161,46 +228,68 @@ async function retrieveModel (
    const fullModelPath = path.join(retrieveOptions.modelPath, modelFileName);
    const modelExists = existsSync(fullModelPath);

-    if (modelExists) {
-        return fullModelPath;
-    }
+    let config = { ...DEFAULT_MODEL_CONFIG };

-    if (!retrieveOptions.allowDownload) {
-        throw Error(`Model does not exist at ${fullModelPath}`);
-    }
+    const availableModels = await listModels({
+        file: retrieveOptions.modelConfigFile,
+        url:
+            retrieveOptions.allowDownload &&
+            "https://gpt4all.io/models/models.json",
+    });

-    const availableModels = await listModels();
-    
-    const foundModel = availableModels.find((model) => model.filename === modelFileName);
+    const loadedModelConfig = availableModels.find(
+        (model) => model.filename === modelFileName
+    );

-    if (!foundModel) {
-        throw Error(`Model "${modelName}" is not available.`);
-    }
-    //todo  
-    if (retrieveOptions.verbose) {
-        console.log(`Downloading ${modelName}...`);
+    if (loadedModelConfig) {
+        config = {
+            ...config,
+            ...loadedModelConfig,
+        };
+    } else {
+        // if there's no local modelConfigFile specified, and allowDownload is false, the default model config will be used.
+        // warning the user here because the model may not work as expected.
+        console.warn(
+            `Failed to load model config for ${modelName}. Using defaults.`
+        );
    }

-    const downloadController = downloadModel(modelName, {
-        modelPath: retrieveOptions.modelPath,
-        debug: retrieveOptions.verbose,
-        url: foundModel.url
-    });
+    config.systemPrompt = config.systemPrompt.trim();
+
+    if (modelExists) {
+        config.path = fullModelPath;
+
+        if (retrieveOptions.verbose) {
+            console.log(`Found ${modelName} at ${fullModelPath}`);
+        }
+    } else if (retrieveOptions.allowDownload) {
+
+        const downloadController = downloadModel(modelName, {
+            modelPath: retrieveOptions.modelPath,
+            verbose: retrieveOptions.verbose,
+            filesize: config.filesize,
+            url: config.url,
+            md5sum: config.md5sum,
+        });

-    const downloadPath = await downloadController.promise();
+        const downloadPath = await downloadController.promise;
+        config.path = downloadPath;

-    if (retrieveOptions.verbose) {
-        console.log(`Model downloaded to ${downloadPath}`);
+        if (retrieveOptions.verbose) {
+            console.log(`Model downloaded to ${downloadPath}`);
+        }
+    } else {
+        throw Error("Failed to retrieve model.");
    }

-    return downloadPath
-
+    return config;
 }

-
 module.exports = {
    appendBinSuffixIfMissing,
    downloadModel,
    retrieveModel,
-    listModels
+    listModels,
+    normalizePromptContext,
+    warnOnSnakeCaseKeys,
 };
--- a/gpt4all-bindings/typescript/test/gpt4all.test.js
+++ b/gpt4all-bindings/typescript/test/gpt4all.test.js
@ -1,79 +1,228 @@
-const path = require('node:path');
-const os = require('node:os');
-const { LLModel } = require('node-gyp-build')(path.resolve(__dirname, '..'));
+const path = require("node:path");
+const os = require("node:os");
+const fsp = require("node:fs/promises");
+const { LLModel } = require("node-gyp-build")(path.resolve(__dirname, ".."));
 const {
-  listModels,
-  downloadModel,
-  appendBinSuffixIfMissing,
-} = require('../src/util.js');
+    listModels,
+    downloadModel,
+    appendBinSuffixIfMissing,
+    normalizePromptContext,
+} = require("../src/util.js");
 const {
-  DEFAULT_DIRECTORY,
-  DEFAULT_LIBRARIES_DIRECTORY,
-} = require('../src/config.js');
+    DEFAULT_DIRECTORY,
+    DEFAULT_LIBRARIES_DIRECTORY,
+    DEFAULT_MODEL_LIST_URL,
+} = require("../src/config.js");
 const {
-  loadModel,
-  createPrompt,
-  createCompletion,
-} = require('../src/gpt4all.js');
-
-
-global.fetch = jest.fn(() =>
-  Promise.resolve({
-    json: () => Promise.resolve([{}, {}, {}]),
-  })
-);
-
-jest.mock('../src/util.js', () => {
-    const actualModule = jest.requireActual('../src/util.js');
-    return {
-       ...actualModule,
-        downloadModel: jest.fn(() => 
-            ({ cancel: jest.fn(), promise: jest.fn() })
-        )
-
-    }
-})
-
-beforeEach(() => {
-  downloadModel.mockClear()
-});
+    loadModel,
+    createPrompt,
+    createCompletion,
+} = require("../src/gpt4all.js");
+const { mock } = require("node:test");

-afterEach( () => {
-  fetch.mockClear();
-  jest.clearAllMocks()
-})
-
-describe('utils', () => {
-    test("appendBinSuffixIfMissing", () => {
-        expect(appendBinSuffixIfMissing("filename")).toBe("filename.bin")
-        expect(appendBinSuffixIfMissing("filename.bin")).toBe("filename.bin")
-    })
-    test("default paths", () => {
-        expect(DEFAULT_DIRECTORY).toBe(path.resolve(os.homedir(), ".cache/gpt4all"))
+describe("config", () => {
+    test("default paths constants are available and correct", () => {
+        expect(DEFAULT_DIRECTORY).toBe(
+            path.resolve(os.homedir(), ".cache/gpt4all")
+        );
        const paths = [
            path.join(DEFAULT_DIRECTORY, "libraries"),
            path.resolve("./libraries"),
            path.resolve(
-            __dirname,
-            "..",
-            `runtimes/${process.platform}-${process.arch}/native`
+                __dirname,
+                "..",
+                `runtimes/${process.platform}-${process.arch}/native`
            ),
            process.cwd(),
        ];
-        expect(typeof DEFAULT_LIBRARIES_DIRECTORY).toBe('string')
-        expect(DEFAULT_LIBRARIES_DIRECTORY).toBe(paths.join(';'))
-    })
-
-    test("listModels", async () => {
-        try { 
-            await listModels();
-        } catch(e) {}
-      
-        expect(fetch).toHaveBeenCalledTimes(1)
-        expect(fetch).toHaveBeenCalledWith(
-          "https://gpt4all.io/models/models.json"
+        expect(typeof DEFAULT_LIBRARIES_DIRECTORY).toBe("string");
+        expect(DEFAULT_LIBRARIES_DIRECTORY).toBe(paths.join(";"));
+    });
+});
+
+describe("listModels", () => {
+    const fakeModels = require("./models.json");
+    const fakeModel = fakeModels[0];
+    const mockResponse = JSON.stringify([fakeModel]);
+
+    let mockFetch, originalFetch;
+
+    beforeAll(() => {
+        // Mock the fetch function for all tests
+        mockFetch = jest.fn().mockResolvedValue({
+            ok: true,
+            json: () => JSON.parse(mockResponse),
+        });
+        originalFetch = global.fetch;
+        global.fetch = mockFetch;
+    });
+
+    afterEach(() => {
+        // Reset the fetch counter after each test
+        mockFetch.mockClear();
+    });
+    afterAll(() => {
+        // Restore fetch
+        global.fetch = originalFetch;
+    });
+
+    it("should load the model list from remote when called without args", async () => {
+        const models = await listModels();
+        expect(fetch).toHaveBeenCalledTimes(1);
+        expect(fetch).toHaveBeenCalledWith(DEFAULT_MODEL_LIST_URL);
+        expect(models[0]).toEqual(fakeModel);
+    });
+
+    it("should load the model list from a local file, if specified", async () => {
+        const models = await listModels({
+            file: path.resolve(__dirname, "models.json"),
+        });
+        expect(fetch).toHaveBeenCalledTimes(0);
+        expect(models[0]).toEqual(fakeModel);
+    });
+    
+    it("should throw an error if neither url nor file is specified", async () => {
+        await expect(listModels(null)).rejects.toThrow(
+            "No model list source specified. Please specify either a url or a file."
+        );
+    });
+});
+
+describe("appendBinSuffixIfMissing", () => {
+    it("should make sure the suffix is there", () => {
+        expect(appendBinSuffixIfMissing("filename")).toBe("filename.bin");
+        expect(appendBinSuffixIfMissing("filename.bin")).toBe("filename.bin");
+    });
+});
+
+describe("downloadModel", () => {
+    let mockAbortController, mockFetch;
+    const fakeModelName = "fake-model";
+
+    const createMockFetch = () => {
+        const mockData = new Uint8Array([1, 2, 3, 4]);
+        const mockResponse = new ReadableStream({
+            start(controller) {
+                controller.enqueue(mockData);
+                controller.close();
+            },
+        });
+        const mockFetchImplementation = jest.fn(() =>
+            Promise.resolve({
+                ok: true,
+                body: mockResponse,
+            })
+        );
+        return mockFetchImplementation;
+    };
+
+    beforeEach(() => {
+        // Mocking the AbortController constructor
+        mockAbortController = jest.fn();
+        global.AbortController = mockAbortController;
+        mockAbortController.mockReturnValue({
+            signal: "signal",
+            abort: jest.fn(),
+        });
+        mockFetch = createMockFetch();
+        jest.spyOn(global, "fetch").mockImplementation(mockFetch);
+    });
+
+    afterEach(() => {
+        // Clean up mocks
+        mockAbortController.mockReset();
+        mockFetch.mockClear();
+        global.fetch.mockRestore();
+    });
+
+    test("should successfully download a model file", async () => {
+        const downloadController = downloadModel(fakeModelName);
+        const modelFilePath = await downloadController.promise;
+        expect(modelFilePath).toBe(`${DEFAULT_DIRECTORY}/${fakeModelName}.bin`);
+
+        expect(global.fetch).toHaveBeenCalledTimes(1);
+        expect(global.fetch).toHaveBeenCalledWith(
+            "https://gpt4all.io/models/fake-model.bin",
+            {
+                signal: "signal",
+                headers: {
+                    "Accept-Ranges": "arraybuffer",
+                    "Response-Type": "arraybuffer",
+                },
+            }
        );
-        
-    })

-})
+        // final model file should be present
+        expect(fsp.access(modelFilePath)).resolves.not.toThrow();
+
+        // remove the testing model file
+        await fsp.unlink(modelFilePath);
+    });
+
+    test("should error and cleanup if md5sum is not matching", async () => {
+        const downloadController = downloadModel(fakeModelName, {
+            md5sum: "wrong-md5sum",
+        });
+        // the promise should reject with a mismatch
+        await expect(downloadController.promise).rejects.toThrow(
+            `Model "${fakeModelName}" failed verification: Hashes mismatch.`
+        );
+        // fetch should have been called
+        expect(global.fetch).toHaveBeenCalledTimes(1);
+        // the file should be missing
+        expect(
+            fsp.access(`${DEFAULT_DIRECTORY}/${fakeModelName}.bin`)
+        ).rejects.toThrow();
+        // partial file should also be missing
+        expect(
+            fsp.access(`${DEFAULT_DIRECTORY}/${fakeModelName}.part`)
+        ).rejects.toThrow();
+    });
+
+    // TODO
+    // test("should be able to cancel and resume a download", async () => {
+    // });
+});
+
+describe("normalizePromptContext", () => {
+    it("should convert a dict with camelCased keys to snake_case", () => {
+        const camelCased = {
+            topK: 20,
+            repeatLastN: 10,
+        };
+
+        const expectedSnakeCased = {
+            top_k: 20,
+            repeat_last_n: 10,
+        };
+
+        const result = normalizePromptContext(camelCased);
+        expect(result).toEqual(expectedSnakeCased);
+    });
+
+    it("should convert a mixed case dict to snake_case, last value taking precedence", () => {
+        const mixedCased = {
+            topK: 20,
+            top_k: 10,
+            repeatLastN: 10,
+        };
+
+        const expectedSnakeCased = {
+            top_k: 10,
+            repeat_last_n: 10,
+        };
+
+        const result = normalizePromptContext(mixedCased);
+        expect(result).toEqual(expectedSnakeCased);
+    });
+
+    it("should not modify already snake cased dict", () => {
+        const snakeCased = {
+            top_k: 10,
+            repeast_last_n: 10,
+        };
+
+        const result = normalizePromptContext(snakeCased);
+        expect(result).toEqual(snakeCased);
+    });
+});
--- a/gpt4all-bindings/typescript/test/models.json
+++ b/gpt4all-bindings/typescript/test/models.json
@ -0,0 +1,10 @@
+[
+  {
+    "order": "a",
+    "md5sum": "08d6c05a21512a79a1dfeb9d2a8f262f",
+    "name": "Not a real model",
+    "filename": "fake-model.bin",
+    "filesize": "4",
+    "systemPrompt": " "
+  }
+]
--- a/gpt4all-bindings/typescript/yarn.lock
+++ b/gpt4all-bindings/typescript/yarn.lock