Add GPT4All embeddings (#7743)

Support for [GPT4All
embeddings](https://docs.gpt4all.io/gpt4all_python_embedding.html)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
pull/7752/head
Lance Martin 1 year ago committed by GitHub
parent b6a7f40ad3
commit b015647e31
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,117 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d63d56c2",
"metadata": {},
"source": [
"# GPT4All\n",
"\n",
"This notebook explains how to use [GPT4All embeddings](https://docs.gpt4all.io/gpt4all_python_embedding.html#gpt4all.gpt4all.Embed4All) with LangChain."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cdd68231",
"metadata": {},
"outputs": [],
"source": [
"! pip install gpt4all"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "08f267d6",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import GPT4AllEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0120e939",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████| 45.5M/45.5M [00:02<00:00, 18.5MiB/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model downloaded at: /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"objc[45711]: Class GGMLMetalClass is implemented in both /Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libreplit-mainline-metal.dylib (0x29fe18208) and /Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllamamodel-mainline-metal.dylib (0x2a0244208). One of the two will be used. Which one is undefined.\n"
]
}
],
"source": [
"gpt4all_embd = GPT4AllEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "53134a38",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a55adf9f",
"metadata": {},
"outputs": [],
"source": [
"query_result = gpt4all_embd.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6ebd42d7",
"metadata": {},
"outputs": [],
"source": [
"doc_result = gpt4all_embd.embed_documents([text])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,405 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "3ea857b1",
"metadata": {},
"source": [
"# Running LLMs locally\n",
"\n",
"The popularity of [PrivateGPT](https://github.com/imartinez/privateGPT) and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally.\n",
"\n",
"LangChain has integrations with many open source LLMs that can be run locally.\n",
"\n",
"For example, here we show how to run GPT4All locally using both gpt4all embeddings and model."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11514b36",
"metadata": {},
"outputs": [],
"source": [
"! pip install gpt4all"
]
},
{
"cell_type": "markdown",
"id": "5e7543fa",
"metadata": {},
"source": [
"Load and split an example docucment."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f8cf5765",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import WebBaseLoader\n",
"\n",
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
"data = loader.load()\n",
"\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
"all_splits = text_splitter.split_documents(data)"
]
},
{
"cell_type": "markdown",
"id": "131d5059",
"metadata": {},
"source": [
"This will download the `GPT4All` embeddings locally if you don't already have them.\n",
"\n",
"For example, mine are here:\n",
" \n",
"```\n",
"Model downloaded at: /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "fdce8923",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found model file at /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin\n",
"llama_new_context_with_model: max tensor size = 87.89 MB\n"
]
}
],
"source": [
"from langchain.vectorstores import Chroma\n",
"from langchain.embeddings import GPT4AllEmbeddings\n",
"\n",
"vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())"
]
},
{
"cell_type": "markdown",
"id": "29137915",
"metadata": {},
"source": [
"Test similarity search is working with our local embeddings."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b0c55e98",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"question = \"What are the approaches to Task Decomposition?\"\n",
"docs = vectorstore.similarity_search(question)\n",
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "32b43339",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'})"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "0d9579a7",
"metadata": {},
"source": [
"[Download the GPT4All model binary]((https://python.langchain.com/docs/modules/model_io/models/llms/integrations/gpt4all)).\n",
"\n",
"Then, specify the path."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4a24eef1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found model file at /Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"objc[47842]: Class GGMLMetalClass is implemented in both /Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libreplit-mainline-metal.dylib (0x29f48c208) and /Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllamamodel-mainline-metal.dylib (0x29f970208). One of the two will be used. Which one is undefined.\n",
"llama.cpp: using Metal\n",
"llama.cpp: loading model from /Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\n",
"llama_model_load_internal: format = ggjt v3 (latest)\n",
"llama_model_load_internal: n_vocab = 32001\n",
"llama_model_load_internal: n_ctx = 2048\n",
"llama_model_load_internal: n_embd = 5120\n",
"llama_model_load_internal: n_mult = 256\n",
"llama_model_load_internal: n_head = 40\n",
"llama_model_load_internal: n_layer = 40\n",
"llama_model_load_internal: n_rot = 128\n",
"llama_model_load_internal: ftype = 2 (mostly Q4_0)\n",
"llama_model_load_internal: n_ff = 13824\n",
"llama_model_load_internal: n_parts = 1\n",
"llama_model_load_internal: model size = 13B\n",
"llama_model_load_internal: ggml ctx size = 0.09 MB\n",
"llama_model_load_internal: mem required = 9031.71 MB (+ 1608.00 MB per state)\n",
"llama_new_context_with_model: kv self size = 1600.00 MB\n",
"ggml_metal_init: allocating\n",
"ggml_metal_init: using MPS\n",
"ggml_metal_init: loading '/Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/ggml-metal.metal'\n",
"ggml_metal_init: loaded kernel_add 0x115fcbfb0\n",
"ggml_metal_init: loaded kernel_mul 0x115fcd4a0\n",
"ggml_metal_init: loaded kernel_mul_row 0x115fce850\n",
"ggml_metal_init: loaded kernel_scale 0x115fcd700\n",
"ggml_metal_init: loaded kernel_silu 0x115fcd960\n",
"ggml_metal_init: loaded kernel_relu 0x115fcfd50\n",
"ggml_metal_init: loaded kernel_gelu 0x115fd03c0\n",
"ggml_metal_init: loaded kernel_soft_max 0x115fcf640\n",
"ggml_metal_init: loaded kernel_diag_mask_inf 0x115fd07f0\n",
"ggml_metal_init: loaded kernel_get_rows_f16 0x1147b2450\n",
"ggml_metal_init: loaded kernel_get_rows_q4_0 0x11479d1d0\n",
"ggml_metal_init: loaded kernel_get_rows_q4_1 0x1147ad1f0\n",
"ggml_metal_init: loaded kernel_get_rows_q2_k 0x1147aef50\n",
"ggml_metal_init: loaded kernel_get_rows_q3_k 0x1147af1b0\n",
"ggml_metal_init: loaded kernel_get_rows_q4_k 0x1147af410\n",
"ggml_metal_init: loaded kernel_get_rows_q5_k 0x1147affa0\n",
"ggml_metal_init: loaded kernel_get_rows_q6_k 0x1147b0200\n",
"ggml_metal_init: loaded kernel_rms_norm 0x1147b0460\n",
"ggml_metal_init: loaded kernel_norm 0x1147bfc90\n",
"ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x1147c0230\n",
"ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x1147c0490\n",
"ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x1147c06f0\n",
"ggml_metal_init: loaded kernel_mul_mat_q2_k_f32 0x1147c0950\n",
"ggml_metal_init: loaded kernel_mul_mat_q3_k_f32 0x1147c0bb0\n",
"ggml_metal_init: loaded kernel_mul_mat_q4_k_f32 0x1147c0e10\n",
"ggml_metal_init: loaded kernel_mul_mat_q5_k_f32 0x1147c1070\n",
"ggml_metal_init: loaded kernel_mul_mat_q6_k_f32 0x1147c13d0\n",
"ggml_metal_init: loaded kernel_rope 0x1147c1a00\n",
"ggml_metal_init: loaded kernel_alibi_f32 0x1147c2120\n",
"ggml_metal_init: loaded kernel_cpy_f32_f16 0x115fd1690\n",
"ggml_metal_init: loaded kernel_cpy_f32_f32 0x115fd1c60\n",
"ggml_metal_init: loaded kernel_cpy_f16_f16 0x115fd2d40\n",
"ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB\n",
"ggml_metal_init: hasUnifiedMemory = true\n",
"ggml_metal_init: maxTransferRate = built-in GPU\n",
"ggml_metal_add_buffer: allocated 'data ' buffer, size = 6984.06 MB, ( 6984.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'eval ' buffer, size = 1024.00 MB, ( 8008.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1602.00 MB, ( 9610.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 512.00 MB, (10122.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 512.00 MB, (10634.45 / 21845.34)\n"
]
}
],
"source": [
"from langchain.llms import GPT4All\n",
"\n",
"llm = GPT4All(\n",
" model=\"/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\",\n",
" max_tokens=2048,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "d58838ae",
"metadata": {},
"source": [
"Run an `LLMChain` (see [here](https://python.langchain.com/docs/modules/chains/foundational/llm_chain)) by passing in the retrieved docs and a simple prompt.\n",
"\n",
"It formats the prompt template using the input key values provided and passes the formatted string to `GPT4All`.\n",
" \n",
"In this case, the list of retrieved documents above are pass into `{context}`."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "18a3716d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\nThe main themes in this context are task decomposition and building agents with large language models (LLM) as their core controller. The document summarizes how task decomposition can be done using LLM prompting or human inputs, and the challenges faced by LLMs in long-term planning and task decomposition. Finally, it discusses how expert models execute on specific tasks and log results during instruction execution.'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import PromptTemplate, LLMChain\n",
"\n",
"# Prompt\n",
"prompt_template = \"Summarize the main themes in this context: {context}?\"\n",
"# Chain\n",
"llm_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(prompt_template))\n",
"# Run\n",
"result = llm_chain(docs)\n",
"# Output\n",
"result[\"text\"]"
]
},
{
"cell_type": "markdown",
"id": "ed9cecf8",
"metadata": {},
"source": [
"We can use a `QA chain` to handle our question above.\n",
"\n",
"`chain_type=\"stuff\"` (see [here](https://python.langchain.com/docs/modules/chains/document/stuff)) means that all the docs will be added (stuffed) into a prompt."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c01c1725",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': ' There are three main approaches to task decomposition: (1) using language model prompts with simple instructions like \"Steps for XYZ.\\\\n1.\", (2) using task-specific instructions, such as \"Write a story outline.\" for writing a novel, or (3) combining human inputs. However, challenges remain in long-term planning and adjusting plans when faced with unexpected errors, making LLMs less robust compared to humans who learn from trial and error during execution.'}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains.question_answering import load_qa_chain\n",
"\n",
"# Prompt\n",
"template = \"\"\"Use the following pieces of context to answer the question at the end. \n",
"If you don't know the answer, just say that you don't know, don't try to make up an answer. \n",
"Use three sentences maximum and keep the answer as concise as possible. \n",
"Always say \"thanks for asking!\" at the end of the answer. \n",
"{context}\n",
"Question: {question}\n",
"Helpful Answer:\"\"\"\n",
"QA_CHAIN_PROMPT = PromptTemplate(\n",
" input_variables=[\"context\", \"question\"],\n",
" template=template,\n",
")\n",
"\n",
"# Chain\n",
"chain = load_qa_chain(llm, chain_type=\"stuff\", prompt=QA_CHAIN_PROMPT)\n",
"\n",
"# Run\n",
"chain({\"input_documents\": docs, \"question\": question}, return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "821729cb",
"metadata": {},
"source": [
"For an even simpler flow, use `RetrievalQA`.\n",
"\n",
"This will use a QA default prompt (shown [here](https://github.com/hwchase17/langchain/blob/275b926cf745b5668d3ea30236635e20e7866442/langchain/chains/retrieval_qa/prompt.py#L4)) and will retrieve from the vectorDB.\n",
"\n",
"But, you can still pass in a prompt, as before, if desired."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "86c7a349",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"\n",
"qa_chain = RetrievalQA.from_chain_type(\n",
" llm,\n",
" retriever=vectorstore.as_retriever(),\n",
" chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "112ca227",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'What are the approaches to Task Decomposition?',\n",
" 'result': ' There are three main approaches to task decomposition: (1) using language model prompts with simple instructions like \"Steps for XYZ.\\\\n1.\", (2) using task-specific instructions, such as \"Write a story outline.\" for writing a novel, or (3) combining human inputs. However, challenges remain in long-term planning and adjusting plans when faced with unexpected errors, making LLMs less robust compared to humans who learn from trial and error during execution.'}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qa_chain({\"query\": question})"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -15,6 +15,7 @@ from langchain.embeddings.elasticsearch import ElasticsearchEmbeddings
from langchain.embeddings.embaas import EmbaasEmbeddings
from langchain.embeddings.fake import FakeEmbeddings
from langchain.embeddings.google_palm import GooglePalmEmbeddings
from langchain.embeddings.gpt4all import GPT4AllEmbeddings
from langchain.embeddings.huggingface import (
HuggingFaceEmbeddings,
HuggingFaceInstructEmbeddings,
@ -70,6 +71,7 @@ __all__ = [
"EmbaasEmbeddings",
"OctoAIEmbeddings",
"SpacyEmbeddings",
"GPT4AllEmbeddings",
]

@ -0,0 +1,62 @@
"""Wrapper around GPT4All embedding models."""
from typing import Any, Dict, List
from pydantic import BaseModel, root_validator
from langchain.embeddings.base import Embeddings
class GPT4AllEmbeddings(BaseModel, Embeddings):
"""Wrapper around GPT4All embedding models.
To use, you should have the gpt4all python package installed
Example:
.. code-block:: python
from langchain.embeddings import GPT4AllEmbeddings
embeddings = GPT4AllEmbeddings()
"""
client: Any #: :meta private:
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that GPT4All library is installed."""
try:
from gpt4all import Embed4All
values["client"] = Embed4All()
except ImportError:
raise ModuleNotFoundError(
"Could not import gpt4all library. "
"Please install the gpt4all library to "
"use this embedding model: pip install gpt4all"
)
return values
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Embed a list of documents using GPT4All.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
embeddings = [self.client.embed(text) for text in texts]
return [list(map(float, e)) for e in embeddings]
def embed_query(self, text: str) -> List[float]:
"""Embed a query using GPT4All.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
return self.embed_documents([text])[0]
Loading…
Cancel
Save