langchain/docs/extras/use_cases/question_answering/local_retrieval_qa.ipynb

412 lines
16 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "3ea857b1",
"metadata": {},
"source": [
"# Running LLMs locally\n",
"\n",
"The popularity of projects like [PrivateGPT](https://github.com/imartinez/privateGPT), [llama.cpp](https://github.com/ggerganov/llama.cpp), and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally.\n",
"\n",
"LangChain has [integrations](https://integrations.langchain.com/) with many open source LLMs that can be run locally.\n",
"\n",
"For example, here we show how to run `GPT4All` locally (e.g., on your laptop) using local embeddings and a local LLM."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11514b36",
"metadata": {},
"outputs": [],
"source": [
"! pip install gpt4all"
]
},
{
"cell_type": "markdown",
"id": "5e7543fa",
"metadata": {},
"source": [
"Load and split an example docucment."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f8cf5765",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import WebBaseLoader\n",
"\n",
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
"data = loader.load()\n",
"\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
"all_splits = text_splitter.split_documents(data)"
]
},
{
"cell_type": "markdown",
"id": "131d5059",
"metadata": {},
"source": [
"This will download the `GPT4All` embeddings locally if you don't already have them.\n",
"\n",
"For example, mine are here:\n",
" \n",
"```\n",
"Model downloaded at: /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "fdce8923",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found model file at /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin\n",
"llama_new_context_with_model: max tensor size = 87.89 MB\n"
]
}
],
"source": [
"from langchain.vectorstores import Chroma\n",
"from langchain.embeddings import GPT4AllEmbeddings\n",
"\n",
"vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())"
]
},
{
"cell_type": "markdown",
"id": "29137915",
"metadata": {},
"source": [
"Test similarity search is working with our local embeddings."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b0c55e98",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"question = \"What are the approaches to Task Decomposition?\"\n",
"docs = vectorstore.similarity_search(question)\n",
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "32b43339",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'})"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "0d9579a7",
"metadata": {},
"source": [
"[Download the GPT4All model binary](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/gpt4all).\n",
"\n",
"The Model Explorer on the [GPT4All](https://gpt4all.io/index.html) is a great way to choose and download a model.\n",
"\n",
"Then, specify the path that you downloaded to to.\n",
"\n",
"E.g., for me, the model lives here:\n",
"\n",
"`/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin`"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4a24eef1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found model file at /Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"objc[47842]: Class GGMLMetalClass is implemented in both /Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libreplit-mainline-metal.dylib (0x29f48c208) and /Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllamamodel-mainline-metal.dylib (0x29f970208). One of the two will be used. Which one is undefined.\n",
"llama.cpp: using Metal\n",
"llama.cpp: loading model from /Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\n",
"llama_model_load_internal: format = ggjt v3 (latest)\n",
"llama_model_load_internal: n_vocab = 32001\n",
"llama_model_load_internal: n_ctx = 2048\n",
"llama_model_load_internal: n_embd = 5120\n",
"llama_model_load_internal: n_mult = 256\n",
"llama_model_load_internal: n_head = 40\n",
"llama_model_load_internal: n_layer = 40\n",
"llama_model_load_internal: n_rot = 128\n",
"llama_model_load_internal: ftype = 2 (mostly Q4_0)\n",
"llama_model_load_internal: n_ff = 13824\n",
"llama_model_load_internal: n_parts = 1\n",
"llama_model_load_internal: model size = 13B\n",
"llama_model_load_internal: ggml ctx size = 0.09 MB\n",
"llama_model_load_internal: mem required = 9031.71 MB (+ 1608.00 MB per state)\n",
"llama_new_context_with_model: kv self size = 1600.00 MB\n",
"ggml_metal_init: allocating\n",
"ggml_metal_init: using MPS\n",
"ggml_metal_init: loading '/Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/ggml-metal.metal'\n",
"ggml_metal_init: loaded kernel_add 0x115fcbfb0\n",
"ggml_metal_init: loaded kernel_mul 0x115fcd4a0\n",
"ggml_metal_init: loaded kernel_mul_row 0x115fce850\n",
"ggml_metal_init: loaded kernel_scale 0x115fcd700\n",
"ggml_metal_init: loaded kernel_silu 0x115fcd960\n",
"ggml_metal_init: loaded kernel_relu 0x115fcfd50\n",
"ggml_metal_init: loaded kernel_gelu 0x115fd03c0\n",
"ggml_metal_init: loaded kernel_soft_max 0x115fcf640\n",
"ggml_metal_init: loaded kernel_diag_mask_inf 0x115fd07f0\n",
"ggml_metal_init: loaded kernel_get_rows_f16 0x1147b2450\n",
"ggml_metal_init: loaded kernel_get_rows_q4_0 0x11479d1d0\n",
"ggml_metal_init: loaded kernel_get_rows_q4_1 0x1147ad1f0\n",
"ggml_metal_init: loaded kernel_get_rows_q2_k 0x1147aef50\n",
"ggml_metal_init: loaded kernel_get_rows_q3_k 0x1147af1b0\n",
"ggml_metal_init: loaded kernel_get_rows_q4_k 0x1147af410\n",
"ggml_metal_init: loaded kernel_get_rows_q5_k 0x1147affa0\n",
"ggml_metal_init: loaded kernel_get_rows_q6_k 0x1147b0200\n",
"ggml_metal_init: loaded kernel_rms_norm 0x1147b0460\n",
"ggml_metal_init: loaded kernel_norm 0x1147bfc90\n",
"ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x1147c0230\n",
"ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x1147c0490\n",
"ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x1147c06f0\n",
"ggml_metal_init: loaded kernel_mul_mat_q2_k_f32 0x1147c0950\n",
"ggml_metal_init: loaded kernel_mul_mat_q3_k_f32 0x1147c0bb0\n",
"ggml_metal_init: loaded kernel_mul_mat_q4_k_f32 0x1147c0e10\n",
"ggml_metal_init: loaded kernel_mul_mat_q5_k_f32 0x1147c1070\n",
"ggml_metal_init: loaded kernel_mul_mat_q6_k_f32 0x1147c13d0\n",
"ggml_metal_init: loaded kernel_rope 0x1147c1a00\n",
"ggml_metal_init: loaded kernel_alibi_f32 0x1147c2120\n",
"ggml_metal_init: loaded kernel_cpy_f32_f16 0x115fd1690\n",
"ggml_metal_init: loaded kernel_cpy_f32_f32 0x115fd1c60\n",
"ggml_metal_init: loaded kernel_cpy_f16_f16 0x115fd2d40\n",
"ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB\n",
"ggml_metal_init: hasUnifiedMemory = true\n",
"ggml_metal_init: maxTransferRate = built-in GPU\n",
"ggml_metal_add_buffer: allocated 'data ' buffer, size = 6984.06 MB, ( 6984.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'eval ' buffer, size = 1024.00 MB, ( 8008.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1602.00 MB, ( 9610.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 512.00 MB, (10122.45 / 21845.34)\n",
"ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 512.00 MB, (10634.45 / 21845.34)\n"
]
}
],
"source": [
"from langchain.llms import GPT4All\n",
"\n",
"llm = GPT4All(\n",
" model=\"/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\",\n",
" max_tokens=2048,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "d58838ae",
"metadata": {},
"source": [
"Run an `LLMChain` (see [here](https://python.langchain.com/docs/modules/chains/foundational/llm_chain)) by passing in the retrieved docs and a simple prompt.\n",
"\n",
"It formats the prompt template using the input key values provided and passes the formatted string to `GPT4All`.\n",
" \n",
"In this case, the list of retrieved documents above are pass into `{context}`."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "18a3716d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\nThe main themes in this context are task decomposition and building agents with large language models (LLM) as their core controller. The document summarizes how task decomposition can be done using LLM prompting or human inputs, and the challenges faced by LLMs in long-term planning and task decomposition. Finally, it discusses how expert models execute on specific tasks and log results during instruction execution.'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import PromptTemplate, LLMChain\n",
"\n",
"# Prompt\n",
"prompt_template = \"Summarize the main themes in this context: {context}?\"\n",
"# Chain\n",
"llm_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(prompt_template))\n",
"# Run\n",
"result = llm_chain(docs)\n",
"# Output\n",
"result[\"text\"]"
]
},
{
"cell_type": "markdown",
"id": "ed9cecf8",
"metadata": {},
"source": [
"We can use a `QA chain` to handle our question above.\n",
"\n",
"`chain_type=\"stuff\"` (see [here](https://python.langchain.com/docs/modules/chains/document/stuff)) means that all the docs will be added (stuffed) into a prompt."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c01c1725",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': ' There are three main approaches to task decomposition: (1) using language model prompts with simple instructions like \"Steps for XYZ.\\\\n1.\", (2) using task-specific instructions, such as \"Write a story outline.\" for writing a novel, or (3) combining human inputs. However, challenges remain in long-term planning and adjusting plans when faced with unexpected errors, making LLMs less robust compared to humans who learn from trial and error during execution.'}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains.question_answering import load_qa_chain\n",
"\n",
"# Prompt\n",
"template = \"\"\"Use the following pieces of context to answer the question at the end. \n",
"If you don't know the answer, just say that you don't know, don't try to make up an answer. \n",
"Use three sentences maximum and keep the answer as concise as possible. \n",
"Always say \"thanks for asking!\" at the end of the answer. \n",
"{context}\n",
"Question: {question}\n",
"Helpful Answer:\"\"\"\n",
"QA_CHAIN_PROMPT = PromptTemplate(\n",
" input_variables=[\"context\", \"question\"],\n",
" template=template,\n",
")\n",
"\n",
"# Chain\n",
"chain = load_qa_chain(llm, chain_type=\"stuff\", prompt=QA_CHAIN_PROMPT)\n",
"\n",
"# Run\n",
"chain({\"input_documents\": docs, \"question\": question}, return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "821729cb",
"metadata": {},
"source": [
"For an even simpler flow, use `RetrievalQA`.\n",
"\n",
"This will use a QA default prompt (shown [here](https://github.com/hwchase17/langchain/blob/275b926cf745b5668d3ea30236635e20e7866442/langchain/chains/retrieval_qa/prompt.py#L4)) and will retrieve from the vectorDB.\n",
"\n",
"But, you can still pass in a prompt, as before, if desired."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "86c7a349",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"\n",
"qa_chain = RetrievalQA.from_chain_type(\n",
" llm,\n",
" retriever=vectorstore.as_retriever(),\n",
" chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "112ca227",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'What are the approaches to Task Decomposition?',\n",
" 'result': ' There are three main approaches to task decomposition: (1) using language model prompts with simple instructions like \"Steps for XYZ.\\\\n1.\", (2) using task-specific instructions, such as \"Write a story outline.\" for writing a novel, or (3) combining human inputs. However, challenges remain in long-term planning and adjusting plans when faced with unexpected errors, making LLMs less robust compared to humans who learn from trial and error during execution.'}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qa_chain({\"query\": question})"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}