mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
87e502c6bc
Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
667 lines
27 KiB
Plaintext
667 lines
27 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Use LangChain, GPT and Deep Lake to work with code base\n",
|
|
"In this tutorial, we are going to use Langchain + Deep Lake with GPT to analyze the code base of the LangChain itself. "
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Design"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"1. Prepare data:\n",
|
|
" 1. Upload all python project files using the `langchain.document_loaders.TextLoader`. We will call these files the **documents**.\n",
|
|
" 2. Split all documents to chunks using the `langchain.text_splitter.CharacterTextSplitter`.\n",
|
|
" 3. Embed chunks and upload them into the DeepLake using `langchain.embeddings.openai.OpenAIEmbeddings` and `langchain.vectorstores.DeepLake`\n",
|
|
"2. Question-Answering:\n",
|
|
" 1. Build a chain from `langchain.chat_models.ChatOpenAI` and `langchain.chains.ConversationalRetrievalChain`\n",
|
|
" 2. Prepare questions.\n",
|
|
" 3. Get answers running the chain.\n"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Implementation"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"source": [
|
|
"### Integration preparations"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We need to set up keys for external services and install necessary python libraries."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#!python3 -m pip install --upgrade langchain deeplake openai"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Set up OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. \n",
|
|
"\n",
|
|
"For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" ········\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import os\n",
|
|
"from getpass import getpass\n",
|
|
"\n",
|
|
"os.environ[\"OPENAI_API_KEY\"] = getpass()\n",
|
|
"# Please manually enter OpenAI Key"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at [app.activeloop.ai](https://app.activeloop.ai)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" ········\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"os.environ[\"ACTIVELOOP_TOKEN\"] = getpass.getpass(\"Activeloop Token:\")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prepare data "
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Load all repository files. Here we assume this notebook is downloaded as the part of the langchain fork and we work with the python files of the `langchain` repo.\n",
|
|
"\n",
|
|
"If you want to use files from different repo, change `root_dir` to the root dir of your repo."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"1147\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain.document_loaders import TextLoader\n",
|
|
"\n",
|
|
"root_dir = \"../../../..\"\n",
|
|
"\n",
|
|
"docs = []\n",
|
|
"for dirpath, dirnames, filenames in os.walk(root_dir):\n",
|
|
" for file in filenames:\n",
|
|
" if file.endswith(\".py\") and \"/.venv/\" not in dirpath:\n",
|
|
" try:\n",
|
|
" loader = TextLoader(os.path.join(dirpath, file), encoding=\"utf-8\")\n",
|
|
" docs.extend(loader.load_and_split())\n",
|
|
" except Exception as e:\n",
|
|
" pass\n",
|
|
"print(f\"{len(docs)}\")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Then, chunk the files"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Created a chunk of size 1620, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1213, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1263, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1448, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1120, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1148, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1826, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1260, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1195, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2147, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1410, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1269, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1030, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1046, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1024, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1026, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1285, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1370, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1031, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1999, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1029, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1120, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1033, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1143, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1416, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2482, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1890, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1418, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1848, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1069, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2369, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1045, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1501, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1208, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1950, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1283, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1414, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1304, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1224, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1060, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2461, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1099, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1178, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1449, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1345, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 3359, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2248, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1589, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2104, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1505, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1387, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1215, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1240, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1635, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1075, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2180, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1791, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1555, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1082, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1225, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1287, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1085, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1117, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1966, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1150, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1285, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1150, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1585, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1208, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1267, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1542, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1183, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2424, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1017, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1304, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1379, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1324, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1205, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1056, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1195, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 3608, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1058, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1075, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1217, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1109, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1440, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1046, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1220, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1403, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1241, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1427, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1049, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1580, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1565, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1131, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1425, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1054, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1027, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2559, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1028, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1382, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1888, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1475, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1652, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1891, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1899, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1021, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1085, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1854, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1672, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2537, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1251, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1734, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1642, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1376, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1253, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1642, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1419, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1438, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1427, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1684, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1760, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1157, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2504, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1082, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2268, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1784, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1311, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2972, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1144, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1825, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1508, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2901, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1715, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1062, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1206, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1102, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1184, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1002, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1065, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1871, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1754, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2413, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1771, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2054, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2000, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2061, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1066, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1419, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1368, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1008, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1227, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1745, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 2296, which is longer than the specified 1000\n",
|
|
"Created a chunk of size 1083, which is longer than the specified 1000\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"3477\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
|
"\n",
|
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
|
"texts = text_splitter.split_documents(docs)\n",
|
|
"print(f\"{len(texts)}\")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Then embed chunks and upload them to the DeepLake.\n",
|
|
"\n",
|
|
"This can take several minutes. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', document_model_name='text-embedding-ada-002', query_model_name='text-embedding-ada-002', embedding_ctx_length=8191, openai_api_key=None, openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6)"
|
|
]
|
|
},
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
|
"\n",
|
|
"embeddings = OpenAIEmbeddings()\n",
|
|
"embeddings"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.vectorstores import DeepLake\n",
|
|
"\n",
|
|
"db = DeepLake.from_documents(\n",
|
|
" texts, embeddings, dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\"\n",
|
|
")\n",
|
|
"db"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Question Answering\n",
|
|
"First load the dataset, construct the retriever, then construct the Conversational Chain"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"-"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/user_name/langchain-code\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"/"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"hub://user_name/langchain-code loaded successfully.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Deep Lake Dataset in hub://user_name/langchain-code already exists, loading from the storage\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Dataset(path='hub://user_name/langchain-code', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
|
|
"\n",
|
|
" tensor htype shape dtype compression\n",
|
|
" ------- ------- ------- ------- ------- \n",
|
|
" embedding generic (3477, 1536) float32 None \n",
|
|
" ids text (3477, 1) str None \n",
|
|
" metadata json (3477, 1) str None \n",
|
|
" text text (3477, 1) str None \n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"db = DeepLake(\n",
|
|
" dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\",\n",
|
|
" read_only=True,\n",
|
|
" embedding_function=embeddings,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = db.as_retriever()\n",
|
|
"retriever.search_kwargs[\"distance_metric\"] = \"cos\"\n",
|
|
"retriever.search_kwargs[\"fetch_k\"] = 20\n",
|
|
"retriever.search_kwargs[\"maximal_marginal_relevance\"] = True\n",
|
|
"retriever.search_kwargs[\"k\"] = 20"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def filter(x):\n",
|
|
" # filter based on source code\n",
|
|
" if \"something\" in x[\"text\"].data()[\"value\"]:\n",
|
|
" return False\n",
|
|
"\n",
|
|
" # filter based on path e.g. extension\n",
|
|
" metadata = x[\"metadata\"].data()[\"value\"]\n",
|
|
" return \"only_this\" in metadata[\"source\"] or \"also_that\" in metadata[\"source\"]\n",
|
|
"\n",
|
|
"\n",
|
|
"### turn on below for custom filtering\n",
|
|
"# retriever.search_kwargs['filter'] = filter"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.chat_models import ChatOpenAI\n",
|
|
"from langchain.chains import ConversationalRetrievalChain\n",
|
|
"\n",
|
|
"model = ChatOpenAI(model_name=\"gpt-3.5-turbo\") # 'ada' 'gpt-3.5-turbo' 'gpt-4',\n",
|
|
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"questions = [\n",
|
|
" \"What is the class hierarchy?\",\n",
|
|
" # \"What classes are derived from the Chain class?\",\n",
|
|
" # \"What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests?\",\n",
|
|
" # \"What one improvement do you propose in code in relation to the class herarchy for the Chain class?\",\n",
|
|
"]\n",
|
|
"chat_history = []\n",
|
|
"\n",
|
|
"for question in questions:\n",
|
|
" result = qa({\"question\": question, \"chat_history\": chat_history})\n",
|
|
" chat_history.append((question, result[\"answer\"]))\n",
|
|
" print(f\"-> **Question**: {question} \\n\")\n",
|
|
" print(f\"**Answer**: {result['answer']} \\n\")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"source": [
|
|
"-> **Question**: What is the class hierarchy? \n",
|
|
"\n",
|
|
"**Answer**: There are several class hierarchies in the provided code, so I'll list a few:\n",
|
|
"\n",
|
|
"1. `BaseModel` -> `ConstitutionalPrinciple`: `ConstitutionalPrinciple` is a subclass of `BaseModel`.\n",
|
|
"2. `BasePromptTemplate` -> `StringPromptTemplate`, `AIMessagePromptTemplate`, `BaseChatPromptTemplate`, `ChatMessagePromptTemplate`, `ChatPromptTemplate`, `HumanMessagePromptTemplate`, `MessagesPlaceholder`, `SystemMessagePromptTemplate`, `FewShotPromptTemplate`, `FewShotPromptWithTemplates`, `Prompt`, `PromptTemplate`: All of these classes are subclasses of `BasePromptTemplate`.\n",
|
|
"3. `APIChain`, `Chain`, `MapReduceDocumentsChain`, `MapRerankDocumentsChain`, `RefineDocumentsChain`, `StuffDocumentsChain`, `HypotheticalDocumentEmbedder`, `LLMChain`, `LLMBashChain`, `LLMCheckerChain`, `LLMMathChain`, `LLMRequestsChain`, `PALChain`, `QAWithSourcesChain`, `VectorDBQAWithSourcesChain`, `VectorDBQA`, `SQLDatabaseChain`: All of these classes are subclasses of `Chain`.\n",
|
|
"4. `BaseLoader`: `BaseLoader` is a subclass of `ABC`.\n",
|
|
"5. `BaseTracer` -> `ChainRun`, `LLMRun`, `SharedTracer`, `ToolRun`, `Tracer`, `TracerException`, `TracerSession`: All of these classes are subclasses of `BaseTracer`.\n",
|
|
"6. `OpenAIEmbeddings`, `HuggingFaceEmbeddings`, `CohereEmbeddings`, `JinaEmbeddings`, `LlamaCppEmbeddings`, `HuggingFaceHubEmbeddings`, `TensorflowHubEmbeddings`, `SagemakerEndpointEmbeddings`, `HuggingFaceInstructEmbeddings`, `SelfHostedEmbeddings`, `SelfHostedHuggingFaceEmbeddings`, `SelfHostedHuggingFaceInstructEmbeddings`, `FakeEmbeddings`, `AlephAlphaAsymmetricSemanticEmbedding`, `AlephAlphaSymmetricSemanticEmbedding`: All of these classes are subclasses of `BaseLLM`. \n",
|
|
"\n",
|
|
"\n",
|
|
"-> **Question**: What classes are derived from the Chain class? \n",
|
|
"\n",
|
|
"**Answer**: There are multiple classes that are derived from the Chain class. Some of them are:\n",
|
|
"- APIChain\n",
|
|
"- AnalyzeDocumentChain\n",
|
|
"- ChatVectorDBChain\n",
|
|
"- CombineDocumentsChain\n",
|
|
"- ConstitutionalChain\n",
|
|
"- ConversationChain\n",
|
|
"- GraphQAChain\n",
|
|
"- HypotheticalDocumentEmbedder\n",
|
|
"- LLMChain\n",
|
|
"- LLMCheckerChain\n",
|
|
"- LLMRequestsChain\n",
|
|
"- LLMSummarizationCheckerChain\n",
|
|
"- MapReduceChain\n",
|
|
"- OpenAPIEndpointChain\n",
|
|
"- PALChain\n",
|
|
"- QAWithSourcesChain\n",
|
|
"- RetrievalQA\n",
|
|
"- RetrievalQAWithSourcesChain\n",
|
|
"- SequentialChain\n",
|
|
"- SQLDatabaseChain\n",
|
|
"- TransformChain\n",
|
|
"- VectorDBQA\n",
|
|
"- VectorDBQAWithSourcesChain\n",
|
|
"\n",
|
|
"There might be more classes that are derived from the Chain class as it is possible to create custom classes that extend the Chain class.\n",
|
|
"\n",
|
|
"\n",
|
|
"-> **Question**: What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests? \n",
|
|
"\n",
|
|
"**Answer**: All classes and functions in the `./langchain/utilities/` folder seem to have unit tests written for them. \n"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|