mirror of
https://github.com/hwchase17/langchain
synced 2024-11-18 09:25:54 +00:00
update ragatouille integration (#15658)
This commit is contained in:
parent
98c6c9603e
commit
38ae4df3a1
@ -8,7 +8,17 @@
|
||||
"# RAGatouille\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This page covers how to use [RAGatouille](https://github.com/bclavie/RAGatouille) as a retriever in a LangChain chain. RAGatouille makes it as simple as can be to use ColBERT! [ColBERT](https://github.com/stanford-futuredata/ColBERT) is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds. Since the Ragatouille wrapper is so ergonomic we will only write a simple wrapper class so it conforms to the LangChain api."
|
||||
"This page covers how to use [RAGatouille](https://github.com/bclavie/RAGatouille) as a retriever in a LangChain chain. RAGatouille makes it as simple as can be to use ColBERT! [ColBERT](https://github.com/stanford-futuredata/ColBERT) is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.\n",
|
||||
"\n",
|
||||
"We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vectorstore as part of a larger chain.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"The integration lives in the `ragatouille` package.\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"pip install -U ragatouille\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -16,14 +26,14 @@
|
||||
"id": "0b16a1cf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load RAGatouille\n",
|
||||
"## Usage\n",
|
||||
"\n",
|
||||
"This example is taken from their documentation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 2,
|
||||
"id": "00de63d0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -35,7 +45,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 3,
|
||||
"id": "9653b742",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -75,7 +85,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 4,
|
||||
"id": "da2a13e2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -85,7 +95,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 5,
|
||||
"id": "6a582959",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -95,11 +105,11 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"[Jan 04, 13:19:13] #> Creating directory .ragatouille/colbert/indexes/Miyazaki \n",
|
||||
"[Jan 07, 10:38:18] #> Creating directory .ragatouille/colbert/indexes/Miyazaki-123 \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"#> Starting...\n",
|
||||
"[Jan 04, 13:19:17] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n"
|
||||
"[Jan 07, 10:38:23] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -117,51 +127,51 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Jan 04, 13:19:18] [0] \t\t #> Encoding 81 passages..\n"
|
||||
"[Jan 07, 10:38:24] [0] \t\t #> Encoding 81 passages..\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" 50%|█████ | 1/2 [00:03<00:03, 3.00s/it]/Users/harrisonchase/.pyenv/versions/3.10.1/envs/langchain/lib/python3.10/site-packages/torch/amp/autocast_mode.py:250: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling\n",
|
||||
" 50%|█████ | 1/2 [00:02<00:02, 2.85s/it]/Users/harrisonchase/.pyenv/versions/3.10.1/envs/langchain/lib/python3.10/site-packages/torch/amp/autocast_mode.py:250: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling\n",
|
||||
" warnings.warn(\n",
|
||||
"100%|██████████| 2/2 [00:03<00:00, 1.82s/it]\n",
|
||||
"WARNING clustering 10000 points to 1024 centroids: please provide at least 39936 training points\n"
|
||||
"100%|██████████| 2/2 [00:03<00:00, 1.74s/it]\n",
|
||||
"WARNING clustering 10001 points to 1024 centroids: please provide at least 39936 training points\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Jan 04, 13:19:21] [0] \t\t avg_doclen_est = 129.95062255859375 \t len(local_sample) = 81\n",
|
||||
"[Jan 04, 13:19:21] [0] \t\t Creating 1,024 partitions.\n",
|
||||
"[Jan 04, 13:19:21] [0] \t\t *Estimated* 10,526 embeddings.\n",
|
||||
"[Jan 04, 13:19:21] [0] \t\t #> Saving the indexing plan to .ragatouille/colbert/indexes/Miyazaki/plan.json ..\n",
|
||||
"Clustering 10000 points in 128D to 1024 clusters, redo 1 times, 20 iterations\n",
|
||||
"[Jan 07, 10:38:27] [0] \t\t avg_doclen_est = 129.9629669189453 \t len(local_sample) = 81\n",
|
||||
"[Jan 07, 10:38:27] [0] \t\t Creating 1,024 partitions.\n",
|
||||
"[Jan 07, 10:38:27] [0] \t\t *Estimated* 10,527 embeddings.\n",
|
||||
"[Jan 07, 10:38:27] [0] \t\t #> Saving the indexing plan to .ragatouille/colbert/indexes/Miyazaki-123/plan.json ..\n",
|
||||
"Clustering 10001 points in 128D to 1024 clusters, redo 1 times, 20 iterations\n",
|
||||
" Preprocessing in 0.00 s\n",
|
||||
" Iteration 0 (0.02 s, search 0.02 s): objective=3678.67 imbalance=1.583 nsplit=0 \r",
|
||||
" Iteration 1 (0.02 s, search 0.02 s): objective=2381.02 imbalance=1.462 nsplit=0 \r",
|
||||
" Iteration 2 (0.03 s, search 0.03 s): objective=2240.91 imbalance=1.453 nsplit=0 \r",
|
||||
" Iteration 3 (0.04 s, search 0.04 s): objective=2177.91 imbalance=1.459 nsplit=0 \r",
|
||||
" Iteration 4 (0.05 s, search 0.05 s): objective=2145.76 imbalance=1.465 nsplit=0 \r",
|
||||
" Iteration 5 (0.06 s, search 0.05 s): objective=2127.39 imbalance=1.465 nsplit=0 \r",
|
||||
" Iteration 6 (0.06 s, search 0.06 s): objective=2115.28 imbalance=1.467 nsplit=0 \r",
|
||||
" Iteration 7 (0.07 s, search 0.07 s): objective=2107.16 imbalance=1.469 nsplit=0 \r",
|
||||
" Iteration 8 (0.08 s, search 0.08 s): objective=2102.22 imbalance=1.468 nsplit=0 \r",
|
||||
" Iteration 9 (0.09 s, search 0.08 s): objective=2099.1 imbalance=1.467 nsplit=0 \r",
|
||||
" Iteration 10 (0.09 s, search 0.09 s): objective=2097.68 imbalance=1.466 nsplit=0 \r",
|
||||
" Iteration 11 (0.10 s, search 0.10 s): objective=2096.89 imbalance=1.465 nsplit=0 \r",
|
||||
" Iteration 12 (0.11 s, search 0.11 s): objective=2096.21 imbalance=1.464 nsplit=0 \r",
|
||||
" Iteration 13 (0.12 s, search 0.11 s): objective=2095.73 imbalance=1.464 nsplit=0 \r",
|
||||
" Iteration 14 (0.13 s, search 0.12 s): objective=2095.42 imbalance=1.465 nsplit=0 \r",
|
||||
" Iteration 15 (0.13 s, search 0.13 s): objective=2095.38 imbalance=1.465 nsplit=0 \r",
|
||||
" Iteration 16 (0.14 s, search 0.14 s): objective=2095.36 imbalance=1.465 nsplit=0 \r",
|
||||
" Iteration 17 (0.15 s, search 0.15 s): objective=2095.36 imbalance=1.465 nsplit=0 \r",
|
||||
" Iteration 18 (0.16 s, search 0.15 s): objective=2095.36 imbalance=1.465 nsplit=0 \r",
|
||||
" Iteration 19 (0.17 s, search 0.16 s): objective=2095.36 imbalance=1.465 nsplit=0 \r",
|
||||
"[0.037, 0.038, 0.039, 0.033, 0.033, 0.04, 0.033, 0.034, 0.032, 0.033, 0.033, 0.036, 0.033, 0.038, 0.037, 0.038, 0.035, 0.033, 0.034, 0.036, 0.037, 0.035, 0.032, 0.036, 0.037, 0.032, 0.037, 0.034, 0.036, 0.036, 0.034, 0.035, 0.039, 0.032, 0.035, 0.033, 0.035, 0.034, 0.034, 0.04, 0.034, 0.037, 0.033, 0.032, 0.035, 0.032, 0.036, 0.036, 0.036, 0.034, 0.034, 0.034, 0.033, 0.037, 0.035, 0.036, 0.039, 0.038, 0.043, 0.032, 0.033, 0.035, 0.035, 0.034, 0.038, 0.037, 0.034, 0.037, 0.033, 0.033, 0.034, 0.034, 0.034, 0.034, 0.036, 0.035, 0.033, 0.037, 0.036, 0.035, 0.035, 0.039, 0.033, 0.039, 0.033, 0.035, 0.037, 0.036, 0.033, 0.042, 0.036, 0.039, 0.037, 0.038, 0.036, 0.035, 0.04, 0.033, 0.036, 0.036, 0.037, 0.04, 0.035, 0.036, 0.036, 0.034, 0.035, 0.033, 0.036, 0.033, 0.035, 0.036, 0.037, 0.028, 0.034, 0.035, 0.036, 0.034, 0.037, 0.038, 0.033, 0.034, 0.033, 0.036, 0.034, 0.036, 0.035, 0.035]\n",
|
||||
"[Jan 04, 13:19:22] [0] \t\t #> Encoding 81 passages..\n"
|
||||
" Iteration 0 (0.02 s, search 0.02 s): objective=3772.41 imbalance=1.562 nsplit=0 \r",
|
||||
" Iteration 1 (0.02 s, search 0.02 s): objective=2408.99 imbalance=1.470 nsplit=1 \r",
|
||||
" Iteration 2 (0.03 s, search 0.03 s): objective=2243.87 imbalance=1.450 nsplit=0 \r",
|
||||
" Iteration 3 (0.04 s, search 0.04 s): objective=2168.48 imbalance=1.444 nsplit=0 \r",
|
||||
" Iteration 4 (0.05 s, search 0.05 s): objective=2134.26 imbalance=1.449 nsplit=0 \r",
|
||||
" Iteration 5 (0.06 s, search 0.05 s): objective=2117.18 imbalance=1.449 nsplit=0 \r",
|
||||
" Iteration 6 (0.06 s, search 0.06 s): objective=2108.43 imbalance=1.449 nsplit=0 \r",
|
||||
" Iteration 7 (0.07 s, search 0.07 s): objective=2102.62 imbalance=1.450 nsplit=0 \r",
|
||||
" Iteration 8 (0.08 s, search 0.08 s): objective=2100.68 imbalance=1.451 nsplit=0 \r",
|
||||
" Iteration 9 (0.09 s, search 0.08 s): objective=2099.66 imbalance=1.451 nsplit=0 \r",
|
||||
" Iteration 10 (0.10 s, search 0.09 s): objective=2099.03 imbalance=1.451 nsplit=0 \r",
|
||||
" Iteration 11 (0.10 s, search 0.10 s): objective=2098.67 imbalance=1.453 nsplit=0 \r",
|
||||
" Iteration 12 (0.11 s, search 0.11 s): objective=2097.78 imbalance=1.455 nsplit=0 \r",
|
||||
" Iteration 13 (0.12 s, search 0.12 s): objective=2097.31 imbalance=1.455 nsplit=0 \r",
|
||||
" Iteration 14 (0.13 s, search 0.12 s): objective=2097.13 imbalance=1.455 nsplit=0 \r",
|
||||
" Iteration 15 (0.14 s, search 0.13 s): objective=2097.09 imbalance=1.455 nsplit=0 \r",
|
||||
" Iteration 16 (0.14 s, search 0.14 s): objective=2097.09 imbalance=1.455 nsplit=0 \r",
|
||||
" Iteration 17 (0.15 s, search 0.15 s): objective=2097.09 imbalance=1.455 nsplit=0 \r",
|
||||
" Iteration 18 (0.16 s, search 0.15 s): objective=2097.09 imbalance=1.455 nsplit=0 \r",
|
||||
" Iteration 19 (0.17 s, search 0.16 s): objective=2097.09 imbalance=1.455 nsplit=0 \r",
|
||||
"[0.037, 0.038, 0.041, 0.036, 0.035, 0.036, 0.034, 0.036, 0.034, 0.034, 0.036, 0.037, 0.032, 0.039, 0.035, 0.039, 0.033, 0.035, 0.035, 0.037, 0.037, 0.037, 0.037, 0.037, 0.038, 0.034, 0.037, 0.035, 0.036, 0.037, 0.036, 0.04, 0.037, 0.037, 0.036, 0.034, 0.037, 0.035, 0.038, 0.039, 0.037, 0.039, 0.035, 0.036, 0.036, 0.035, 0.035, 0.038, 0.037, 0.033, 0.036, 0.032, 0.034, 0.035, 0.037, 0.037, 0.041, 0.037, 0.039, 0.033, 0.035, 0.033, 0.036, 0.036, 0.038, 0.036, 0.037, 0.038, 0.035, 0.035, 0.033, 0.034, 0.032, 0.038, 0.037, 0.037, 0.036, 0.04, 0.033, 0.037, 0.035, 0.04, 0.036, 0.04, 0.032, 0.037, 0.036, 0.037, 0.034, 0.042, 0.037, 0.035, 0.035, 0.038, 0.034, 0.036, 0.041, 0.035, 0.036, 0.037, 0.041, 0.04, 0.036, 0.036, 0.035, 0.036, 0.034, 0.033, 0.036, 0.033, 0.037, 0.038, 0.036, 0.033, 0.038, 0.037, 0.038, 0.037, 0.039, 0.04, 0.034, 0.034, 0.036, 0.039, 0.033, 0.037, 0.035, 0.037]\n",
|
||||
"[Jan 07, 10:38:27] [0] \t\t #> Encoding 81 passages..\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -173,18 +183,18 @@
|
||||
" 50%|█████ | 1/2 [00:02<00:02, 2.53s/it]\u001b[A\n",
|
||||
"100%|██████████| 2/2 [00:03<00:00, 1.56s/it]\u001b[A\n",
|
||||
"1it [00:03, 3.16s/it]\n",
|
||||
"100%|██████████| 1/1 [00:00<00:00, 4462.03it/s]\n",
|
||||
"100%|██████████| 1024/1024 [00:00<00:00, 255166.78it/s]\n"
|
||||
"100%|██████████| 1/1 [00:00<00:00, 4017.53it/s]\n",
|
||||
"100%|██████████| 1024/1024 [00:00<00:00, 306105.57it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Jan 04, 13:19:25] #> Optimizing IVF to store map from centroids to list of pids..\n",
|
||||
"[Jan 04, 13:19:25] #> Building the emb2pid mapping..\n",
|
||||
"[Jan 04, 13:19:25] len(emb2pid) = 10526\n",
|
||||
"[Jan 04, 13:19:25] #> Saved optimized IVF to .ragatouille/colbert/indexes/Miyazaki/ivf.pid.pt\n",
|
||||
"[Jan 07, 10:38:30] #> Optimizing IVF to store map from centroids to list of pids..\n",
|
||||
"[Jan 07, 10:38:30] #> Building the emb2pid mapping..\n",
|
||||
"[Jan 07, 10:38:30] len(emb2pid) = 10527\n",
|
||||
"[Jan 07, 10:38:30] #> Saved optimized IVF to .ragatouille/colbert/indexes/Miyazaki-123/ivf.pid.pt\n",
|
||||
"\n",
|
||||
"#> Joined...\n",
|
||||
"Done indexing!\n"
|
||||
@ -194,7 +204,7 @@
|
||||
"source": [
|
||||
"RAG.index(\n",
|
||||
" collection=[full_document],\n",
|
||||
" index_name=\"Miyazaki\",\n",
|
||||
" index_name=\"Miyazaki-123\",\n",
|
||||
" max_document_length=180,\n",
|
||||
" split_documents=True,\n",
|
||||
")"
|
||||
@ -202,7 +212,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 6,
|
||||
"id": "fba3b082",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -210,11 +220,11 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Loading searcher for index Miyazaki for the first time... This may take a few seconds\n",
|
||||
"[Jan 04, 13:19:48] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n",
|
||||
"[Jan 04, 13:19:49] #> Loading codec...\n",
|
||||
"[Jan 04, 13:19:49] #> Loading IVF...\n",
|
||||
"[Jan 04, 13:19:49] Loading segmented_lookup_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n"
|
||||
"Loading searcher for index Miyazaki-123 for the first time... This may take a few seconds\n",
|
||||
"[Jan 07, 10:38:34] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n",
|
||||
"[Jan 07, 10:38:35] #> Loading codec...\n",
|
||||
"[Jan 07, 10:38:35] #> Loading IVF...\n",
|
||||
"[Jan 07, 10:38:35] Loading segmented_lookup_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -229,21 +239,21 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Jan 04, 13:19:55] #> Loading doclens...\n"
|
||||
"[Jan 07, 10:38:35] #> Loading doclens...\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 376.07it/s]"
|
||||
"100%|███████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3872.86it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Jan 04, 13:19:55] #> Loading codes and residuals...\n"
|
||||
"[Jan 07, 10:38:35] #> Loading codes and residuals...\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -251,14 +261,14 @@
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 152.81it/s]"
|
||||
"100%|████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 604.89it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Jan 04, 13:19:55] Loading filter_pids_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n"
|
||||
"[Jan 07, 10:38:35] Loading filter_pids_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -272,7 +282,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Jan 04, 13:20:00] Loading decompress_residuals_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n",
|
||||
"[Jan 07, 10:38:35] Loading decompress_residuals_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\n",
|
||||
"Searcher loaded!\n",
|
||||
"\n",
|
||||
"#> QueryTokenizer.tensorize(batch_text[0], batch_background[0], bsize) ==\n",
|
||||
@ -301,7 +311,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 7,
|
||||
"id": "145b7edf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -309,17 +319,17 @@
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'content': 'In April 1984, Miyazaki opened his own office in Suginami Ward, naming it Nibariki.\\n\\n\\n=== Studio Ghibli ===\\n\\n\\n==== Early films (1985–1996) ====\\nIn June 1985, Miyazaki, Takahata, Tokuma and Suzuki founded the animation production company Studio Ghibli, with funding from Tokuma Shoten. Studio Ghibli\\'s first film, Laputa: Castle in the Sky (1986), employed the same production crew of Nausicaä. Miyazaki\\'s designs for the film\\'s setting were inspired by Greek architecture and \"European urbanistic templates\".',\n",
|
||||
" 'score': 25.906391143798828,\n",
|
||||
" 'score': 25.90749740600586,\n",
|
||||
" 'rank': 1},\n",
|
||||
" {'content': 'Hayao Miyazaki (宮崎 駿 or 宮﨑 駿, Miyazaki Hayao, [mijaꜜzaki hajao]; born January 5, 1941) is a Japanese animator, filmmaker, and manga artist. A co-founder of Studio Ghibli, he has attained international acclaim as a masterful storyteller and creator of Japanese animated feature films, and is widely regarded as one of the most accomplished filmmakers in the history of animation.\\nBorn in Tokyo City in the Empire of Japan, Miyazaki expressed interest in manga and animation from an early age, and he joined Toei Animation in 1963. During his early years at Toei Animation he worked as an in-between artist and later collaborated with director Isao Takahata.',\n",
|
||||
" 'score': 25.472686767578125,\n",
|
||||
" 'score': 25.4748477935791,\n",
|
||||
" 'rank': 2},\n",
|
||||
" {'content': 'Glen Keane said Miyazaki is a \"huge influence\" on Walt Disney Animation Studios and has been \"part of our heritage\" ever since The Rescuers Down Under (1990). The Disney Renaissance era was also prompted by competition with the development of Miyazaki\\'s films. Artists from Pixar and Aardman Studios signed a tribute stating, \"You\\'re our inspiration, Miyazaki-san!\"',\n",
|
||||
" 'score': 24.847288131713867,\n",
|
||||
" 'score': 24.84897232055664,\n",
|
||||
" 'rank': 3}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -330,53 +340,26 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2f5c7235",
|
||||
"id": "a0fdf0f9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use in LangChain\n",
|
||||
"\n",
|
||||
"In order to use as a retriever in LangChain, we need to make a really simple wrapper."
|
||||
"We can then convert easily to a LangChain retriever! We can pass in any kwargs we want when creating (like `k`)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "a88d9112",
|
||||
"execution_count": 8,
|
||||
"id": "f3d5cfaa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"\n",
|
||||
"from langchain_core.callbacks import CallbackManagerForRetrieverRun\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_core.retrievers import BaseRetriever\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CustomRetriever(BaseRetriever):\n",
|
||||
" rag: RAGPretrainedModel\n",
|
||||
" k: int = 3\n",
|
||||
"\n",
|
||||
" def _get_relevant_documents(\n",
|
||||
" self, query: str, *, run_manager: CallbackManagerForRetrieverRun\n",
|
||||
" ) -> List[Document]:\n",
|
||||
" results = self.rag.search(query=query, k=self.k)\n",
|
||||
" return [Document(page_content=doc[\"content\"]) for doc in results]"
|
||||
"retriever = RAG.as_langchain_retriever(k=3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "63da0b8e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = CustomRetriever(rag=RAG)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "65954ee4",
|
||||
"execution_count": 10,
|
||||
"id": "98be34c4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -395,18 +378,28 @@
|
||||
" Document(page_content='Glen Keane said Miyazaki is a \"huge influence\" on Walt Disney Animation Studios and has been \"part of our heritage\" ever since The Rescuers Down Under (1990). The Disney Renaissance era was also prompted by competition with the development of Miyazaki\\'s films. Artists from Pixar and Aardman Studios signed a tribute stating, \"You\\'re our inspiration, Miyazaki-san!\"')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever.get_relevant_documents(\"What animation studio did Miyazaki found?\")"
|
||||
"retriever.invoke(\"What animation studio did Miyazaki found?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2f5c7235",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Chaining\n",
|
||||
"\n",
|
||||
"We can easily combine this retriever in to a chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"execution_count": 11,
|
||||
"id": "550e73e2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -434,7 +427,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"execution_count": 12,
|
||||
"id": "0e58ee1d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -456,7 +449,7 @@
|
||||
" 'answer': 'Miyazaki founded Studio Ghibli.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 23,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -467,7 +460,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"execution_count": 13,
|
||||
"id": "d0134f73",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
|
@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -163,7 +162,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
Loading…
Reference in New Issue
Block a user