{ "cells": [ { "cell_type": "markdown", "id": "70b333e6", "metadata": {}, "source": [ "[![View Article](https://img.shields.io/badge/View%20Article-blue)](https://www.mongodb.com/developer/products/atlas/advanced-rag-langchain-mongodb/)\n" ] }, { "cell_type": "markdown", "id": "d84a72ea", "metadata": {}, "source": [ "# Adding Semantic Caching and Memory to your RAG Application using MongoDB and LangChain\n", "\n", "In this notebook, we will see how to use the new MongoDBCache and MongoDBChatMessageHistory in your RAG application.\n" ] }, { "cell_type": "markdown", "id": "65527202", "metadata": {}, "source": [ "## Step 1: Install required libraries\n", "\n", "- **datasets**: Python library to get access to datasets available on Hugging Face Hub\n", "\n", "- **langchain**: Python toolkit for LangChain\n", "\n", "- **langchain-mongodb**: Python package to use MongoDB as a vector store, semantic cache, chat history store etc. in LangChain\n", "\n", "- **langchain-openai**: Python package to use OpenAI models with LangChain\n", "\n", "- **pymongo**: Python toolkit for MongoDB\n", "\n", "- **pandas**: Python library for data analysis, exploration, and manipulation" ] }, { "cell_type": "code", "execution_count": 1, "id": "cbc22fa4", "metadata": {}, "outputs": [], "source": [ "! pip install -qU datasets langchain langchain-mongodb langchain-openai pymongo pandas" ] }, { "cell_type": "markdown", "id": "39c41e87", "metadata": {}, "source": [ "## Step 2: Setup pre-requisites\n", "\n", "* Set the MongoDB connection string. Follow the steps [here](https://www.mongodb.com/docs/manual/reference/connection-string/) to get the connection string from the Atlas UI.\n", "\n", "* Set the OpenAI API key. Steps to obtain an API key as [here](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)" ] }, { "cell_type": "code", "execution_count": 2, "id": "b56412ae", "metadata": {}, "outputs": [], "source": [ "import getpass" ] }, { "cell_type": "code", "execution_count": 3, "id": "16a20d7a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Enter your MongoDB connection string:········\n" ] } ], "source": [ "MONGODB_URI = getpass.getpass(\"Enter your MongoDB connection string:\")" ] }, { "cell_type": "code", "execution_count": 4, "id": "978682d4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Enter your OpenAI API key:········\n" ] } ], "source": [ "OPENAI_API_KEY = getpass.getpass(\"Enter your OpenAI API key:\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "606081c5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "········\n" ] } ], "source": [ "# Optional-- If you want to enable Langsmith -- good for debugging\n", "import os\n", "\n", "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n", "os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()" ] }, { "cell_type": "markdown", "id": "f6b8302c", "metadata": {}, "source": [ "## Step 3: Download the dataset\n", "\n", "We will be using MongoDB's [embedded_movies](https://huggingface.co/datasets/MongoDB/embedded_movies) dataset" ] }, { "cell_type": "code", "execution_count": 6, "id": "1a3433a6", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from datasets import load_dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "aee5311b", "metadata": {}, "outputs": [], "source": [ "# Ensure you have an HF_TOKEN in your development enviornment:\n", "# access tokens can be created or copied from the Hugging Face platform (https://huggingface.co/docs/hub/en/security-tokens)\n", "\n", "# Load MongoDB's embedded_movies dataset from Hugging Face\n", "# https://huggingface.co/datasets/MongoDB/airbnb_embeddings\n", "\n", "data = load_dataset(\"MongoDB/embedded_movies\")" ] }, { "cell_type": "code", "execution_count": 8, "id": "1d630a26", "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame(data[\"train\"])" ] }, { "cell_type": "markdown", "id": "a1f94f43", "metadata": {}, "source": [ "## Step 4: Data analysis\n", "\n", "Make sure length of the dataset is what we expect, drop Nones etc." ] }, { "cell_type": "code", "execution_count": 10, "id": "b276df71", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | fullplot | \n", "type | \n", "plot_embedding | \n", "num_mflix_comments | \n", "runtime | \n", "writers | \n", "imdb | \n", "countries | \n", "rated | \n", "plot | \n", "title | \n", "languages | \n", "metacritic | \n", "directors | \n", "awards | \n", "genres | \n", "poster | \n", "cast | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Young Pauline is left a lot of money when her ... | \n", "movie | \n", "[0.00072939653, -0.026834568, 0.013515796, -0.... | \n", "0 | \n", "199.0 | \n", "[Charles W. Goddard (screenplay), Basil Dickey... | \n", "{'id': 4465, 'rating': 7.6, 'votes': 744} | \n", "[USA] | \n", "None | \n", "Young Pauline is left a lot of money when her ... | \n", "The Perils of Pauline | \n", "[English] | \n", "NaN | \n", "[Louis J. Gasnier, Donald MacKenzie] | \n", "{'nominations': 0, 'text': '1 win.', 'wins': 1} | \n", "[Action] | \n", "https://m.media-amazon.com/images/M/MV5BMzgxOD... | \n", "[Pearl White, Crane Wilbur, Paul Panzer, Edwar... | \n", "