{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/agents/agent_fireworks_ai_langchain_mongodb.ipynb)\n", "\n", "[![View Article](https://img.shields.io/badge/View%20Article-blue)](https://www.mongodb.com/developer/products/atlas/agent-fireworksai-mongodb-langchain/)" ] }, { "cell_type": "markdown", "metadata": { "id": "3kMALXaMv-MS" }, "source": [ "## Install Libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cxTXczeTghzU", "outputId": "ae3a81b2-cba6-42fc-f593-8646bff77b14" }, "outputs": [], "source": [ "!pip install langchain langchain_openai langchain-fireworks langchain-mongodb arxiv pymupdf datasets pymongo" ] }, { "cell_type": "markdown", "metadata": { "id": "RM8rg08YhqZe" }, "source": [ "## Set Evironment Variables" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "oXLWCWEghuOX" }, "outputs": [], "source": [ "import os\n", "\n", "os.environ[\"OPENAI_API_KEY\"] = \"\"\n", "os.environ[\"FIREWORKS_API_KEY\"] = \"\"\n", "os.environ[\"MONGO_URI\"] = \"\"\n", "\n", "FIREWORKS_API_KEY = os.environ.get(\"FIREWORKS_API_KEY\")\n", "OPENAI_API_KEY = os.environ.get(\"OPENAI_API_KEY\")\n", "MONGO_URI = os.environ.get(\"MONGO_URI\")" ] }, { "cell_type": "markdown", "metadata": { "id": "UUf3jtFzO4-V" }, "source": [ "## Data Ingestion into MongoDB Vector Database\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "referenced_widgets": [ "cebfba144ba6418092df949783f93455", "09dcf4ce88064f11980bbefaad1ebc75", "f2bd7bda4d0c4d93b88e53aeb4e1b62d", "278513c5a8b04a24b1823d38107f1e50", "d3941c633788427abb858b21e285088f", "39563df9477648398456675ec51075aa", "f4353368efbd4c3891f805ddc3d05e1b", "30fe0bcd02cb47f3ba23bb480e2eaaea", "d17d8c8f45ee44cd87dcd787c05dbdc3", "62e196b6d30746578e137c50b661f946", "ced7f9d61e06442a960dcda95852048e", "7dbfebff68ff45628da832fac5233c93", "164d16df28d24ab796b7c9cf85174800", "e70e0d317f1e4e73bd95349ed1510cce", "41056c822b9d44559147d2b21416b956", "b1929fb112174c0abcd8004f6be0f880", "95e4af5b420242b7a6b74a18cad98961", "dff65b579f0746ffae8739ecb0aa5a41", "f73ae771c24645c79fd41409a8fc7b34", "20d693a09c534414a5c4c0dd58cf94ed", "a43c349d171e469c8cc94d48060f775b", "373ed3b6307741859ab297c270cf42c8" ] }, "id": "pq4SA6r7O30i", "outputId": "904f4112-79fb-45cc-954b-d2b818cb2748" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/richmondalake/miniconda3/envs/langchain_workarea/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n", "Downloading readme: 100%|██████████| 701/701 [00:00<00:00, 2.04MB/s]\n", "Repo card metadata block was not found. Setting CardData to empty.\n", "Downloading data: 100%|██████████| 102M/102M [00:15<00:00, 6.41MB/s] \n", "Generating train split: 50000 examples [00:01, 38699.64 examples/s]\n" ] } ], "source": [ "import pandas as pd\n", "from datasets import load_dataset\n", "\n", "data = load_dataset(\"MongoDB/subset_arxiv_papers_with_emebeddings\")\n", "dataset_df = pd.DataFrame(data[\"train\"])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jsuj3jOgFimi", "outputId": "5e92750a-4053-46d8-c3b3-9bba5b1180ba" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "50000\n" ] }, { "data": { "text/html": [ "
\n", " | id | \n", "submitter | \n", "authors | \n", "title | \n", "comments | \n", "journal-ref | \n", "doi | \n", "report-no | \n", "categories | \n", "license | \n", "abstract | \n", "versions | \n", "update_date | \n", "authors_parsed | \n", "embedding | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "704.0001 | \n", "Pavel Nadolsky | \n", "C. Bal\\'azs, E. L. Berger, P. M. Nadolsky, C.-... | \n", "Calculation of prompt diphoton production cros... | \n", "37 pages, 15 figures; published version | \n", "Phys.Rev.D76:013009,2007 | \n", "10.1103/PhysRevD.76.013009 | \n", "ANL-HEP-PR-07-12 | \n", "hep-ph | \n", "None | \n", "A fully differential calculation in perturba... | \n", "[{'version': 'v1', 'created': 'Mon, 2 Apr 2007... | \n", "2008-11-26 | \n", "[[Balázs, C., ], [Berger, E. L., ], [Nadolsky,... | \n", "[0.0594153292, -0.0440569334, -0.0487333685, -... | \n", "
1 | \n", "704.0002 | \n", "Louis Theran | \n", "Ileana Streinu and Louis Theran | \n", "Sparsity-certifying Graph Decompositions | \n", "To appear in Graphs and Combinatorics | \n", "None | \n", "None | \n", "None | \n", "math.CO cs.CG | \n", "http://arxiv.org/licenses/nonexclusive-distrib... | \n", "We describe a new algorithm, the $(k,\\ell)$-... | \n", "[{'version': 'v1', 'created': 'Sat, 31 Mar 200... | \n", "2008-12-13 | \n", "[[Streinu, Ileana, ], [Theran, Louis, ]] | \n", "[0.0247399714, -0.065658465, 0.0201423876, -0.... | \n", "
2 | \n", "704.0003 | \n", "Hongjun Pan | \n", "Hongjun Pan | \n", "The evolution of the Earth-Moon system based o... | \n", "23 pages, 3 figures | \n", "None | \n", "None | \n", "None | \n", "physics.gen-ph | \n", "None | \n", "The evolution of Earth-Moon system is descri... | \n", "[{'version': 'v1', 'created': 'Sun, 1 Apr 2007... | \n", "2008-01-13 | \n", "[[Pan, Hongjun, ]] | \n", "[0.0491479263, 0.0728017688, 0.0604138002, 0.0... | \n", "
3 | \n", "704.0004 | \n", "David Callan | \n", "David Callan | \n", "A determinant of Stirling cycle numbers counts... | \n", "11 pages | \n", "None | \n", "None | \n", "None | \n", "math.CO | \n", "None | \n", "We show that a determinant of Stirling cycle... | \n", "[{'version': 'v1', 'created': 'Sat, 31 Mar 200... | \n", "2007-05-23 | \n", "[[Callan, David, ]] | \n", "[0.0389556214, -0.0410280302, 0.0410280302, -0... | \n", "
4 | \n", "704.0005 | \n", "Alberto Torchinsky | \n", "Wael Abu-Shammala and Alberto Torchinsky | \n", "From dyadic $\\Lambda_{\\alpha}$ to $\\Lambda_{\\a... | \n", "None | \n", "Illinois J. Math. 52 (2008) no.2, 681-689 | \n", "None | \n", "None | \n", "math.CA math.FA | \n", "None | \n", "In this paper we show how to compute the $\\L... | \n", "[{'version': 'v1', 'created': 'Mon, 2 Apr 2007... | \n", "2013-10-15 | \n", "[[Abu-Shammala, Wael, ], [Torchinsky, Alberto, ]] | \n", "[0.118412666, -0.0127423415, 0.1185125113, 0.0... | \n", "