From f59e5d48edb7954034598e9551e459ed3dc59ab7 Mon Sep 17 00:00:00 2001 From: Philippe PRADOS Date: Mon, 4 Sep 2023 00:54:42 +0200 Subject: [PATCH] Google drive integration (lite) (#9999) My other [pull-request](https://github.com/langchain-ai/langchain/pull/5135) is too big to be acceptable. I propose another 'lite' version. I update only notebook to propose an integration with the external project [`langchain-googledrive`](https://github.com/pprados/langchain-googledrive). --------- Co-authored-by: Harrison Chase --- .../document_loaders/google_drive.ipynb | 10 +- .../retrievers/google_drive.ipynb | 2 +- .../integrations/toolkits/google_drive.ipynb | 215 ++++++++++++++++++ 3 files changed, 221 insertions(+), 6 deletions(-) create mode 100644 docs/extras/integrations/toolkits/google_drive.ipynb diff --git a/docs/extras/integrations/document_loaders/google_drive.ipynb b/docs/extras/integrations/document_loaders/google_drive.ipynb index 9d17e5df97..35b856153e 100644 --- a/docs/extras/integrations/document_loaders/google_drive.ipynb +++ b/docs/extras/integrations/document_loaders/google_drive.ipynb @@ -210,7 +210,7 @@ "id": "83ac576b-48c9-4aad-a35e-e978ea32f746", "metadata": {}, "source": [ - "# Extended usage\n", + "## Extended usage\n", "An external component can manage the complexity of Google Drive : `langchain-googledrive`\n", "It's compatible with the ̀`langchain.document_loaders.GoogleDriveLoader` and can be used\n", "in its place.\n", @@ -319,7 +319,7 @@ "id": "cd13d7d1-db7a-498d-ac98-76ccd9ad9019", "metadata": {}, "source": [ - "## Customize the search pattern\n", + "### Customize the search pattern\n", "\n", "All parameter compatible with Google [`list()`](https://developers.google.com/drive/api/v3/reference/files/list)\n", "API can be set.\n", @@ -398,7 +398,7 @@ "id": "375bb465-8f69-407b-94bd-ffa3718ef500", "metadata": {}, "source": [ - "### Modes for GSlide and GSheet\n", + "#### Modes for GSlide and GSheet\n", "The parameter mode accepts different values:\n", "\n", "- \"document\": return the body of each document\n", @@ -469,7 +469,7 @@ "id": "09acb864-e919-4add-9e06-deba6f7f0cd8", "metadata": {}, "source": [ - "## Advanced usage\n", + "### Advanced usage\n", "All Google File have a 'description' in the metadata. This field can be used to memorize a summary of the document or others indexed tags (See method `lazy_update_description_with_summary()`).\n", "\n", "If you use the `mode=\"snippet\"`, only the description will be used for the body. Else, the `metadata['summary']` has the field.\n", @@ -525,7 +525,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/extras/integrations/retrievers/google_drive.ipynb b/docs/extras/integrations/retrievers/google_drive.ipynb index 3acb14cbc1..caf0bf3092 100644 --- a/docs/extras/integrations/retrievers/google_drive.ipynb +++ b/docs/extras/integrations/retrievers/google_drive.ipynb @@ -59,7 +59,7 @@ }, "outputs": [], "source": [ - "from langchain.retrievers import GoogleDriveRetriever" + "from langchain_googledrive.retrievers import GoogleDriveRetriever" ] }, { diff --git a/docs/extras/integrations/toolkits/google_drive.ipynb b/docs/extras/integrations/toolkits/google_drive.ipynb new file mode 100644 index 0000000000..38ee843d43 --- /dev/null +++ b/docs/extras/integrations/toolkits/google_drive.ipynb @@ -0,0 +1,215 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Google Drive tool\n", + "\n", + "This notebook walks through connecting a LangChain to the Google Drive API.\n", + "\n", + "## Prerequisites\n", + "\n", + "1. Create a Google Cloud project or use an existing project\n", + "1. Enable the [Google Drive API](https://console.cloud.google.com/flows/enableapi?apiid=drive.googleapis.com)\n", + "1. [Authorize credentials for desktop app](https://developers.google.com/drive/api/quickstart/python#authorize_credentials_for_a_desktop_application)\n", + "1. `pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib`\n", + "\n", + "## Instructions for retrieving your Google Docs data\n", + "By default, the `GoogleDriveTools` and `GoogleDriveWrapper` expects the `credentials.json` file to be `~/.credentials/credentials.json`, but this is configurable using the `GOOGLE_ACCOUNT_FILE` environment variable. \n", + "The location of `token.json` use the same directory (or use the parameter `token_path`). Note that `token.json` will be created automatically the first time you use the tool.\n", + "\n", + "`GoogleDriveSearchTool` can retrieve a selection of files with some requests. \n", + "\n", + "By default, If you use a `folder_id`, all the files inside this folder can be retrieved to `Document`, if the name match the query.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#!pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can obtain your folder and document id from the URL:\n", + "* Folder: https://drive.google.com/drive/u/0/folders/1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5 -> folder id is `\"1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5\"`\n", + "* Document: https://docs.google.com/document/d/1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw/edit -> document id is `\"1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw\"`\n", + "\n", + "The special value `root` is for your personal home." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "folder_id=\"root\"\n", + "#folder_id='1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By default, all files with these mime-type can be converted to `Document`.\n", + "- text/text\n", + "- text/plain\n", + "- text/html\n", + "- text/csv\n", + "- text/markdown\n", + "- image/png\n", + "- image/jpeg\n", + "- application/epub+zip\n", + "- application/pdf\n", + "- application/rtf\n", + "- application/vnd.google-apps.document (GDoc)\n", + "- application/vnd.google-apps.presentation (GSlide)\n", + "- application/vnd.google-apps.spreadsheet (GSheet)\n", + "- application/vnd.google.colaboratory (Notebook colab)\n", + "- application/vnd.openxmlformats-officedocument.presentationml.presentation (PPTX)\n", + "- application/vnd.openxmlformats-officedocument.wordprocessingml.document (DOCX)\n", + "\n", + "It's possible to update or customize this. See the documentation of `GoogleDriveAPIWrapper`.\n", + "\n", + "But, the corresponding packages must installed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#!pip install unstructured" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain_googledrive.utilities.google_drive import GoogleDriveAPIWrapper\n", + "from langchain_googledrive.tools.google_drive.tool import GoogleDriveSearchTool\n", + "\n", + "# By default, search only in the filename.\n", + "tool = GoogleDriveSearchTool(\n", + " api_wrapper=GoogleDriveAPIWrapper(\n", + " folder_id=folder_id,\n", + " num_results=2,\n", + " template=\"gdrive-query-in-folder\", # Search in the body of documents\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "logging.basicConfig(level=logging.INFO)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tool.run(\"machine learning\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tool.description" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.agents import load_tools\n", + "tools = load_tools([\"google-drive-search\"],\n", + " folder_id=folder_id,\n", + " template=\"gdrive-query-in-folder\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use within an Agent" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain import OpenAI\n", + "from langchain.agents import initialize_agent, AgentType\n", + "llm = OpenAI(temperature=0)\n", + "agent = initialize_agent(\n", + " tools=tools,\n", + " llm=llm,\n", + " agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "agent.run(\n", + " \"Search in google drive, who is 'Yann LeCun' ?\"\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}