Harrison/psychic (#5063)

Co-authored-by: Ayan Bandyopadhyay <ayanb9440@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
pull/5064/head
Harrison Chase 1 year ago committed by GitHub
parent 8c661baefb
commit b0431c672b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,20 @@
# Psychic
This page covers how to use [Psychic](https://www.psychic.dev/) within LangChain.
## What is Psychic?
Psychic is a platform for integrating with your customers SaaS tools like Notion, Zendesk, Confluence, and Google Drive via OAuth and syncing documents from these applications to your SQL or vector database. You can think of it like Plaid for unstructured data. Psychic is easy to set up - you use it by importing the react library and configuring it with your Sidekick API key, which you can get from the [Psychic dashboard](https://dashboard.psychic.dev/). When your users connect their applications, you can view these connections from the dashboard and retrieve data using the server-side libraries.
## Quick start
1. Create an account in the [dashboard](https://dashboard.psychic.dev/).
2. Use the [react library](https://docs.psychic.dev/sidekick-link) to add the Psychic link modal to your frontend react app. Users will use this to connect their SaaS apps.
3. Once your user has created a connection, you can use the langchain PsychicLoader by following the [example notebook](../modules/indexes/document_loaders/examples/psychic.ipynb)
# Advantages vs Other Document Loaders
1. **Universal API:** Instead of building OAuth flows and learning the APIs for every SaaS app, you integrate Psychic once and leverage our universal API to retrieve data.
2. **Data Syncs:** Data in your customers' SaaS apps can get stale fast. With Psychic you can configure webhooks to keep your documents up to date on a daily or realtime basis.
3. **Simplified OAuth:** Psychic handles OAuth end-to-end so that you don't have to spend time creating OAuth clients for each integration, keeping access tokens fresh, and handling OAuth redirect logic.

@ -0,0 +1,134 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Psychic\n",
"This notebook covers how to load documents from `Psychic`. See [here](../../../../ecosystem/psychic.md) for more details.\n",
"\n",
"## Prerequisites\n",
"1. Follow the Quick Start section in [this document](../../../../ecosystem/psychic.md)\n",
"2. Log into the [Psychic dashboard](https://dashboard.psychic.dev/) and get your secret key\n",
"3. Install the frontend react library into your web app and have a user authenticate a connection. The connection will be created using the connection id that you specify."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading documents\n",
"\n",
"Use the `PsychicLoader` class to load in documents from a connection. Each connection has a connector id (corresponding to the SaaS app that was connected) and a connection id (which you passed in to the frontend library)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
]
}
],
"source": [
"# Uncomment this to install psychicapi if you don't already have it installed\n",
"!poetry run pip -q install psychicapi"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import PsychicLoader\n",
"from psychicapi import ConnectorId\n",
"\n",
"# Create a document loader for google drive. We can also load from other connectors by setting the connector_id to the appropriate value e.g. ConnectorId.notion.value\n",
"# This loader uses our test credentials\n",
"google_drive_loader = PsychicLoader(\n",
" api_key=\"7ddb61c1-8b6a-4d31-a58e-30d1c9ea480e\",\n",
" connector_id=ConnectorId.gdrive.value,\n",
" connection_id=\"google-test\"\n",
")\n",
"\n",
"documents = google_drive_loader.load()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Converting the docs to embeddings \n",
"\n",
"We can now convert these documents into embeddings and store them in a vector database like Chroma"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.llms import OpenAI\n",
"from langchain.chains import RetrievalQAWithSourcesChain\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"docsearch = Chroma.from_documents(texts, embeddings)\n",
"chain = RetrievalQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type=\"stuff\", retriever=docsearch.as_retriever())\n",
"chain({\"question\": \"what is psychic?\"}, return_only_outputs=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -69,6 +69,7 @@ from langchain.document_loaders.pdf import (
UnstructuredPDFLoader,
)
from langchain.document_loaders.powerpoint import UnstructuredPowerPointLoader
from langchain.document_loaders.psychic import PsychicLoader
from langchain.document_loaders.python import PythonLoader
from langchain.document_loaders.readthedocs import ReadTheDocsLoader
from langchain.document_loaders.reddit import RedditPostsLoader
@ -215,4 +216,5 @@ __all__ = [
"YoutubeLoader",
"TelegramChatLoader",
"ToMarkdownLoader",
"PsychicLoader",
]

@ -0,0 +1,34 @@
"""Loader that loads documents from Psychic.dev."""
from typing import List
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
class PsychicLoader(BaseLoader):
"""Loader that loads documents from Psychic.dev."""
def __init__(self, api_key: str, connector_id: str, connection_id: str):
"""Initialize with API key, connector id, and connection id."""
try:
from psychicapi import ConnectorId, Psychic # noqa: F401
except ImportError:
raise ImportError(
"`psychicapi` package not found, please run `pip install psychicapi`"
)
self.psychic = Psychic(secret_key=api_key)
self.connector_id = ConnectorId(connector_id)
self.connection_id = connection_id
def load(self) -> List[Document]:
"""Load documents."""
psychic_docs = self.psychic.get_documents(self.connector_id, self.connection_id)
return [
Document(
page_content=doc["content"],
metadata={"title": doc["title"], "source": doc["uri"]},
)
for doc in psychic_docs
]

19
poetry.lock generated

@ -6214,6 +6214,21 @@ files = [
[package.extras]
test = ["enum34", "ipaddress", "mock", "pywin32", "wmi"]
[[package]]
name = "psychicapi"
version = "0.2"
description = "Psychic.dev is an open-source universal data connector for knowledgebases."
category = "main"
optional = true
python-versions = "*"
files = [
{file = "psychicapi-0.2-py3-none-any.whl", hash = "sha256:712c6a1615dfad11d65241c179e96a5058ed1ada47463d1208e5a55a2bfdb4ff"},
{file = "psychicapi-0.2.tar.gz", hash = "sha256:3db62c2665c1485d0f68f3c1c57590691f20ee868d1f40fdeb59a6eeb15ed26a"},
]
[package.dependencies]
requests = "*"
[[package]]
name = "psycopg2-binary"
version = "2.9.6"
@ -10348,7 +10363,7 @@ all = ["O365", "aleph-alpha-client", "anthropic", "arxiv", "atlassian-python-api
azure = ["azure-core", "azure-cosmos", "azure-identity", "openai"]
cohere = ["cohere"]
embeddings = ["sentence-transformers"]
extended-testing = ["atlassian-python-api", "beautifulsoup4", "beautifulsoup4", "chardet", "gql", "html2text", "jq", "lxml", "pandas", "pdfminer-six", "pymupdf", "pypdf", "pypdfium2", "requests-toolbelt", "telethon", "tqdm", "zep-python"]
extended-testing = ["atlassian-python-api", "beautifulsoup4", "beautifulsoup4", "chardet", "gql", "html2text", "jq", "lxml", "pandas", "pdfminer-six", "psychicapi", "pymupdf", "pypdf", "pypdfium2", "requests-toolbelt", "telethon", "tqdm", "zep-python"]
hnswlib = ["docarray", "hnswlib", "protobuf"]
in-memory-store = ["docarray"]
llms = ["anthropic", "cohere", "huggingface_hub", "manifest-ml", "nlpcloud", "openai", "torch", "transformers"]
@ -10359,4 +10374,4 @@ text-helpers = ["chardet"]
[metadata]
lock-version = "2.0"
python-versions = ">=3.8.1,<4.0"
content-hash = "5202794df913184aee17f9c6c798edbaa102d5b5152cac885a623ebc93d1e2a3"
content-hash = "086b4d4d5ca5d0be9d12105f926d667926170bca706a6c6ee152637389d2a22d"

@ -88,6 +88,7 @@ pypdfium2 = {version = "^4.10.0", optional = true}
gql = {version = "^3.4.1", optional = true}
pandas = {version = "^2.0.1", optional = true}
telethon = {version = "^1.28.5", optional = true}
psychicapi = {version = "^0.2", optional = true}
zep-python = {version="^0.25", optional=true}
chardet = {version="^5.1.0", optional=true}
requests-toolbelt = {version = "^1.0.0", optional = true}
@ -203,6 +204,7 @@ extended_testing = [
"beautifulsoup4",
"pandas",
"telethon",
"psychicapi",
"zep-python",
"gql",
"requests_toolbelt",

@ -0,0 +1,66 @@
from typing import Dict
from unittest.mock import MagicMock, patch
import pytest
from langchain.docstore.document import Document
from langchain.document_loaders.psychic import PsychicLoader
@pytest.fixture
def mock_psychic(): # type: ignore
with patch("psychicapi.Psychic") as mock_psychic:
yield mock_psychic
@pytest.fixture
def mock_connector_id(): # type: ignore
with patch("psychicapi.ConnectorId") as mock_connector_id:
yield mock_connector_id
@pytest.mark.requires("psychicapi")
class TestPsychicLoader:
MOCK_API_KEY = "api_key"
MOCK_CONNECTOR_ID = "notion"
MOCK_CONNECTION_ID = "connection_id"
def test_psychic_loader_initialization(
self, mock_psychic: MagicMock, mock_connector_id: MagicMock
) -> None:
PsychicLoader(
api_key=self.MOCK_API_KEY,
connector_id=self.MOCK_CONNECTOR_ID,
connection_id=self.MOCK_CONNECTION_ID,
)
mock_psychic.assert_called_once_with(secret_key=self.MOCK_API_KEY)
mock_connector_id.assert_called_once_with(self.MOCK_CONNECTOR_ID)
def test_psychic_loader_load_data(self, mock_psychic: MagicMock) -> None:
mock_psychic.get_documents.return_value = [
self._get_mock_document("123"),
self._get_mock_document("456"),
]
psychic_loader = self._get_mock_psychic_loader(mock_psychic)
documents = psychic_loader.load()
assert mock_psychic.get_documents.call_count == 1
assert len(documents) == 2
assert all(isinstance(doc, Document) for doc in documents)
assert documents[0].page_content == "Content 123"
assert documents[1].page_content == "Content 456"
def _get_mock_psychic_loader(self, mock_psychic: MagicMock) -> PsychicLoader:
psychic_loader = PsychicLoader(
api_key=self.MOCK_API_KEY,
connector_id=self.MOCK_CONNECTOR_ID,
connection_id=self.MOCK_CONNECTION_ID,
)
psychic_loader.psychic = mock_psychic
return psychic_loader
def _get_mock_document(self, uri: str) -> Dict:
return {"uri": f"{uri}", "title": f"Title {uri}", "content": f"Content {uri}"}
Loading…
Cancel
Save