mirror of
https://github.com/hwchase17/langchain
synced 2024-11-13 19:10:52 +00:00
CrateDB: Documentation about Vector Store, Document Loader, and Memory
This commit is contained in:
parent
0606aabfa3
commit
5f04f9bc80
3
docs/docs/.gitignore
vendored
3
docs/docs/.gitignore
vendored
@ -4,4 +4,5 @@ node_modules/
|
||||
|
||||
.docusaurus
|
||||
.cache-loader
|
||||
docs/api
|
||||
docs/api
|
||||
example.sqlite
|
||||
|
276
docs/docs/integrations/document_loaders/cratedb.ipynb
Normal file
276
docs/docs/integrations/document_loaders/cratedb.ipynb
Normal file
@ -0,0 +1,276 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CrateDB Document Loader\n",
|
||||
"\n",
|
||||
"> [CrateDB] is capable of performing both vector and lexical search.\n",
|
||||
"> It is built on top of the Apache Lucene library, talks SQL,\n",
|
||||
"> is PostgreSQL-compatible, and scales like Elasticsearch.\n",
|
||||
"\n",
|
||||
"This notebook covers how to get started with the CrateDB document loader.\n",
|
||||
"\n",
|
||||
"The CrateDB document loader is based on [SQLAlchemy], and uses LangChain's\n",
|
||||
"SQLDatabaseLoader. It loads the result of a database query with one document\n",
|
||||
"per row.\n",
|
||||
"\n",
|
||||
"[CrateDB]: https://github.com/crate/crate\n",
|
||||
"[SQLAlchemy]: https://www.sqlalchemy.org/\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"The `CrateDBLoader` class helps you get your unstructured content from CrateDB\n",
|
||||
"into LangChain's `Document` format.\n",
|
||||
"\n",
|
||||
"You must provide an SQLAlchemy-compatible connection string, and a query\n",
|
||||
"expression in SQL format. \n",
|
||||
"\n",
|
||||
"### Integration details\n",
|
||||
"\n",
|
||||
"| Class | Package | Local | Serializable | JS support|\n",
|
||||
"|:-----------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------| :---: | :---: | :---: |\n",
|
||||
"| [CrateDBLoader](https://python.langchain.com/api_reference/cratedb/document_loaders/langchain_cratedb.document_loaders.cratedb.CrateDBLoader.html) | [langchain_box](https://python.langchain.com/api_reference/cratedb/index.html) | ✅ | ❌ | ❌ | \n",
|
||||
"### Loader features\n",
|
||||
"| Source | Document Lazy Loading | Async Support\n",
|
||||
"| :---: | :---: | :---: | \n",
|
||||
"| CrateDBLoader | ✅ | ❌ | \n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"You can run CrateDB Community Edition on your premises, or you can use CrateDB Cloud.\n",
|
||||
"\n",
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"You will supply credentials through a regular SQLAlchemy connection string, like\n",
|
||||
"`crate://username:password@cratedb.example.org/`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"Install the **langchain-community** and **sqlalchemy-cratedb** packages."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-community sqlalchemy-cratedb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"Now, initialize the loader and start loading documents. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import CrateDBLoader\n",
|
||||
"\n",
|
||||
"loader = CrateDBLoader(\"SELECT * FROM sys.summits\", url=\"crate://crate@localhost/\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": "## Load"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = loader.load()\n",
|
||||
"print(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": "## Lazy Load\n"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"page = []\n",
|
||||
"for doc in loader.lazy_load():\n",
|
||||
" page.append(doc)\n",
|
||||
" if len(page) >= 10:\n",
|
||||
" # do some paged operation, e.g.\n",
|
||||
" # index.upsert(page)\n",
|
||||
"\n",
|
||||
" page = []"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all PyMuPDFLoader features and configurations head to the API reference: https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## Tutorial\n",
|
||||
"\n",
|
||||
"### Populate database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!crash < ./example_data/mlb_teams_2012.sql\n",
|
||||
"!crash --command \"REFRESH TABLE mlb_teams_2012;\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": "### Usage"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from pprint import pprint\n",
|
||||
"\n",
|
||||
"from langchain.document_loaders import CrateDBLoader\n",
|
||||
"\n",
|
||||
"CONNECTION_STRING = \"crate://crate@localhost/\"\n",
|
||||
"\n",
|
||||
"loader = CrateDBLoader(\n",
|
||||
" 'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n",
|
||||
" url=CONNECTION_STRING,\n",
|
||||
")\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pprint(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": "### Specifying Which Columns are Content vs Metadata"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = CrateDBLoader(\n",
|
||||
" 'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n",
|
||||
" url=CONNECTION_STRING,\n",
|
||||
" page_content_columns=[\"Team\"],\n",
|
||||
" metadata_columns=[\"Payroll (millions)\"],\n",
|
||||
")\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pprint(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": "### Adding Source to Metadata"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = CrateDBLoader(\n",
|
||||
" 'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n",
|
||||
" url=CONNECTION_STRING,\n",
|
||||
" source_columns=[\"Team\"],\n",
|
||||
")\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pprint(documents)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
@ -1,6 +1,7 @@
|
||||
-- Provisioning table "mlb_teams_2012".
|
||||
--
|
||||
-- psql postgresql://postgres@localhost < mlb_teams_2012.sql
|
||||
-- crash < mlb_teams_2012.sql
|
||||
|
||||
DROP TABLE IF EXISTS mlb_teams_2012;
|
||||
CREATE TABLE mlb_teams_2012 ("Team" VARCHAR, "Payroll (millions)" FLOAT, "Wins" BIGINT);
|
||||
|
359
docs/docs/integrations/memory/cratedb_chat_message_history.ipynb
Normal file
359
docs/docs/integrations/memory/cratedb_chat_message_history.ipynb
Normal file
@ -0,0 +1,359 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f22eab3f84cbeb37",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"# CrateDB Chat Message History\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use the `CrateDBChatMessageHistory`\n",
|
||||
"to manage chat history in CrateDB, for supporting conversational memory."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7fb27b941602401d91542211134fc71a",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## Prerequisites"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "acae54e37e7d407bbb7b55eff062a284",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!#pip install langchain sqlalchemy-cratedb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f8f2830ee9ca1e01",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## Configuration\n",
|
||||
"\n",
|
||||
"To use the storage wrapper, you will need to configure two details.\n",
|
||||
"\n",
|
||||
"1. Session Id - a unique identifier of the session, like user name, email, chat id etc.\n",
|
||||
"2. Database connection string: An SQLAlchemy-compatible URI that specifies the database\n",
|
||||
" connection. It will be passed to SQLAlchemy create_engine function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 52,
|
||||
"id": "9a63283cbaf04dbcab1f6479b197f3a8",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory.chat_message_histories import CrateDBChatMessageHistory\n",
|
||||
"\n",
|
||||
"CONNECTION_STRING = \"crate://crate@localhost:4200/?schema=example\"\n",
|
||||
"\n",
|
||||
"chat_message_history = CrateDBChatMessageHistory(\n",
|
||||
" session_id=\"test_session\", connection_string=CONNECTION_STRING\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8dd0d8092fe74a7c96281538738b07e2",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## Basic Usage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 53,
|
||||
"id": "4576e914a866fb40",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-28T10:04:38.077748Z",
|
||||
"start_time": "2023-08-28T10:04:36.105894Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_message_history.add_user_message(\"Hello\")\n",
|
||||
"chat_message_history.add_ai_message(\"Hi\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 61,
|
||||
"id": "b476688cbb32ba90",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-28T10:04:38.929396Z",
|
||||
"start_time": "2023-08-28T10:04:38.915727Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": "[HumanMessage(content='Hello', additional_kwargs={}, example=False),\n AIMessage(content='Hi', additional_kwargs={}, example=False)]"
|
||||
},
|
||||
"execution_count": 61,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat_message_history.messages"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2e5337719d5614fd",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## Custom Storage Model\n",
|
||||
"\n",
|
||||
"The default data model, which stores information about conversation messages only\n",
|
||||
"has two slots for storing message details, the session id, and the message dictionary.\n",
|
||||
"\n",
|
||||
"If you want to store additional information, like message date, author, language etc.,\n",
|
||||
"please provide an implementation for a custom message converter.\n",
|
||||
"\n",
|
||||
"This example demonstrates how to create a custom message converter, by implementing\n",
|
||||
"the `BaseMessageConverter` interface."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 55,
|
||||
"id": "fdfde84c07d071bb",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-28T10:04:41.510498Z",
|
||||
"start_time": "2023-08-28T10:04:41.494912Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from datetime import datetime\n",
|
||||
"from typing import Any\n",
|
||||
"\n",
|
||||
"import sqlalchemy as sa\n",
|
||||
"from langchain.memory.chat_message_histories.sql import BaseMessageConverter\n",
|
||||
"from langchain.schema import AIMessage, BaseMessage, HumanMessage, SystemMessage\n",
|
||||
"from sqlalchemy.orm import declarative_base\n",
|
||||
"\n",
|
||||
"Base = declarative_base()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CustomMessage(Base):\n",
|
||||
" __tablename__ = \"custom_message_store\"\n",
|
||||
"\n",
|
||||
" id = sa.Column(sa.BigInteger, primary_key=True, server_default=sa.func.now())\n",
|
||||
" session_id = sa.Column(sa.Text)\n",
|
||||
" type = sa.Column(sa.Text)\n",
|
||||
" content = sa.Column(sa.Text)\n",
|
||||
" created_at = sa.Column(sa.DateTime)\n",
|
||||
" author_email = sa.Column(sa.Text)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CustomMessageConverter(BaseMessageConverter):\n",
|
||||
" def __init__(self, author_email: str):\n",
|
||||
" self.author_email = author_email\n",
|
||||
"\n",
|
||||
" def from_sql_model(self, sql_message: Any) -> BaseMessage:\n",
|
||||
" if sql_message.type == \"human\":\n",
|
||||
" return HumanMessage(\n",
|
||||
" content=sql_message.content,\n",
|
||||
" )\n",
|
||||
" elif sql_message.type == \"ai\":\n",
|
||||
" return AIMessage(\n",
|
||||
" content=sql_message.content,\n",
|
||||
" )\n",
|
||||
" elif sql_message.type == \"system\":\n",
|
||||
" return SystemMessage(\n",
|
||||
" content=sql_message.content,\n",
|
||||
" )\n",
|
||||
" else:\n",
|
||||
" raise ValueError(f\"Unknown message type: {sql_message.type}\")\n",
|
||||
"\n",
|
||||
" def to_sql_model(self, message: BaseMessage, session_id: str) -> Any:\n",
|
||||
" now = datetime.now()\n",
|
||||
" return CustomMessage(\n",
|
||||
" session_id=session_id,\n",
|
||||
" type=message.type,\n",
|
||||
" content=message.content,\n",
|
||||
" created_at=now,\n",
|
||||
" author_email=self.author_email,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" def get_sql_model_class(self) -> Any:\n",
|
||||
" return CustomMessage\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"if __name__ == \"__main__\":\n",
|
||||
" Base.metadata.drop_all(bind=sa.create_engine(CONNECTION_STRING))\n",
|
||||
"\n",
|
||||
" chat_message_history = CrateDBChatMessageHistory(\n",
|
||||
" session_id=\"test_session\",\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
" custom_message_converter=CustomMessageConverter(\n",
|
||||
" author_email=\"test@example.com\"\n",
|
||||
" ),\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" chat_message_history.add_user_message(\"Hello\")\n",
|
||||
" chat_message_history.add_ai_message(\"Hi\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 60,
|
||||
"id": "4a6a54d8a9e2856f",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-28T10:04:43.497990Z",
|
||||
"start_time": "2023-08-28T10:04:43.492517Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": "[HumanMessage(content='Hello', additional_kwargs={}, example=False),\n AIMessage(content='Hi', additional_kwargs={}, example=False)]"
|
||||
},
|
||||
"execution_count": 60,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat_message_history.messages"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "622aded629a1adeb",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## Custom Name for Session Column\n",
|
||||
"\n",
|
||||
"The session id, a unique token identifying the session, is an important property of\n",
|
||||
"this subsystem. If your database table stores it in a different column, you can use\n",
|
||||
"the `session_id_field_name` keyword argument to adjust the name correspondingly."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 57,
|
||||
"id": "72eea5119410473aa328ad9291626812",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"import typing as t\n",
|
||||
"\n",
|
||||
"from langchain.memory.chat_message_histories.cratedb import CrateDBMessageConverter\n",
|
||||
"from langchain.schema import _message_to_dict\n",
|
||||
"\n",
|
||||
"Base = declarative_base()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class MessageWithDifferentSessionIdColumn(Base):\n",
|
||||
" __tablename__ = \"message_store_different_session_id\"\n",
|
||||
" id = sa.Column(sa.BigInteger, primary_key=True, server_default=sa.func.now())\n",
|
||||
" custom_session_id = sa.Column(sa.Text)\n",
|
||||
" message = sa.Column(sa.Text)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CustomMessageConverterWithDifferentSessionIdColumn(CrateDBMessageConverter):\n",
|
||||
" def __init__(self):\n",
|
||||
" self.model_class = MessageWithDifferentSessionIdColumn\n",
|
||||
"\n",
|
||||
" def to_sql_model(self, message: BaseMessage, custom_session_id: str) -> t.Any:\n",
|
||||
" return self.model_class(\n",
|
||||
" custom_session_id=custom_session_id,\n",
|
||||
" message=json.dumps(_message_to_dict(message)),\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"if __name__ == \"__main__\":\n",
|
||||
" Base.metadata.drop_all(bind=sa.create_engine(CONNECTION_STRING))\n",
|
||||
"\n",
|
||||
" chat_message_history = CrateDBChatMessageHistory(\n",
|
||||
" session_id=\"test_session\",\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
" custom_message_converter=CustomMessageConverterWithDifferentSessionIdColumn(),\n",
|
||||
" session_id_field_name=\"custom_session_id\",\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" chat_message_history.add_user_message(\"Hello\")\n",
|
||||
" chat_message_history.add_ai_message(\"Hi\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 58,
|
||||
"id": "8edb47106e1a46a883d545849b8ab81b",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": "[HumanMessage(content='Hello', additional_kwargs={}, example=False),\n AIMessage(content='Hi', additional_kwargs={}, example=False)]"
|
||||
},
|
||||
"execution_count": 58,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat_message_history.messages"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
203
docs/docs/integrations/providers/cratedb.mdx
Normal file
203
docs/docs/integrations/providers/cratedb.mdx
Normal file
@ -0,0 +1,203 @@
|
||||
# CrateDB
|
||||
|
||||
This documentation section shows how to use the CrateDB vector store
|
||||
functionality around [`FLOAT_VECTOR`] and [`KNN_MATCH`]. You will learn
|
||||
how to use it for similarity search and other purposes.
|
||||
|
||||
|
||||
## What is CrateDB?
|
||||
|
||||
[CrateDB] is an open-source, distributed, and scalable SQL analytics database
|
||||
for storing and analyzing massive amounts of data in near real-time, even with
|
||||
complex queries. It is PostgreSQL-compatible, based on [Lucene], and inherits
|
||||
the shared-nothing distribution layer of [Elasticsearch].
|
||||
|
||||
It provides a distributed, multi-tenant-capable relational database and search
|
||||
engine with HTTP and PostgreSQL interfaces, and schema-free objects. It supports
|
||||
sharding, partitioning, and replication out of the box.
|
||||
|
||||
CrateDB enables you to efficiently store billions of records, and terabytes of
|
||||
data, and query it using SQL.
|
||||
|
||||
- Provides a standards-based SQL interface for querying relational data, nested
|
||||
documents, geospatial constraints, and vector embeddings at the same time.
|
||||
- Improves your operations by storing time-series data, relational metadata,
|
||||
and vector embeddings within a single database.
|
||||
- Builds upon approved technologies from Lucene and Elasticsearch.
|
||||
|
||||
|
||||
## CrateDB Cloud
|
||||
|
||||
- Offers on-demand CrateDB clusters without operational overhead,
|
||||
with enterprise-grade features and [ISO 27001] certification.
|
||||
- The entrypoint to [CrateDB Cloud] is the [CrateDB Cloud Console].
|
||||
- Crate.io offers a free tier via [CrateDB Cloud CRFREE].
|
||||
- To get started, [sign up] to CrateDB Cloud, deploy a database cluster,
|
||||
and follow the upcoming instructions.
|
||||
|
||||
|
||||
## Features
|
||||
|
||||
The CrateDB adapter supports the _Vector Store_, _Document Loader_,
|
||||
and _Conversational Memory_ subsystems of LangChain.
|
||||
|
||||
### Vector Store
|
||||
|
||||
`CrateDBVectorSearch` is an API wrapper around CrateDB's `FLOAT_VECTOR` type
|
||||
and the corresponding `KNN_MATCH` function, based on SQLAlchemy and CrateDB's
|
||||
SQLAlchemy dialect. It provides an interface to store and retrieve floating
|
||||
point vectors, and to conduct similarity searches.
|
||||
|
||||
Supports:
|
||||
- Approximate nearest neighbor search.
|
||||
- Euclidean distance.
|
||||
|
||||
### Document Loader
|
||||
|
||||
`CrateDBLoader` provides loading documents from a database table by an SQL
|
||||
query expression or an SQLAlchemy selectable instance.
|
||||
|
||||
### Conversational Memory
|
||||
|
||||
`CrateDBChatMessageHistory` uses CrateDB to manage conversation history.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There are multiple ways to get started with CrateDB.
|
||||
|
||||
### Install CrateDB on your local machine
|
||||
|
||||
You can [download CrateDB], or use the [OCI image] to run CrateDB on Docker or Podman.
|
||||
Note that this is not recommended for production use.
|
||||
|
||||
```shell
|
||||
docker run --rm -it --name=cratedb --publish=4200:4200 --publish=5432:5432 \
|
||||
--env=CRATE_HEAP_SIZE=4g crate/crate:nightly \
|
||||
-Cdiscovery.type=single-node
|
||||
```
|
||||
|
||||
### Deploy a cluster on CrateDB Cloud
|
||||
|
||||
[CrateDB Cloud] is a managed CrateDB service. Sign up for a [free trial].
|
||||
|
||||
### Install Client
|
||||
|
||||
```bash
|
||||
pip install crash langchain langchain-openai sqlalchemy-cratedb
|
||||
```
|
||||
|
||||
|
||||
## Usage » Vector Store
|
||||
|
||||
For a more detailed walkthrough of the `CrateDBVectorSearch` wrapper, there is also
|
||||
a corresponding [Jupyter notebook](/docs/extras/integrations/vectorstores/cratedb.html).
|
||||
|
||||
### Provide input data
|
||||
The example uses the canonical `state_of_the_union.txt`.
|
||||
```shell
|
||||
wget https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt
|
||||
```
|
||||
|
||||
### Set environment variables
|
||||
Use a valid OpenAI API key and SQL connection string. This one fits a local instance of CrateDB.
|
||||
```shell
|
||||
export OPENAI_API_KEY=foobar
|
||||
export CRATEDB_CONNECTION_STRING=crate://crate@localhost
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
Load and index documents, and invoke query.
|
||||
```python
|
||||
from langchain.document_loaders import UnstructuredURLLoader
|
||||
from langchain.embeddings.openai import OpenAIEmbeddings
|
||||
from langchain.text_splitter import CharacterTextSplitter
|
||||
from langchain.vectorstores import CrateDBVectorSearch
|
||||
|
||||
|
||||
def main():
|
||||
# Load the document, split it into chunks, embed each chunk and load it into the vector store.
|
||||
raw_documents = UnstructuredURLLoader("https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt").load()
|
||||
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
|
||||
documents = text_splitter.split_documents(raw_documents)
|
||||
db = CrateDBVectorSearch.from_documents(documents, OpenAIEmbeddings())
|
||||
|
||||
query = "What did the president say about Ketanji Brown Jackson"
|
||||
docs = db.similarity_search(query)
|
||||
print(docs[0].page_content)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
|
||||
## Usage » Document Loader
|
||||
|
||||
For a more detailed walkthrough of the `CrateDBLoader`, there is also a corresponding
|
||||
[Jupyter notebook](/docs/extras/integrations/document_loaders/cratedb.html).
|
||||
|
||||
|
||||
### Provide input data
|
||||
```shell
|
||||
wget https://github.com/crate-workbench/langchain/raw/cratedb/docs/docs/integrations/document_loaders/example_data/mlb_teams_2012.sql
|
||||
crash < ./example_data/mlb_teams_2012.sql
|
||||
crash --command "REFRESH TABLE mlb_teams_2012;"
|
||||
```
|
||||
|
||||
### Load documents by SQL query
|
||||
```python
|
||||
from langchain.document_loaders import CrateDBLoader
|
||||
from pprint import pprint
|
||||
|
||||
def main():
|
||||
loader = CrateDBLoader(
|
||||
'SELECT * FROM mlb_teams_2012 ORDER BY "Team" LIMIT 5;',
|
||||
url="crate://crate@localhost/",
|
||||
)
|
||||
documents = loader.load()
|
||||
pprint(documents)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
|
||||
## Usage » Conversational Memory
|
||||
|
||||
For a more detailed walkthrough of the `CrateDBChatMessageHistory`, there is also a corresponding
|
||||
[Jupyter notebook](/docs/extras/integrations/memory/cratedb_chat_message_history.html).
|
||||
|
||||
```python
|
||||
from langchain.memory.chat_message_histories import CrateDBChatMessageHistory
|
||||
from pprint import pprint
|
||||
|
||||
def main():
|
||||
chat_message_history = CrateDBChatMessageHistory(
|
||||
session_id="test_session",
|
||||
connection_string="crate://crate@localhost/",
|
||||
)
|
||||
chat_message_history.add_user_message("Hello")
|
||||
chat_message_history.add_ai_message("Hi")
|
||||
pprint(chat_message_history)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
|
||||
[CrateDB]: https://github.com/crate/crate
|
||||
[CrateDB Cloud]: https://cratedb.com/product
|
||||
[CrateDB Cloud Console]: https://console.cratedb.cloud/
|
||||
[CrateDB Cloud CRFREE]: https://community.crate.io/t/new-cratedb-cloud-edge-feature-cratedb-cloud-free-tier/1402
|
||||
[CrateDB SQLAlchemy dialect]: https://cratedb.com/docs/sqlalchemy-cratedb/
|
||||
[download CrateDB]: https://cratedb.com/download
|
||||
[Elastisearch]: https://github.com/elastic/elasticsearch
|
||||
[`FLOAT_VECTOR`]: https://cratedb.com/docs/crate/reference/en/master/general/ddl/data-types.html#float-vector
|
||||
[free trial]: https://cratedb.com/lp-crfree?utm_source=langchain
|
||||
[ISO 27001]: https://cratedb.com/blog/cratedb-elevates-its-security-standards-and-achieves-iso-27001-certification
|
||||
[`KNN_MATCH`]: https://cratedb.com/docs/crate/reference/en/master/general/builtins/scalar-functions.html#scalar-knn-match
|
||||
[Lucene]: https://github.com/apache/lucene
|
||||
[OCI image]: https://hub.docker.com/_/crate
|
||||
[sign up]: https://console.cratedb.cloud/
|
589
docs/docs/integrations/vectorstores/cratedb.ipynb
Normal file
589
docs/docs/integrations/vectorstores/cratedb.ipynb
Normal file
@ -0,0 +1,589 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CrateDB\n",
|
||||
"\n",
|
||||
"> [CrateDB] is capable of performing both vector and lexical search.\n",
|
||||
"> It is built on top of the Apache Lucene library, talks SQL,\n",
|
||||
"> is PostgreSQL-compatible, and scales like Elasticsearch.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the CrateDB vector store functionality around\n",
|
||||
"[`FLOAT_VECTOR`] and [`KNN_MATCH`]. You will learn how to use LangChain's\n",
|
||||
"`CrateDBVectorSearch` adapter for similarity search and other purposes.\n",
|
||||
"\n",
|
||||
"It supports:\n",
|
||||
"- Similarity Search with Euclidean Distance\n",
|
||||
"- Maximal Marginal Relevance Search (MMR)\n",
|
||||
"\n",
|
||||
"## What is CrateDB?\n",
|
||||
"\n",
|
||||
"[CrateDB] is an open-source, distributed, and scalable SQL analytics database\n",
|
||||
"for storing and analyzing massive amounts of data in near real-time, even with\n",
|
||||
"complex queries. It is PostgreSQL-compatible, based on [Lucene], and inherits\n",
|
||||
"the shared-nothing distribution layer of [Elasticsearch].\n",
|
||||
"\n",
|
||||
"This example uses the [Python client driver for CrateDB]. For more documentation,\n",
|
||||
"see also [LangChain with CrateDB].\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"[CrateDB]: https://github.com/crate/crate\n",
|
||||
"[Elasticsearch]: https://github.com/elastic/elasticsearch\n",
|
||||
"[`FLOAT_VECTOR`]: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/data-types.html#float-vector\n",
|
||||
"[`KNN_MATCH`]: https://cratedb.com/docs/crate/reference/en/latest/general/builtins/scalar-functions.html#scalar-knn-match\n",
|
||||
"[LangChain with CrateDB]: /docs/extras/integrations/providers/cratedb.html\n",
|
||||
"[Lucene]: https://github.com/apache/lucene\n",
|
||||
"[Python client driver for CrateDB]: https://cratedb.com/docs/python/"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"In order to use the CrateDB vector search you must install the sqlalchemy-cratedb package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Install required packages: LangChain, OpenAI SDK, and the CrateDB SQLAlchemy adapter.\n",
|
||||
"%pip install -qU langchain-community langchain-openai sqlalchemy-cratedb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"You will supply credentials through a regular SQLAlchemy connection string, like\n",
|
||||
"`crate://username:password@cratedb.example.org/`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"### OpenAI API key\n",
|
||||
"\n",
|
||||
"You need to provide an OpenAI API key, optionally using the environment\n",
|
||||
"variable `OPENAI_API_KEY`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-09-09T08:02:16.802456Z",
|
||||
"start_time": "2023-09-09T08:02:07.065604Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from dotenv import find_dotenv, load_dotenv\n",
|
||||
"\n",
|
||||
"# Run `export OPENAI_API_KEY=sk-YOUR_OPENAI_API_KEY`.\n",
|
||||
"# Get OpenAI api key from `.env` file.\n",
|
||||
"# Otherwise, prompt for it.\n",
|
||||
"_ = load_dotenv(find_dotenv())\n",
|
||||
"OPENAI_API_KEY = os.environ.get(\"OPENAI_API_KEY\", getpass.getpass(\"OpenAI API key:\"))\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"You also need to provide a connection string to your CrateDB database cluster,\n",
|
||||
"optionally using the environment variable `CRATEDB_CONNECTION_STRING`.\n",
|
||||
"\n",
|
||||
"This example uses a CrateDB instance on your workstation, which you can start by\n",
|
||||
"running [CrateDB using Docker]. Alternatively, you can also connect to a cluster\n",
|
||||
"running on [CrateDB Cloud].\n",
|
||||
"\n",
|
||||
"[CrateDB Cloud]: https://console.cratedb.cloud/\n",
|
||||
"[CrateDB using Docker]: https://cratedb.com/docs/guide/install/container/\n",
|
||||
"\n",
|
||||
"### CrateDB connection string\n",
|
||||
"\n",
|
||||
"You will need to supply an SQLAlchemy-compatible connection string."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"CONNECTION_STRING = os.environ.get(\n",
|
||||
" \"CRATEDB_CONNECTION_STRING\",\n",
|
||||
" \"crate://crate@localhost:4200/?schema=langchain\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# For CrateDB Cloud, use:\n",
|
||||
"# CONNECTION_STRING = os.environ.get(\n",
|
||||
"# \"CRATEDB_CONNECTION_STRING\",\n",
|
||||
"# \"crate://username:password@hostname:4200/?ssl=true&schema=langchain\",\n",
|
||||
"# )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-09-09T08:02:28.174088Z",
|
||||
"start_time": "2023-09-09T08:02:28.162698Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\"\"\"\n",
|
||||
"# Alternatively, the connection string can be assembled from individual\n",
|
||||
"# environment variables.\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"CONNECTION_STRING = CrateDBVectorSearch.connection_string_from_db_params(\n",
|
||||
" driver=os.environ.get(\"CRATEDB_DRIVER\", \"crate\"),\n",
|
||||
" host=os.environ.get(\"CRATEDB_HOST\", \"localhost\"),\n",
|
||||
" port=int(os.environ.get(\"CRATEDB_PORT\", \"4200\")),\n",
|
||||
" database=os.environ.get(\"CRATEDB_DATABASE\", \"langchain\"),\n",
|
||||
" user=os.environ.get(\"CRATEDB_USER\", \"crate\"),\n",
|
||||
" password=os.environ.get(\"CRATEDB_PASSWORD\", \"\"),\n",
|
||||
")\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"### Import Python Modules\n",
|
||||
"\n",
|
||||
"You will start by importing all required modules."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"from langchain.document_loaders import UnstructuredURLLoader\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import CrateDBVectorSearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"\n",
|
||||
"In the example above, you created a vector store from scratch. When\n",
|
||||
"aiming to work with an existing vector store, you can initialize it directly."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
"store = CrateDBVectorSearch(\n",
|
||||
" collection_name=\"testdrive\",\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
" embedding_function=embeddings,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"You can also add documents to an existing vector store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"store.add_documents([Document(page_content=\"foo\")])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"jupyter": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_with_score = store.similarity_search_with_score(\"foo\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_with_score[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_with_score[1]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Update items in vector store\n",
|
||||
"\n",
|
||||
"FIXME"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Foo."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store\n",
|
||||
"FIXME"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"store.delete(ids=[\"foo\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"### Load and Index Documents\n",
|
||||
"\n",
|
||||
"Next, you will read input data, and tokenize it. The module will create a table\n",
|
||||
"with the name of the collection. Make sure the collection name is unique, and\n",
|
||||
"that you have the permission to create a table."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredURLLoader(\n",
|
||||
" \"https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt\"\n",
|
||||
")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"COLLECTION_NAME = \"state_of_the_union_test\"\n",
|
||||
"\n",
|
||||
"db = CrateDBVectorSearch.from_documents(\n",
|
||||
" embedding=embeddings,\n",
|
||||
" documents=docs,\n",
|
||||
" collection_name=COLLECTION_NAME,\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Overwriting a Vector Store\n",
|
||||
"\n",
|
||||
"If you have an existing collection, you can overwrite it by using `from_documents`,\n",
|
||||
"aad setting `pre_delete_collection = True`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = CrateDBVectorSearch.from_documents(\n",
|
||||
" documents=docs,\n",
|
||||
" embedding=embeddings,\n",
|
||||
" collection_name=COLLECTION_NAME,\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
" pre_delete_collection=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_with_score = db.similarity_search_with_score(\"foo\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_with_score[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"#### Similarity search\n",
|
||||
"Searching by euclidean distance is the default."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-09-09T08:05:11.104135Z",
|
||||
"start_time": "2023-09-09T08:05:10.548998Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs_with_score = db.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-09-09T08:05:13.532334Z",
|
||||
"start_time": "2023-09-09T08:05:13.523191Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for doc, score in docs_with_score:\n",
|
||||
" print(\"-\" * 80)\n",
|
||||
" print(\"Score: \", score)\n",
|
||||
" print(doc.page_content)\n",
|
||||
" print(\"-\" * 80)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"#### Maximal Marginal Relevance Search (MMR)\n",
|
||||
"Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-09-09T08:05:23.276819Z",
|
||||
"start_time": "2023-09-09T08:05:21.972256Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_with_score = db.max_marginal_relevance_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-09-09T08:05:27.478580Z",
|
||||
"start_time": "2023-09-09T08:05:27.470138Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for doc, score in docs_with_score:\n",
|
||||
" print(\"-\" * 80)\n",
|
||||
" print(\"Score: \", score)\n",
|
||||
" print(doc.page_content)\n",
|
||||
" print(\"-\" * 80)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"#### Searching in Multiple Collections\n",
|
||||
"`CrateDBVectorSearchMultiCollection` is a special adapter which provides similarity search across\n",
|
||||
"multiple collections. It can not be used for indexing documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores.cratedb import CrateDBVectorSearchMultiCollection\n",
|
||||
"\n",
|
||||
"multisearch = CrateDBVectorSearchMultiCollection(\n",
|
||||
" collection_names=[\"test_collection_1\", \"test_collection_2\"],\n",
|
||||
" embedding_function=embeddings,\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
")\n",
|
||||
"docs_with_score = multisearch.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": "### Query by turning into retriever"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = store.as_retriever()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(retriever)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `CrateDBVectorSearch` features and configurations,\n",
|
||||
"head to the API reference:\n",
|
||||
"https://python.langchain.com/api_reference/cratedb/vectorstores/langchain_cratedb.vectorstores.CrateDBVectorSearch.html"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
Loading…
Reference in New Issue
Block a user