CrateDB: Documentation about Vector Store, Document Loader, and Memory

2024-11-13 19:10:52 +00:00 · 2024-10-29 14:25:01 +01:00 · 2024-10-29 14:25:01 +01:00 · 5f04f9bc80
commit 5f04f9bc80
parent 0606aabfa3
6 changed files with 1430 additions and 1 deletions
--- a/docs/docs/.gitignore
+++ b/docs/docs/.gitignore
@ -4,4 +4,5 @@ node_modules/

 .docusaurus
 .cache-loader
-docs/api
+docs/api
+example.sqlite
--- a/docs/docs/integrations/document_loaders/cratedb.ipynb
+++ b/docs/docs/integrations/document_loaders/cratedb.ipynb
@ -0,0 +1,276 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# CrateDB Document Loader\n",
+    "\n",
+    "> [CrateDB] is capable of performing both vector and lexical search.\n",
+    "> It is built on top of the Apache Lucene library, talks SQL,\n",
+    "> is PostgreSQL-compatible, and scales like Elasticsearch.\n",
+    "\n",
+    "This notebook covers how to get started with the CrateDB document loader.\n",
+    "\n",
+    "The CrateDB document loader is based on [SQLAlchemy], and uses LangChain's\n",
+    "SQLDatabaseLoader. It loads the result of a database query with one document\n",
+    "per row.\n",
+    "\n",
+    "[CrateDB]: https://github.com/crate/crate\n",
+    "[SQLAlchemy]: https://www.sqlalchemy.org/\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "The `CrateDBLoader` class helps you get your unstructured content from CrateDB\n",
+    "into LangChain's `Document` format.\n",
+    "\n",
+    "You must provide an SQLAlchemy-compatible connection string, and a query\n",
+    "expression in SQL format. \n",
+    "\n",
+    "### Integration details\n",
+    "\n",
+    "| Class                                                                                                                                          | Package                                                                        | Local | Serializable | JS support|\n",
+    "|:-----------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------| :---: | :---: |  :---: |\n",
+    "| [CrateDBLoader](https://python.langchain.com/api_reference/cratedb/document_loaders/langchain_cratedb.document_loaders.cratedb.CrateDBLoader.html) | [langchain_box](https://python.langchain.com/api_reference/cratedb/index.html) | ✅ | ❌ | ❌ | \n",
+    "### Loader features\n",
+    "| Source | Document Lazy Loading | Async Support\n",
+    "| :---: | :---: | :---: | \n",
+    "| CrateDBLoader | ✅ | ❌ | \n",
+    "\n",
+    "## Setup\n",
+    "\n",
+    "You can run CrateDB Community Edition on your premises, or you can use CrateDB Cloud.\n",
+    "\n",
+    "### Credentials\n",
+    "\n",
+    "You will supply credentials through a regular SQLAlchemy connection string, like\n",
+    "`crate://username:password@cratedb.example.org/`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Installation\n",
+    "\n",
+    "Install the **langchain-community** and **sqlalchemy-cratedb** packages."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -qU langchain-community sqlalchemy-cratedb"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization\n",
+    "\n",
+    "Now, initialize the loader and start loading documents. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.document_loaders import CrateDBLoader\n",
+    "\n",
+    "loader = CrateDBLoader(\"SELECT * FROM sys.summits\", url=\"crate://crate@localhost/\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": "## Load"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "documents = loader.load()\n",
+    "print(documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "## Lazy Load\n"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "page = []\n",
+    "for doc in loader.lazy_load():\n",
+    "    page.append(doc)\n",
+    "    if len(page) >= 10:\n",
+    "        # do some paged operation, e.g.\n",
+    "        # index.upsert(page)\n",
+    "\n",
+    "        page = []"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "For detailed documentation of all PyMuPDFLoader features and configurations head to the API reference: https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Tutorial\n",
+    "\n",
+    "### Populate database."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!crash < ./example_data/mlb_teams_2012.sql\n",
+    "!crash --command \"REFRESH TABLE mlb_teams_2012;\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": "### Usage"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from pprint import pprint\n",
+    "\n",
+    "from langchain.document_loaders import CrateDBLoader\n",
+    "\n",
+    "CONNECTION_STRING = \"crate://crate@localhost/\"\n",
+    "\n",
+    "loader = CrateDBLoader(\n",
+    "    'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n",
+    "    url=CONNECTION_STRING,\n",
+    ")\n",
+    "documents = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "pprint(documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "### Specifying Which Columns are Content vs Metadata"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = CrateDBLoader(\n",
+    "    'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n",
+    "    url=CONNECTION_STRING,\n",
+    "    page_content_columns=[\"Team\"],\n",
+    "    metadata_columns=[\"Payroll (millions)\"],\n",
+    ")\n",
+    "documents = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pprint(documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "### Adding Source to Metadata"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = CrateDBLoader(\n",
+    "    'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n",
+    "    url=CONNECTION_STRING,\n",
+    "    source_columns=[\"Team\"],\n",
+    ")\n",
+    "documents = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pprint(documents)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/docs/docs/integrations/document_loaders/example_data/mlb_teams_2012.sql
+++ b/docs/docs/integrations/document_loaders/example_data/mlb_teams_2012.sql
@ -1,6 +1,7 @@
 -- Provisioning table "mlb_teams_2012".
 --
 -- psql postgresql://postgres@localhost < mlb_teams_2012.sql
+-- crash < mlb_teams_2012.sql

 DROP TABLE IF EXISTS mlb_teams_2012;
 CREATE TABLE mlb_teams_2012 ("Team" VARCHAR, "Payroll (millions)" FLOAT, "Wins" BIGINT);
--- a/docs/docs/integrations/memory/cratedb_chat_message_history.ipynb
+++ b/docs/docs/integrations/memory/cratedb_chat_message_history.ipynb
@ -0,0 +1,359 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "f22eab3f84cbeb37",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "# CrateDB Chat Message History\n",
+    "\n",
+    "This notebook demonstrates how to use the `CrateDBChatMessageHistory`\n",
+    "to manage chat history in CrateDB, for supporting conversational memory."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fb27b941602401d91542211134fc71a",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Prerequisites"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "acae54e37e7d407bbb7b55eff062a284",
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "!#pip install langchain sqlalchemy-cratedb"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8f2830ee9ca1e01",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Configuration\n",
+    "\n",
+    "To use the storage wrapper, you will need to configure two details.\n",
+    "\n",
+    "1. Session Id - a unique identifier of the session, like user name, email, chat id etc.\n",
+    "2. Database connection string: An SQLAlchemy-compatible URI that specifies the database\n",
+    "   connection. It will be passed to SQLAlchemy create_engine function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "id": "9a63283cbaf04dbcab1f6479b197f3a8",
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.memory.chat_message_histories import CrateDBChatMessageHistory\n",
+    "\n",
+    "CONNECTION_STRING = \"crate://crate@localhost:4200/?schema=example\"\n",
+    "\n",
+    "chat_message_history = CrateDBChatMessageHistory(\n",
+    "    session_id=\"test_session\", connection_string=CONNECTION_STRING\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8dd0d8092fe74a7c96281538738b07e2",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Basic Usage"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "id": "4576e914a866fb40",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-08-28T10:04:38.077748Z",
+     "start_time": "2023-08-28T10:04:36.105894Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "chat_message_history.add_user_message(\"Hello\")\n",
+    "chat_message_history.add_ai_message(\"Hi\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "id": "b476688cbb32ba90",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-08-28T10:04:38.929396Z",
+     "start_time": "2023-08-28T10:04:38.915727Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "[HumanMessage(content='Hello', additional_kwargs={}, example=False),\n AIMessage(content='Hi', additional_kwargs={}, example=False)]"
+     },
+     "execution_count": 61,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chat_message_history.messages"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e5337719d5614fd",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Custom Storage Model\n",
+    "\n",
+    "The default data model, which stores information about conversation messages only\n",
+    "has two slots for storing message details, the session id, and the message dictionary.\n",
+    "\n",
+    "If you want to store additional information, like message date, author, language etc.,\n",
+    "please provide an implementation for a custom message converter.\n",
+    "\n",
+    "This example demonstrates how to create a custom message converter, by implementing\n",
+    "the `BaseMessageConverter` interface."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "id": "fdfde84c07d071bb",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-08-28T10:04:41.510498Z",
+     "start_time": "2023-08-28T10:04:41.494912Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from datetime import datetime\n",
+    "from typing import Any\n",
+    "\n",
+    "import sqlalchemy as sa\n",
+    "from langchain.memory.chat_message_histories.sql import BaseMessageConverter\n",
+    "from langchain.schema import AIMessage, BaseMessage, HumanMessage, SystemMessage\n",
+    "from sqlalchemy.orm import declarative_base\n",
+    "\n",
+    "Base = declarative_base()\n",
+    "\n",
+    "\n",
+    "class CustomMessage(Base):\n",
+    "    __tablename__ = \"custom_message_store\"\n",
+    "\n",
+    "    id = sa.Column(sa.BigInteger, primary_key=True, server_default=sa.func.now())\n",
+    "    session_id = sa.Column(sa.Text)\n",
+    "    type = sa.Column(sa.Text)\n",
+    "    content = sa.Column(sa.Text)\n",
+    "    created_at = sa.Column(sa.DateTime)\n",
+    "    author_email = sa.Column(sa.Text)\n",
+    "\n",
+    "\n",
+    "class CustomMessageConverter(BaseMessageConverter):\n",
+    "    def __init__(self, author_email: str):\n",
+    "        self.author_email = author_email\n",
+    "\n",
+    "    def from_sql_model(self, sql_message: Any) -> BaseMessage:\n",
+    "        if sql_message.type == \"human\":\n",
+    "            return HumanMessage(\n",
+    "                content=sql_message.content,\n",
+    "            )\n",
+    "        elif sql_message.type == \"ai\":\n",
+    "            return AIMessage(\n",
+    "                content=sql_message.content,\n",
+    "            )\n",
+    "        elif sql_message.type == \"system\":\n",
+    "            return SystemMessage(\n",
+    "                content=sql_message.content,\n",
+    "            )\n",
+    "        else:\n",
+    "            raise ValueError(f\"Unknown message type: {sql_message.type}\")\n",
+    "\n",
+    "    def to_sql_model(self, message: BaseMessage, session_id: str) -> Any:\n",
+    "        now = datetime.now()\n",
+    "        return CustomMessage(\n",
+    "            session_id=session_id,\n",
+    "            type=message.type,\n",
+    "            content=message.content,\n",
+    "            created_at=now,\n",
+    "            author_email=self.author_email,\n",
+    "        )\n",
+    "\n",
+    "    def get_sql_model_class(self) -> Any:\n",
+    "        return CustomMessage\n",
+    "\n",
+    "\n",
+    "if __name__ == \"__main__\":\n",
+    "    Base.metadata.drop_all(bind=sa.create_engine(CONNECTION_STRING))\n",
+    "\n",
+    "    chat_message_history = CrateDBChatMessageHistory(\n",
+    "        session_id=\"test_session\",\n",
+    "        connection_string=CONNECTION_STRING,\n",
+    "        custom_message_converter=CustomMessageConverter(\n",
+    "            author_email=\"test@example.com\"\n",
+    "        ),\n",
+    "    )\n",
+    "\n",
+    "    chat_message_history.add_user_message(\"Hello\")\n",
+    "    chat_message_history.add_ai_message(\"Hi\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "id": "4a6a54d8a9e2856f",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-08-28T10:04:43.497990Z",
+     "start_time": "2023-08-28T10:04:43.492517Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "[HumanMessage(content='Hello', additional_kwargs={}, example=False),\n AIMessage(content='Hi', additional_kwargs={}, example=False)]"
+     },
+     "execution_count": 60,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chat_message_history.messages"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "622aded629a1adeb",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Custom Name for Session Column\n",
+    "\n",
+    "The session id, a unique token identifying the session, is an important property of\n",
+    "this subsystem. If your database table stores it in a different column, you can use\n",
+    "the `session_id_field_name` keyword argument to adjust the name correspondingly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "id": "72eea5119410473aa328ad9291626812",
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import typing as t\n",
+    "\n",
+    "from langchain.memory.chat_message_histories.cratedb import CrateDBMessageConverter\n",
+    "from langchain.schema import _message_to_dict\n",
+    "\n",
+    "Base = declarative_base()\n",
+    "\n",
+    "\n",
+    "class MessageWithDifferentSessionIdColumn(Base):\n",
+    "    __tablename__ = \"message_store_different_session_id\"\n",
+    "    id = sa.Column(sa.BigInteger, primary_key=True, server_default=sa.func.now())\n",
+    "    custom_session_id = sa.Column(sa.Text)\n",
+    "    message = sa.Column(sa.Text)\n",
+    "\n",
+    "\n",
+    "class CustomMessageConverterWithDifferentSessionIdColumn(CrateDBMessageConverter):\n",
+    "    def __init__(self):\n",
+    "        self.model_class = MessageWithDifferentSessionIdColumn\n",
+    "\n",
+    "    def to_sql_model(self, message: BaseMessage, custom_session_id: str) -> t.Any:\n",
+    "        return self.model_class(\n",
+    "            custom_session_id=custom_session_id,\n",
+    "            message=json.dumps(_message_to_dict(message)),\n",
+    "        )\n",
+    "\n",
+    "\n",
+    "if __name__ == \"__main__\":\n",
+    "    Base.metadata.drop_all(bind=sa.create_engine(CONNECTION_STRING))\n",
+    "\n",
+    "    chat_message_history = CrateDBChatMessageHistory(\n",
+    "        session_id=\"test_session\",\n",
+    "        connection_string=CONNECTION_STRING,\n",
+    "        custom_message_converter=CustomMessageConverterWithDifferentSessionIdColumn(),\n",
+    "        session_id_field_name=\"custom_session_id\",\n",
+    "    )\n",
+    "\n",
+    "    chat_message_history.add_user_message(\"Hello\")\n",
+    "    chat_message_history.add_ai_message(\"Hi\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 58,
+   "id": "8edb47106e1a46a883d545849b8ab81b",
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "[HumanMessage(content='Hello', additional_kwargs={}, example=False),\n AIMessage(content='Hi', additional_kwargs={}, example=False)]"
+     },
+     "execution_count": 58,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chat_message_history.messages"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/docs/integrations/providers/cratedb.mdx
+++ b/docs/docs/integrations/providers/cratedb.mdx
@ -0,0 +1,203 @@
+# CrateDB
+
+This documentation section shows how to use the CrateDB vector store
+functionality around [`FLOAT_VECTOR`] and [`KNN_MATCH`]. You will learn
+how to use it for similarity search and other purposes.
+
+
+## What is CrateDB?
+
+[CrateDB] is an open-source, distributed, and scalable SQL analytics database
+for storing and analyzing massive amounts of data in near real-time, even with
+complex queries. It is PostgreSQL-compatible, based on [Lucene], and inherits
+the shared-nothing distribution layer of [Elasticsearch].
+
+It provides a distributed, multi-tenant-capable relational database and search
+engine with HTTP and PostgreSQL interfaces, and schema-free objects. It supports
+sharding, partitioning, and replication out of the box.
+
+CrateDB enables you to efficiently store billions of records, and terabytes of
+data, and query it using SQL.
+
+- Provides a standards-based SQL interface for querying relational data, nested
+  documents, geospatial constraints, and vector embeddings at the same time.
+- Improves your operations by storing time-series data, relational metadata,
+  and vector embeddings within a single database.
+- Builds upon approved technologies from Lucene and Elasticsearch.
+
+
+## CrateDB Cloud
+
+- Offers on-demand CrateDB clusters without operational overhead, 
+  with enterprise-grade features and [ISO 27001] certification.
+- The entrypoint to [CrateDB Cloud] is the [CrateDB Cloud Console].
+- Crate.io offers a free tier via [CrateDB Cloud CRFREE].
+- To get started, [sign up] to CrateDB Cloud, deploy a database cluster,
+  and follow the upcoming instructions.
+
+
+## Features
+
+The CrateDB adapter supports the _Vector Store_, _Document Loader_,
+and _Conversational Memory_ subsystems of LangChain.
+
+### Vector Store
+
+`CrateDBVectorSearch` is an API wrapper around CrateDB's `FLOAT_VECTOR` type
+and the corresponding `KNN_MATCH` function, based on SQLAlchemy and CrateDB's
+SQLAlchemy dialect. It provides an interface to store and retrieve floating
+point vectors, and to conduct similarity searches.
+
+Supports:
+- Approximate nearest neighbor search.
+- Euclidean distance.
+
+### Document Loader
+
+`CrateDBLoader` provides loading documents from a database table by an SQL
+query expression or an SQLAlchemy selectable instance.
+
+### Conversational Memory
+
+`CrateDBChatMessageHistory` uses CrateDB to manage conversation history.
+
+
+## Installation and Setup
+
+There are multiple ways to get started with CrateDB.
+
+### Install CrateDB on your local machine
+
+You can [download CrateDB], or use the [OCI image] to run CrateDB on Docker or Podman.
+Note that this is not recommended for production use.
+
+```shell
+docker run --rm -it --name=cratedb --publish=4200:4200 --publish=5432:5432 \
+    --env=CRATE_HEAP_SIZE=4g crate/crate:nightly \
+    -Cdiscovery.type=single-node
+```
+
+### Deploy a cluster on CrateDB Cloud
+
+[CrateDB Cloud] is a managed CrateDB service. Sign up for a [free trial].
+
+### Install Client
+
+```bash
+pip install crash langchain langchain-openai sqlalchemy-cratedb
+```
+
+
+## Usage » Vector Store
+
+For a more detailed walkthrough of the `CrateDBVectorSearch` wrapper, there is also
+a corresponding [Jupyter notebook](/docs/extras/integrations/vectorstores/cratedb.html).
+
+### Provide input data
+The example uses the canonical `state_of_the_union.txt`.
+```shell
+wget https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt
+```
+
+### Set environment variables
+Use a valid OpenAI API key and SQL connection string. This one fits a local instance of CrateDB.
+```shell
+export OPENAI_API_KEY=foobar
+export CRATEDB_CONNECTION_STRING=crate://crate@localhost
+```
+
+### Example
+
+Load and index documents, and invoke query.
+```python
+from langchain.document_loaders import UnstructuredURLLoader
+from langchain.embeddings.openai import OpenAIEmbeddings
+from langchain.text_splitter import CharacterTextSplitter
+from langchain.vectorstores import CrateDBVectorSearch
+
+
+def main():
+  # Load the document, split it into chunks, embed each chunk and load it into the vector store.
+  raw_documents = UnstructuredURLLoader("https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt").load()
+  text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
+  documents = text_splitter.split_documents(raw_documents)
+  db = CrateDBVectorSearch.from_documents(documents, OpenAIEmbeddings())
+
+  query = "What did the president say about Ketanji Brown Jackson"
+  docs = db.similarity_search(query)
+  print(docs[0].page_content)
+
+
+if __name__ == "__main__":
+  main()
+```
+
+
+## Usage » Document Loader
+
+For a more detailed walkthrough of the `CrateDBLoader`, there is also a corresponding
+[Jupyter notebook](/docs/extras/integrations/document_loaders/cratedb.html).
+
+
+### Provide input data
+```shell
+wget https://github.com/crate-workbench/langchain/raw/cratedb/docs/docs/integrations/document_loaders/example_data/mlb_teams_2012.sql
+crash < ./example_data/mlb_teams_2012.sql
+crash --command "REFRESH TABLE mlb_teams_2012;"
+```
+
+### Load documents by SQL query
+```python
+from langchain.document_loaders import CrateDBLoader
+from pprint import pprint
+
+def main():
+  loader = CrateDBLoader(
+      'SELECT * FROM mlb_teams_2012 ORDER BY "Team" LIMIT 5;',
+      url="crate://crate@localhost/",
+  )
+  documents = loader.load()
+  pprint(documents)
+
+if __name__ == "__main__":
+  main()
+```
+
+
+## Usage » Conversational Memory
+
+For a more detailed walkthrough of the `CrateDBChatMessageHistory`, there is also a corresponding
+[Jupyter notebook](/docs/extras/integrations/memory/cratedb_chat_message_history.html).
+
+```python
+from langchain.memory.chat_message_histories import CrateDBChatMessageHistory
+from pprint import pprint
+
+def main():
+  chat_message_history = CrateDBChatMessageHistory(
+      session_id="test_session",
+      connection_string="crate://crate@localhost/",
+  )
+  chat_message_history.add_user_message("Hello")
+  chat_message_history.add_ai_message("Hi")
+  pprint(chat_message_history)
+
+if __name__ == "__main__":
+  main()
+```
+
+
+[CrateDB]: https://github.com/crate/crate
+[CrateDB Cloud]: https://cratedb.com/product
+[CrateDB Cloud Console]: https://console.cratedb.cloud/
+[CrateDB Cloud CRFREE]: https://community.crate.io/t/new-cratedb-cloud-edge-feature-cratedb-cloud-free-tier/1402
+[CrateDB SQLAlchemy dialect]: https://cratedb.com/docs/sqlalchemy-cratedb/
+[download CrateDB]: https://cratedb.com/download
+[Elastisearch]: https://github.com/elastic/elasticsearch
+[`FLOAT_VECTOR`]: https://cratedb.com/docs/crate/reference/en/master/general/ddl/data-types.html#float-vector
+[free trial]: https://cratedb.com/lp-crfree?utm_source=langchain
+[ISO 27001]: https://cratedb.com/blog/cratedb-elevates-its-security-standards-and-achieves-iso-27001-certification
+[`KNN_MATCH`]: https://cratedb.com/docs/crate/reference/en/master/general/builtins/scalar-functions.html#scalar-knn-match
+[Lucene]: https://github.com/apache/lucene
+[OCI image]: https://hub.docker.com/_/crate
+[sign up]: https://console.cratedb.cloud/
--- a/docs/docs/integrations/vectorstores/cratedb.ipynb
+++ b/docs/docs/integrations/vectorstores/cratedb.ipynb
@ -0,0 +1,589 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# CrateDB\n",
+    "\n",
+    "> [CrateDB] is capable of performing both vector and lexical search.\n",
+    "> It is built on top of the Apache Lucene library, talks SQL,\n",
+    "> is PostgreSQL-compatible, and scales like Elasticsearch.\n",
+    "\n",
+    "This notebook shows how to use the CrateDB vector store functionality around\n",
+    "[`FLOAT_VECTOR`] and [`KNN_MATCH`]. You will learn how to use LangChain's\n",
+    "`CrateDBVectorSearch` adapter for similarity search and other purposes.\n",
+    "\n",
+    "It supports:\n",
+    "- Similarity Search with Euclidean Distance\n",
+    "- Maximal Marginal Relevance Search (MMR)\n",
+    "\n",
+    "## What is CrateDB?\n",
+    "\n",
+    "[CrateDB] is an open-source, distributed, and scalable SQL analytics database\n",
+    "for storing and analyzing massive amounts of data in near real-time, even with\n",
+    "complex queries. It is PostgreSQL-compatible, based on [Lucene], and inherits\n",
+    "the shared-nothing distribution layer of [Elasticsearch].\n",
+    "\n",
+    "This example uses the [Python client driver for CrateDB]. For more documentation,\n",
+    "see also [LangChain with CrateDB].\n",
+    "\n",
+    "\n",
+    "[CrateDB]: https://github.com/crate/crate\n",
+    "[Elasticsearch]: https://github.com/elastic/elasticsearch\n",
+    "[`FLOAT_VECTOR`]: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/data-types.html#float-vector\n",
+    "[`KNN_MATCH`]: https://cratedb.com/docs/crate/reference/en/latest/general/builtins/scalar-functions.html#scalar-knn-match\n",
+    "[LangChain with CrateDB]: /docs/extras/integrations/providers/cratedb.html\n",
+    "[Lucene]: https://github.com/apache/lucene\n",
+    "[Python client driver for CrateDB]: https://cratedb.com/docs/python/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "In order to use the CrateDB vector search you must install the sqlalchemy-cratedb package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "pycharm": {
+     "is_executing": true
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Install required packages: LangChain, OpenAI SDK, and the CrateDB SQLAlchemy adapter.\n",
+    "%pip install -qU langchain-community langchain-openai sqlalchemy-cratedb"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "### Credentials\n",
+    "\n",
+    "You will supply credentials through a regular SQLAlchemy connection string, like\n",
+    "`crate://username:password@cratedb.example.org/`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization\n",
+    "\n",
+    "### OpenAI API key\n",
+    "\n",
+    "You need to provide an OpenAI API key, optionally using the environment\n",
+    "variable `OPENAI_API_KEY`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-09-09T08:02:16.802456Z",
+     "start_time": "2023-09-09T08:02:07.065604Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "from dotenv import find_dotenv, load_dotenv\n",
+    "\n",
+    "# Run `export OPENAI_API_KEY=sk-YOUR_OPENAI_API_KEY`.\n",
+    "# Get OpenAI api key from `.env` file.\n",
+    "# Otherwise, prompt for it.\n",
+    "_ = load_dotenv(find_dotenv())\n",
+    "OPENAI_API_KEY = os.environ.get(\"OPENAI_API_KEY\", getpass.getpass(\"OpenAI API key:\"))\n",
+    "os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "You also need to provide a connection string to your CrateDB database cluster,\n",
+    "optionally using the environment variable `CRATEDB_CONNECTION_STRING`.\n",
+    "\n",
+    "This example uses a CrateDB instance on your workstation, which you can start by\n",
+    "running [CrateDB using Docker]. Alternatively, you can also connect to a cluster\n",
+    "running on [CrateDB Cloud].\n",
+    "\n",
+    "[CrateDB Cloud]: https://console.cratedb.cloud/\n",
+    "[CrateDB using Docker]: https://cratedb.com/docs/guide/install/container/\n",
+    "\n",
+    "### CrateDB connection string\n",
+    "\n",
+    "You will need to supply an SQLAlchemy-compatible connection string."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "CONNECTION_STRING = os.environ.get(\n",
+    "    \"CRATEDB_CONNECTION_STRING\",\n",
+    "    \"crate://crate@localhost:4200/?schema=langchain\",\n",
+    ")\n",
+    "\n",
+    "# For CrateDB Cloud, use:\n",
+    "# CONNECTION_STRING = os.environ.get(\n",
+    "#     \"CRATEDB_CONNECTION_STRING\",\n",
+    "#     \"crate://username:password@hostname:4200/?ssl=true&schema=langchain\",\n",
+    "# )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-09-09T08:02:28.174088Z",
+     "start_time": "2023-09-09T08:02:28.162698Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "# Alternatively, the connection string can be assembled from individual\n",
+    "# environment variables.\n",
+    "import os\n",
+    "\n",
+    "CONNECTION_STRING = CrateDBVectorSearch.connection_string_from_db_params(\n",
+    "    driver=os.environ.get(\"CRATEDB_DRIVER\", \"crate\"),\n",
+    "    host=os.environ.get(\"CRATEDB_HOST\", \"localhost\"),\n",
+    "    port=int(os.environ.get(\"CRATEDB_PORT\", \"4200\")),\n",
+    "    database=os.environ.get(\"CRATEDB_DATABASE\", \"langchain\"),\n",
+    "    user=os.environ.get(\"CRATEDB_USER\", \"crate\"),\n",
+    "    password=os.environ.get(\"CRATEDB_PASSWORD\", \"\"),\n",
+    ")\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Import Python Modules\n",
+    "\n",
+    "You will start by importing all required modules."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "from langchain.document_loaders import UnstructuredURLLoader\n",
+    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
+    "from langchain.text_splitter import CharacterTextSplitter\n",
+    "from langchain.vectorstores import CrateDBVectorSearch"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Manage vector store\n",
+    "\n",
+    "In the example above, you created a vector store from scratch. When\n",
+    "aiming to work with an existing vector store, you can initialize it directly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "embeddings = OpenAIEmbeddings()\n",
+    "\n",
+    "store = CrateDBVectorSearch(\n",
+    "    collection_name=\"testdrive\",\n",
+    "    connection_string=CONNECTION_STRING,\n",
+    "    embedding_function=embeddings,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Add items to vector store\n",
+    "\n",
+    "You can also add documents to an existing vector store."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "store.add_documents([Document(page_content=\"foo\")])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "is_executing": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "docs_with_score = store.similarity_search_with_score(\"foo\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_with_score[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_with_score[1]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Update items in vector store\n",
+    "\n",
+    "FIXME"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Foo."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Delete items from vector store\n",
+    "FIXME"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "store.delete(ids=[\"foo\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Load and Index Documents\n",
+    "\n",
+    "Next, you will read input data, and tokenize it. The module will create a table\n",
+    "with the name of the collection. Make sure the collection name is unique, and\n",
+    "that you have the permission to create a table."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "is_executing": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "loader = UnstructuredURLLoader(\n",
+    "    \"https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt\"\n",
+    ")\n",
+    "documents = loader.load()\n",
+    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
+    "docs = text_splitter.split_documents(documents)\n",
+    "\n",
+    "COLLECTION_NAME = \"state_of_the_union_test\"\n",
+    "\n",
+    "db = CrateDBVectorSearch.from_documents(\n",
+    "    embedding=embeddings,\n",
+    "    documents=docs,\n",
+    "    collection_name=COLLECTION_NAME,\n",
+    "    connection_string=CONNECTION_STRING,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Overwriting a Vector Store\n",
+    "\n",
+    "If you have an existing collection, you can overwrite it by using `from_documents`,\n",
+    "aad setting `pre_delete_collection = True`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "db = CrateDBVectorSearch.from_documents(\n",
+    "    documents=docs,\n",
+    "    embedding=embeddings,\n",
+    "    collection_name=COLLECTION_NAME,\n",
+    "    connection_string=CONNECTION_STRING,\n",
+    "    pre_delete_collection=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_with_score = db.similarity_search_with_score(\"foo\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs_with_score[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Query vector store\n",
+    "\n",
+    "### Query directly\n",
+    "\n",
+    "#### Similarity search\n",
+    "Searching by euclidean distance is the default."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-09-09T08:05:11.104135Z",
+     "start_time": "2023-09-09T08:05:10.548998Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
+    "docs_with_score = db.similarity_search_with_score(query)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-09-09T08:05:13.532334Z",
+     "start_time": "2023-09-09T08:05:13.523191Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "for doc, score in docs_with_score:\n",
+    "    print(\"-\" * 80)\n",
+    "    print(\"Score: \", score)\n",
+    "    print(doc.page_content)\n",
+    "    print(\"-\" * 80)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "#### Maximal Marginal Relevance Search (MMR)\n",
+    "Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-09-09T08:05:23.276819Z",
+     "start_time": "2023-09-09T08:05:21.972256Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "docs_with_score = db.max_marginal_relevance_search_with_score(query)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-09-09T08:05:27.478580Z",
+     "start_time": "2023-09-09T08:05:27.470138Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "for doc, score in docs_with_score:\n",
+    "    print(\"-\" * 80)\n",
+    "    print(\"Score: \", score)\n",
+    "    print(doc.page_content)\n",
+    "    print(\"-\" * 80)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "#### Searching in Multiple Collections\n",
+    "`CrateDBVectorSearchMultiCollection` is a special adapter which provides similarity search across\n",
+    "multiple collections. It can not be used for indexing documents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.vectorstores.cratedb import CrateDBVectorSearchMultiCollection\n",
+    "\n",
+    "multisearch = CrateDBVectorSearchMultiCollection(\n",
+    "    collection_names=[\"test_collection_1\", \"test_collection_2\"],\n",
+    "    embedding_function=embeddings,\n",
+    "    connection_string=CONNECTION_STRING,\n",
+    ")\n",
+    "docs_with_score = multisearch.similarity_search_with_score(query)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "### Query by turning into retriever"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "retriever = store.as_retriever()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(retriever)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Usage for retrieval-augmented generation\n",
+    "\n",
+    "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
+    "\n",
+    "- [Tutorials: working with external knowledge](https://python.langchain.com/docs/tutorials/#working-with-external-knowledge)\n",
+    "- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
+    "- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "For detailed documentation of all `CrateDBVectorSearch` features and configurations,\n",
+    "head to the API reference:\n",
+    "https://python.langchain.com/api_reference/cratedb/vectorstores/langchain_cratedb.vectorstores.CrateDBVectorSearch.html"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}