mirror of
https://github.com/hwchase17/langchain
synced 2024-11-06 03:20:49 +00:00
docs: vectorstore upgrades 2 (#6796)
updated vectorstores/ notebooks; added new integrations into ecosystem/integrations/ @dev2049 @rlancemartin, @eyurtsev
This commit is contained in:
parent
d7dbf4aefe
commit
49c864fa18
23
docs/extras/ecosystem/integrations/hologres.mdx
Normal file
23
docs/extras/ecosystem/integrations/hologres.mdx
Normal file
@ -0,0 +1,23 @@
|
||||
# Hologres
|
||||
|
||||
>[Hologres](https://www.alibabacloud.com/help/en/hologres/latest/introduction) is a unified real-time data warehousing service developed by Alibaba Cloud. You can use Hologres to write, update, process, and analyze large amounts of data in real time.
|
||||
>`Hologres` supports standard `SQL` syntax, is compatible with `PostgreSQL`, and supports most PostgreSQL functions. Hologres supports online analytical processing (OLAP) and ad hoc analysis for up to petabytes of data, and provides high-concurrency and low-latency online data services.
|
||||
|
||||
>`Hologres` provides **vector database** functionality by adopting [Proxima](https://www.alibabacloud.com/help/en/hologres/latest/vector-processing).
|
||||
>`Proxima` is a high-performance software library developed by `Alibaba DAMO Academy`. It allows you to search for the nearest neighbors of vectors. Proxima provides higher stability and performance than similar open source software such as Faiss. Proxima allows you to search for similar text or image embeddings with high throughput and low latency. Hologres is deeply integrated with Proxima to provide a high-performance vector search service.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
Click [here](https://www.alibabacloud.com/zh/product/hologres) to fast deploy a Hologres cloud instance.
|
||||
|
||||
```bash
|
||||
pip install psycopg2
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/hologres.html).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Hologres
|
||||
```
|
19
docs/extras/ecosystem/integrations/rockset.mdx
Normal file
19
docs/extras/ecosystem/integrations/rockset.mdx
Normal file
@ -0,0 +1,19 @@
|
||||
# Rockset
|
||||
|
||||
>[Rockset](https://rockset.com/product/) is a real-time analytics database service for serving low latency, high concurrency analytical queries at scale. It builds a Converged Index™ on structured and semi-structured data with an efficient store for vector embeddings. Its support for running SQL on schemaless data makes it a perfect choice for running vector search with metadata filters.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
Make sure you have Rockset account and go to the web console to get the API key. Details can be found on [the website](https://rockset.com/docs/rest-api/).
|
||||
|
||||
```bash
|
||||
pip install rockset
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/rockset.html).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import RocksetDB
|
||||
```
|
20
docs/extras/ecosystem/integrations/singlestoredb.mdx
Normal file
20
docs/extras/ecosystem/integrations/singlestoredb.mdx
Normal file
@ -0,0 +1,20 @@
|
||||
# SingleStoreDB
|
||||
|
||||
>[SingleStoreDB](https://singlestore.com/) is a high-performance distributed SQL database that supports deployment both in the [cloud](https://www.singlestore.com/cloud/) and on-premises. It provides vector storage, and vector functions including [dot_product](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/dot_product.html) and [euclidean_distance](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/euclidean_distance.html), thereby supporting AI applications that require text similarity matching.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There are several ways to establish a [connection](https://singlestoredb-python.labs.singlestore.com/generated/singlestoredb.connect.html) to the database. You can either set up environment variables or pass named parameters to the `SingleStoreDB constructor`.
|
||||
Alternatively, you may provide these parameters to the `from_documents` and `from_texts` methods.
|
||||
|
||||
```bash
|
||||
pip install singlestoredb
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/singlestoredb.html).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import SingleStoreDB
|
||||
```
|
@ -1,15 +1,14 @@
|
||||
# scikit-learn
|
||||
|
||||
This page covers how to use the scikit-learn package within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific scikit-learn wrappers.
|
||||
>[scikit-learn](https://scikit-learn.org/stable/) is an open source collection of machine learning algorithms,
|
||||
> including some implementations of the [k nearest neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html). `SKLearnVectorStore` wraps this implementation and adds the possibility to persist the vector store in json, bson (binary json) or Apache Parquet format.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Install the Python package with `pip install scikit-learn`
|
||||
|
||||
## Wrappers
|
||||
|
||||
### VectorStore
|
||||
## Vector Store
|
||||
|
||||
`SKLearnVectorStore` provides a simple wrapper around the nearest neighbor implementation in the
|
||||
scikit-learn package, allowing you to use it as a vectorstore.
|
||||
|
21
docs/extras/ecosystem/integrations/starrocks.mdx
Normal file
21
docs/extras/ecosystem/integrations/starrocks.mdx
Normal file
@ -0,0 +1,21 @@
|
||||
# StarRocks
|
||||
|
||||
>[StarRocks](https://www.starrocks.io/) is a High-Performance Analytical Database.
|
||||
`StarRocks` is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
|
||||
|
||||
>Usually `StarRocks` is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install pymysql
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/starrocks.html).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import StarRocks
|
||||
```
|
19
docs/extras/ecosystem/integrations/tigris.mdx
Normal file
19
docs/extras/ecosystem/integrations/tigris.mdx
Normal file
@ -0,0 +1,19 @@
|
||||
# Tigris
|
||||
|
||||
> [Tigris](htttps://tigrisdata.com) is an open source Serverless NoSQL Database and Search Platform designed to simplify building high-performance vector search applications.
|
||||
> `Tigris` eliminates the infrastructure complexity of managing, operating, and synchronizing multiple tools, allowing you to focus on building great applications instead.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install tigrisdb openapi-schema-pydantic openai tiktoken
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/tigris.html).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Tigris
|
||||
```
|
22
docs/extras/ecosystem/integrations/typesense.mdx
Normal file
22
docs/extras/ecosystem/integrations/typesense.mdx
Normal file
@ -0,0 +1,22 @@
|
||||
# Typesense
|
||||
|
||||
> [Typesense](https://typesense.org) is an open source, in-memory search engine, that you can either
|
||||
> [self-host](https://typesense.org/docs/guide/install-typesense.html#option-2-local-machine-self-hosting) or run
|
||||
> on [Typesense Cloud](https://cloud.typesense.org/).
|
||||
> `Typesense` focuses on performance by storing the entire index in RAM (with a backup on disk) and also
|
||||
> focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install typesense openapi-schema-pydantic openai tiktoken
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/typesense.html).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Typesense
|
||||
```
|
@ -2,28 +2,34 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Alibaba Cloud OpenSearch\n",
|
||||
"\n",
|
||||
">[Alibaba Cloud Opensearch](https://www.alibabacloud.com/product/opensearch) OpenSearch is a one-stop platform to develop intelligent search services. OpenSearch was built based on the large-scale distributed search engine developed by Alibaba. OpenSearch serves more than 500 business cases in Alibaba Group and thousands of Alibaba Cloud customers. OpenSearch helps develop search services in different search scenarios, including e-commerce, O2O, multimedia, the content industry, communities and forums, and big data query in enterprises.\n",
|
||||
">[Alibaba Cloud Opensearch](https://www.alibabacloud.com/product/opensearch) is a one-stop platform to develop intelligent search services. `OpenSearch` was built on the large-scale distributed search engine developed by `Alibaba`. `OpenSearch` serves more than 500 business cases in Alibaba Group and thousands of Alibaba Cloud customers. `OpenSearch` helps develop search services in different search scenarios, including e-commerce, O2O, multimedia, the content industry, communities and forums, and big data query in enterprises.\n",
|
||||
"\n",
|
||||
">OpenSearch helps you develop high quality, maintenance-free, and high performance intelligent search services to provide your users with high search efficiency and accuracy.\n",
|
||||
">`OpenSearch` helps you develop high quality, maintenance-free, and high performance intelligent search services to provide your users with high search efficiency and accuracy.\n",
|
||||
"\n",
|
||||
">OpenSearch provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results. This topic describes the syntax and usage notes of vector indexes.\n",
|
||||
">`OpenSearch` provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results. This topic describes the syntax and usage notes of vector indexes.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Alibaba Cloud OpenSearch Vector Search Edition`.\n",
|
||||
"To run, you should have an [OpenSearch Vector Search Edition](https://opensearch.console.aliyun.com) instance up and running:\n",
|
||||
"- Read the [help document](https://www.alibabacloud.com/help/en/opensearch/latest/vector-search) to quickly familiarize and configure OpenSearch Vector Search Edition instance.\n"
|
||||
"\n",
|
||||
"Read the [help document](https://www.alibabacloud.com/help/en/opensearch/latest/vector-search) to quickly familiarize and configure OpenSearch Vector Search Edition instance.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install alibabacloud-ha3engine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"After completing the configuration, follow these steps to connect to the instance, index documents, and perform vector retrieval."
|
||||
]
|
||||
@ -33,6 +39,9 @@
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
},
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
@ -49,9 +58,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Split documents and get embeddings by call OpenAI API"
|
||||
]
|
||||
@ -61,6 +68,9 @@
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
},
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
@ -80,7 +90,6 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
@ -94,6 +103,9 @@
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
},
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
@ -133,9 +145,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Create an opensearch access instance by settings."
|
||||
]
|
||||
@ -145,6 +155,9 @@
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
},
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
@ -159,9 +172,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"or"
|
||||
]
|
||||
@ -171,6 +182,9 @@
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
},
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
@ -183,9 +197,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Add texts and build index."
|
||||
]
|
||||
@ -195,6 +207,9 @@
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
},
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
@ -208,9 +223,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Query and retrieve data."
|
||||
]
|
||||
@ -220,6 +233,9 @@
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
},
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
@ -233,9 +249,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Query and retrieve data with metadata\n"
|
||||
]
|
||||
@ -245,6 +259,9 @@
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
},
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
@ -260,7 +277,6 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
@ -272,23 +288,23 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
@ -6,8 +6,9 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# AwaDB\n",
|
||||
"[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n",
|
||||
"This notebook shows how to use functionality related to the AwaDB."
|
||||
">[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `AwaDB`."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -184,7 +185,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1,19 +1,19 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure Cognitive Search"
|
||||
"# Azure Cognitive Search\n",
|
||||
"\n",
|
||||
">[Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Search`) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Install Azure Cognitive Search SDK"
|
||||
"## Install Azure Cognitive Search SDK"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -27,7 +27,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -49,7 +48,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -74,7 +72,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -95,7 +92,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -120,7 +116,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -148,7 +143,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -187,7 +181,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -226,7 +219,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.9.13 ('.venv': venv)",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@ -240,9 +233,8 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.10.6"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "645053d6307d413a1a75681b5ebb6449bb2babba4bcb0bf65a1ddc3dbefb108a"
|
||||
@ -250,5 +242,5 @@
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
@ -9,20 +9,6 @@
|
||||
"\n",
|
||||
">[Chroma](https://docs.trychroma.com/getting-started) is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.\n",
|
||||
"\n",
|
||||
"<a href=\"https://discord.gg/MMeYNTmh3x\" target=\"_blank\">\n",
|
||||
" <img src=\"https://img.shields.io/discord/1073293645303795742\" alt=\"Discord\" />\n",
|
||||
"</a> \n",
|
||||
"<a href=\"https://github.com/chroma-core/chroma/blob/master/LICENSE\" target=\"_blank\">\n",
|
||||
" <img src=\"https://img.shields.io/static/v1?label=license&message=Apache 2.0&color=white\" alt=\"License\" />\n",
|
||||
"</a> \n",
|
||||
"<img src=\"https://github.com/chroma-core/chroma/actions/workflows/chroma-integration-test.yml/badge.svg?branch=main\" alt=\"Integration Tests\" />\n",
|
||||
"\n",
|
||||
"- [Website](https://www.trychroma.com/)\n",
|
||||
"- [Documentation](https://docs.trychroma.com/)\n",
|
||||
"- [Twitter](https://twitter.com/trychroma)\n",
|
||||
"- [Discord](https://discord.gg/MMeYNTmh3x)\n",
|
||||
"\n",
|
||||
"Chroma is fully-typed, fully-tested and fully-documented.\n",
|
||||
"\n",
|
||||
"Install Chroma with:\n",
|
||||
"\n",
|
||||
@ -47,19 +33,6 @@
|
||||
"View full docs at [docs](https://docs.trychroma.com/reference/Collection). To access these methods directly, you can do `._collection_.method()`\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "12e83df7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# first install dependencies\n",
|
||||
"!pip install langchain\n",
|
||||
"!pip install langchainplus_sdk\n",
|
||||
"!pip install chromadb\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b5ffbf8",
|
||||
@ -576,7 +549,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -14,22 +14,12 @@
|
||||
"This notebook shows how to use functionality related to the `Elasticsearch` database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# ElasticVectorSearch class"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "tKSYjyTBtSLc"
|
||||
},
|
||||
"id": "tKSYjyTBtSLc"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409"
|
||||
"id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409",
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Installation"
|
||||
@ -104,8 +94,8 @@
|
||||
"execution_count": null,
|
||||
"id": "d6197931-cbe5-460c-a5e6-b5eedb83887c",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "d6197931-cbe5-460c-a5e6-b5eedb83887c"
|
||||
"id": "d6197931-cbe5-460c-a5e6-b5eedb83887c",
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -117,9 +107,9 @@
|
||||
"execution_count": null,
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
"outputId": "fd16b37f-cb76-40a9-b83f-eab58dd0d912"
|
||||
"outputId": "fd16b37f-cb76-40a9-b83f-eab58dd0d912",
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
@ -141,8 +131,8 @@
|
||||
"cell_type": "markdown",
|
||||
"id": "f6030187-0bd7-4798-8372-a265036af5e0",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "f6030187-0bd7-4798-8372-a265036af5e0"
|
||||
"id": "f6030187-0bd7-4798-8372-a265036af5e0",
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Example"
|
||||
@ -153,8 +143,8 @@
|
||||
"execution_count": null,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "aac9563e"
|
||||
"id": "aac9563e",
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -169,8 +159,8 @@
|
||||
"execution_count": null,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "a3c3999a"
|
||||
"id": "a3c3999a",
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -189,8 +179,8 @@
|
||||
"execution_count": null,
|
||||
"id": "12eb86d8",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "12eb86d8"
|
||||
"id": "12eb86d8",
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -235,43 +225,49 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# ElasticKnnSearch Class\n",
|
||||
"The `ElasticKnnSearch` implements features allowing storing vectors and documents in Elasticsearch for use with approximate [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html)"
|
||||
],
|
||||
"id": "FheGPztJsrRB",
|
||||
"metadata": {
|
||||
"id": "FheGPztJsrRB"
|
||||
},
|
||||
"id": "FheGPztJsrRB"
|
||||
"source": [
|
||||
"# ElasticKnnSearch Class\n",
|
||||
"The `ElasticKnnSearch` implements features allowing storing vectors and documents in Elasticsearch for use with approximate [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!pip install langchain elasticsearch"
|
||||
],
|
||||
"execution_count": null,
|
||||
"id": "gRVcbh5zqCJQ",
|
||||
"metadata": {
|
||||
"id": "gRVcbh5zqCJQ"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "gRVcbh5zqCJQ"
|
||||
"source": [
|
||||
"!pip install langchain elasticsearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "TJtqiw5AqBp8",
|
||||
"metadata": {
|
||||
"id": "TJtqiw5AqBp8"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores.elastic_vector_search import ElasticKnnSearch\n",
|
||||
"from langchain.embeddings import ElasticsearchEmbeddings\n",
|
||||
"import elasticsearch"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "TJtqiw5AqBp8"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "TJtqiw5AqBp8"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "XHfC0As6qN3T",
|
||||
"metadata": {
|
||||
"id": "XHfC0As6qN3T"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize ElasticsearchEmbeddings\n",
|
||||
"model_id = \"<model_id_from_es>\"\n",
|
||||
@ -281,16 +277,16 @@
|
||||
"es_password = \"es_pass\"\n",
|
||||
"test_index = \"<index_name>\"\n",
|
||||
"# input_field = \"your_input_field\" # if different from 'text_field'"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "XHfC0As6qN3T"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "XHfC0As6qN3T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "UkTipx1lqc3h",
|
||||
"metadata": {
|
||||
"id": "UkTipx1lqc3h"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Generate embedding object\n",
|
||||
"embeddings = ElasticsearchEmbeddings.from_credentials(\n",
|
||||
@ -300,16 +296,16 @@
|
||||
" es_user=es_user,\n",
|
||||
" es_password=es_password,\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "UkTipx1lqc3h"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "UkTipx1lqc3h"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "74psgD0oqjYK",
|
||||
"metadata": {
|
||||
"id": "74psgD0oqjYK"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize ElasticKnnSearch\n",
|
||||
"knn_search = ElasticKnnSearch(\n",
|
||||
@ -319,26 +315,26 @@
|
||||
" index_name=test_index,\n",
|
||||
" embedding=embeddings,\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "74psgD0oqjYK"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "74psgD0oqjYK"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test adding vectors"
|
||||
],
|
||||
"id": "7AfgIKLWqnQl",
|
||||
"metadata": {
|
||||
"id": "7AfgIKLWqnQl"
|
||||
},
|
||||
"id": "7AfgIKLWqnQl"
|
||||
"source": [
|
||||
"## Test adding vectors"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "yNUUIaL9qmze",
|
||||
"metadata": {
|
||||
"id": "yNUUIaL9qmze"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Test `add_texts` method\n",
|
||||
"texts = [\"Hello, world!\", \"Machine learning is fun.\", \"I love Python.\"]\n",
|
||||
@ -351,26 +347,26 @@
|
||||
" \"Python is great for data analysis.\",\n",
|
||||
"]\n",
|
||||
"knn_search.from_texts(new_texts, dims=dims)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "yNUUIaL9qmze"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "yNUUIaL9qmze"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test knn search using query vector builder "
|
||||
],
|
||||
"id": "0zdR-Iubquov",
|
||||
"metadata": {
|
||||
"id": "0zdR-Iubquov"
|
||||
},
|
||||
"id": "0zdR-Iubquov"
|
||||
"source": [
|
||||
"## Test knn search using query vector builder "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bwR4jYvqqxTo",
|
||||
"metadata": {
|
||||
"id": "bwR4jYvqqxTo"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
@ -387,26 +383,26 @@
|
||||
"print(\n",
|
||||
" f\"The 'text' field value from the top hit is: '{hybrid_result['hits']['hits'][0]['_source']['text']}'\"\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "bwR4jYvqqxTo"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "bwR4jYvqqxTo"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test knn search using pre generated vector \n"
|
||||
],
|
||||
"id": "ltXYqp0qqz7R",
|
||||
"metadata": {
|
||||
"id": "ltXYqp0qqz7R"
|
||||
},
|
||||
"id": "ltXYqp0qqz7R"
|
||||
"source": [
|
||||
"## Test knn search using pre generated vector \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "O5COtpTqq23t",
|
||||
"metadata": {
|
||||
"id": "O5COtpTqq23t"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Generate embedding for tests\n",
|
||||
"query_text = \"Hello\"\n",
|
||||
@ -428,26 +424,26 @@
|
||||
"print(\n",
|
||||
" f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\"\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "O5COtpTqq23t"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "O5COtpTqq23t"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test source option"
|
||||
],
|
||||
"id": "0dnmimcJq42C",
|
||||
"metadata": {
|
||||
"id": "0dnmimcJq42C"
|
||||
},
|
||||
"id": "0dnmimcJq42C"
|
||||
"source": [
|
||||
"## Test source option"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "v4_B72nHq7g1",
|
||||
"metadata": {
|
||||
"id": "v4_B72nHq7g1"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
@ -460,26 +456,26 @@
|
||||
" query=query, model_id=model_id, k=2, source=False\n",
|
||||
")\n",
|
||||
"assert not \"_source\" in hybrid_result[\"hits\"][\"hits\"][0].keys()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "v4_B72nHq7g1"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "v4_B72nHq7g1"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test fields option "
|
||||
],
|
||||
"id": "teHgJgrlq-Jb",
|
||||
"metadata": {
|
||||
"id": "teHgJgrlq-Jb"
|
||||
},
|
||||
"id": "teHgJgrlq-Jb"
|
||||
"source": [
|
||||
"## Test fields option "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "utNBbpZYrAYW",
|
||||
"metadata": {
|
||||
"id": "utNBbpZYrAYW"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
@ -492,72 +488,72 @@
|
||||
" query=query, model_id=model_id, k=2, fields=[\"text\"]\n",
|
||||
")\n",
|
||||
"assert \"text\" in hybrid_result[\"hits\"][\"hits\"][0][\"fields\"].keys()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "utNBbpZYrAYW"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "utNBbpZYrAYW"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### Test with es client connection rather than cloud_id "
|
||||
],
|
||||
"id": "hddsIFferBy1",
|
||||
"metadata": {
|
||||
"id": "hddsIFferBy1"
|
||||
},
|
||||
"id": "hddsIFferBy1"
|
||||
"source": [
|
||||
"### Test with es client connection rather than cloud_id "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bXqrUnoirFia",
|
||||
"metadata": {
|
||||
"id": "bXqrUnoirFia"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create Elasticsearch connection\n",
|
||||
"es_connection = Elasticsearch(\n",
|
||||
" hosts=[\"https://es_cluster_url:port\"], basic_auth=(\"user\", \"password\")\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "bXqrUnoirFia"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "bXqrUnoirFia"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "TIM__Hm8rSEW",
|
||||
"metadata": {
|
||||
"id": "TIM__Hm8rSEW"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Instantiate ElasticsearchEmbeddings using es_connection\n",
|
||||
"embeddings = ElasticsearchEmbeddings.from_es_connection(\n",
|
||||
" model_id,\n",
|
||||
" es_connection,\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "TIM__Hm8rSEW"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "TIM__Hm8rSEW"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1-CdnOrArVc_",
|
||||
"metadata": {
|
||||
"id": "1-CdnOrArVc_"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize ElasticKnnSearch\n",
|
||||
"knn_search = ElasticKnnSearch(\n",
|
||||
" es_connection=es_connection, index_name=test_index, embedding=embeddings\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "1-CdnOrArVc_"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "1-CdnOrArVc_"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0kgyaL6QrYVF",
|
||||
"metadata": {
|
||||
"id": "0kgyaL6QrYVF"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
@ -566,16 +562,13 @@
|
||||
"print(\n",
|
||||
" f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\"\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "0kgyaL6QrYVF"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "0kgyaL6QrYVF"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
@ -592,11 +585,8 @@
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
},
|
||||
"colab": {
|
||||
"provenance": []
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
|
@ -16,6 +16,15 @@
|
||||
"Click [here](https://www.alibabacloud.com/zh/product/hologres) to fast deploy a Hologres cloud instance."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install psycopg2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
@ -149,7 +158,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -5,7 +5,7 @@
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# MongoDB Atlas Vector Search\n",
|
||||
"# MongoDB Atlas\n",
|
||||
"\n",
|
||||
">[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a fully-managed cloud database available in AWS , Azure, and GCP. It now has support for native Vector Search on your MongoDB document data.\n",
|
||||
"\n",
|
||||
@ -214,7 +214,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
@ -96,7 +96,7 @@
|
||||
"id": "01a9a035",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### similarity_search using Approximate k-NN\n",
|
||||
"## similarity_search using Approximate k-NN\n",
|
||||
"\n",
|
||||
"`similarity_search` using `Approximate k-NN` Search with Custom Parameters"
|
||||
]
|
||||
@ -182,7 +182,7 @@
|
||||
"id": "0d0cd877",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### similarity_search using Script Scoring\n",
|
||||
"## similarity_search using Script Scoring\n",
|
||||
"\n",
|
||||
"`similarity_search` using `Script Scoring` with Custom Parameters"
|
||||
]
|
||||
@ -221,7 +221,7 @@
|
||||
"id": "a4af96cc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### similarity_search using Painless Scripting\n",
|
||||
"## similarity_search using Painless Scripting\n",
|
||||
"\n",
|
||||
"`similarity_search` using `Painless Scripting` with Custom Parameters"
|
||||
]
|
||||
@ -258,32 +258,35 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4f8fb0d0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Maximum marginal relevance search (MMR)\n",
|
||||
"## Maximum marginal relevance search (MMR)\n",
|
||||
"If you’d like to look up for some similar documents, but you’d also like to receive diverse results, MMR is method you should consider. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ba85e092",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10, lambda_param=0.5)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73264864",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using a preexisting OpenSearch instance\n",
|
||||
"## Using a preexisting OpenSearch instance\n",
|
||||
"\n",
|
||||
"It's also possible to use a preexisting OpenSearch instance with documents that already have vectors present."
|
||||
]
|
||||
@ -330,7 +333,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -201,14 +201,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Similarity search with score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity Search with Euclidean Distance (Default)"
|
||||
"## Similarity Search with Euclidean Distance (Default)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -303,14 +296,14 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Working with vectorstore in PG"
|
||||
"## Working with vectorstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Uploading a vectorstore in PG "
|
||||
"### Uploading a vectorstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -336,7 +329,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Retrieving a vectorstore in PG"
|
||||
"### Retrieving a vectorstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -498,7 +491,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.7"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1,20 +1,18 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "20b588b4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Rockset Vector Search\n",
|
||||
"# Rockset\n",
|
||||
"\n",
|
||||
"[Rockset](https://rockset.com/product/) is a real-time analytics database service for serving low latency, high concurrency analytical queries at scale. It builds a Converged Index™ on structured and semi-structured data with an efficient store for vector embeddings. Its support for running SQL on schemaless data makes it a perfect choice for running vector search with metadata filters. \n",
|
||||
">[Rockset](https://rockset.com/product/) is a real-time analytics database service for serving low latency, high concurrency analytical queries at scale. It builds a Converged Index™ on structured and semi-structured data with an efficient store for vector embeddings. Its support for running SQL on schemaless data makes it a perfect choice for running vector search with metadata filters. \n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use Rockset as a vectorstore in langchain. To get started, make sure you have a Rockset account and an API key available."
|
||||
"This notebook demonstrates how to use `Rockset` as a vectorstore in langchain. To get started, make sure you have a `Rockset` account and an API key available."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "e290ddc0",
|
||||
"metadata": {},
|
||||
@ -25,7 +23,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7d77bbbe",
|
||||
"metadata": {},
|
||||
@ -52,7 +49,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7951c9cd",
|
||||
"metadata": {},
|
||||
@ -71,7 +67,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "8600900d",
|
||||
"metadata": {},
|
||||
@ -80,12 +75,11 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "3bf2f818",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using Rockset langchain vectorstore"
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -109,7 +103,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "474636a2",
|
||||
"metadata": {},
|
||||
@ -138,7 +131,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1404cada",
|
||||
"metadata": {},
|
||||
@ -173,7 +165,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f1290844",
|
||||
"metadata": {},
|
||||
@ -205,7 +196,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "5e15d630",
|
||||
"metadata": {},
|
||||
@ -243,7 +233,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "0765b822",
|
||||
"metadata": {},
|
||||
@ -266,7 +255,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "03fa12a9",
|
||||
"metadata": {},
|
||||
@ -277,6 +265,14 @@
|
||||
"\n",
|
||||
"Keep an eye on https://rockset.com/blog/introducing-vector-search-on-rockset/ for future updates in this space!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2763dddb-e87d-4d3b-b0bf-c246b0573d87",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@ -295,7 +291,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.6"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
@ -6,7 +6,9 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# SingleStoreDB\n",
|
||||
"[SingleStoreDB](https://singlestore.com/) is a high-performance distributed SQL database that supports deployment both in the [cloud](https://www.singlestore.com/cloud/) and on-premises. It provides vector storage, and vector functions including [dot_product](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/dot_product.html) and [euclidean_distance](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/euclidean_distance.html), thereby supporting AI applications that require text similarity matching. This tutorial illustrates how to [work with vector data in SingleStoreDB](https://docs.singlestore.com/managed-service/en/developer-resources/functional-extensions/working-with-vector-data.html)."
|
||||
">[SingleStoreDB](https://singlestore.com/) is a high-performance distributed SQL database that supports deployment both in the [cloud](https://www.singlestore.com/cloud/) and on-premises. It provides vector storage, and vector functions including [dot_product](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/dot_product.html) and [euclidean_distance](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/euclidean_distance.html), thereby supporting AI applications that require text similarity matching. \n",
|
||||
"\n",
|
||||
"This tutorial illustrates how to [work with vector data in SingleStoreDB](https://docs.singlestore.com/managed-service/en/developer-resources/functional-extensions/working-with-vector-data.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -129,7 +131,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.2"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1,13 +1,12 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# SKLearnVectorStore\n",
|
||||
"# scikit-learn\n",
|
||||
"\n",
|
||||
"[scikit-learn](https://scikit-learn.org/stable/) is an open source collection of machine learning algorithms, including some implementations of the [k nearest neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html). `SKLearnVectorStore` wraps this implementation and adds the possibility to persist the vector store in json, bson (binary json) or Apache Parquet format.\n",
|
||||
">[scikit-learn](https://scikit-learn.org/stable/) is an open source collection of machine learning algorithms, including some implementations of the [k nearest neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html). `SKLearnVectorStore` wraps this implementation and adds the possibility to persist the vector store in json, bson (binary json) or Apache Parquet format.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the `SKLearnVectorStore` vector database."
|
||||
]
|
||||
@ -28,7 +27,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -48,7 +46,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -76,7 +73,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -120,7 +116,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -190,7 +185,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@ -209,7 +203,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "sofia",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@ -223,10 +217,9 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.16"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
@ -7,11 +7,10 @@
|
||||
"source": [
|
||||
"# StarRocks\n",
|
||||
"\n",
|
||||
"[StarRocks | A High-Performance Analytical Database](https://www.starrocks.io/)\n",
|
||||
">[StarRocks](https://www.starrocks.io/) is a High-Performance Analytical Database.\n",
|
||||
"`StarRocks` is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.\n",
|
||||
"\n",
|
||||
"StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.\n",
|
||||
"\n",
|
||||
"Usually StarRocks is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.\n",
|
||||
">Usually `StarRocks` is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.\n",
|
||||
"\n",
|
||||
"Here we'll show how to use the StarRocks Vector Store."
|
||||
]
|
||||
@ -21,8 +20,17 @@
|
||||
"id": "1685854f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Import all used modules"
|
||||
"## Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "311d44bb-4aca-4f3b-8f97-5e1f29238e40",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install pymysql"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -305,7 +313,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -2,68 +2,67 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tigris\n",
|
||||
"\n",
|
||||
"> [Tigris](htttps://tigrisdata.com) is an open source Serverless NoSQL Database and Search Platform designed to simplify building high-performance vector search applications.\n",
|
||||
"> Tigris eliminates the infrastructure complexity of managing, operating, and synchronizing multiple tools, allowing you to focus on building great applications instead."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
"> `Tigris` eliminates the infrastructure complexity of managing, operating, and synchronizing multiple tools, allowing you to focus on building great applications instead."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook guides you how to use Tigris as your VectorStore"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Pre requisites**\n",
|
||||
"1. An OpenAI account. You can sign up for an account [here](https://platform.openai.com/)\n",
|
||||
"2. [Sign up for a free Tigris account](https://console.preview.tigrisdata.cloud). Once you have signed up for the Tigris account, create a new project called `vectordemo`. Next, make a note of the *Uri* for the region you've created your project in, the **clientId** and **clientSecret**. You can get all this information from the **Application Keys** section of the project."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's first install our dependencies:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install tigrisdb openapi-schema-pydantic openai tiktoken"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will load the `OpenAI` api key and `Tigris` credentials in our environment"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
@ -73,38 +72,42 @@
|
||||
"os.environ[\"TIGRIS_PROJECT\"] = getpass.getpass(\"Tigris Project Name:\")\n",
|
||||
"os.environ[\"TIGRIS_CLIENT_ID\"] = getpass.getpass(\"Tigris Client Id:\")\n",
|
||||
"os.environ[\"TIGRIS_CLIENT_SECRET\"] = getpass.getpass(\"Tigris Client Secret:\")"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Tigris\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Initialize Tigris vector store\n",
|
||||
"Let's import our test dataset:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||||
@ -113,87 +116,89 @@
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store = Tigris.from_documents(docs, embeddings, index_name=\"my_embeddings\")"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity Search"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = vector_store.similarity_search(query)\n",
|
||||
"print(found_docs)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity Search with score (vector distance)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = vector_store.similarity_search_with_score(query)\n",
|
||||
"for doc, score in result:\n",
|
||||
" print(f\"document={doc}, score={score}\")"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
@ -2,6 +2,7 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Typesense\n",
|
||||
"\n",
|
||||
@ -10,97 +11,105 @@
|
||||
"> Typesense focuses on performance by storing the entire index in RAM (with a backup on disk) and also focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults.\n",
|
||||
">\n",
|
||||
"> It also lets you combine attribute-based filtering together with vector queries, to fetch the most relevant documents."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook shows you how to use Typesense as your VectorStore."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's first install our dependencies:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install typesense openapi-schema-pydantic openai tiktoken"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-23T22:48:02.968822Z",
|
||||
"start_time": "2023-05-23T22:47:48.574094Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-23T22:48:02.968822Z",
|
||||
"start_time": "2023-05-23T22:47:48.574094Z"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-23T22:50:34.775893Z",
|
||||
"start_time": "2023-05-23T22:50:34.771889Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Typesense\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-23T22:50:34.775893Z",
|
||||
"start_time": "2023-05-23T22:50:34.771889Z"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's import our test dataset:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-23T22:56:19.093489Z",
|
||||
"start_time": "2023-05-23T22:56:19.089Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||||
@ -109,18 +118,17 @@
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-23T22:56:19.093489Z",
|
||||
"start_time": "2023-05-23T22:56:19.089Z"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docsearch = Typesense.from_documents(\n",
|
||||
@ -134,98 +142,103 @@
|
||||
" \"typesense_collection_name\": \"lang-chain\",\n",
|
||||
" },\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Similarity Search"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = docsearch.similarity_search(query)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(found_docs[0].page_content)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Typesense as a Retriever\n",
|
||||
"\n",
|
||||
"Typesense, as all the other vector stores, is a LangChain Retriever, by using cosine similarity."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = docsearch.as_retriever()\n",
|
||||
"retriever"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"retriever.get_relevant_documents(query)[0]"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user