langchain/docs/modules/indexes/vectorstores/examples/awadb.ipynb
Harrison Chase 9218684759
Add a new vector store - AwaDB (#5971) (#5992)
Added AwaDB vector store, which is a wrapper over the AwaDB, that can be
used as a vector storage and has an efficient similarity search. Added
integration tests for the vector store
Added jupyter notebook with the example

Delete a unneeded empty file and resolve the
conflict(https://github.com/hwchase17/langchain/pull/5886)

Please check, Thanks!

@dev2049
@hwchase17

---------

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: ljeagle <vincent_jieli@yeah.net>
Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-10 15:42:32 -07:00

195 lines
4.4 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "833c4789",
"metadata": {},
"source": [
"# AwaDB\n",
"[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n",
"This notebook shows how to use functionality related to the AwaDB."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "252930ea",
"metadata": {},
"outputs": [],
"source": [
"!pip install awadb"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2b71a47",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import AwaDB\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "49be0bac",
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size= 100, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "18714278",
"metadata": {},
"outputs": [],
"source": [
"db = AwaDB.from_documents(docs)\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62b7a4c5",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "a9b4be48",
"metadata": {},
"source": [
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence."
]
},
{
"cell_type": "markdown",
"id": "87fec6b5",
"metadata": {},
"source": [
"## Similarity search with score"
]
},
{
"cell_type": "markdown",
"id": "17231924",
"metadata": {},
"source": [
"The returned distance score is between 0-1. 0 is dissimilar, 1 is the most similar"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f40ddae1",
"metadata": {},
"outputs": [],
"source": [
"docs = db.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0045583",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0])"
]
},
{
"cell_type": "markdown",
"id": "8c2da99d",
"metadata": {},
"source": [
"(Document(page_content='And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}), 0.561813814013747)"
]
},
{
"cell_type": "markdown",
"id": "0b49fb59",
"metadata": {},
"source": [
"## Restore the table created and added data before"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1bfa6e25",
"metadata": {},
"outputs": [],
"source": [
"AwaDB automatically persists added document data"
]
},
{
"cell_type": "markdown",
"id": "2a0f3b35",
"metadata": {},
"source": [
"If you can restore the table you created and added before, you can just do this as below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1fd4b5b0",
"metadata": {},
"outputs": [],
"source": [
"awadb_client = awadb.Client()\n",
"ret = awadb_client.Load('langchain_awadb')\n",
"if ret : print('awadb load table success')\n",
"else:\n",
" print('awadb load table failed')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ae9a9dd",
"metadata": {},
"outputs": [],
"source": [
"awadb load table success"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}