Resolve: VectorSearch enabled SQLChain? (#10177)
Squashed from #7454 with updated features
We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and
put everything into `experimental/`.
Below is the original PR message from #7454.
-------
We have been working on features to fill up the gap among SQL, vector
search and LLM applications. Some inspiring works like self-query
retrievers for VectorStores (for example
[Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html)
and
[others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html))
really turn those vector search databases into a powerful knowledge
base! 🚀🚀
We are thinking if we can merge all in one, like SQL and vector search
and LLMChains, making this SQL vector database memory as the only source
of your data. Here are some benefits we can think of for now, maybe you
have more 👀:
With ALL data you have: since you store all your pasta in the database,
you don't need to worry about the foreign keys or links between names
from other data source.
Flexible data structure: Even if you have changed your schema, for
example added a table, the LLM will know how to JOIN those tables and
use those as filters.
SQL compatibility: We found that vector databases that supports SQL in
the marketplace have similar interfaces, which means you can change your
backend with no pain, just change the name of the distance function in
your DB solution and you are ready to go!
### Issue resolved:
- [Feature Proposal: VectorSearch enabled
SQLChain?](https://github.com/hwchase17/langchain/issues/5122)
### Change made in this PR:
- An improved schema handling that ignore `types.NullType` columns
- A SQL output Parser interface in `SQLDatabaseChain` to enable Vector
SQL capability and further more
- A Retriever based on `SQLDatabaseChain` to retrieve data from the
database for RetrievalQAChains and many others
- Allow `SQLDatabaseChain` to retrieve data in python native format
- Includes PR #6737
- Vector SQL Output Parser for `SQLDatabaseChain` and
`SQLDatabaseChainRetriever`
- Prompts that can implement text to VectorSQL
- Corresponding unit-tests and notebook
### Twitter handle:
- @MyScaleDB
### Tag Maintainer:
Prompts / General: @hwchase17, @baskaryan
DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
### Dependencies:
No dependency added
2023-09-07 00:08:12 +00:00
{
"cells": [
{
"cell_type": "markdown",
"id": "245065c6",
"metadata": {},
"source": [
"# Vector SQL Retriever with MyScale\n",
"\n",
">[MyScale](https://docs.myscale.com/en/) is an integrated vector database. You can access your database in SQL and also from here, LangChain. MyScale can make a use of [various data types and functions for filters](https://blog.myscale.com/2023/06/06/why-integrated-database-solution-can-boost-your-llm-apps/#filter-on-anything-without-constraints). It will boost up your LLM app no matter if you are scaling up your data or expand your system to broader application."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0246c5bf",
"metadata": {},
"outputs": [],
"source": [
"!pip3 install clickhouse-sqlalchemy InstructorEmbedding sentence_transformers openai langchain-experimental"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7585d2c3",
"metadata": {},
"outputs": [],
"source": [
"from os import environ\n",
"import getpass\n",
"from typing import Dict, Any\n",
2023-10-29 22:50:09 +00:00
"from langchain.llms import OpenAI\n",
"from langchain.utilities import SQLDatabase\n",
"from langchain.chains import LLMChain\n",
Resolve: VectorSearch enabled SQLChain? (#10177)
Squashed from #7454 with updated features
We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and
put everything into `experimental/`.
Below is the original PR message from #7454.
-------
We have been working on features to fill up the gap among SQL, vector
search and LLM applications. Some inspiring works like self-query
retrievers for VectorStores (for example
[Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html)
and
[others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html))
really turn those vector search databases into a powerful knowledge
base! 🚀🚀
We are thinking if we can merge all in one, like SQL and vector search
and LLMChains, making this SQL vector database memory as the only source
of your data. Here are some benefits we can think of for now, maybe you
have more 👀:
With ALL data you have: since you store all your pasta in the database,
you don't need to worry about the foreign keys or links between names
from other data source.
Flexible data structure: Even if you have changed your schema, for
example added a table, the LLM will know how to JOIN those tables and
use those as filters.
SQL compatibility: We found that vector databases that supports SQL in
the marketplace have similar interfaces, which means you can change your
backend with no pain, just change the name of the distance function in
your DB solution and you are ready to go!
### Issue resolved:
- [Feature Proposal: VectorSearch enabled
SQLChain?](https://github.com/hwchase17/langchain/issues/5122)
### Change made in this PR:
- An improved schema handling that ignore `types.NullType` columns
- A SQL output Parser interface in `SQLDatabaseChain` to enable Vector
SQL capability and further more
- A Retriever based on `SQLDatabaseChain` to retrieve data from the
database for RetrievalQAChains and many others
- Allow `SQLDatabaseChain` to retrieve data in python native format
- Includes PR #6737
- Vector SQL Output Parser for `SQLDatabaseChain` and
`SQLDatabaseChainRetriever`
- Prompts that can implement text to VectorSQL
- Corresponding unit-tests and notebook
### Twitter handle:
- @MyScaleDB
### Tag Maintainer:
Prompts / General: @hwchase17, @baskaryan
DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
### Dependencies:
No dependency added
2023-09-07 00:08:12 +00:00
"from langchain_experimental.sql.vector_sql import VectorSQLDatabaseChain\n",
"from sqlalchemy import create_engine, Column, MetaData\n",
2023-09-17 00:22:48 +00:00
"from langchain.prompts import PromptTemplate\n",
Resolve: VectorSearch enabled SQLChain? (#10177)
Squashed from #7454 with updated features
We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and
put everything into `experimental/`.
Below is the original PR message from #7454.
-------
We have been working on features to fill up the gap among SQL, vector
search and LLM applications. Some inspiring works like self-query
retrievers for VectorStores (for example
[Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html)
and
[others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html))
really turn those vector search databases into a powerful knowledge
base! 🚀🚀
We are thinking if we can merge all in one, like SQL and vector search
and LLMChains, making this SQL vector database memory as the only source
of your data. Here are some benefits we can think of for now, maybe you
have more 👀:
With ALL data you have: since you store all your pasta in the database,
you don't need to worry about the foreign keys or links between names
from other data source.
Flexible data structure: Even if you have changed your schema, for
example added a table, the LLM will know how to JOIN those tables and
use those as filters.
SQL compatibility: We found that vector databases that supports SQL in
the marketplace have similar interfaces, which means you can change your
backend with no pain, just change the name of the distance function in
your DB solution and you are ready to go!
### Issue resolved:
- [Feature Proposal: VectorSearch enabled
SQLChain?](https://github.com/hwchase17/langchain/issues/5122)
### Change made in this PR:
- An improved schema handling that ignore `types.NullType` columns
- A SQL output Parser interface in `SQLDatabaseChain` to enable Vector
SQL capability and further more
- A Retriever based on `SQLDatabaseChain` to retrieve data from the
database for RetrievalQAChains and many others
- Allow `SQLDatabaseChain` to retrieve data in python native format
- Includes PR #6737
- Vector SQL Output Parser for `SQLDatabaseChain` and
`SQLDatabaseChainRetriever`
- Prompts that can implement text to VectorSQL
- Corresponding unit-tests and notebook
### Twitter handle:
- @MyScaleDB
### Tag Maintainer:
Prompts / General: @hwchase17, @baskaryan
DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
### Dependencies:
No dependency added
2023-09-07 00:08:12 +00:00
"\n",
"\n",
"from sqlalchemy import create_engine\n",
"\n",
myscale notebook url change (#12810)
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
2023-11-03 00:56:26 +00:00
"MYSCALE_HOST = \"msc-4a9e710a.us-east-1.aws.staging.myscale.cloud\"\n",
Resolve: VectorSearch enabled SQLChain? (#10177)
Squashed from #7454 with updated features
We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and
put everything into `experimental/`.
Below is the original PR message from #7454.
-------
We have been working on features to fill up the gap among SQL, vector
search and LLM applications. Some inspiring works like self-query
retrievers for VectorStores (for example
[Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html)
and
[others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html))
really turn those vector search databases into a powerful knowledge
base! 🚀🚀
We are thinking if we can merge all in one, like SQL and vector search
and LLMChains, making this SQL vector database memory as the only source
of your data. Here are some benefits we can think of for now, maybe you
have more 👀:
With ALL data you have: since you store all your pasta in the database,
you don't need to worry about the foreign keys or links between names
from other data source.
Flexible data structure: Even if you have changed your schema, for
example added a table, the LLM will know how to JOIN those tables and
use those as filters.
SQL compatibility: We found that vector databases that supports SQL in
the marketplace have similar interfaces, which means you can change your
backend with no pain, just change the name of the distance function in
your DB solution and you are ready to go!
### Issue resolved:
- [Feature Proposal: VectorSearch enabled
SQLChain?](https://github.com/hwchase17/langchain/issues/5122)
### Change made in this PR:
- An improved schema handling that ignore `types.NullType` columns
- A SQL output Parser interface in `SQLDatabaseChain` to enable Vector
SQL capability and further more
- A Retriever based on `SQLDatabaseChain` to retrieve data from the
database for RetrievalQAChains and many others
- Allow `SQLDatabaseChain` to retrieve data in python native format
- Includes PR #6737
- Vector SQL Output Parser for `SQLDatabaseChain` and
`SQLDatabaseChainRetriever`
- Prompts that can implement text to VectorSQL
- Corresponding unit-tests and notebook
### Twitter handle:
- @MyScaleDB
### Tag Maintainer:
Prompts / General: @hwchase17, @baskaryan
DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
### Dependencies:
No dependency added
2023-09-07 00:08:12 +00:00
"MYSCALE_PORT = 443\n",
"MYSCALE_USER = \"chatdata\"\n",
"MYSCALE_PASSWORD = \"myscale_rocks\"\n",
"OPENAI_API_KEY = getpass.getpass(\"OpenAI API Key:\")\n",
"\n",
"engine = create_engine(\n",
" f\"clickhouse://{MYSCALE_USER}:{MYSCALE_PASSWORD}@{MYSCALE_HOST}:{MYSCALE_PORT}/default?protocol=https\"\n",
")\n",
"metadata = MetaData(bind=engine)\n",
"environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e08d9ddc",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import HuggingFaceInstructEmbeddings\n",
"from langchain_experimental.sql.vector_sql import VectorSQLOutputParser\n",
"\n",
"output_parser = VectorSQLOutputParser.from_embeddings(\n",
" model=HuggingFaceInstructEmbeddings(\n",
" model_name=\"hkunlp/instructor-xl\", model_kwargs={\"device\": \"cpu\"}\n",
" )\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84b705b2",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.callbacks import StdOutCallbackHandler\n",
"\n",
"from langchain.utilities.sql_database import SQLDatabase\n",
"from langchain_experimental.sql.prompt import MYSCALE_PROMPT\n",
"from langchain_experimental.sql.vector_sql import VectorSQLDatabaseChain\n",
"\n",
"chain = VectorSQLDatabaseChain(\n",
" llm_chain=LLMChain(\n",
" llm=OpenAI(openai_api_key=OPENAI_API_KEY, temperature=0),\n",
" prompt=MYSCALE_PROMPT,\n",
" ),\n",
" top_k=10,\n",
" return_direct=True,\n",
" sql_cmd_parser=output_parser,\n",
" database=SQLDatabase(engine, None, metadata),\n",
")\n",
"\n",
"import pandas as pd\n",
"\n",
"pd.DataFrame(\n",
" chain.run(\n",
" \"Please give me 10 papers to ask what is PageRank?\",\n",
" callbacks=[StdOutCallbackHandler()],\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"id": "6c09cda0",
"metadata": {},
"source": [
"## SQL Database as Retriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "734d7ff5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains.qa_with_sources.retrieval import RetrievalQAWithSourcesChain\n",
"\n",
"from langchain_experimental.sql.vector_sql import VectorSQLDatabaseChain\n",
2023-10-29 22:50:09 +00:00
"from langchain_experimental.retrievers.vector_sql_database import (\n",
" VectorSQLDatabaseChainRetriever,\n",
")\n",
Resolve: VectorSearch enabled SQLChain? (#10177)
Squashed from #7454 with updated features
We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and
put everything into `experimental/`.
Below is the original PR message from #7454.
-------
We have been working on features to fill up the gap among SQL, vector
search and LLM applications. Some inspiring works like self-query
retrievers for VectorStores (for example
[Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html)
and
[others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html))
really turn those vector search databases into a powerful knowledge
base! 🚀🚀
We are thinking if we can merge all in one, like SQL and vector search
and LLMChains, making this SQL vector database memory as the only source
of your data. Here are some benefits we can think of for now, maybe you
have more 👀:
With ALL data you have: since you store all your pasta in the database,
you don't need to worry about the foreign keys or links between names
from other data source.
Flexible data structure: Even if you have changed your schema, for
example added a table, the LLM will know how to JOIN those tables and
use those as filters.
SQL compatibility: We found that vector databases that supports SQL in
the marketplace have similar interfaces, which means you can change your
backend with no pain, just change the name of the distance function in
your DB solution and you are ready to go!
### Issue resolved:
- [Feature Proposal: VectorSearch enabled
SQLChain?](https://github.com/hwchase17/langchain/issues/5122)
### Change made in this PR:
- An improved schema handling that ignore `types.NullType` columns
- A SQL output Parser interface in `SQLDatabaseChain` to enable Vector
SQL capability and further more
- A Retriever based on `SQLDatabaseChain` to retrieve data from the
database for RetrievalQAChains and many others
- Allow `SQLDatabaseChain` to retrieve data in python native format
- Includes PR #6737
- Vector SQL Output Parser for `SQLDatabaseChain` and
`SQLDatabaseChainRetriever`
- Prompts that can implement text to VectorSQL
- Corresponding unit-tests and notebook
### Twitter handle:
- @MyScaleDB
### Tag Maintainer:
Prompts / General: @hwchase17, @baskaryan
DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
### Dependencies:
No dependency added
2023-09-07 00:08:12 +00:00
"from langchain_experimental.sql.prompt import MYSCALE_PROMPT\n",
"from langchain_experimental.sql.vector_sql import VectorSQLRetrieveAllOutputParser\n",
"\n",
"output_parser_retrieve_all = VectorSQLRetrieveAllOutputParser.from_embeddings(\n",
" output_parser.model\n",
")\n",
"\n",
"chain = VectorSQLDatabaseChain.from_llm(\n",
" llm=OpenAI(openai_api_key=OPENAI_API_KEY, temperature=0),\n",
" prompt=MYSCALE_PROMPT,\n",
" top_k=10,\n",
" return_direct=True,\n",
" db=SQLDatabase(engine, None, metadata),\n",
" sql_cmd_parser=output_parser_retrieve_all,\n",
" native_format=True,\n",
")\n",
"\n",
"# You need all those keys to get docs\n",
2023-10-29 22:50:09 +00:00
"retriever = VectorSQLDatabaseChainRetriever(\n",
" sql_db_chain=chain, page_content_key=\"abstract\"\n",
")\n",
Resolve: VectorSearch enabled SQLChain? (#10177)
Squashed from #7454 with updated features
We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and
put everything into `experimental/`.
Below is the original PR message from #7454.
-------
We have been working on features to fill up the gap among SQL, vector
search and LLM applications. Some inspiring works like self-query
retrievers for VectorStores (for example
[Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html)
and
[others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html))
really turn those vector search databases into a powerful knowledge
base! 🚀🚀
We are thinking if we can merge all in one, like SQL and vector search
and LLMChains, making this SQL vector database memory as the only source
of your data. Here are some benefits we can think of for now, maybe you
have more 👀:
With ALL data you have: since you store all your pasta in the database,
you don't need to worry about the foreign keys or links between names
from other data source.
Flexible data structure: Even if you have changed your schema, for
example added a table, the LLM will know how to JOIN those tables and
use those as filters.
SQL compatibility: We found that vector databases that supports SQL in
the marketplace have similar interfaces, which means you can change your
backend with no pain, just change the name of the distance function in
your DB solution and you are ready to go!
### Issue resolved:
- [Feature Proposal: VectorSearch enabled
SQLChain?](https://github.com/hwchase17/langchain/issues/5122)
### Change made in this PR:
- An improved schema handling that ignore `types.NullType` columns
- A SQL output Parser interface in `SQLDatabaseChain` to enable Vector
SQL capability and further more
- A Retriever based on `SQLDatabaseChain` to retrieve data from the
database for RetrievalQAChains and many others
- Allow `SQLDatabaseChain` to retrieve data in python native format
- Includes PR #6737
- Vector SQL Output Parser for `SQLDatabaseChain` and
`SQLDatabaseChainRetriever`
- Prompts that can implement text to VectorSQL
- Corresponding unit-tests and notebook
### Twitter handle:
- @MyScaleDB
### Tag Maintainer:
Prompts / General: @hwchase17, @baskaryan
DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
### Dependencies:
No dependency added
2023-09-07 00:08:12 +00:00
"\n",
"document_with_metadata_prompt = PromptTemplate(\n",
" input_variables=[\"page_content\", \"id\", \"title\", \"authors\", \"pubdate\", \"categories\"],\n",
" template=\"Content:\\n\\tTitle: {title}\\n\\tAbstract: {page_content}\\n\\tAuthors: {authors}\\n\\tDate of Publication: {pubdate}\\n\\tCategories: {categories}\\nSOURCE: {id}\",\n",
")\n",
"\n",
"chain = RetrievalQAWithSourcesChain.from_chain_type(\n",
" ChatOpenAI(\n",
" model_name=\"gpt-3.5-turbo-16k\", openai_api_key=OPENAI_API_KEY, temperature=0.6\n",
" ),\n",
" retriever=retriever,\n",
" chain_type=\"stuff\",\n",
" chain_type_kwargs={\n",
" \"document_prompt\": document_with_metadata_prompt,\n",
" },\n",
" return_source_documents=True,\n",
")\n",
2023-10-29 22:50:09 +00:00
"ans = chain(\n",
" \"Please give me 10 papers to ask what is PageRank?\",\n",
" callbacks=[StdOutCallbackHandler()],\n",
")\n",
Resolve: VectorSearch enabled SQLChain? (#10177)
Squashed from #7454 with updated features
We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and
put everything into `experimental/`.
Below is the original PR message from #7454.
-------
We have been working on features to fill up the gap among SQL, vector
search and LLM applications. Some inspiring works like self-query
retrievers for VectorStores (for example
[Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html)
and
[others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html))
really turn those vector search databases into a powerful knowledge
base! 🚀🚀
We are thinking if we can merge all in one, like SQL and vector search
and LLMChains, making this SQL vector database memory as the only source
of your data. Here are some benefits we can think of for now, maybe you
have more 👀:
With ALL data you have: since you store all your pasta in the database,
you don't need to worry about the foreign keys or links between names
from other data source.
Flexible data structure: Even if you have changed your schema, for
example added a table, the LLM will know how to JOIN those tables and
use those as filters.
SQL compatibility: We found that vector databases that supports SQL in
the marketplace have similar interfaces, which means you can change your
backend with no pain, just change the name of the distance function in
your DB solution and you are ready to go!
### Issue resolved:
- [Feature Proposal: VectorSearch enabled
SQLChain?](https://github.com/hwchase17/langchain/issues/5122)
### Change made in this PR:
- An improved schema handling that ignore `types.NullType` columns
- A SQL output Parser interface in `SQLDatabaseChain` to enable Vector
SQL capability and further more
- A Retriever based on `SQLDatabaseChain` to retrieve data from the
database for RetrievalQAChains and many others
- Allow `SQLDatabaseChain` to retrieve data in python native format
- Includes PR #6737
- Vector SQL Output Parser for `SQLDatabaseChain` and
`SQLDatabaseChainRetriever`
- Prompts that can implement text to VectorSQL
- Corresponding unit-tests and notebook
### Twitter handle:
- @MyScaleDB
### Tag Maintainer:
Prompts / General: @hwchase17, @baskaryan
DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
### Dependencies:
No dependency added
2023-09-07 00:08:12 +00:00
"print(ans[\"answer\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4948ff25",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}