Integrate Rockset as a document loader (#7681)
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Integrate [Rockset](https://rockset.com/docs/) as a document loader.
Issue: None
Dependencies: Nothing new (rockset's dependency was already added
[here](https://github.com/hwchase17/langchain/pull/6216))
Tag maintainer: @rlancemartin
I have added a test for the integration and an example notebook showing
its use. I ran `make lint` and everything looks good.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-14 14:58:13 +00:00
|
|
|
import logging
|
|
|
|
import os
|
|
|
|
|
2023-11-27 20:48:43 +00:00
|
|
|
from langchain_core.documents import Document
|
|
|
|
|
2023-12-11 21:53:30 +00:00
|
|
|
from langchain_community.document_loaders import RocksetLoader
|
Integrate Rockset as a document loader (#7681)
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Integrate [Rockset](https://rockset.com/docs/) as a document loader.
Issue: None
Dependencies: Nothing new (rockset's dependency was already added
[here](https://github.com/hwchase17/langchain/pull/6216))
Tag maintainer: @rlancemartin
I have added a test for the integration and an example notebook showing
its use. I ran `make lint` and everything looks good.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-14 14:58:13 +00:00
|
|
|
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
|
|
|
|
|
|
def test_sql_query() -> None:
|
|
|
|
import rockset
|
|
|
|
|
|
|
|
assert os.environ.get("ROCKSET_API_KEY") is not None
|
|
|
|
assert os.environ.get("ROCKSET_REGION") is not None
|
|
|
|
|
|
|
|
api_key = os.environ.get("ROCKSET_API_KEY")
|
|
|
|
region = os.environ.get("ROCKSET_REGION")
|
|
|
|
if region == "use1a1":
|
|
|
|
host = rockset.Regions.use1a1
|
|
|
|
elif region == "usw2a1":
|
|
|
|
host = rockset.Regions.usw2a1
|
|
|
|
elif region == "euc1a1":
|
|
|
|
host = rockset.Regions.euc1a1
|
|
|
|
elif region == "dev":
|
|
|
|
host = rockset.DevRegions.usw2a1
|
|
|
|
else:
|
|
|
|
logger.warning(
|
|
|
|
"Using ROCKSET_REGION:%s as it is.. \
|
|
|
|
You should know what you're doing...",
|
|
|
|
region,
|
|
|
|
)
|
|
|
|
|
|
|
|
host = region
|
|
|
|
|
|
|
|
client = rockset.RocksetClient(host, api_key)
|
|
|
|
|
2023-07-31 16:54:59 +00:00
|
|
|
col_1 = "Rockset is a real-time analytics database"
|
Integrate Rockset as a document loader (#7681)
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Integrate [Rockset](https://rockset.com/docs/) as a document loader.
Issue: None
Dependencies: Nothing new (rockset's dependency was already added
[here](https://github.com/hwchase17/langchain/pull/6216))
Tag maintainer: @rlancemartin
I have added a test for the integration and an example notebook showing
its use. I ran `make lint` and everything looks good.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-14 14:58:13 +00:00
|
|
|
col_2 = 2
|
|
|
|
col_3 = "e903e069-b0b5-4b80-95e2-86471b41f55f"
|
|
|
|
id = 7320132
|
|
|
|
|
2023-07-31 16:54:59 +00:00
|
|
|
"""Run a simple SQL query"""
|
Integrate Rockset as a document loader (#7681)
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Integrate [Rockset](https://rockset.com/docs/) as a document loader.
Issue: None
Dependencies: Nothing new (rockset's dependency was already added
[here](https://github.com/hwchase17/langchain/pull/6216))
Tag maintainer: @rlancemartin
I have added a test for the integration and an example notebook showing
its use. I ran `make lint` and everything looks good.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-14 14:58:13 +00:00
|
|
|
loader = RocksetLoader(
|
|
|
|
client,
|
|
|
|
rockset.models.QueryRequestSql(
|
|
|
|
query=(
|
|
|
|
f"SELECT '{col_1}' AS col_1, {col_2} AS col_2, '{col_3}' AS col_3,"
|
|
|
|
f" {id} AS id"
|
|
|
|
)
|
|
|
|
),
|
|
|
|
["col_1"],
|
|
|
|
metadata_keys=["col_2", "col_3", "id"],
|
|
|
|
)
|
|
|
|
|
|
|
|
output = loader.load()
|
|
|
|
|
|
|
|
assert len(output) == 1
|
|
|
|
assert isinstance(output[0], Document)
|
|
|
|
assert output[0].page_content == col_1
|
|
|
|
assert output[0].metadata == {"col_2": col_2, "col_3": col_3, "id": id}
|