langchain/tests/integration_tests/document_loaders/test_rocksetdb.py
Aarav Borthakur 210296a71f
Integrate Rockset as a document loader (#7681)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Integrate [Rockset](https://rockset.com/docs/) as a document loader.

Issue: None
Dependencies: Nothing new (rockset's dependency was already added
[here](https://github.com/hwchase17/langchain/pull/6216))
Tag maintainer: @rlancemartin

I have added a test for the integration and an example notebook showing
its use. I ran `make lint` and everything looks good.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-14 07:58:13 -07:00

61 lines
2.1 KiB
Python

import logging
import os
from langchain.docstore.document import Document
from langchain.document_loaders import RocksetLoader
logger = logging.getLogger(__name__)
def test_sql_query() -> None:
import rockset
assert os.environ.get("ROCKSET_API_KEY") is not None
assert os.environ.get("ROCKSET_REGION") is not None
api_key = os.environ.get("ROCKSET_API_KEY")
region = os.environ.get("ROCKSET_REGION")
if region == "use1a1":
host = rockset.Regions.use1a1
elif region == "usw2a1":
host = rockset.Regions.usw2a1
elif region == "euc1a1":
host = rockset.Regions.euc1a1
elif region == "dev":
host = rockset.DevRegions.usw2a1
else:
logger.warning(
"Using ROCKSET_REGION:%s as it is.. \
You should know what you're doing...",
region,
)
host = region
client = rockset.RocksetClient(host, api_key)
col_1 = "Rockset is a real-time analytics database which enables queries on massive, semi-structured data without operational burden. Rockset is serverless and fully managed. It offloads the work of managing configuration, cluster provisioning, denormalization, and shard / index management. Rockset is also SOC 2 Type II compliant and offers encryption at rest and in flight, securing and protecting any sensitive data. Most teams can ingest data into Rockset and start executing queries in less than 15 minutes." # noqa: E501
col_2 = 2
col_3 = "e903e069-b0b5-4b80-95e2-86471b41f55f"
id = 7320132
"""Run a simple SQL query query"""
loader = RocksetLoader(
client,
rockset.models.QueryRequestSql(
query=(
f"SELECT '{col_1}' AS col_1, {col_2} AS col_2, '{col_3}' AS col_3,"
f" {id} AS id"
)
),
["col_1"],
metadata_keys=["col_2", "col_3", "id"],
)
output = loader.load()
assert len(output) == 1
assert isinstance(output[0], Document)
assert output[0].page_content == col_1
assert output[0].metadata == {"col_2": col_2, "col_3": col_3, "id": id}