langchain/libs/community/langchain_community/utilities/rememberizer.py

"""Wrapper for Rememberizer APIs."""

from typing import Any, Dict, List, Optional, cast

import requests
from langchain_core.documents import Document
from langchain_core.utils import get_from_dict_or_env
from pydantic import BaseModel, model_validator


class RememberizerAPIWrapper(BaseModel):
    """Wrapper for Rememberizer APIs."""

    top_k_results: int = 10
    rememberizer_api_key: Optional[str] = None

    @model_validator(mode="before")
    @classmethod
    def validate_environment(cls, values: Dict) -> Any:
        """Validate that api key in environment."""
        rememberizer_api_key = get_from_dict_or_env(
            values, "rememberizer_api_key", "REMEMBERIZER_API_KEY"
        )
        values["rememberizer_api_key"] = rememberizer_api_key

        return values

    def search(self, query: str) -> dict:
        """Search for a query in the Rememberizer API."""
        url = f"https://api.rememberizer.ai/api/v1/documents/search?q={query}&n={self.top_k_results}"
        response = requests.get(
            url, headers={"x-api-key": cast(str, self.rememberizer_api_key)}
        )
        data = response.json()

        if response.status_code != 200:
            raise ValueError(f"API Error: {data}")

        matched_chunks = data.get("matched_chunks", [])
        return matched_chunks

    def load(self, query: str) -> List[Document]:
        matched_chunks = self.search(query)
        docs = []
        for matched_chunk in matched_chunks:
            docs.append(
                Document(
                    page_content=matched_chunk["matched_content"],
                    metadata=matched_chunk["document"],
                )
            )
        return docs
community[minor]: Rememberizer retriever (#20052) Description: This pull request introduces a new feature for LangChain: the integration with the Rememberizer API through a custom retriever. This enables LangChain applications to allow users to load and sync their data from Dropbox, Google Drive, Slack, their hard drive into a vector database that LangChain can query. Queries involve sending text chunks generated within LangChain and retrieving a collection of semantically relevant user data for inclusion in LLM prompts. User knowledge dramatically improved AI applications. The Rememberizer integration will also allow users to access general purpose vectorized data such as Reddit channel discussions and US patents. Issue: N/A Dependencies: N/A Twitter handle: https://twitter.com/Rememberizer 2024-05-01 14:41:44 +00:00			`"""Wrapper for Rememberizer APIs."""`
infra: update mypy 1.10, ruff 0.5 (#23721) ```python """python scripts/update_mypy_ruff.py""" import glob import tomllib from pathlib import Path import toml import subprocess import re ROOT_DIR = Path(__file__).parents[1] def main(): for path in glob.glob(str(ROOT_DIR / "libs/*/pyproject.toml"), recursive=True): print(path) with open(path, "rb") as f: pyproject = tomllib.load(f) try: pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = ( "^1.10" ) pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = ( "^0.5" ) except KeyError: continue with open(path, "w") as f: toml.dump(pyproject, f) cwd = "/".join(path.split("/")[:-1]) completed = subprocess.run( "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color", cwd=cwd, shell=True, capture_output=True, text=True, ) logs = completed.stdout.split("\n") to_ignore = {} for l in logs: if re.match("^(.)\:(\d+)\: error:.\[(.)\]", l): path, line_no, error_type = re.match( "^(.)\:(\d+)\: error:.\[(.*)\]", l ).groups() if (path, line_no) in to_ignore: to_ignore[(path, line_no)].append(error_type) else: to_ignore[(path, line_no)] = [error_type] print(len(to_ignore)) for (error_path, line_no), error_types in to_ignore.items(): all_errors = ", ".join(error_types) full_path = f"{cwd}/{error_path}" try: with open(full_path, "r") as f: file_lines = f.readlines() except FileNotFoundError: continue file_lines[int(line_no) - 1] = ( file_lines[int(line_no) - 1][:-1] + f" # type: ignore[{all_errors}]\n" ) with open(full_path, "w") as f: f.write("".join(file_lines)) subprocess.run( "poetry run ruff format .; poetry run ruff --select I --fix .", cwd=cwd, shell=True, capture_output=True, text=True, ) if __name__ == "__main__": main() ``` 2024-07-03 17:33:27 +00:00
multiple: pydantic 2 compatibility, v0.3 (#26443) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com> 2024-09-13 21:38:45 +00:00			`from typing import Any, Dict, List, Optional, cast`
community[minor]: Rememberizer retriever (#20052) Description: This pull request introduces a new feature for LangChain: the integration with the Rememberizer API through a custom retriever. This enables LangChain applications to allow users to load and sync their data from Dropbox, Google Drive, Slack, their hard drive into a vector database that LangChain can query. Queries involve sending text chunks generated within LangChain and retrieving a collection of semantically relevant user data for inclusion in LLM prompts. User knowledge dramatically improved AI applications. The Rememberizer integration will also allow users to access general purpose vectorized data such as Reddit channel discussions and US patents. Issue: N/A Dependencies: N/A Twitter handle: https://twitter.com/Rememberizer 2024-05-01 14:41:44 +00:00
			`import requests`
			`from langchain_core.documents import Document`
			`from langchain_core.utils import get_from_dict_or_env`
multiple: pydantic 2 compatibility, v0.3 (#26443) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com> 2024-09-13 21:38:45 +00:00			`from pydantic import BaseModel, model_validator`
community[minor]: Rememberizer retriever (#20052) Description: This pull request introduces a new feature for LangChain: the integration with the Rememberizer API through a custom retriever. This enables LangChain applications to allow users to load and sync their data from Dropbox, Google Drive, Slack, their hard drive into a vector database that LangChain can query. Queries involve sending text chunks generated within LangChain and retrieving a collection of semantically relevant user data for inclusion in LLM prompts. User knowledge dramatically improved AI applications. The Rememberizer integration will also allow users to access general purpose vectorized data such as Reddit channel discussions and US patents. Issue: N/A Dependencies: N/A Twitter handle: https://twitter.com/Rememberizer 2024-05-01 14:41:44 +00:00

			`class RememberizerAPIWrapper(BaseModel):`
			`"""Wrapper for Rememberizer APIs."""`

			`top_k_results: int = 10`
			`rememberizer_api_key: Optional[str] = None`

multiple: pydantic 2 compatibility, v0.3 (#26443) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com> 2024-09-13 21:38:45 +00:00			`@model_validator(mode="before")`
			`@classmethod`
			`def validate_environment(cls, values: Dict) -> Any:`
community[minor]: Rememberizer retriever (#20052) Description: This pull request introduces a new feature for LangChain: the integration with the Rememberizer API through a custom retriever. This enables LangChain applications to allow users to load and sync their data from Dropbox, Google Drive, Slack, their hard drive into a vector database that LangChain can query. Queries involve sending text chunks generated within LangChain and retrieving a collection of semantically relevant user data for inclusion in LLM prompts. User knowledge dramatically improved AI applications. The Rememberizer integration will also allow users to access general purpose vectorized data such as Reddit channel discussions and US patents. Issue: N/A Dependencies: N/A Twitter handle: https://twitter.com/Rememberizer 2024-05-01 14:41:44 +00:00			`"""Validate that api key in environment."""`
			`rememberizer_api_key = get_from_dict_or_env(`
			`values, "rememberizer_api_key", "REMEMBERIZER_API_KEY"`
			`)`
			`values["rememberizer_api_key"] = rememberizer_api_key`

			`return values`

			`def search(self, query: str) -> dict:`
			`"""Search for a query in the Rememberizer API."""`
			`url = f"https://api.rememberizer.ai/api/v1/documents/search?q={query}&n={self.top_k_results}"`
community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676) This PR add supports for Azure Cosmos DB for NoSQL vector store. Summary: Description: added vector store integration for Azure Cosmos DB for NoSQL Vector Store, Dependencies: azure-cosmos dependency, Tag maintainer: @hwchase17, @baskaryan @efriis @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> 2024-06-11 17:34:01 +00:00			`response = requests.get(`
			`url, headers={"x-api-key": cast(str, self.rememberizer_api_key)}`
			`)`
community[minor]: Rememberizer retriever (#20052) Description: This pull request introduces a new feature for LangChain: the integration with the Rememberizer API through a custom retriever. This enables LangChain applications to allow users to load and sync their data from Dropbox, Google Drive, Slack, their hard drive into a vector database that LangChain can query. Queries involve sending text chunks generated within LangChain and retrieving a collection of semantically relevant user data for inclusion in LLM prompts. User knowledge dramatically improved AI applications. The Rememberizer integration will also allow users to access general purpose vectorized data such as Reddit channel discussions and US patents. Issue: N/A Dependencies: N/A Twitter handle: https://twitter.com/Rememberizer 2024-05-01 14:41:44 +00:00			`data = response.json()`

			`if response.status_code != 200:`
			`raise ValueError(f"API Error: {data}")`

			`matched_chunks = data.get("matched_chunks", [])`
			`return matched_chunks`

			`def load(self, query: str) -> List[Document]:`
			`matched_chunks = self.search(query)`
			`docs = []`
			`for matched_chunk in matched_chunks:`
			`docs.append(`
			`Document(`
			`page_content=matched_chunk["matched_content"],`
			`metadata=matched_chunk["document"],`
			`)`
			`)`
			`return docs`