mirror of https://github.com/hwchase17/langchain
Clarifai integration (#5954)
# Changes This PR adds [Clarifai](https://www.clarifai.com/) integration to Langchain. Clarifai is an end-to-end AI Platform. Clarifai offers user the ability to use many types of LLM (OpenAI, cohere, ect and other open source models). As well, a clarifai app can be treated as a vector database to upload and retrieve data. The integrations includes: - Clarifai LLM integration: Clarifai supports many types of language model that users can utilize for their application - Clarifai VectorDB: A Clarifai application can hold data and embeddings. You can run semantic search with the embeddings #### Before submitting - [x] Added integration test for LLM - [x] Added integration test for VectorDB - [x] Added notebook for LLM - [x] Added notebook for VectorDB Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>pull/6593/head
parent
7f6f5c2a6a
commit
6e57306a13
@ -0,0 +1,210 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9597802c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Clarifai\n",
|
||||
"\n",
|
||||
">[Clarifai](https://www.clarifai.com/) is a AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model building and inference.\n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with `Clarifai` [models](https://clarifai.com/explore/models)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2a773d8d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Dependencies"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "91ea14ce-831d-409a-a88f-30353acdabd1",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Install required dependencies\n",
|
||||
"!pip install clarifai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "426f1156",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Imports\n",
|
||||
"Here we will be setting the personal access token. You can find your PAT under settings/security on the platform."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "3f5dc9d7-65e3-4b5b-9086-3327d016cfe0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Please login and get your API key from https://clarifai.com/settings/security \n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"CLARIFAI_PAT_KEY = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "6fb585dd",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Import the required modules\n",
|
||||
"from langchain.llms import Clarifai\n",
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "16521ed2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Input\n",
|
||||
"Create a prompt template to be used with the LLM Chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "035dea0f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c8905eac",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Setup\n",
|
||||
"Setup the user id and app id where the model resides. You can find a list of public models on https://clarifai.com/explore/models\n",
|
||||
"\n",
|
||||
"You will have to also initialize the model id and if needed, the model version id. Some models have many versions, you can choose the one appropriate for your task."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "1fe9bf15",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"USER_ID = 'openai'\n",
|
||||
"APP_ID = 'chat-completion'\n",
|
||||
"MODEL_ID = 'chatgpt-3_5-turbo'\n",
|
||||
"\n",
|
||||
"# You can provide a specific model version\n",
|
||||
"# model_version_id = \"MODEL_VERSION_ID\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "3f3458d9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize a Clarifai LLM\n",
|
||||
"clarifai_llm = Clarifai(clarifai_pat_key=CLARIFAI_PAT_KEY, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "a641dbd9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create LLM chain\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=clarifai_llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "3e87c71a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Run Chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "9f844993",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Justin Bieber was born on March 1, 1994. So, we need to look at the Super Bowl that was played in the year 1994. \\n\\nThe Super Bowl in 1994 was Super Bowl XXVIII (28). It was played on January 30, 1994, between the Dallas Cowboys and the Buffalo Bills. \\n\\nThe Dallas Cowboys won the Super Bowl in 1994, defeating the Buffalo Bills by a score of 30-13. \\n\\nTherefore, the Dallas Cowboys are the NFL team that won the Super Bowl in the year Justin Bieber was born.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
|
||||
"\n",
|
||||
"llm_chain.run(question)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,176 @@
|
||||
"""Wrapper around Clarifai's APIs."""
|
||||
import logging
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import Extra, root_validator
|
||||
|
||||
from langchain.callbacks.manager import CallbackManagerForLLMRun
|
||||
from langchain.llms.base import LLM
|
||||
from langchain.llms.utils import enforce_stop_tokens
|
||||
from langchain.utils import get_from_dict_or_env
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class Clarifai(LLM):
|
||||
"""Wrapper around Clarifai's large language models.
|
||||
|
||||
To use, you should have an account on the Clarifai platform,
|
||||
the ``clarifai`` python package installed, and the
|
||||
environment variable ``CLARIFAI_PAT_KEY`` set with your PAT key,
|
||||
or pass it as a named parameter to the constructor.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.llms import Clarifai
|
||||
clarifai_llm = Clarifai(clarifai_pat_key=CLARIFAI_PAT_KEY, \
|
||||
user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)
|
||||
"""
|
||||
|
||||
stub: Any #: :meta private:
|
||||
metadata: Any
|
||||
userDataObject: Any
|
||||
|
||||
model_id: Optional[str] = None
|
||||
"""Model id to use."""
|
||||
|
||||
model_version_id: Optional[str] = None
|
||||
"""Model version id to use."""
|
||||
|
||||
app_id: Optional[str] = None
|
||||
"""Clarifai application id to use."""
|
||||
|
||||
user_id: Optional[str] = None
|
||||
"""Clarifai user id to use."""
|
||||
|
||||
clarifai_pat_key: Optional[str] = None
|
||||
|
||||
api_base: str = "https://api.clarifai.com"
|
||||
|
||||
stop: Optional[List[str]] = None
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
extra = Extra.forbid
|
||||
|
||||
@root_validator()
|
||||
def validate_environment(cls, values: Dict) -> Dict:
|
||||
"""Validate that we have all required info to access Clarifai
|
||||
platform and python package exists in environment."""
|
||||
values["clarifai_pat_key"] = get_from_dict_or_env(
|
||||
values, "clarifai_pat_key", "CLARIFAI_PAT_KEY"
|
||||
)
|
||||
user_id = values.get("user_id")
|
||||
app_id = values.get("app_id")
|
||||
model_id = values.get("model_id")
|
||||
|
||||
if values["clarifai_pat_key"] is None:
|
||||
raise ValueError("Please provide a clarifai_pat_key.")
|
||||
if user_id is None:
|
||||
raise ValueError("Please provide a user_id.")
|
||||
if app_id is None:
|
||||
raise ValueError("Please provide a app_id.")
|
||||
if model_id is None:
|
||||
raise ValueError("Please provide a model_id.")
|
||||
return values
|
||||
|
||||
@property
|
||||
def _default_params(self) -> Dict[str, Any]:
|
||||
"""Get the default parameters for calling Cohere API."""
|
||||
return {}
|
||||
|
||||
@property
|
||||
def _identifying_params(self) -> Dict[str, Any]:
|
||||
"""Get the identifying parameters."""
|
||||
return {**{"model_id": self.model_id}}
|
||||
|
||||
@property
|
||||
def _llm_type(self) -> str:
|
||||
"""Return type of llm."""
|
||||
return "clarifai"
|
||||
|
||||
def _call(
|
||||
self,
|
||||
prompt: str,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForLLMRun] = None,
|
||||
**kwargs: Any
|
||||
) -> str:
|
||||
"""Call out to Clarfai's PostModelOutputs endpoint.
|
||||
|
||||
Args:
|
||||
prompt: The prompt to pass into the model.
|
||||
stop: Optional list of stop words to use when generating.
|
||||
|
||||
Returns:
|
||||
The string generated by the model.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
response = clarifai_llm("Tell me a joke.")
|
||||
"""
|
||||
|
||||
try:
|
||||
from clarifai.auth.helper import ClarifaiAuthHelper
|
||||
from clarifai.client import create_stub
|
||||
from clarifai_grpc.grpc.api import (
|
||||
resources_pb2,
|
||||
service_pb2,
|
||||
)
|
||||
from clarifai_grpc.grpc.api.status import status_code_pb2
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"Could not import clarifai python package. "
|
||||
"Please install it with `pip install clarifai`."
|
||||
)
|
||||
|
||||
auth = ClarifaiAuthHelper(
|
||||
user_id=self.user_id,
|
||||
app_id=self.app_id,
|
||||
pat=self.clarifai_pat_key,
|
||||
base=self.api_base,
|
||||
)
|
||||
self.userDataObject = auth.get_user_app_id_proto()
|
||||
self.stub = create_stub(auth)
|
||||
|
||||
params = self._default_params
|
||||
if self.stop is not None and stop is not None:
|
||||
raise ValueError("`stop` found in both the input and default params.")
|
||||
elif self.stop is not None:
|
||||
params["stop_sequences"] = self.stop
|
||||
else:
|
||||
params["stop_sequences"] = stop
|
||||
|
||||
# The userDataObject is created in the overview and
|
||||
# is required when using a PAT
|
||||
# If version_id None, Defaults to the latest model version
|
||||
post_model_outputs_request = service_pb2.PostModelOutputsRequest(
|
||||
user_app_id=self.userDataObject,
|
||||
model_id=self.model_id,
|
||||
version_id=self.model_version_id,
|
||||
inputs=[
|
||||
resources_pb2.Input(
|
||||
data=resources_pb2.Data(text=resources_pb2.Text(raw=prompt))
|
||||
)
|
||||
],
|
||||
)
|
||||
post_model_outputs_response = self.stub.PostModelOutputs(
|
||||
post_model_outputs_request
|
||||
)
|
||||
|
||||
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
|
||||
logger.error(post_model_outputs_response.status)
|
||||
raise Exception(
|
||||
"Post model outputs failed, status: "
|
||||
+ post_model_outputs_response.status.description
|
||||
)
|
||||
|
||||
text = post_model_outputs_response.outputs[0].data.text.raw
|
||||
|
||||
# In order to make this consistent with other endpoints, we strip them.
|
||||
if stop is not None or self.stop is not None:
|
||||
text = enforce_stop_tokens(text, params["stop_sequences"])
|
||||
return text
|
@ -0,0 +1,355 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import traceback
|
||||
from typing import Any, Iterable, List, Optional, Tuple
|
||||
|
||||
import requests
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.embeddings.base import Embeddings
|
||||
from langchain.vectorstores.base import VectorStore
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class Clarifai(VectorStore):
|
||||
"""Wrapper around Clarifai AI platform's vector store.
|
||||
|
||||
To use, you should have the ``clarifai`` python package installed.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.vectorstores import Clarifai
|
||||
from langchain.embeddings.openai import OpenAIEmbeddings
|
||||
|
||||
embeddings = OpenAIEmbeddings()
|
||||
vectorstore = Clarifai("langchain_store", embeddings.embed_query)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
user_id: Optional[str] = None,
|
||||
app_id: Optional[str] = None,
|
||||
pat: Optional[str] = None,
|
||||
number_of_docs: Optional[int] = None,
|
||||
api_base: Optional[str] = None,
|
||||
) -> None:
|
||||
"""Initialize with Clarifai client.
|
||||
|
||||
Args:
|
||||
user_id (Optional[str], optional): User ID. Defaults to None.
|
||||
app_id (Optional[str], optional): App ID. Defaults to None.
|
||||
pat (Optional[str], optional): Personal access token. Defaults to None.
|
||||
number_of_docs (Optional[int], optional): Number of documents to return
|
||||
during vector search. Defaults to None.
|
||||
api_base (Optional[str], optional): API base. Defaults to None.
|
||||
|
||||
Raises:
|
||||
ValueError: If user ID, app ID or personal access token is not provided.
|
||||
"""
|
||||
try:
|
||||
from clarifai.auth.helper import DEFAULT_BASE, ClarifaiAuthHelper
|
||||
from clarifai.client import create_stub
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import clarifai python package. "
|
||||
"Please install it with `pip install clarifai`."
|
||||
)
|
||||
|
||||
if api_base is None:
|
||||
self._api_base = DEFAULT_BASE
|
||||
|
||||
self._user_id = user_id or os.environ.get("CLARIFAI_USER_ID")
|
||||
self._app_id = app_id or os.environ.get("CLARIFAI_APP_ID")
|
||||
self._pat = pat or os.environ.get("CLARIFAI_PAT_KEY")
|
||||
if self._user_id is None or self._app_id is None or self._pat is None:
|
||||
raise ValueError(
|
||||
"Could not find CLARIFAI_USER_ID, CLARIFAI_APP_ID or\
|
||||
CLARIFAI_PAT in your environment. "
|
||||
"Please set those env variables with a valid user ID, \
|
||||
app ID and personal access token \
|
||||
from https://clarifai.com/settings/security."
|
||||
)
|
||||
|
||||
self._auth = ClarifaiAuthHelper(
|
||||
user_id=self._user_id,
|
||||
app_id=self._app_id,
|
||||
pat=self._pat,
|
||||
base=self._api_base,
|
||||
)
|
||||
self._stub = create_stub(self._auth)
|
||||
self._userDataObject = self._auth.get_user_app_id_proto()
|
||||
self._number_of_docs = number_of_docs
|
||||
|
||||
def _post_text_input(self, text: str, metadata: dict) -> str:
|
||||
"""Post text to Clarifai and return the ID of the input.
|
||||
|
||||
Args:
|
||||
text (str): Text to post.
|
||||
metadata (dict): Metadata to post.
|
||||
|
||||
Returns:
|
||||
str: ID of the input.
|
||||
"""
|
||||
try:
|
||||
from clarifai_grpc.grpc.api import resources_pb2, service_pb2
|
||||
from clarifai_grpc.grpc.api.status import status_code_pb2
|
||||
from google.protobuf.struct_pb2 import Struct # type: ignore
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Could not import clarifai python package. "
|
||||
"Please install it with `pip install clarifai`."
|
||||
) from e
|
||||
|
||||
input_metadata = Struct()
|
||||
input_metadata.update(metadata)
|
||||
|
||||
post_inputs_response = self._stub.PostInputs(
|
||||
service_pb2.PostInputsRequest(
|
||||
user_app_id=self._userDataObject,
|
||||
inputs=[
|
||||
resources_pb2.Input(
|
||||
data=resources_pb2.Data(
|
||||
text=resources_pb2.Text(raw=text),
|
||||
metadata=input_metadata,
|
||||
)
|
||||
)
|
||||
],
|
||||
)
|
||||
)
|
||||
|
||||
if post_inputs_response.status.code != status_code_pb2.SUCCESS:
|
||||
logger.error(post_inputs_response.status)
|
||||
raise Exception(
|
||||
"Post inputs failed, status: " + post_inputs_response.status.description
|
||||
)
|
||||
|
||||
input_id = post_inputs_response.inputs[0].id
|
||||
|
||||
return input_id
|
||||
|
||||
def add_texts(
|
||||
self,
|
||||
texts: Iterable[str],
|
||||
metadatas: Optional[List[dict]] = None,
|
||||
ids: Optional[List[str]] = None,
|
||||
**kwargs: Any,
|
||||
) -> List[str]:
|
||||
"""Add texts to the Clarifai vectorstore. This will push the text
|
||||
to a Clarifai application.
|
||||
Application use base workflow that create and store embedding for each text.
|
||||
Make sure you are using a base workflow that is compatible with text
|
||||
(such as Language Understanding).
|
||||
|
||||
Args:
|
||||
texts (Iterable[str]): Texts to add to the vectorstore.
|
||||
metadatas (Optional[List[dict]], optional): Optional list of metadatas.
|
||||
ids (Optional[List[str]], optional): Optional list of IDs.
|
||||
|
||||
Returns:
|
||||
List[str]: List of IDs of the added texts.
|
||||
"""
|
||||
|
||||
assert len(list(texts)) > 0, "No texts provided to add to the vectorstore."
|
||||
|
||||
if metadatas is not None:
|
||||
assert len(list(texts)) == len(
|
||||
metadatas
|
||||
), "Number of texts and metadatas should be the same."
|
||||
|
||||
input_ids = []
|
||||
for idx, text in enumerate(texts):
|
||||
try:
|
||||
metadata = metadatas[idx] if metadatas else {}
|
||||
input_id = self._post_text_input(text, metadata)
|
||||
input_ids.append(input_id)
|
||||
logger.debug(f"Input {input_id} posted successfully.")
|
||||
except Exception as error:
|
||||
logger.warning(f"Post inputs failed: {error}")
|
||||
traceback.print_exc()
|
||||
|
||||
return input_ids
|
||||
|
||||
def similarity_search_with_score(
|
||||
self,
|
||||
query: str,
|
||||
k: int = 4,
|
||||
filter: Optional[dict] = None,
|
||||
namespace: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> List[Tuple[Document, float]]:
|
||||
"""Run similarity search with score using Clarifai.
|
||||
|
||||
Args:
|
||||
query (str): Query text to search for.
|
||||
k (int): Number of results to return. Defaults to 4.
|
||||
filter (Optional[Dict[str, str]]): Filter by metadata.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
List[Document]: List of documents most simmilar to the query text.
|
||||
"""
|
||||
try:
|
||||
from clarifai_grpc.grpc.api import resources_pb2, service_pb2
|
||||
from clarifai_grpc.grpc.api.status import status_code_pb2
|
||||
from google.protobuf import json_format # type: ignore
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Could not import clarifai python package. "
|
||||
"Please install it with `pip install clarifai`."
|
||||
) from e
|
||||
|
||||
# Get number of docs to return
|
||||
if self._number_of_docs is not None:
|
||||
k = self._number_of_docs
|
||||
|
||||
post_annotations_searches_response = self._stub.PostAnnotationsSearches(
|
||||
service_pb2.PostAnnotationsSearchesRequest(
|
||||
user_app_id=self._userDataObject,
|
||||
searches=[
|
||||
resources_pb2.Search(
|
||||
query=resources_pb2.Query(
|
||||
ranks=[
|
||||
resources_pb2.Rank(
|
||||
annotation=resources_pb2.Annotation(
|
||||
data=resources_pb2.Data(
|
||||
text=resources_pb2.Text(raw=query),
|
||||
)
|
||||
)
|
||||
)
|
||||
]
|
||||
)
|
||||
)
|
||||
],
|
||||
pagination=service_pb2.Pagination(page=1, per_page=k),
|
||||
)
|
||||
)
|
||||
|
||||
# Check if search was successful
|
||||
if post_annotations_searches_response.status.code != status_code_pb2.SUCCESS:
|
||||
raise Exception(
|
||||
"Post searches failed, status: "
|
||||
+ post_annotations_searches_response.status.description
|
||||
)
|
||||
|
||||
# Retrieve hits
|
||||
hits = post_annotations_searches_response.hits
|
||||
|
||||
docs_and_scores = []
|
||||
# Iterate over hits and retrieve metadata and text
|
||||
for hit in hits:
|
||||
metadata = json_format.MessageToDict(hit.input.data.metadata)
|
||||
request = requests.get(hit.input.data.text.url)
|
||||
|
||||
# override encoding by real educated guess as provided by chardet
|
||||
request.encoding = request.apparent_encoding
|
||||
requested_text = request.text
|
||||
|
||||
logger.debug(
|
||||
f"\tScore {hit.score:.2f} for annotation: {hit.annotation.id}\
|
||||
off input: {hit.input.id}, text: {requested_text[:125]}"
|
||||
)
|
||||
|
||||
docs_and_scores.append(
|
||||
(Document(page_content=requested_text, metadata=metadata), hit.score)
|
||||
)
|
||||
|
||||
return docs_and_scores
|
||||
|
||||
def similarity_search(
|
||||
self,
|
||||
query: str,
|
||||
k: int = 4,
|
||||
**kwargs: Any,
|
||||
) -> List[Document]:
|
||||
"""Run similarity search using Clarifai.
|
||||
|
||||
Args:
|
||||
query: Text to look up documents similar to.
|
||||
k: Number of Documents to return. Defaults to 4.
|
||||
|
||||
Returns:
|
||||
List of Documents most similar to the query and score for each
|
||||
"""
|
||||
docs_and_scores = self.similarity_search_with_score(query, **kwargs)
|
||||
return [doc for doc, _ in docs_and_scores]
|
||||
|
||||
@classmethod
|
||||
def from_texts(
|
||||
cls,
|
||||
texts: List[str],
|
||||
embedding: Optional[Embeddings] = None,
|
||||
metadatas: Optional[List[dict]] = None,
|
||||
user_id: Optional[str] = None,
|
||||
app_id: Optional[str] = None,
|
||||
pat: Optional[str] = None,
|
||||
number_of_docs: Optional[int] = None,
|
||||
api_base: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> Clarifai:
|
||||
"""Create a Clarifai vectorstore from a list of texts.
|
||||
|
||||
Args:
|
||||
user_id (str): User ID.
|
||||
app_id (str): App ID.
|
||||
texts (List[str]): List of texts to add.
|
||||
pat (Optional[str]): Personal access token. Defaults to None.
|
||||
number_of_docs (Optional[int]): Number of documents to return
|
||||
during vector search. Defaults to None.
|
||||
api_base (Optional[str]): API base. Defaults to None.
|
||||
metadatas (Optional[List[dict]]): Optional list of metadatas.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
Clarifai: Clarifai vectorstore.
|
||||
"""
|
||||
clarifai_vector_db = cls(
|
||||
user_id=user_id,
|
||||
app_id=app_id,
|
||||
pat=pat,
|
||||
number_of_docs=number_of_docs,
|
||||
api_base=api_base,
|
||||
)
|
||||
clarifai_vector_db.add_texts(texts=texts, metadatas=metadatas)
|
||||
return clarifai_vector_db
|
||||
|
||||
@classmethod
|
||||
def from_documents(
|
||||
cls,
|
||||
documents: List[Document],
|
||||
embedding: Optional[Embeddings] = None,
|
||||
user_id: Optional[str] = None,
|
||||
app_id: Optional[str] = None,
|
||||
pat: Optional[str] = None,
|
||||
number_of_docs: Optional[int] = None,
|
||||
api_base: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> Clarifai:
|
||||
"""Create a Clarifai vectorstore from a list of documents.
|
||||
|
||||
Args:
|
||||
user_id (str): User ID.
|
||||
app_id (str): App ID.
|
||||
documents (List[Document]): List of documents to add.
|
||||
pat (Optional[str]): Personal access token. Defaults to None.
|
||||
number_of_docs (Optional[int]): Number of documents to return
|
||||
during vector search. Defaults to None.
|
||||
api_base (Optional[str]): API base. Defaults to None.
|
||||
|
||||
Returns:
|
||||
Clarifai: Clarifai vectorstore.
|
||||
"""
|
||||
texts = [doc.page_content for doc in documents]
|
||||
metadatas = [doc.metadata for doc in documents]
|
||||
return cls.from_texts(
|
||||
user_id=user_id,
|
||||
app_id=app_id,
|
||||
texts=texts,
|
||||
pat=pat,
|
||||
number_of_docs=number_of_docs,
|
||||
api_base=api_base,
|
||||
metadatas=metadatas,
|
||||
)
|
@ -0,0 +1,29 @@
|
||||
"""Test Clarifai API wrapper.
|
||||
In order to run this test, you need to have an account on Clarifai.
|
||||
You can sign up for free at https://clarifai.com/signup.
|
||||
pip install clarifai
|
||||
|
||||
You'll need to set env variable CLARIFAI_PAT_KEY to your personal access token key.
|
||||
"""
|
||||
|
||||
from langchain.llms.clarifai import Clarifai
|
||||
|
||||
|
||||
def test_clarifai_call() -> None:
|
||||
"""Test valid call to clarifai."""
|
||||
llm = Clarifai(
|
||||
user_id="google-research",
|
||||
app_id="summarization",
|
||||
model_id="text-summarization-english-pegasus",
|
||||
)
|
||||
output = llm(
|
||||
"A chain is a serial assembly of connected pieces, called links, \
|
||||
typically made of metal, with an overall character similar to that\
|
||||
of a rope in that it is flexible and curved in compression but \
|
||||
linear, rigid, and load-bearing in tension. A chain may consist\
|
||||
of two or more links."
|
||||
)
|
||||
|
||||
assert isinstance(output, str)
|
||||
assert llm._llm_type == "clarifai"
|
||||
assert llm.model_id == "text-summarization-english-pegasus"
|
@ -0,0 +1,86 @@
|
||||
"""Test Clarifai vectore store functionality."""
|
||||
import time
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.vectorstores import Clarifai
|
||||
|
||||
|
||||
def test_clarifai_with_from_texts() -> None:
|
||||
"""Test end to end construction and search."""
|
||||
texts = ["foo", "bar", "baz"]
|
||||
USER_ID = "minhajul"
|
||||
APP_ID = "test-lang-2"
|
||||
NUMBER_OF_DOCS = 1
|
||||
docsearch = Clarifai.from_texts(
|
||||
user_id=USER_ID,
|
||||
app_id=APP_ID,
|
||||
texts=texts,
|
||||
pat=None,
|
||||
number_of_docs=NUMBER_OF_DOCS,
|
||||
)
|
||||
time.sleep(2.5)
|
||||
output = docsearch.similarity_search("foo")
|
||||
assert output == [Document(page_content="foo")]
|
||||
|
||||
|
||||
def test_clarifai_with_from_documents() -> None:
|
||||
"""Test end to end construction and search."""
|
||||
# Initial document content and id
|
||||
initial_content = "foo"
|
||||
|
||||
# Create an instance of Document with initial content and metadata
|
||||
original_doc = Document(page_content=initial_content, metadata={"page": "0"})
|
||||
USER_ID = "minhajul"
|
||||
APP_ID = "test-lang-2"
|
||||
NUMBER_OF_DOCS = 1
|
||||
docsearch = Clarifai.from_documents(
|
||||
user_id=USER_ID,
|
||||
app_id=APP_ID,
|
||||
documents=[original_doc],
|
||||
pat=None,
|
||||
number_of_docs=NUMBER_OF_DOCS,
|
||||
)
|
||||
time.sleep(2.5)
|
||||
output = docsearch.similarity_search("foo")
|
||||
assert output == [Document(page_content=initial_content, metadata={"page": "0"})]
|
||||
|
||||
|
||||
def test_clarifai_with_metadatas() -> None:
|
||||
"""Test end to end construction and search with metadata."""
|
||||
texts = ["oof", "rab", "zab"]
|
||||
metadatas = [{"page": str(i)} for i in range(len(texts))]
|
||||
USER_ID = "minhajul"
|
||||
APP_ID = "test-lang-2"
|
||||
NUMBER_OF_DOCS = 1
|
||||
docsearch = Clarifai.from_texts(
|
||||
user_id=USER_ID,
|
||||
app_id=APP_ID,
|
||||
texts=texts,
|
||||
pat=None,
|
||||
number_of_docs=NUMBER_OF_DOCS,
|
||||
metadatas=metadatas,
|
||||
)
|
||||
time.sleep(2.5)
|
||||
output = docsearch.similarity_search("oof", k=1)
|
||||
assert output == [Document(page_content="oof", metadata={"page": "0"})]
|
||||
|
||||
|
||||
def test_clarifai_with_metadatas_with_scores() -> None:
|
||||
"""Test end to end construction and scored search."""
|
||||
texts = ["oof", "rab", "zab"]
|
||||
metadatas = [{"page": str(i)} for i in range(len(texts))]
|
||||
USER_ID = "minhajul"
|
||||
APP_ID = "test-lang-2"
|
||||
NUMBER_OF_DOCS = 1
|
||||
docsearch = Clarifai.from_texts(
|
||||
user_id=USER_ID,
|
||||
app_id=APP_ID,
|
||||
texts=texts,
|
||||
pat=None,
|
||||
number_of_docs=NUMBER_OF_DOCS,
|
||||
metadatas=metadatas,
|
||||
)
|
||||
time.sleep(2.5)
|
||||
output = docsearch.similarity_search_with_score("oof", k=1)
|
||||
assert output[0][0] == Document(page_content="oof", metadata={"page": "0"})
|
||||
assert abs(output[0][1] - 1.0) < 0.001
|
Loading…
Reference in New Issue