Clarifai integration (#5954)

# Changes
This PR adds [Clarifai](https://www.clarifai.com/) integration to
Langchain. Clarifai is an end-to-end AI Platform. Clarifai offers user
the ability to use many types of LLM (OpenAI, cohere, ect and other open
source models). As well, a clarifai app can be treated as a vector
database to upload and retrieve data. The integrations includes:
- Clarifai LLM integration: Clarifai supports many types of language
model that users can utilize for their application
- Clarifai VectorDB: A Clarifai application can hold data and
embeddings. You can run semantic search with the embeddings

#### Before submitting
- [x] Added integration test for LLM 
- [x] Added integration test for VectorDB 
- [x] Added notebook for LLM 
- [x] Added notebook for VectorDB 

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
This commit is contained in:
minhajul-clarifai 2023-06-22 11:00:15 -04:00 committed by GitHub
parent 7f6f5c2a6a
commit 6e57306a13
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 1224 additions and 4 deletions

View File

@ -0,0 +1,286 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# Clarifai\n",
"\n",
">[Clarifai](https://www.clarifai.com/) is a AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model building and inference. A Clarifai application can be used as a vector database after uploading inputs. \n",
"\n",
"This notebook shows how to use functionality related to the `Clarifai` vector database.\n",
"\n",
"To use Clarifai, you must have an account and a Personal Access Token key. \n",
"Here are the [installation instructions](https://clarifai.com/settings/security )."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1eecfb1c",
"metadata": {},
"source": [
"# Dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b4c41cad-08ef-4f72-a545-2151e4598efe",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Install required dependencies\n",
"!pip install clarifai"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "93039ada",
"metadata": {},
"source": [
"# Imports\n",
"Here we will be setting the personal access token. You can find your PAT under settings/security on the platform."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87",
"metadata": {},
"outputs": [],
"source": [
"# Please login and get your API key from https://clarifai.com/settings/security \n",
"from getpass import getpass\n",
"\n",
"CLARIFAI_PAT_KEY = getpass()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "320af802-9271-46ee-948f-d2453933d44b",
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "aac9563e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Import the required modules\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.document_loaders import TextLoader\n",
"from langchain.vectorstores import Clarifai"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "edcf5159",
"metadata": {},
"source": [
"# Setup\n",
"Setup the user id and app id where the text data will be uploaded. \n",
"\n",
"You will have to first create an account on [Clarifai](https://clarifai.com/login) and then create an application."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4d853395",
"metadata": {},
"outputs": [],
"source": [
"USER_ID = 'USERNAME_ID'\n",
"APP_ID = 'APPLICATION_ID'\n",
"NUMBER_OF_DOCS = 4"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5631bdd5",
"metadata": {},
"source": [
"## From Texts\n",
"Create a Clarifai vectorstore from a list of texts. This section will upload each text with its respective metadata to a Clarifai Application. The Clarifai Application can then be used for semantic search to find relevant texts."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1d828f77",
"metadata": {},
"outputs": [],
"source": [
"texts = [\"I really enjoy spending time with you\", \"I hate spending time with my dog\", \"I want to go for a run\", \\\n",
" \"I went to the movies yesterday\", \"I love playing soccer with my friends\"]\n",
"\n",
"metadatas = [{\"id\": i, \"text\": text} for i, text in enumerate(texts)]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "738bff27",
"metadata": {},
"outputs": [],
"source": [
"clarifai_vector_db = Clarifai.from_texts(user_id=USER_ID, app_id=APP_ID, texts=texts, pat=CLARIFAI_PAT_KEY, number_of_docs=NUMBER_OF_DOCS, metadatas = metadatas)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e755cdce",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='I really enjoy spending time with you', metadata={'text': 'I really enjoy spending time with you', 'id': 0.0}),\n",
" Document(page_content='I went to the movies yesterday', metadata={'text': 'I went to the movies yesterday', 'id': 3.0}),\n",
" Document(page_content='zab', metadata={'page': '2'}),\n",
" Document(page_content='zab', metadata={'page': '2'})]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = clarifai_vector_db.similarity_search(\"I would love to see you\")\n",
"docs"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c39504e4",
"metadata": {},
"source": [
"## From Documents\n",
"Create a Clarifai vectorstore from a list of Documents. This section will upload each document with its respective metadata to a Clarifai Application. The Clarifai Application can then be used for semantic search to find relevant documents."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "69ae7e35",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" Document(page_content='Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \\n\\nIn this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. \\n\\nLet each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. \\n\\nPlease rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. \\n\\nThroughout our history weve learned this lesson when dictators do not pay a price for their aggression they cause more chaos. \\n\\nThey keep moving. \\n\\nAnd the costs and the threats to America and the world keep rising. \\n\\nThats why the NATO Alliance was created to secure peace and stability in Europe after World War 2. \\n\\nThe United States is a member along with 29 other nations. \\n\\nIt matters. American diplomacy matters. American resolve matters.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" Document(page_content='Putins latest attack on Ukraine was premeditated and unprovoked. \\n\\nHe rejected repeated efforts at diplomacy. \\n\\nHe thought the West and NATO wouldnt respond. And he thought he could divide us at home. Putin was wrong. We were ready. Here is what we did. \\n\\nWe prepared extensively and carefully. \\n\\nWe spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. \\n\\nI spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression. \\n\\nWe countered Russias lies with truth. \\n\\nAnd now that he has acted the free world is holding him accountable. \\n\\nAlong with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" Document(page_content='We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. \\n\\nTogether with our allies we are right now enforcing powerful economic sanctions. \\n\\nWe are cutting off Russias largest banks from the international financial system. \\n\\nPreventing Russias central bank from defending the Russian Ruble making Putins $630 Billion “war fund” worthless. \\n\\nWe are choking off Russias access to technology that will sap its economic strength and weaken its military for years to come. \\n\\nTonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. \\n\\nThe U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. \\n\\nWe are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains.', metadata={'source': '../../../state_of_the_union.txt'})]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[:4]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "40bf1305",
"metadata": {},
"outputs": [],
"source": [
"USER_ID = 'USERNAME_ID'\n",
"APP_ID = 'APPLICATION_ID'\n",
"NUMBER_OF_DOCS = 4"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "6e104aee",
"metadata": {},
"outputs": [],
"source": [
"clarifai_vector_db = Clarifai.from_documents(user_id=USER_ID, app_id=APP_ID, documents=docs, pat=CLARIFAI_PAT_KEY, number_of_docs=NUMBER_OF_DOCS)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "9c608226",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='And I will keep doing everything in my power to crack down on gun trafficking and ghost guns you can buy online and make at home—they have no serial numbers and cant be traced. \\n\\nAnd I ask Congress to pass proven measures to reduce gun violence. Pass universal background checks. Why should anyone on a terrorist list be able to purchase a weapon? \\n\\nBan assault weapons and high-capacity magazines. \\n\\nRepeal the liability shield that makes gun manufacturers the only industry in America that cant be sued. \\n\\nThese laws dont infringe on the Second Amendment. They save lives. \\n\\nThe most fundamental right in America is the right to vote and to have it counted. And its under assault. \\n\\nIn state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" Document(page_content='We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \\n\\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \\n\\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \\n\\nOfficer Mora was 27 years old. \\n\\nOfficer Rivera was 22. \\n\\nBoth Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \\n\\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \\n\\nIve worked on these issues a long time. \\n\\nI know what works: Investing in crime preventionand community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" Document(page_content='So lets not abandon our streets. Or choose between safety and equal justice. \\n\\nLets come together to protect our communities, restore trust, and hold law enforcement accountable. \\n\\nThats why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. \\n\\nThats why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope. \\n\\nWe should all agree: The answer is not to Defund the police. The answer is to FUND the police with the resources and training they need to protect our communities. \\n\\nI ask Democrats and Republicans alike: Pass my budget and keep our neighborhoods safe.', metadata={'source': '../../../state_of_the_union.txt'})]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = clarifai_vector_db.similarity_search(\"Texts related to criminals and violence\")\n",
"docs"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -0,0 +1,210 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "9597802c",
"metadata": {},
"source": [
"# Clarifai\n",
"\n",
">[Clarifai](https://www.clarifai.com/) is a AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model building and inference.\n",
"\n",
"This example goes over how to use LangChain to interact with `Clarifai` [models](https://clarifai.com/explore/models)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2a773d8d",
"metadata": {},
"source": [
"# Dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91ea14ce-831d-409a-a88f-30353acdabd1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Install required dependencies\n",
"!pip install clarifai"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "426f1156",
"metadata": {},
"source": [
"# Imports\n",
"Here we will be setting the personal access token. You can find your PAT under settings/security on the platform."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3f5dc9d7-65e3-4b5b-9086-3327d016cfe0",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Please login and get your API key from https://clarifai.com/settings/security \n",
"from getpass import getpass\n",
"\n",
"CLARIFAI_PAT_KEY = getpass()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6fb585dd",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Import the required modules\n",
"from langchain.llms import Clarifai\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "16521ed2",
"metadata": {},
"source": [
"# Input\n",
"Create a prompt template to be used with the LLM Chain"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "035dea0f",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c8905eac",
"metadata": {},
"source": [
"# Setup\n",
"Setup the user id and app id where the model resides. You can find a list of public models on https://clarifai.com/explore/models\n",
"\n",
"You will have to also initialize the model id and if needed, the model version id. Some models have many versions, you can choose the one appropriate for your task."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1fe9bf15",
"metadata": {},
"outputs": [],
"source": [
"USER_ID = 'openai'\n",
"APP_ID = 'chat-completion'\n",
"MODEL_ID = 'chatgpt-3_5-turbo'\n",
"\n",
"# You can provide a specific model version\n",
"# model_version_id = \"MODEL_VERSION_ID\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3f3458d9",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Initialize a Clarifai LLM\n",
"clarifai_llm = Clarifai(clarifai_pat_key=CLARIFAI_PAT_KEY, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a641dbd9",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Create LLM chain\n",
"llm_chain = LLMChain(prompt=prompt, llm=clarifai_llm)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3e87c71a",
"metadata": {},
"source": [
"# Run Chain"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9f844993",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Justin Bieber was born on March 1, 1994. So, we need to look at the Super Bowl that was played in the year 1994. \\n\\nThe Super Bowl in 1994 was Super Bowl XXVIII (28). It was played on January 30, 1994, between the Dallas Cowboys and the Buffalo Bills. \\n\\nThe Dallas Cowboys won the Super Bowl in 1994, defeating the Buffalo Bills by a score of 30-13. \\n\\nTherefore, the Dallas Cowboys are the NFL team that won the Super Bowl in the year Justin Bieber was born.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -13,6 +13,7 @@ from langchain.llms.baseten import Baseten
from langchain.llms.beam import Beam
from langchain.llms.bedrock import Bedrock
from langchain.llms.cerebriumai import CerebriumAI
from langchain.llms.clarifai import Clarifai
from langchain.llms.cohere import Cohere
from langchain.llms.ctransformers import CTransformers
from langchain.llms.databricks import Databricks
@ -63,6 +64,7 @@ __all__ = [
"Bedrock",
"CTransformers",
"CerebriumAI",
"Clarifai",
"Cohere",
"Databricks",
"DeepInfra",
@ -113,6 +115,7 @@ type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
"baseten": Baseten,
"beam": Beam,
"cerebriumai": CerebriumAI,
"clarifai": Clarifai,
"cohere": Cohere,
"ctransformers": CTransformers,
"databricks": Databricks,

176
langchain/llms/clarifai.py Normal file
View File

@ -0,0 +1,176 @@
"""Wrapper around Clarifai's APIs."""
import logging
from typing import Any, Dict, List, Optional
from pydantic import Extra, root_validator
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
from langchain.utils import get_from_dict_or_env
logger = logging.getLogger(__name__)
class Clarifai(LLM):
"""Wrapper around Clarifai's large language models.
To use, you should have an account on the Clarifai platform,
the ``clarifai`` python package installed, and the
environment variable ``CLARIFAI_PAT_KEY`` set with your PAT key,
or pass it as a named parameter to the constructor.
Example:
.. code-block:: python
from langchain.llms import Clarifai
clarifai_llm = Clarifai(clarifai_pat_key=CLARIFAI_PAT_KEY, \
user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)
"""
stub: Any #: :meta private:
metadata: Any
userDataObject: Any
model_id: Optional[str] = None
"""Model id to use."""
model_version_id: Optional[str] = None
"""Model version id to use."""
app_id: Optional[str] = None
"""Clarifai application id to use."""
user_id: Optional[str] = None
"""Clarifai user id to use."""
clarifai_pat_key: Optional[str] = None
api_base: str = "https://api.clarifai.com"
stop: Optional[List[str]] = None
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that we have all required info to access Clarifai
platform and python package exists in environment."""
values["clarifai_pat_key"] = get_from_dict_or_env(
values, "clarifai_pat_key", "CLARIFAI_PAT_KEY"
)
user_id = values.get("user_id")
app_id = values.get("app_id")
model_id = values.get("model_id")
if values["clarifai_pat_key"] is None:
raise ValueError("Please provide a clarifai_pat_key.")
if user_id is None:
raise ValueError("Please provide a user_id.")
if app_id is None:
raise ValueError("Please provide a app_id.")
if model_id is None:
raise ValueError("Please provide a model_id.")
return values
@property
def _default_params(self) -> Dict[str, Any]:
"""Get the default parameters for calling Cohere API."""
return {}
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Get the identifying parameters."""
return {**{"model_id": self.model_id}}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "clarifai"
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any
) -> str:
"""Call out to Clarfai's PostModelOutputs endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
Example:
.. code-block:: python
response = clarifai_llm("Tell me a joke.")
"""
try:
from clarifai.auth.helper import ClarifaiAuthHelper
from clarifai.client import create_stub
from clarifai_grpc.grpc.api import (
resources_pb2,
service_pb2,
)
from clarifai_grpc.grpc.api.status import status_code_pb2
except ImportError:
raise ImportError(
"Could not import clarifai python package. "
"Please install it with `pip install clarifai`."
)
auth = ClarifaiAuthHelper(
user_id=self.user_id,
app_id=self.app_id,
pat=self.clarifai_pat_key,
base=self.api_base,
)
self.userDataObject = auth.get_user_app_id_proto()
self.stub = create_stub(auth)
params = self._default_params
if self.stop is not None and stop is not None:
raise ValueError("`stop` found in both the input and default params.")
elif self.stop is not None:
params["stop_sequences"] = self.stop
else:
params["stop_sequences"] = stop
# The userDataObject is created in the overview and
# is required when using a PAT
# If version_id None, Defaults to the latest model version
post_model_outputs_request = service_pb2.PostModelOutputsRequest(
user_app_id=self.userDataObject,
model_id=self.model_id,
version_id=self.model_version_id,
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(text=resources_pb2.Text(raw=prompt))
)
],
)
post_model_outputs_response = self.stub.PostModelOutputs(
post_model_outputs_request
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
logger.error(post_model_outputs_response.status)
raise Exception(
"Post model outputs failed, status: "
+ post_model_outputs_response.status.description
)
text = post_model_outputs_response.outputs[0].data.text.raw
# In order to make this consistent with other endpoints, we strip them.
if stop is not None or self.stop is not None:
text = enforce_stop_tokens(text, params["stop_sequences"])
return text

View File

@ -11,6 +11,7 @@ from langchain.vectorstores.azuresearch import AzureSearch
from langchain.vectorstores.base import VectorStore
from langchain.vectorstores.cassandra import Cassandra
from langchain.vectorstores.chroma import Chroma
from langchain.vectorstores.clarifai import Clarifai
from langchain.vectorstores.clickhouse import Clickhouse, ClickhouseSettings
from langchain.vectorstores.deeplake import DeepLake
from langchain.vectorstores.docarray import DocArrayHnswSearch, DocArrayInMemorySearch
@ -59,6 +60,14 @@ __all__ = [
"LanceDB",
"MatchingEngine",
"Milvus",
"Zilliz",
"SingleStoreDB",
"Chroma",
"Clarifai",
"OpenSearchVectorSearch",
"AtlasDB",
"DeepLake",
"Annoy",
"MongoDBAtlasVectorSearch",
"MyScale",
"MyScaleSettings",

View File

@ -0,0 +1,355 @@
from __future__ import annotations
import logging
import os
import traceback
from typing import Any, Iterable, List, Optional, Tuple
import requests
from langchain.docstore.document import Document
from langchain.embeddings.base import Embeddings
from langchain.vectorstores.base import VectorStore
logger = logging.getLogger(__name__)
class Clarifai(VectorStore):
"""Wrapper around Clarifai AI platform's vector store.
To use, you should have the ``clarifai`` python package installed.
Example:
.. code-block:: python
from langchain.vectorstores import Clarifai
from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
vectorstore = Clarifai("langchain_store", embeddings.embed_query)
"""
def __init__(
self,
user_id: Optional[str] = None,
app_id: Optional[str] = None,
pat: Optional[str] = None,
number_of_docs: Optional[int] = None,
api_base: Optional[str] = None,
) -> None:
"""Initialize with Clarifai client.
Args:
user_id (Optional[str], optional): User ID. Defaults to None.
app_id (Optional[str], optional): App ID. Defaults to None.
pat (Optional[str], optional): Personal access token. Defaults to None.
number_of_docs (Optional[int], optional): Number of documents to return
during vector search. Defaults to None.
api_base (Optional[str], optional): API base. Defaults to None.
Raises:
ValueError: If user ID, app ID or personal access token is not provided.
"""
try:
from clarifai.auth.helper import DEFAULT_BASE, ClarifaiAuthHelper
from clarifai.client import create_stub
except ImportError:
raise ValueError(
"Could not import clarifai python package. "
"Please install it with `pip install clarifai`."
)
if api_base is None:
self._api_base = DEFAULT_BASE
self._user_id = user_id or os.environ.get("CLARIFAI_USER_ID")
self._app_id = app_id or os.environ.get("CLARIFAI_APP_ID")
self._pat = pat or os.environ.get("CLARIFAI_PAT_KEY")
if self._user_id is None or self._app_id is None or self._pat is None:
raise ValueError(
"Could not find CLARIFAI_USER_ID, CLARIFAI_APP_ID or\
CLARIFAI_PAT in your environment. "
"Please set those env variables with a valid user ID, \
app ID and personal access token \
from https://clarifai.com/settings/security."
)
self._auth = ClarifaiAuthHelper(
user_id=self._user_id,
app_id=self._app_id,
pat=self._pat,
base=self._api_base,
)
self._stub = create_stub(self._auth)
self._userDataObject = self._auth.get_user_app_id_proto()
self._number_of_docs = number_of_docs
def _post_text_input(self, text: str, metadata: dict) -> str:
"""Post text to Clarifai and return the ID of the input.
Args:
text (str): Text to post.
metadata (dict): Metadata to post.
Returns:
str: ID of the input.
"""
try:
from clarifai_grpc.grpc.api import resources_pb2, service_pb2
from clarifai_grpc.grpc.api.status import status_code_pb2
from google.protobuf.struct_pb2 import Struct # type: ignore
except ImportError as e:
raise ImportError(
"Could not import clarifai python package. "
"Please install it with `pip install clarifai`."
) from e
input_metadata = Struct()
input_metadata.update(metadata)
post_inputs_response = self._stub.PostInputs(
service_pb2.PostInputsRequest(
user_app_id=self._userDataObject,
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(
text=resources_pb2.Text(raw=text),
metadata=input_metadata,
)
)
],
)
)
if post_inputs_response.status.code != status_code_pb2.SUCCESS:
logger.error(post_inputs_response.status)
raise Exception(
"Post inputs failed, status: " + post_inputs_response.status.description
)
input_id = post_inputs_response.inputs[0].id
return input_id
def add_texts(
self,
texts: Iterable[str],
metadatas: Optional[List[dict]] = None,
ids: Optional[List[str]] = None,
**kwargs: Any,
) -> List[str]:
"""Add texts to the Clarifai vectorstore. This will push the text
to a Clarifai application.
Application use base workflow that create and store embedding for each text.
Make sure you are using a base workflow that is compatible with text
(such as Language Understanding).
Args:
texts (Iterable[str]): Texts to add to the vectorstore.
metadatas (Optional[List[dict]], optional): Optional list of metadatas.
ids (Optional[List[str]], optional): Optional list of IDs.
Returns:
List[str]: List of IDs of the added texts.
"""
assert len(list(texts)) > 0, "No texts provided to add to the vectorstore."
if metadatas is not None:
assert len(list(texts)) == len(
metadatas
), "Number of texts and metadatas should be the same."
input_ids = []
for idx, text in enumerate(texts):
try:
metadata = metadatas[idx] if metadatas else {}
input_id = self._post_text_input(text, metadata)
input_ids.append(input_id)
logger.debug(f"Input {input_id} posted successfully.")
except Exception as error:
logger.warning(f"Post inputs failed: {error}")
traceback.print_exc()
return input_ids
def similarity_search_with_score(
self,
query: str,
k: int = 4,
filter: Optional[dict] = None,
namespace: Optional[str] = None,
**kwargs: Any,
) -> List[Tuple[Document, float]]:
"""Run similarity search with score using Clarifai.
Args:
query (str): Query text to search for.
k (int): Number of results to return. Defaults to 4.
filter (Optional[Dict[str, str]]): Filter by metadata.
Defaults to None.
Returns:
List[Document]: List of documents most simmilar to the query text.
"""
try:
from clarifai_grpc.grpc.api import resources_pb2, service_pb2
from clarifai_grpc.grpc.api.status import status_code_pb2
from google.protobuf import json_format # type: ignore
except ImportError as e:
raise ImportError(
"Could not import clarifai python package. "
"Please install it with `pip install clarifai`."
) from e
# Get number of docs to return
if self._number_of_docs is not None:
k = self._number_of_docs
post_annotations_searches_response = self._stub.PostAnnotationsSearches(
service_pb2.PostAnnotationsSearchesRequest(
user_app_id=self._userDataObject,
searches=[
resources_pb2.Search(
query=resources_pb2.Query(
ranks=[
resources_pb2.Rank(
annotation=resources_pb2.Annotation(
data=resources_pb2.Data(
text=resources_pb2.Text(raw=query),
)
)
)
]
)
)
],
pagination=service_pb2.Pagination(page=1, per_page=k),
)
)
# Check if search was successful
if post_annotations_searches_response.status.code != status_code_pb2.SUCCESS:
raise Exception(
"Post searches failed, status: "
+ post_annotations_searches_response.status.description
)
# Retrieve hits
hits = post_annotations_searches_response.hits
docs_and_scores = []
# Iterate over hits and retrieve metadata and text
for hit in hits:
metadata = json_format.MessageToDict(hit.input.data.metadata)
request = requests.get(hit.input.data.text.url)
# override encoding by real educated guess as provided by chardet
request.encoding = request.apparent_encoding
requested_text = request.text
logger.debug(
f"\tScore {hit.score:.2f} for annotation: {hit.annotation.id}\
off input: {hit.input.id}, text: {requested_text[:125]}"
)
docs_and_scores.append(
(Document(page_content=requested_text, metadata=metadata), hit.score)
)
return docs_and_scores
def similarity_search(
self,
query: str,
k: int = 4,
**kwargs: Any,
) -> List[Document]:
"""Run similarity search using Clarifai.
Args:
query: Text to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
Returns:
List of Documents most similar to the query and score for each
"""
docs_and_scores = self.similarity_search_with_score(query, **kwargs)
return [doc for doc, _ in docs_and_scores]
@classmethod
def from_texts(
cls,
texts: List[str],
embedding: Optional[Embeddings] = None,
metadatas: Optional[List[dict]] = None,
user_id: Optional[str] = None,
app_id: Optional[str] = None,
pat: Optional[str] = None,
number_of_docs: Optional[int] = None,
api_base: Optional[str] = None,
**kwargs: Any,
) -> Clarifai:
"""Create a Clarifai vectorstore from a list of texts.
Args:
user_id (str): User ID.
app_id (str): App ID.
texts (List[str]): List of texts to add.
pat (Optional[str]): Personal access token. Defaults to None.
number_of_docs (Optional[int]): Number of documents to return
during vector search. Defaults to None.
api_base (Optional[str]): API base. Defaults to None.
metadatas (Optional[List[dict]]): Optional list of metadatas.
Defaults to None.
Returns:
Clarifai: Clarifai vectorstore.
"""
clarifai_vector_db = cls(
user_id=user_id,
app_id=app_id,
pat=pat,
number_of_docs=number_of_docs,
api_base=api_base,
)
clarifai_vector_db.add_texts(texts=texts, metadatas=metadatas)
return clarifai_vector_db
@classmethod
def from_documents(
cls,
documents: List[Document],
embedding: Optional[Embeddings] = None,
user_id: Optional[str] = None,
app_id: Optional[str] = None,
pat: Optional[str] = None,
number_of_docs: Optional[int] = None,
api_base: Optional[str] = None,
**kwargs: Any,
) -> Clarifai:
"""Create a Clarifai vectorstore from a list of documents.
Args:
user_id (str): User ID.
app_id (str): App ID.
documents (List[Document]): List of documents to add.
pat (Optional[str]): Personal access token. Defaults to None.
number_of_docs (Optional[int]): Number of documents to return
during vector search. Defaults to None.
api_base (Optional[str]): API base. Defaults to None.
Returns:
Clarifai: Clarifai vectorstore.
"""
texts = [doc.page_content for doc in documents]
metadatas = [doc.metadata for doc in documents]
return cls.from_texts(
user_id=user_id,
app_id=app_id,
texts=texts,
pat=pat,
number_of_docs=number_of_docs,
api_base=api_base,
metadatas=metadatas,
)

68
poetry.lock generated
View File

@ -1606,6 +1606,39 @@ tornado = ">=5.0.2"
[package.extras]
test = ["coverage", "flake8 (==2.1.0)", "gevent", "mock", "nose2", "pyyaml", "tox"]
[[package]]
name = "clarifai"
version = "9.1.0"
description = "Clarifai Python Utilities"
category = "main"
optional = true
python-versions = ">=3.7"
files = [
{file = "clarifai-9.1.0-py3-none-any.whl", hash = "sha256:a22b6c34d18067eb6902111bdbd9627dc2b72b743ac50b3f3178dc7663016003"},
{file = "clarifai-9.1.0.tar.gz", hash = "sha256:f6e65fd81a810c4063f23a066ded68306423da1be0bbf61b32c5ef01214f607f"},
]
[package.dependencies]
clarifai-grpc = ">=9.1.0"
[[package]]
name = "clarifai-grpc"
version = "9.1.1"
description = "Clarifai gRPC API Client"
category = "main"
optional = true
python-versions = ">=3.6"
files = [
{file = "clarifai-grpc-9.1.1.tar.gz", hash = "sha256:d347b64f8d8dcf4dee1c51c5eced3c3f489e3be595ca4c8374323fdf934bae57"},
{file = "clarifai_grpc-9.1.1-py3-none-any.whl", hash = "sha256:84a49e3d4fa57937ab38fb365c535a8ae255acac4666134d188f5dbe10e865ba"},
]
[package.dependencies]
googleapis-common-protos = ">=1.53.0"
grpcio = ">=1.44.0"
protobuf = ">=3.12"
requests = ">=2.25.1"
[[package]]
name = "click"
version = "8.1.3"
@ -5445,6 +5478,22 @@ files = [
{file = "mypy_extensions-1.0.0.tar.gz", hash = "sha256:75dbf8955dc00442a438fc4d0666508a9a97b6bd41aa2f0ffe9d2f2725af0782"},
]
[[package]]
name = "mypy-protobuf"
version = "3.3.0"
description = "Generate mypy stub files from protobuf specs"
category = "dev"
optional = false
python-versions = ">=3.7"
files = [
{file = "mypy-protobuf-3.3.0.tar.gz", hash = "sha256:24f3b0aecb06656e983f58e07c732a90577b9d7af3e1066fc2b663bbf0370248"},
{file = "mypy_protobuf-3.3.0-py3-none-any.whl", hash = "sha256:15604f6943b16c05db646903261e3b3e775cf7f7990b7c37b03d043a907b650d"},
]
[package.dependencies]
protobuf = ">=3.19.4"
types-protobuf = ">=3.19.12"
[[package]]
name = "myst-nb"
version = "0.17.2"
@ -10901,6 +10950,18 @@ files = [
{file = "types_chardet-5.0.4.6-py3-none-any.whl", hash = "sha256:ea832d87e798abf1e4dfc73767807c2b7fee35d0003ae90348aea4ae00fb004d"},
]
[[package]]
name = "types-protobuf"
version = "4.23.0.1"
description = "Typing stubs for protobuf"
category = "dev"
optional = false
python-versions = "*"
files = [
{file = "types-protobuf-4.23.0.1.tar.gz", hash = "sha256:7bd5ea122a057b11a82b785d9de464932a1e9175fe977a4128adef11d7f35547"},
{file = "types_protobuf-4.23.0.1-py3-none-any.whl", hash = "sha256:c926104f69ea62103846681b35b690d8d100ecf86c6cdda16c850a1313a272e4"},
]
[[package]]
name = "types-pyopenssl"
version = "23.2.0.0"
@ -12040,13 +12101,14 @@ cffi = {version = ">=1.11", markers = "platform_python_implementation == \"PyPy\
cffi = ["cffi (>=1.11)"]
[extras]
all = ["O365", "aleph-alpha-client", "anthropic", "arxiv", "atlassian-python-api", "awadb", "azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech", "azure-cosmos", "azure-identity", "beautifulsoup4", "clickhouse-connect", "cohere", "deeplake", "docarray", "duckduckgo-search", "elasticsearch", "faiss-cpu", "google-api-python-client", "google-auth", "google-search-results", "gptcache", "html2text", "huggingface_hub", "jina", "jinja2", "jq", "lancedb", "langkit", "lark", "lxml", "manifest-ml", "momento", "nebula3-python", "neo4j", "networkx", "nlpcloud", "nltk", "nomic", "openai", "openlm", "opensearch-py", "pdfminer-six", "pexpect", "pgvector", "pinecone-client", "pinecone-text", "psycopg2-binary", "pymongo", "pyowm", "pypdf", "pytesseract", "pyvespa", "qdrant-client", "redis", "requests-toolbelt", "sentence-transformers", "singlestoredb", "spacy", "steamship", "tensorflow-text", "tigrisdb", "tiktoken", "torch", "transformers", "weaviate-client", "wikipedia", "wolframalpha"]
all = ["O365", "aleph-alpha-client", "anthropic", "arxiv", "atlassian-python-api", "awadb", "azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech", "azure-cosmos", "azure-identity", "beautifulsoup4", "clarifai", "clickhouse-connect", "cohere", "deeplake", "docarray", "duckduckgo-search", "elasticsearch", "faiss-cpu", "google-api-python-client", "google-auth", "google-search-results", "gptcache", "html2text", "huggingface_hub", "jina", "jinja2", "jq", "lancedb", "langkit", "lark", "lxml", "manifest-ml", "momento", "nebula3-python", "neo4j", "networkx", "nlpcloud", "nltk", "nomic", "openai", "openlm", "opensearch-py", "pdfminer-six", "pexpect", "pgvector", "pinecone-client", "pinecone-text", "psycopg2-binary", "pymongo", "pyowm", "pypdf", "pytesseract", "pyvespa", "qdrant-client", "redis", "requests-toolbelt", "sentence-transformers", "singlestoredb", "spacy", "steamship", "tensorflow-text", "tigrisdb", "tiktoken", "torch", "transformers", "weaviate-client", "wikipedia", "wolframalpha"]
azure = ["azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech", "azure-core", "azure-cosmos", "azure-identity", "azure-search-documents", "openai"]
clarifai = ["clarifai"]
cohere = ["cohere"]
docarray = ["docarray"]
embeddings = ["sentence-transformers"]
extended-testing = ["atlassian-python-api", "beautifulsoup4", "beautifulsoup4", "bibtexparser", "chardet", "gql", "html2text", "jq", "lxml", "openai", "pandas", "pdfminer-six", "pgvector", "psychicapi", "py-trello", "pymupdf", "pypdf", "pypdfium2", "pyspark", "requests-toolbelt", "scikit-learn", "telethon", "tqdm", "zep-python"]
llms = ["anthropic", "cohere", "huggingface_hub", "manifest-ml", "nlpcloud", "openai", "openllm", "openlm", "torch", "transformers"]
llms = ["anthropic", "clarifai", "cohere", "huggingface_hub", "manifest-ml", "nlpcloud", "openai", "openllm", "openlm", "torch", "transformers"]
openai = ["openai", "tiktoken"]
qdrant = ["qdrant-client"]
text-helpers = ["chardet"]
@ -12054,4 +12116,4 @@ text-helpers = ["chardet"]
[metadata]
lock-version = "2.0"
python-versions = ">=3.8.1,<4.0"
content-hash = "4ef80074fedf100df7216da7231f96008c62fe84abf0619322debf7a5d9078bb"
content-hash = "df0aa59ee541224667b5cefebec83d2ff48813218f959294eb7612148dc5ef82"

View File

@ -103,6 +103,7 @@ momento = {version = "^1.5.0", optional = true}
bibtexparser = {version = "^1.4.0", optional = true}
singlestoredb = {version = "^0.7.1", optional = true}
pyspark = {version = "^3.4.0", optional = true}
clarifai = {version = "9.1.0", optional = true}
tigrisdb = {version = "^1.0.0b6", optional = true}
nebula3-python = {version = "^3.4.0", optional = true}
langchainplus-sdk = ">=0.0.13"
@ -198,6 +199,7 @@ types-toml = "^0.10.8.1"
types-redis = "^4.3.21.6"
black = "^23.1.0"
types-chardet = "^5.0.4.6"
mypy-protobuf = "^3.0.0"
[tool.poetry.group.typing.dependencies]
mypy = "^0.991"
@ -213,10 +215,11 @@ playwright = "^1.28.0"
setuptools = "^67.6.1"
[tool.poetry.extras]
llms = ["anthropic", "cohere", "openai", "openllm", "openlm", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"]
llms = ["anthropic", "clarifai", "cohere", "openai", "openllm", "openlm", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"]
qdrant = ["qdrant-client"]
openai = ["openai", "tiktoken"]
text_helpers = ["chardet"]
clarifai = ["clarifai"]
cohere = ["cohere"]
docarray = ["docarray"]
embeddings = ["sentence-transformers"]
@ -232,6 +235,7 @@ azure = [
]
all = [
"anthropic",
"clarifai",
"cohere",
"openai",
"nlpcloud",

View File

@ -0,0 +1,29 @@
"""Test Clarifai API wrapper.
In order to run this test, you need to have an account on Clarifai.
You can sign up for free at https://clarifai.com/signup.
pip install clarifai
You'll need to set env variable CLARIFAI_PAT_KEY to your personal access token key.
"""
from langchain.llms.clarifai import Clarifai
def test_clarifai_call() -> None:
"""Test valid call to clarifai."""
llm = Clarifai(
user_id="google-research",
app_id="summarization",
model_id="text-summarization-english-pegasus",
)
output = llm(
"A chain is a serial assembly of connected pieces, called links, \
typically made of metal, with an overall character similar to that\
of a rope in that it is flexible and curved in compression but \
linear, rigid, and load-bearing in tension. A chain may consist\
of two or more links."
)
assert isinstance(output, str)
assert llm._llm_type == "clarifai"
assert llm.model_id == "text-summarization-english-pegasus"

View File

@ -0,0 +1,86 @@
"""Test Clarifai vectore store functionality."""
import time
from langchain.docstore.document import Document
from langchain.vectorstores import Clarifai
def test_clarifai_with_from_texts() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
USER_ID = "minhajul"
APP_ID = "test-lang-2"
NUMBER_OF_DOCS = 1
docsearch = Clarifai.from_texts(
user_id=USER_ID,
app_id=APP_ID,
texts=texts,
pat=None,
number_of_docs=NUMBER_OF_DOCS,
)
time.sleep(2.5)
output = docsearch.similarity_search("foo")
assert output == [Document(page_content="foo")]
def test_clarifai_with_from_documents() -> None:
"""Test end to end construction and search."""
# Initial document content and id
initial_content = "foo"
# Create an instance of Document with initial content and metadata
original_doc = Document(page_content=initial_content, metadata={"page": "0"})
USER_ID = "minhajul"
APP_ID = "test-lang-2"
NUMBER_OF_DOCS = 1
docsearch = Clarifai.from_documents(
user_id=USER_ID,
app_id=APP_ID,
documents=[original_doc],
pat=None,
number_of_docs=NUMBER_OF_DOCS,
)
time.sleep(2.5)
output = docsearch.similarity_search("foo")
assert output == [Document(page_content=initial_content, metadata={"page": "0"})]
def test_clarifai_with_metadatas() -> None:
"""Test end to end construction and search with metadata."""
texts = ["oof", "rab", "zab"]
metadatas = [{"page": str(i)} for i in range(len(texts))]
USER_ID = "minhajul"
APP_ID = "test-lang-2"
NUMBER_OF_DOCS = 1
docsearch = Clarifai.from_texts(
user_id=USER_ID,
app_id=APP_ID,
texts=texts,
pat=None,
number_of_docs=NUMBER_OF_DOCS,
metadatas=metadatas,
)
time.sleep(2.5)
output = docsearch.similarity_search("oof", k=1)
assert output == [Document(page_content="oof", metadata={"page": "0"})]
def test_clarifai_with_metadatas_with_scores() -> None:
"""Test end to end construction and scored search."""
texts = ["oof", "rab", "zab"]
metadatas = [{"page": str(i)} for i in range(len(texts))]
USER_ID = "minhajul"
APP_ID = "test-lang-2"
NUMBER_OF_DOCS = 1
docsearch = Clarifai.from_texts(
user_id=USER_ID,
app_id=APP_ID,
texts=texts,
pat=None,
number_of_docs=NUMBER_OF_DOCS,
metadatas=metadatas,
)
time.sleep(2.5)
output = docsearch.similarity_search_with_score("oof", k=1)
assert output[0][0] == Document(page_content="oof", metadata={"page": "0"})
assert abs(output[0][1] - 1.0) < 0.001