mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
305 lines
16 KiB
Plaintext
305 lines
16 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "683953b3",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Clarifai\n",
|
||
"\n",
|
||
">[Clarifai](https://www.clarifai.com/) is an AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model training, evaluation, and inference. A Clarifai application can be used as a vector database after uploading inputs. \n",
|
||
"\n",
|
||
"This notebook shows how to use functionality related to the `Clarifai` vector database.\n",
|
||
"\n",
|
||
"To use Clarifai, you must have an account and a Personal Access Token (PAT) key. \n",
|
||
"[Check here](https://clarifai.com/settings/security) to get or create a PAT."
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "1eecfb1c",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Dependencies"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "b4c41cad-08ef-4f72-a545-2151e4598efe",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Install required dependencies\n",
|
||
"!pip install clarifai"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "93039ada",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Imports\n",
|
||
"Here we will be setting the personal access token. You can find your PAT under settings/security on the platform."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Please login and get your API key from https://clarifai.com/settings/security\n",
|
||
"from getpass import getpass\n",
|
||
"\n",
|
||
"CLARIFAI_PAT = getpass()"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "320af802-9271-46ee-948f-d2453933d44b",
|
||
"metadata": {},
|
||
"source": [
|
||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "aac9563e",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Import the required modules\n",
|
||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||
"from langchain.document_loaders import TextLoader\n",
|
||
"from langchain.vectorstores import Clarifai"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "edcf5159",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Setup\n",
|
||
"Setup the user id and app id where the text data will be uploaded. Note: when creating that application please select an appropriate base workflow for indexing your text documents such as the Language-Understanding workflow.\n",
|
||
"\n",
|
||
"You will have to first create an account on [Clarifai](https://clarifai.com/login) and then create an application."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "4d853395",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"USER_ID = \"USERNAME_ID\"\n",
|
||
"APP_ID = \"APPLICATION_ID\"\n",
|
||
"NUMBER_OF_DOCS = 4"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "5631bdd5",
|
||
"metadata": {},
|
||
"source": [
|
||
"## From Texts\n",
|
||
"Create a Clarifai vectorstore from a list of texts. This section will upload each text with its respective metadata to a Clarifai Application. The Clarifai Application can then be used for semantic search to find relevant texts."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "1d828f77",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"texts = [\n",
|
||
" \"I really enjoy spending time with you\",\n",
|
||
" \"I hate spending time with my dog\",\n",
|
||
" \"I want to go for a run\",\n",
|
||
" \"I went to the movies yesterday\",\n",
|
||
" \"I love playing soccer with my friends\",\n",
|
||
"]\n",
|
||
"\n",
|
||
"metadatas = [{\"id\": i, \"text\": text} for i, text in enumerate(texts)]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "738bff27",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"clarifai_vector_db = Clarifai.from_texts(\n",
|
||
" user_id=USER_ID,\n",
|
||
" app_id=APP_ID,\n",
|
||
" texts=texts,\n",
|
||
" pat=CLARIFAI_PAT,\n",
|
||
" number_of_docs=NUMBER_OF_DOCS,\n",
|
||
" metadatas=metadatas,\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "e755cdce",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[Document(page_content='I really enjoy spending time with you', metadata={'text': 'I really enjoy spending time with you', 'id': 0.0}),\n",
|
||
" Document(page_content='I went to the movies yesterday', metadata={'text': 'I went to the movies yesterday', 'id': 3.0}),\n",
|
||
" Document(page_content='zab', metadata={'page': '2'}),\n",
|
||
" Document(page_content='zab', metadata={'page': '2'})]"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"docs = clarifai_vector_db.similarity_search(\"I would love to see you\")\n",
|
||
"docs"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "c39504e4",
|
||
"metadata": {},
|
||
"source": [
|
||
"## From Documents\n",
|
||
"Create a Clarifai vectorstore from a list of Documents. This section will upload each document with its respective metadata to a Clarifai Application. The Clarifai Application can then be used for semantic search to find relevant documents."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"id": "a3c3999a",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||
"documents = loader.load()\n",
|
||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||
"docs = text_splitter.split_documents(documents)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "69ae7e35",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
|
||
" Document(page_content='Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \\n\\nIn this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. \\n\\nLet each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. \\n\\nPlease rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. \\n\\nThroughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos. \\n\\nThey keep moving. \\n\\nAnd the costs and the threats to America and the world keep rising. \\n\\nThat’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2. \\n\\nThe United States is a member along with 29 other nations. \\n\\nIt matters. American diplomacy matters. American resolve matters.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
|
||
" Document(page_content='Putin’s latest attack on Ukraine was premeditated and unprovoked. \\n\\nHe rejected repeated efforts at diplomacy. \\n\\nHe thought the West and NATO wouldn’t respond. And he thought he could divide us at home. Putin was wrong. We were ready. Here is what we did. \\n\\nWe prepared extensively and carefully. \\n\\nWe spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. \\n\\nI spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression. \\n\\nWe countered Russia’s lies with truth. \\n\\nAnd now that he has acted the free world is holding him accountable. \\n\\nAlong with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
|
||
" Document(page_content='We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. \\n\\nTogether with our allies –we are right now enforcing powerful economic sanctions. \\n\\nWe are cutting off Russia’s largest banks from the international financial system. \\n\\nPreventing Russia’s central bank from defending the Russian Ruble making Putin’s $630 Billion “war fund” worthless. \\n\\nWe are choking off Russia’s access to technology that will sap its economic strength and weaken its military for years to come. \\n\\nTonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. \\n\\nThe U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. \\n\\nWe are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains.', metadata={'source': '../../../state_of_the_union.txt'})]"
|
||
]
|
||
},
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"docs[:4]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"id": "40bf1305",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"USER_ID = \"USERNAME_ID\"\n",
|
||
"APP_ID = \"APPLICATION_ID\"\n",
|
||
"NUMBER_OF_DOCS = 4"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "6e104aee",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"clarifai_vector_db = Clarifai.from_documents(\n",
|
||
" user_id=USER_ID,\n",
|
||
" app_id=APP_ID,\n",
|
||
" documents=docs,\n",
|
||
" pat=CLARIFAI_PAT_KEY,\n",
|
||
" number_of_docs=NUMBER_OF_DOCS,\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "9c608226",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[Document(page_content='And I will keep doing everything in my power to crack down on gun trafficking and ghost guns you can buy online and make at home—they have no serial numbers and can’t be traced. \\n\\nAnd I ask Congress to pass proven measures to reduce gun violence. Pass universal background checks. Why should anyone on a terrorist list be able to purchase a weapon? \\n\\nBan assault weapons and high-capacity magazines. \\n\\nRepeal the liability shield that makes gun manufacturers the only industry in America that can’t be sued. \\n\\nThese laws don’t infringe on the Second Amendment. They save lives. \\n\\nThe most fundamental right in America is the right to vote – and to have it counted. And it’s under assault. \\n\\nIn state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
|
||
" Document(page_content='We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \\n\\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \\n\\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \\n\\nOfficer Mora was 27 years old. \\n\\nOfficer Rivera was 22. \\n\\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \\n\\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \\n\\nI’ve worked on these issues a long time. \\n\\nI know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
|
||
" Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
|
||
" Document(page_content='So let’s not abandon our streets. Or choose between safety and equal justice. \\n\\nLet’s come together to protect our communities, restore trust, and hold law enforcement accountable. \\n\\nThat’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. \\n\\nThat’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope. \\n\\nWe should all agree: The answer is not to Defund the police. The answer is to FUND the police with the resources and training they need to protect our communities. \\n\\nI ask Democrats and Republicans alike: Pass my budget and keep our neighborhoods safe.', metadata={'source': '../../../state_of_the_union.txt'})]"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"docs = clarifai_vector_db.similarity_search(\"Texts related to criminals and violence\")\n",
|
||
"docs"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.9.16"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|