Added deeplake use case examples of the new features (#6528)

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
 
 1. Added use cases of the new features
 2. Done some code refactoring

---------

Co-authored-by: Ivo Stranic <istranic@gmail.com>
pull/7486/head
Adilkhan Sarsen 1 year ago committed by GitHub
parent 9b615022e2
commit 5debd5043e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,14 +1,15 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deep Lake\n",
"# Activeloop's Deep Lake\n",
"\n",
">[Deep Lake](https://docs.activeloop.ai/) as a Multi-Modal Vector Store that stores embeddings and their metadata including text, jsons, images, audio, video, and more. It saves the data locally, in your cloud, or on Activeloop storage. It performs hybrid search including embeddings and their attributes.\n",
">[Activeloop's Deep Lake](https://docs.activeloop.ai/) as a Multi-Modal Vector Store that stores embeddings and their metadata including text, jsons, images, audio, video, and more. It saves the data locally, in your cloud, or on Activeloop storage. It performs hybrid search including embeddings and their attributes.\n",
"\n",
"This notebook showcases basic functionality related to `Deep Lake`. While `Deep Lake` can store embeddings, it is capable of storing any type of data. It is a fully fledged serverless data lake with version control, query engine and streaming dataloader to deep learning frameworks. \n",
"This notebook showcases basic functionality related to `Activeloop's Deep Lake`. While `Deep Lake` can store embeddings, it is capable of storing any type of data. It is a serverless data lake with version control, query engine and streaming dataloaders to deep learning frameworks. \n",
"\n",
"For more information, please see the Deep Lake [documentation](https://docs.activeloop.ai) or [api reference](https://docs.deeplake.ai)"
]
@ -16,12 +17,10 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"!pip install openai deeplake tiktoken"
"!pip install openai 'deeplake[enterprise]' tiktoken"
]
},
{
@ -61,7 +60,7 @@
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"docs/modules/state_of_the_union.txt\")\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
@ -70,6 +69,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -78,31 +78,9 @@
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": []
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='./my_deeplake/', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (42, 1536) float32 None \n",
" id text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db = DeepLake(\n",
" dataset_path=\"./my_deeplake/\", embedding_function=embeddings, overwrite=True\n",
@ -116,30 +94,15 @@
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -148,19 +111,9 @@
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Deep Lake Dataset in ./my_deeplake/ already exists, loading from the storage\n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db = DeepLake(\n",
" dataset_path=\"./my_deeplake/\", embedding_function=embeddings, read_only=True\n",
@ -169,6 +122,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -176,6 +130,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -184,20 +139,9 @@
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/adilkhansarsen/Documents/work/LangChain/langchain/langchain/llms/openai.py:751: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`\n",
" warnings.warn(\n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.llms import OpenAIChat\n",
@ -211,64 +155,35 @@
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"'The President nominated Ketanji Brown Jackson to serve on the United States Supreme Court and spoke highly of her legal expertise and reputation as a consensus builder.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"qa.run(query)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Attribute based filtering in metadata"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create another vector store containing metadata with the year the documents were created."
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": []
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='./my_deeplake/', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (4, 1536) float32 None \n",
" id text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": []
}
],
"outputs": [],
"source": [
"import random\n",
"\n",
@ -282,29 +197,9 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 4/4 [00:00<00:00, 3300.00it/s]\n"
]
},
{
"data": {
"text/plain": [
"[Document(lc_kwargs={'page_content': 'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013})]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"db.similarity_search(\n",
" \"What did the president say about Ketanji Brown Jackson\",\n",
@ -313,6 +208,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -322,23 +218,9 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(lc_kwargs={'page_content': 'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2012}}, page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2012})]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"db.similarity_search(\n",
" \"What did the president say about Ketanji Brown Jackson?\", distance_metric=\"cos\"\n",
@ -346,6 +228,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -355,23 +238,9 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(lc_kwargs={'page_content': 'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2012}}, page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2012})]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"db.max_marginal_relevance_search(\n",
" \"What did the president say about Ketanji Brown Jackson?\"\n",
@ -379,6 +248,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -401,6 +271,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -423,11 +294,12 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deep Lake datasets on cloud (Activeloop, AWS, GCS, etc.) or in memory\n",
"By default deep lake datasets are stored locally, in case you want to store them in memory, in the Deep Lake Managed DB, or in any object storage, you can provide the [corresponding path to the dataset](https://docs.activeloop.ai/storage-and-credentials/storage-options). You can retrieve your user token from [app.activeloop.ai](https://app.activeloop.ai/)"
"By default, Deep Lake datasets are stored locally. To store them in memory, in the Deep Lake Managed DB, or in any object storage, you can provide the [corresponding path and credentials when creating the vector store](https://docs.activeloop.ai/storage-and-credentials/storage-options). Some paths require registration with Activeloop and creation of an API token that can be [retrieved here](https://app.activeloop.ai/)"
]
},
{
@ -439,106 +311,11 @@
"os.environ[\"ACTIVELOOP_TOKEN\"] = activeloop_token"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Deeplake now supports running the inference in 3 modes. `python` naive way of searching inside of the data, `tensor_db` which is managed database, it runs tql on a remote optimized engine and sends results back, and `compute_engine` which is C++ implementation of search that runs locally."
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your Deep Lake dataset has been successfully created!\n",
"The dataset is private so make sure you are logged in!\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://adilkhan/langchain_testing_python', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (42, 1536) float32 None \n",
" id text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/plain": [
"['d604b1ac-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b238-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b260-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b27e-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b29c-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b2ba-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b2d8-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b2f6-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b314-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b332-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b350-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b36e-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b38c-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b3a0-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b3be-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b3dc-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b3fa-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b418-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b436-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b454-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b472-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b490-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b4a4-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b4c2-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b4e0-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b4fe-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b51c-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b53a-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b558-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b576-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b594-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b5b2-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b5c6-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b5e4-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b602-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b620-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b63e-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b65c-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b67a-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b698-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b6b6-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b6d4-093c-11ee-bdba-76d8a30504e0']"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"# Embed and store the texts\n",
"username = \"<username>\" # your username on app.activeloop.ai\n",
@ -553,126 +330,40 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)\n",
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `tensor_db` execution option "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to utilize Deep Lake's Managed Tensor Database, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
]
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your Deep Lake dataset has been successfully created!\n",
"The dataset is private so make sure you are logged in!\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"|"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://adilkhan/langchain_testing', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (42, 1536) float32 None \n",
" id text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/plain": [
"['6584c33a-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c3ee-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c420-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c43e-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c466-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c484-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c4a2-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c4c0-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c4de-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c4fc-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c51a-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c538-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c556-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c574-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c592-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c5b0-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c5ce-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c5f6-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c614-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c632-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c646-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c66e-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c682-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c6a0-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c6be-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c6e6-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c704-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c722-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c740-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c75e-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c77c-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c79a-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c7ae-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c7cc-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c7ea-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c808-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c826-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c844-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c862-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c876-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c894-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c8bc-093d-11ee-bdba-76d8a30504e0']"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"# Embed and store the texts\n",
"username = \"adilkhan\" # your username on app.activeloop.ai\n",
"dataset_path = f\"hub://{username}/langchain_testing\" # could be also ./local/path (much faster locally), s3://bucket/path/to/dataset, gcs://path/to/dataset, etc.\n",
"dataset_path = f\"hub://{username}/langchain_testing\"\n",
"\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
@ -681,44 +372,13 @@
" dataset_path=dataset_path,\n",
" embedding_function=embeddings,\n",
" overwrite=True,\n",
" exec_option=\"tensor_db\",\n",
" runtime={\"tensor_db\": True},\n",
")\n",
"db.add_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query, exec_option=\"tensor_db\")\n",
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### The difference will be apparent on a bigger datasets (~10000 rows)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -726,15 +386,16 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"now we can use tql search with DeepLake"
"Furthermore, the execution of queries is also supported within the similarity_search method, whereby the query can be specified utilizing Deep Lake's Tensor Query Language (TQL)."
]
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
@ -743,42 +404,31 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"docs = db.similarity_search(\n",
" query=None,\n",
" tql_query=f\"SELECT * WHERE id == '{search_id[0]}'\",\n",
" exec_option=\"tensor_db\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(lc_kwargs={'page_content': 'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt'}}, page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', metadata={'source': 'docs/modules/state_of_the_union.txt'})]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"docs"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating dataset on AWS S3"
"### Creating vector stores on AWS S3"
]
},
{
@ -841,11 +491,12 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deep Lake API\n",
"you can access the Deep Lake dataset at `db.ds`"
"you can access the Deep Lake dataset at `db.vectorstore`"
]
},
{
@ -884,6 +535,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [

@ -5,8 +5,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Use LangChain, GPT and Deep Lake to work with code base\n",
"In this tutorial, we are going to use Langchain + Deep Lake with GPT to analyze the code base of the LangChain itself. "
"# Use LangChain, GPT and Activeloop's Deep Lake to work with code base\n",
"In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT to analyze the code base of the LangChain itself. "
]
},
{
@ -60,7 +60,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {
"tags": []
},
@ -81,19 +81,11 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 2,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"outputs": [],
"source": [
"import os\n",
"from getpass import getpass\n",
@ -112,21 +104,14 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 3,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"outputs": [],
"source": [
"os.environ[\"ACTIVELOOP_TOKEN\"] = getpass.getpass(\"Activeloop Token:\")"
"activeloop_token = getpass(\"Activeloop Token:\")\n",
"os.environ[\"ACTIVELOOP_TOKEN\"] = activeloop_token"
]
},
{
@ -149,19 +134,20 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!ls \"../../../..\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1147\n"
]
}
],
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
@ -189,180 +175,11 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Created a chunk of size 1620, which is longer than the specified 1000\n",
"Created a chunk of size 1213, which is longer than the specified 1000\n",
"Created a chunk of size 1263, which is longer than the specified 1000\n",
"Created a chunk of size 1448, which is longer than the specified 1000\n",
"Created a chunk of size 1120, which is longer than the specified 1000\n",
"Created a chunk of size 1148, which is longer than the specified 1000\n",
"Created a chunk of size 1826, which is longer than the specified 1000\n",
"Created a chunk of size 1260, which is longer than the specified 1000\n",
"Created a chunk of size 1195, which is longer than the specified 1000\n",
"Created a chunk of size 2147, which is longer than the specified 1000\n",
"Created a chunk of size 1410, which is longer than the specified 1000\n",
"Created a chunk of size 1269, which is longer than the specified 1000\n",
"Created a chunk of size 1030, which is longer than the specified 1000\n",
"Created a chunk of size 1046, which is longer than the specified 1000\n",
"Created a chunk of size 1024, which is longer than the specified 1000\n",
"Created a chunk of size 1026, which is longer than the specified 1000\n",
"Created a chunk of size 1285, which is longer than the specified 1000\n",
"Created a chunk of size 1370, which is longer than the specified 1000\n",
"Created a chunk of size 1031, which is longer than the specified 1000\n",
"Created a chunk of size 1999, which is longer than the specified 1000\n",
"Created a chunk of size 1029, which is longer than the specified 1000\n",
"Created a chunk of size 1120, which is longer than the specified 1000\n",
"Created a chunk of size 1033, which is longer than the specified 1000\n",
"Created a chunk of size 1143, which is longer than the specified 1000\n",
"Created a chunk of size 1416, which is longer than the specified 1000\n",
"Created a chunk of size 2482, which is longer than the specified 1000\n",
"Created a chunk of size 1890, which is longer than the specified 1000\n",
"Created a chunk of size 1418, which is longer than the specified 1000\n",
"Created a chunk of size 1848, which is longer than the specified 1000\n",
"Created a chunk of size 1069, which is longer than the specified 1000\n",
"Created a chunk of size 2369, which is longer than the specified 1000\n",
"Created a chunk of size 1045, which is longer than the specified 1000\n",
"Created a chunk of size 1501, which is longer than the specified 1000\n",
"Created a chunk of size 1208, which is longer than the specified 1000\n",
"Created a chunk of size 1950, which is longer than the specified 1000\n",
"Created a chunk of size 1283, which is longer than the specified 1000\n",
"Created a chunk of size 1414, which is longer than the specified 1000\n",
"Created a chunk of size 1304, which is longer than the specified 1000\n",
"Created a chunk of size 1224, which is longer than the specified 1000\n",
"Created a chunk of size 1060, which is longer than the specified 1000\n",
"Created a chunk of size 2461, which is longer than the specified 1000\n",
"Created a chunk of size 1099, which is longer than the specified 1000\n",
"Created a chunk of size 1178, which is longer than the specified 1000\n",
"Created a chunk of size 1449, which is longer than the specified 1000\n",
"Created a chunk of size 1345, which is longer than the specified 1000\n",
"Created a chunk of size 3359, which is longer than the specified 1000\n",
"Created a chunk of size 2248, which is longer than the specified 1000\n",
"Created a chunk of size 1589, which is longer than the specified 1000\n",
"Created a chunk of size 2104, which is longer than the specified 1000\n",
"Created a chunk of size 1505, which is longer than the specified 1000\n",
"Created a chunk of size 1387, which is longer than the specified 1000\n",
"Created a chunk of size 1215, which is longer than the specified 1000\n",
"Created a chunk of size 1240, which is longer than the specified 1000\n",
"Created a chunk of size 1635, which is longer than the specified 1000\n",
"Created a chunk of size 1075, which is longer than the specified 1000\n",
"Created a chunk of size 2180, which is longer than the specified 1000\n",
"Created a chunk of size 1791, which is longer than the specified 1000\n",
"Created a chunk of size 1555, which is longer than the specified 1000\n",
"Created a chunk of size 1082, which is longer than the specified 1000\n",
"Created a chunk of size 1225, which is longer than the specified 1000\n",
"Created a chunk of size 1287, which is longer than the specified 1000\n",
"Created a chunk of size 1085, which is longer than the specified 1000\n",
"Created a chunk of size 1117, which is longer than the specified 1000\n",
"Created a chunk of size 1966, which is longer than the specified 1000\n",
"Created a chunk of size 1150, which is longer than the specified 1000\n",
"Created a chunk of size 1285, which is longer than the specified 1000\n",
"Created a chunk of size 1150, which is longer than the specified 1000\n",
"Created a chunk of size 1585, which is longer than the specified 1000\n",
"Created a chunk of size 1208, which is longer than the specified 1000\n",
"Created a chunk of size 1267, which is longer than the specified 1000\n",
"Created a chunk of size 1542, which is longer than the specified 1000\n",
"Created a chunk of size 1183, which is longer than the specified 1000\n",
"Created a chunk of size 2424, which is longer than the specified 1000\n",
"Created a chunk of size 1017, which is longer than the specified 1000\n",
"Created a chunk of size 1304, which is longer than the specified 1000\n",
"Created a chunk of size 1379, which is longer than the specified 1000\n",
"Created a chunk of size 1324, which is longer than the specified 1000\n",
"Created a chunk of size 1205, which is longer than the specified 1000\n",
"Created a chunk of size 1056, which is longer than the specified 1000\n",
"Created a chunk of size 1195, which is longer than the specified 1000\n",
"Created a chunk of size 3608, which is longer than the specified 1000\n",
"Created a chunk of size 1058, which is longer than the specified 1000\n",
"Created a chunk of size 1075, which is longer than the specified 1000\n",
"Created a chunk of size 1217, which is longer than the specified 1000\n",
"Created a chunk of size 1109, which is longer than the specified 1000\n",
"Created a chunk of size 1440, which is longer than the specified 1000\n",
"Created a chunk of size 1046, which is longer than the specified 1000\n",
"Created a chunk of size 1220, which is longer than the specified 1000\n",
"Created a chunk of size 1403, which is longer than the specified 1000\n",
"Created a chunk of size 1241, which is longer than the specified 1000\n",
"Created a chunk of size 1427, which is longer than the specified 1000\n",
"Created a chunk of size 1049, which is longer than the specified 1000\n",
"Created a chunk of size 1580, which is longer than the specified 1000\n",
"Created a chunk of size 1565, which is longer than the specified 1000\n",
"Created a chunk of size 1131, which is longer than the specified 1000\n",
"Created a chunk of size 1425, which is longer than the specified 1000\n",
"Created a chunk of size 1054, which is longer than the specified 1000\n",
"Created a chunk of size 1027, which is longer than the specified 1000\n",
"Created a chunk of size 2559, which is longer than the specified 1000\n",
"Created a chunk of size 1028, which is longer than the specified 1000\n",
"Created a chunk of size 1382, which is longer than the specified 1000\n",
"Created a chunk of size 1888, which is longer than the specified 1000\n",
"Created a chunk of size 1475, which is longer than the specified 1000\n",
"Created a chunk of size 1652, which is longer than the specified 1000\n",
"Created a chunk of size 1891, which is longer than the specified 1000\n",
"Created a chunk of size 1899, which is longer than the specified 1000\n",
"Created a chunk of size 1021, which is longer than the specified 1000\n",
"Created a chunk of size 1085, which is longer than the specified 1000\n",
"Created a chunk of size 1854, which is longer than the specified 1000\n",
"Created a chunk of size 1672, which is longer than the specified 1000\n",
"Created a chunk of size 2537, which is longer than the specified 1000\n",
"Created a chunk of size 1251, which is longer than the specified 1000\n",
"Created a chunk of size 1734, which is longer than the specified 1000\n",
"Created a chunk of size 1642, which is longer than the specified 1000\n",
"Created a chunk of size 1376, which is longer than the specified 1000\n",
"Created a chunk of size 1253, which is longer than the specified 1000\n",
"Created a chunk of size 1642, which is longer than the specified 1000\n",
"Created a chunk of size 1419, which is longer than the specified 1000\n",
"Created a chunk of size 1438, which is longer than the specified 1000\n",
"Created a chunk of size 1427, which is longer than the specified 1000\n",
"Created a chunk of size 1684, which is longer than the specified 1000\n",
"Created a chunk of size 1760, which is longer than the specified 1000\n",
"Created a chunk of size 1157, which is longer than the specified 1000\n",
"Created a chunk of size 2504, which is longer than the specified 1000\n",
"Created a chunk of size 1082, which is longer than the specified 1000\n",
"Created a chunk of size 2268, which is longer than the specified 1000\n",
"Created a chunk of size 1784, which is longer than the specified 1000\n",
"Created a chunk of size 1311, which is longer than the specified 1000\n",
"Created a chunk of size 2972, which is longer than the specified 1000\n",
"Created a chunk of size 1144, which is longer than the specified 1000\n",
"Created a chunk of size 1825, which is longer than the specified 1000\n",
"Created a chunk of size 1508, which is longer than the specified 1000\n",
"Created a chunk of size 2901, which is longer than the specified 1000\n",
"Created a chunk of size 1715, which is longer than the specified 1000\n",
"Created a chunk of size 1062, which is longer than the specified 1000\n",
"Created a chunk of size 1206, which is longer than the specified 1000\n",
"Created a chunk of size 1102, which is longer than the specified 1000\n",
"Created a chunk of size 1184, which is longer than the specified 1000\n",
"Created a chunk of size 1002, which is longer than the specified 1000\n",
"Created a chunk of size 1065, which is longer than the specified 1000\n",
"Created a chunk of size 1871, which is longer than the specified 1000\n",
"Created a chunk of size 1754, which is longer than the specified 1000\n",
"Created a chunk of size 2413, which is longer than the specified 1000\n",
"Created a chunk of size 1771, which is longer than the specified 1000\n",
"Created a chunk of size 2054, which is longer than the specified 1000\n",
"Created a chunk of size 2000, which is longer than the specified 1000\n",
"Created a chunk of size 2061, which is longer than the specified 1000\n",
"Created a chunk of size 1066, which is longer than the specified 1000\n",
"Created a chunk of size 1419, which is longer than the specified 1000\n",
"Created a chunk of size 1368, which is longer than the specified 1000\n",
"Created a chunk of size 1008, which is longer than the specified 1000\n",
"Created a chunk of size 1227, which is longer than the specified 1000\n",
"Created a chunk of size 1745, which is longer than the specified 1000\n",
"Created a chunk of size 2296, which is longer than the specified 1000\n",
"Created a chunk of size 1083, which is longer than the specified 1000\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"3477\n"
]
}
],
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"\n",
@ -383,22 +200,11 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', document_model_name='text-embedding-ada-002', query_model_name='text-embedding-ada-002', embedding_ctx_length=8191, openai_api_key=None, openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"\n",
@ -417,11 +223,33 @@
"from langchain.vectorstores import DeepLake\n",
"\n",
"db = DeepLake.from_documents(\n",
" texts, embeddings, dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\"\n",
" texts, embeddings, dataset_path=f\"hub://{<org_id>}/langchain-code\"\n",
")\n",
"db"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"`Optional`: You can also use Deep Lake's Managed Tensor Database as a hosting service and run queries there. In order to do so, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# from langchain.vectorstores import DeepLake\n",
"\n",
"# db = DeepLake.from_documents(\n",
"# texts, embeddings, dataset_path=f\"hub://{<org_id>}/langchain-code\", runtime={\"tensor_db\": True}\n",
"# )\n",
"# db"
]
},
{
"attachments": {},
"cell_type": "markdown",
@ -433,66 +261,14 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/user_name/langchain-code\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://user_name/langchain-code loaded successfully.\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Deep Lake Dataset in hub://user_name/langchain-code already exists, loading from the storage\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://user_name/langchain-code', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (3477, 1536) float32 None \n",
" ids text (3477, 1) str None \n",
" metadata json (3477, 1) str None \n",
" text text (3477, 1) str None \n"
]
}
],
"outputs": [],
"source": [
"db = DeepLake(\n",
" dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\",\n",
" dataset_path=f\"hub://{<org_id>}/langchain-code\",\n",
" read_only=True,\n",
" embedding_function=embeddings,\n",
")"
@ -500,7 +276,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {
"tags": []
},
@ -523,7 +299,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {
"tags": []
},
@ -545,7 +321,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {
"tags": []
},
@ -658,7 +434,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.9.6"
}
},
"nbformat": 4,

@ -5,8 +5,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Analysis of Twitter the-algorithm source code with LangChain, GPT4 and Deep Lake\n",
"In this tutorial, we are going to use Langchain + Deep Lake with GPT4 to analyze the code base of the twitter algorithm. "
"# Analysis of Twitter the-algorithm source code with LangChain, GPT4 and Activeloop's Deep Lake\n",
"In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT4 to analyze the code base of the twitter algorithm. "
]
},
{
@ -15,7 +15,7 @@
"metadata": {},
"outputs": [],
"source": [
"!python3 -m pip install --upgrade langchain deeplake openai tiktoken"
"!python3 -m pip install --upgrade langchain 'deeplake[enterprise]' openai tiktoken"
]
},
{
@ -41,7 +41,8 @@
"from langchain.vectorstores import DeepLake\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
"os.environ[\"ACTIVELOOP_TOKEN\"] = getpass.getpass(\"Activeloop Token:\")"
"activeloop_token = getpass.getpass(\"Activeloop Token:\")\n",
"os.environ[\"ACTIVELOOP_TOKEN\"] = activeloop_token"
]
},
{
@ -149,6 +150,29 @@
"db.add_documents(texts)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"`Optional`: You can also use Deep Lake's Managed Tensor Database as a hosting service and run queries there. In order to do so, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# username = \"davitbun\" # replace with your username from app.activeloop.ai\n",
"# db = DeepLake(\n",
"# dataset_path=f\"hub://{username}/twitter-algorithm\",\n",
"# embedding_function=embeddings,\n",
"# runtime={\"tensor_db\": True}\n",
"# )\n",
"# db.add_documents(texts)"
]
},
{
"attachments": {},
"cell_type": "markdown",
@ -176,6 +200,7 @@
" dataset_path=\"hub://davitbun/twitter-algorithm\",\n",
" read_only=True,\n",
" embedding_function=embeddings,\n",
" \n",
")"
]
},

@ -1,16 +1,18 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Question answering over a group chat messages\n",
"In this tutorial, we are going to use Langchain + Deep Lake with GPT4 to semantically search and ask questions over a group chat.\n",
"# Question answering over a group chat messages using Activeloop's DeepLake\n",
"In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT4 to semantically search and ask questions over a group chat.\n",
"\n",
"View a working demo [here](https://twitter.com/thisissukh_/status/1647223328363679745)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -23,10 +25,11 @@
"metadata": {},
"outputs": [],
"source": [
"!python3 -m pip install --upgrade langchain deeplake openai tiktoken"
"!python3 -m pip install --upgrade langchain 'deeplake[enterprise]' openai tiktoken"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -34,6 +37,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": []
@ -58,16 +62,18 @@
"from langchain.llms import OpenAI\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
"os.environ[\"ACTIVELOOP_TOKEN\"] = getpass.getpass(\"Activeloop Token:\")\n",
"activeloop_token = getpass.getpass(\"Activeloop Token:\")\n",
"os.environ[\"ACTIVELOOP_TOKEN\"] = activeloop_token\n",
"os.environ[\"ACTIVELOOP_ORG\"] = getpass.getpass(\"Activeloop Org:\")\n",
"\n",
"org = os.environ[\"ACTIVELOOP_ORG\"]\n",
"org_id = os.environ[\"ACTIVELOOP_ORG\"]\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
"dataset_path = \"hub://\" + org + \"/data\""
"dataset_path = \"hub://\" + org_id + \"/data\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -77,6 +83,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -117,6 +124,38 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"`Optional`: You can also use Deep Lake's Managed Tensor Database as a hosting service and run queries there. In order to do so, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# with open(\"messages.txt\") as f:\n",
"# state_of_the_union = f.read()\n",
"# text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"# pages = text_splitter.split_text(state_of_the_union)\n",
"\n",
"# text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)\n",
"# texts = text_splitter.create_documents(pages)\n",
"\n",
"# print(texts)\n",
"\n",
"# dataset_path = \"hub://\" + org + \"/data\"\n",
"# embeddings = OpenAIEmbeddings()\n",
"# db = DeepLake.from_documents(\n",
"# texts, embeddings, dataset_path=dataset_path, overwrite=True, runtime=\"tensor_db\"\n",
"# )"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [

@ -61,7 +61,7 @@ class DeepLake(VectorStore):
ingestion_batch_size: int = 1000,
num_workers: int = 0,
verbose: bool = True,
exec_option: str = "python",
exec_option: Optional[str] = None,
**kwargs: Any,
) -> None:
"""Creates an empty DeepLakeVectorStore or loads an existing one.
@ -96,19 +96,21 @@ class DeepLake(VectorStore):
Default is 0.
verbose (bool): Print dataset summary after each operation.
Default is True.
exec_option (str): DeepLakeVectorStore supports 3 ways to perform
searching - "python", "compute_engine", "tensor_db".
Default is "python".
exec_option (str, optional): DeepLakeVectorStore supports 3 ways to perform
searching - "python", "compute_engine", "tensor_db" and auto.
Default is None.
- ``auto``- Selects the best execution method based on the storage
location of the Vector Store. It is the default option.
- ``python`` - Pure-python implementation that runs on the client.
WARNING: using this with big datasets can lead to memory
issues. Data can be stored anywhere.
WARNING: using this with big datasets can lead to memory
issues. Data can be stored anywhere.
- ``compute_engine`` - C++ implementation of the Deep Lake Compute
Engine that runs on the client. Can be used for any data stored in
or connected to Deep Lake. Not for in-memory or local datasets.
Engine that runs on the client. Can be used for any data stored in
or connected to Deep Lake. Not for in-memory or local datasets.
- ``tensor_db`` - Hosted Managed Tensor Database that is
responsible for storage and query execution. Only for data stored in
the Deep Lake Managed Database. Use runtime = {"db_engine": True} during
dataset creation.
responsible for storage and query execution. Only for data stored in
the Deep Lake Managed Database. Use runtime = {"db_engine": True}
during dataset creation.
**kwargs: Other optional keyword arguments.
Raises:
@ -122,15 +124,18 @@ class DeepLake(VectorStore):
if _DEEPLAKE_INSTALLED is False:
raise ValueError(
"Could not import deeplake python package. "
"Please install it with `pip install deeplake`."
"Please install it with `pip install deeplake[enterprise]`."
)
if version_compare(deeplake.__version__, "3.6.2") == -1:
if (
kwargs.get("runtime") == {"tensor_db": True}
and version_compare(deeplake.__version__, "3.6.7") == -1
):
raise ValueError(
"deeplake version should be >= 3.6.3, but you've installed"
f" {deeplake.__version__}. Consider upgrading deeplake version \
pip install --upgrade deeplake."
"To use tensor_db option you need to update deeplake to `3.6.7`. "
f"Currently installed deeplake version is {deeplake.__version__}. "
)
self.dataset_path = dataset_path
self.vectorstore = DeepLakeVectorStore(
@ -181,6 +186,14 @@ class DeepLake(VectorStore):
if metadatas is None:
metadatas = [{}] * len(list(texts))
if not isinstance(texts, list):
texts = list(texts)
if texts is None:
raise ValueError("`texts` parameter shouldn't be None.")
elif len(texts) == 0:
raise ValueError("`texts` parameter shouldn't be empty.")
return self.vectorstore.add(
text=texts,
metadata=metadatas,
@ -196,8 +209,8 @@ class DeepLake(VectorStore):
self,
tql_query: Optional[str],
exec_option: Optional[str] = None,
return_score: bool = False,
) -> Any[List[Document], List[Tuple[Document, float]]]:
**kwargs: Any,
) -> List[Document]:
"""Function for performing tql_search.
Args:
@ -216,7 +229,9 @@ class DeepLake(VectorStore):
return_score (bool): Return score with document. Default is False.
Returns:
List[Document] - A list of documents
Tuple[List[Document], List[Tuple[Document, float]]] - A tuple of two lists.
The first list contains Documents, and the second list contains
tuples of Document and float score.
Raises:
ValueError: If return_score is True but some condition is not met.
@ -236,8 +251,13 @@ class DeepLake(VectorStore):
for text, metadata in zip(texts, metadatas)
]
if return_score:
raise ValueError("scores can't be returned with tql search")
if kwargs:
unsupported_argument = next(iter(kwargs))
if kwargs[unsupported_argument] is not False:
raise ValueError(
f"specifying {unsupported_argument} is "
"not supported with tql search."
)
return docs
@ -301,6 +321,11 @@ class DeepLake(VectorStore):
tql_query=kwargs["tql_query"],
exec_option=exec_option,
return_score=return_score,
embedding=embedding,
embedding_function=embedding_function,
distance_metric=distance_metric,
use_maximal_marginal_relevance=use_maximal_marginal_relevance,
filter=filter,
)
if embedding_function:
@ -384,7 +409,8 @@ class DeepLake(VectorStore):
... exec_option=<preferred_exec_option>,
... )
>>> # Run tql search:
>>> data = vector_store.tql_search(
>>> data = vector_store.similarity_search(
... query=None,
... tql_query="SELECT * WHERE id == <id>",
... exec_option="compute_engine",
... )
@ -787,3 +813,10 @@ class DeepLake(VectorStore):
def delete_dataset(self) -> None:
"""Delete the collection."""
self.delete(delete_all=True)
def ds(self) -> Any:
logger.warning(
"this method is deprecated and will be removed, "
"better to use `db.vectorstore.dataset` instead."
)
return self.vectorstore.dataset

597
poetry.lock generated

File diff suppressed because it is too large Load Diff

@ -61,7 +61,8 @@ arxiv = {version = "^1.4", optional = true}
pypdf = {version = "^3.4.0", optional = true}
networkx = {version="^2.6.3", optional = true}
aleph-alpha-client = {version="^2.15.0", optional = true}
deeplake = {version = "^3.6.2", optional = true}
deeplake = {version = "^3.6.8", optional = true}
libdeeplake = {version = "^0.0.60", optional = true}
pgvector = {version = "^0.1.6", optional = true}
psycopg2-binary = {version = "^2.9.5", optional = true}
pyowm = {version = "^3.3.0", optional = true}
@ -180,7 +181,8 @@ pinecone-text = "^0.4.2"
pymongo = "^4.3.3"
clickhouse-connect = "^0.5.14"
transformers = "^4.27.4"
deeplake = "^3.2.21"
deeplake = "^3.6.8"
libdeeplake = "^0.0.60"
weaviate-client = "^3.15.5"
torch = "^1.0.0"
chromadb = "^0.3.21"
@ -278,6 +280,7 @@ all = [
"nomic",
"aleph-alpha-client",
"deeplake",
"libdeeplake",
"pgvector",
"psycopg2-binary",
"pyowm",

@ -13,10 +13,11 @@ def deeplake_datastore() -> DeepLake:
texts = ["foo", "bar", "baz"]
metadatas = [{"page": str(i)} for i in range(len(texts))]
docsearch = DeepLake.from_texts(
dataset_path="mem://test_path",
dataset_path="./test_path",
texts=texts,
metadatas=metadatas,
embedding=FakeEmbeddings(),
overwrite=True,
)
return docsearch
@ -131,6 +132,15 @@ def test_similarity_search(deeplake_datastore: DeepLake, distance_metric: str) -
"foo", k=1, distance_metric=distance_metric
)
assert output == [Document(page_content="foo", metadata={"page": "0"})]
tql_query = (
f"SELECT * WHERE "
f"id=='{deeplake_datastore.vectorstore.dataset.id[0].numpy()[0]}'"
)
with pytest.raises(ValueError):
output = deeplake_datastore.similarity_search(
query="foo", tql_query=tql_query, k=1, distance_metric=distance_metric
)
deeplake_datastore.delete_dataset()

Loading…
Cancel
Save