2023-02-27 06:11:38 +00:00
{
"cells": [
{
2023-08-24 00:49:44 +00:00
"attachments": {},
2023-02-27 06:11:38 +00:00
"cell_type": "markdown",
2023-03-10 00:31:14 +00:00
"metadata": {},
2023-02-27 06:11:38 +00:00
"source": [
2023-05-18 22:35:47 +00:00
"# Atlas\n",
2023-02-27 06:11:38 +00:00
"\n",
2023-04-29 02:26:50 +00:00
"\n",
2023-08-24 00:49:44 +00:00
">[Atlas](https://docs.nomic.ai/index.html) is a platform by Nomic made for interacting with both small and internet scale unstructured datasets. It enables anyone to visualize, search, and share massive datasets in their browser.\n",
2023-05-18 22:35:47 +00:00
"\n",
"This notebook shows you how to use functionality related to the `AtlasDB` vectorstore."
2023-04-29 02:26:50 +00:00
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install spacy"
2023-02-27 06:11:38 +00:00
]
},
{
"cell_type": "code",
2023-03-10 00:31:14 +00:00
"execution_count": null,
2023-02-27 06:11:38 +00:00
"metadata": {
"pycharm": {
2023-03-10 00:31:14 +00:00
"is_executing": true
2023-04-29 02:26:50 +00:00
},
"scrolled": true,
"tags": []
2023-02-27 06:11:38 +00:00
},
"outputs": [],
"source": [
2023-04-29 02:26:50 +00:00
"!python3 -m spacy download en_core_web_sm"
2023-02-27 06:11:38 +00:00
]
},
{
"cell_type": "code",
2023-03-10 00:31:14 +00:00
"execution_count": null,
2023-04-29 02:26:50 +00:00
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install nomic"
]
},
2023-08-24 00:49:44 +00:00
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Packages"
]
},
2023-04-29 02:26:50 +00:00
{
"cell_type": "code",
"execution_count": 6,
2023-02-27 06:11:38 +00:00
"metadata": {
2023-03-10 00:31:14 +00:00
"pycharm": {
"is_executing": true
2023-03-27 02:49:46 +00:00
},
2023-04-29 02:26:50 +00:00
"tags": []
2023-03-10 00:31:14 +00:00
},
"outputs": [],
2023-02-27 06:11:38 +00:00
"source": [
2023-04-29 02:26:50 +00:00
"import time\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import SpacyTextSplitter\n",
"from langchain.vectorstores import AtlasDB\n",
"from langchain.document_loaders import TextLoader"
2023-02-27 06:11:38 +00:00
]
},
{
"cell_type": "code",
2023-04-29 02:26:50 +00:00
"execution_count": 7,
"metadata": {
"tags": []
},
2023-02-27 06:11:38 +00:00
"outputs": [],
"source": [
2023-06-16 18:52:56 +00:00
"ATLAS_TEST_API_KEY = \"7xDPkYXSYDc1_ErdTPIcoAR9RNd8YDlkS3nVNXcVoIMZ6\""
2023-02-27 06:11:38 +00:00
]
},
2023-08-24 00:49:44 +00:00
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare the Data"
]
},
2023-02-27 06:11:38 +00:00
{
"cell_type": "code",
2023-04-29 02:26:50 +00:00
"execution_count": 8,
"metadata": {
"tags": []
},
2023-02-27 06:11:38 +00:00
"outputs": [],
"source": [
2023-06-16 18:52:56 +00:00
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
2023-02-27 06:11:38 +00:00
"documents = loader.load()\n",
2023-06-16 18:52:56 +00:00
"text_splitter = SpacyTextSplitter(separator=\"|\")\n",
2023-02-27 06:11:38 +00:00
"texts = []\n",
"for doc in text_splitter.split_documents(documents):\n",
2023-06-16 18:52:56 +00:00
" texts.extend(doc.page_content.split(\"|\"))\n",
"\n",
2023-02-27 06:11:38 +00:00
"texts = [e.strip() for e in texts]"
]
},
2023-08-24 00:49:44 +00:00
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Map the Data using Nomic's Atlas"
]
},
2023-02-27 06:11:38 +00:00
{
"cell_type": "code",
2023-03-10 00:31:14 +00:00
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": true
2023-04-29 02:26:50 +00:00
},
"tags": []
2023-03-10 00:31:14 +00:00
},
"outputs": [],
2023-02-27 06:11:38 +00:00
"source": [
2023-06-16 18:52:56 +00:00
"db = AtlasDB.from_texts(\n",
" texts=texts,\n",
" name=\"test_index_\" + str(time.time()), # unique name for your vector store\n",
" description=\"test_index\", # a description for your vector store\n",
" api_key=ATLAS_TEST_API_KEY,\n",
" index_kwargs={\"build_topic_model\": True},\n",
")"
2023-02-27 06:11:38 +00:00
]
},
{
"cell_type": "code",
2023-03-10 00:31:14 +00:00
"execution_count": null,
2023-03-27 02:49:46 +00:00
"metadata": {},
2023-03-10 00:31:14 +00:00
"outputs": [],
2023-02-27 06:11:38 +00:00
"source": [
2023-03-10 00:31:14 +00:00
"db.project.wait_for_project_lock()"
2023-03-27 02:49:46 +00:00
]
2023-02-27 06:11:38 +00:00
},
{
"cell_type": "code",
2023-08-24 00:49:44 +00:00
"execution_count": null,
2023-02-27 06:11:38 +00:00
"metadata": {},
2023-08-24 00:49:44 +00:00
"outputs": [],
2023-02-27 06:11:38 +00:00
"source": [
"db.project"
]
2023-08-24 00:49:44 +00:00
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is a map with the result of this code. This map displays the texts of the State of the Union.\n",
"https://atlas.nomic.ai/map/3e4de075-89ff-486a-845c-36c23f30bb67/d8ce2284-8edb-4050-8b9b-9bb543d7f647"
]
2023-02-27 06:11:38 +00:00
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2023-04-29 02:26:50 +00:00
"version": "3.10.6"
2023-02-27 06:11:38 +00:00
}
},
"nbformat": 4,
2023-04-29 02:26:50 +00:00
"nbformat_minor": 4
2023-03-10 00:31:14 +00:00
}