Updates to Nomic Atlas and GPT4All documentation (#9414)

Description: Updates for Nomic AI Atlas and GPT4All integrations
documentation.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
pull/9672/head
Lakshay Kansal 11 months ago committed by GitHub
parent 342087bdfa
commit a8c916955f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -30,6 +31,14 @@
"%pip install gpt4all > /dev/null"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import GPT4All"
]
},
{
"cell_type": "code",
"execution_count": 2,
@ -43,6 +52,14 @@
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set Up Question to pass to LLM"
]
},
{
"cell_type": "code",
"execution_count": 3,
@ -59,6 +76,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -66,18 +84,14 @@
"\n",
"To run locally, download a compatible ggml-formatted model. \n",
" \n",
"**Download option 1**: The [gpt4all page](https://gpt4all.io/index.html) has a useful `Model Explorer` section:\n",
"The [gpt4all page](https://gpt4all.io/index.html) has a useful `Model Explorer` section:\n",
"\n",
"* Select a model of interest\n",
"* Download using the UI and move the `.bin` to the `local_path` (noted below)\n",
"\n",
"For more info, visit https://github.com/nomic-ai/gpt4all.\n",
"\n",
"--- \n",
"\n",
"**Download option 2**: Uncomment the below block to download a model. \n",
"\n",
"* You may want to update `url` to a new version, whih can be browsed using the [gpt4all page](https://gpt4all.io/index.html)."
"---"
]
},
{
@ -88,27 +102,7 @@
"source": [
"local_path = (\n",
" \"./models/ggml-gpt4all-l13b-snoozy.bin\" # replace with your desired local file path\n",
")\n",
"\n",
"# import requests\n",
"\n",
"# from pathlib import Path\n",
"# from tqdm import tqdm\n",
"\n",
"# Path(local_path).parent.mkdir(parents=True, exist_ok=True)\n",
"\n",
"# # Example model. Check https://github.com/nomic-ai/gpt4all for the latest models.\n",
"# url = 'http://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin'\n",
"\n",
"# # send a GET request to the URL to download the file. Stream since it's large\n",
"# response = requests.get(url, stream=True)\n",
"\n",
"# # open the file in binary mode and write the contents of the response to it in chunks\n",
"# # This is a large file, so be prepared to wait.\n",
"# with open(local_path, 'wb') as f:\n",
"# for chunk in tqdm(response.iter_content(chunk_size=8192)):\n",
"# if chunk:\n",
"# f.write(chunk)"
")"
]
},
{
@ -147,6 +141,14 @@
"\n",
"llm_chain.run(question)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Justin Bieber was born on March 1, 1994. In 1994, The Cowboys won Super Bowl XXVIII."
]
}
],
"metadata": {

@ -1,15 +1,27 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "d63d56c2",
"metadata": {},
"source": [
"# GPT4All\n",
"\n",
"[GPT4All](https://gpt4all.io/index.html) is a free-to-use, locally running, privacy-aware chatbot. There is no GPU or internet required. It features popular models and its own models such as GPT4All Falcon, Wizard, etc.\n",
"\n",
"This notebook explains how to use [GPT4All embeddings](https://docs.gpt4all.io/gpt4all_python_embedding.html#gpt4all.gpt4all.Embed4All) with LangChain."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "46b7aa85",
"metadata": {},
"source": [
"## Install GPT4All's Python Bindings"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -17,7 +29,16 @@
"metadata": {},
"outputs": [],
"source": [
"! pip install gpt4all"
"%pip install gpt4all > /dev/null"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d80f4b92",
"metadata": {},
"source": [
"Note: you may need to restart the kernel to use updated packages."
]
},
{
@ -72,6 +93,15 @@
"text = \"This is a test document.\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "eef36bde",
"metadata": {},
"source": [
"## Embed the Textual Data"
]
},
{
"cell_type": "code",
"execution_count": 4,
@ -82,6 +112,15 @@
"query_result = gpt4all_embd.embed_query(text)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "12b24e69",
"metadata": {},
"source": [
"With embed_documents you can embed multiple pieces of text. You can also map these embeddings with [Nomic's Atlas](https://docs.nomic.ai/index.html) to see a visual representation of your data."
]
},
{
"cell_type": "code",
"execution_count": 5,

@ -1,13 +1,14 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Atlas\n",
"\n",
"\n",
">[Atlas](https://docs.nomic.ai/index.html) is a platform for interacting with both small and internet scale unstructured datasets by `Nomic`. \n",
">[Atlas](https://docs.nomic.ai/index.html) is a platform by Nomic made for interacting with both small and internet scale unstructured datasets. It enables anyone to visualize, search, and share massive datasets in their browser.\n",
"\n",
"This notebook shows you how to use functionality related to the `AtlasDB` vectorstore."
]
@ -49,6 +50,14 @@
"!pip install nomic"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Packages"
]
},
{
"cell_type": "code",
"execution_count": 6,
@ -78,6 +87,14 @@
"ATLAS_TEST_API_KEY = \"7xDPkYXSYDc1_ErdTPIcoAR9RNd8YDlkS3nVNXcVoIMZ6\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare the Data"
]
},
{
"cell_type": "code",
"execution_count": 8,
@ -96,6 +113,14 @@
"texts = [e.strip() for e in texts]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Map the Data using Nomic's Atlas"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -127,78 +152,21 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <strong><a href=\"https://atlas.nomic.ai/dashboard/project/ee2354a3-7f9a-4c6b-af43-b0cda09d7198\">test_index_1677255228.136989</strong></a>\n",
" <br>\n",
" A description for your project 508 datums inserted.\n",
" <br>\n",
" 1 index built.\n",
" <br><strong>Projections</strong>\n",
"<ul>\n",
"<li>test_index_1677255228.136989_index. Status Completed. <a target=\"_blank\" href=\"https://atlas.nomic.ai/map/ee2354a3-7f9a-4c6b-af43-b0cda09d7198/db996d77-8981-48a0-897a-ff2c22bbf541\">view online</a></li></ul><hr><script>\n",
" destroy = function() {\n",
" document.getElementById(\"iframedb996d77-8981-48a0-897a-ff2c22bbf541\").remove()\n",
" }\n",
" </script>\n",
"\n",
" <h4>Projection ID: db996d77-8981-48a0-897a-ff2c22bbf541</h4>\n",
" <div class=\"actions\">\n",
" <div id=\"hide\" class=\"action\" onclick=\"destroy()\">Hide embedded project</div>\n",
" <div class=\"action\" id=\"out\">\n",
" <a href=\"https://atlas.nomic.ai/map/ee2354a3-7f9a-4c6b-af43-b0cda09d7198/db996d77-8981-48a0-897a-ff2c22bbf541\" target=\"_blank\">Explore on atlas.nomic.ai</a>\n",
" </div>\n",
" </div>\n",
" \n",
" <iframe class=\"iframe\" id=\"iframedb996d77-8981-48a0-897a-ff2c22bbf541\" allow=\"clipboard-read; clipboard-write\" src=\"https://atlas.nomic.ai/map/ee2354a3-7f9a-4c6b-af43-b0cda09d7198/db996d77-8981-48a0-897a-ff2c22bbf541\">\n",
" </iframe>\n",
"\n",
" <style>\n",
" .iframe {\n",
" /* vh can be **very** large in vscode html. */\n",
" height: min(75vh, 66vw);\n",
" width: 100%;\n",
" }\n",
" </style>\n",
" \n",
" <style>\n",
" .actions {\n",
" display: block;\n",
" }\n",
" .action {\n",
" min-height: 18px;\n",
" margin: 5px;\n",
" transition: all 500ms ease-in-out;\n",
" }\n",
" .action:hover {\n",
" cursor: pointer;\n",
" }\n",
" #hide:hover::after {\n",
" content: \" X\";\n",
" }\n",
" #out:hover::after {\n",
" content: \"\";\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"AtlasProject: <{'id': 'ee2354a3-7f9a-4c6b-af43-b0cda09d7198', 'owner': '9c29afbb-a002-4d49-958e-ecf5ae1351ac', 'project_name': 'test_index_1677255228.136989', 'creator': 'auth0|63efc4b5462246f4d9a6ecf2', 'description': 'A description for your project', 'opensearch_index_id': 'f61fb8dd-0abf-4f31-9130-41870e443902', 'is_public': True, 'project_fields': ['atlas_id', 'text'], 'unique_id_field': 'atlas_id', 'modality': 'text', 'total_datums_in_project': 508, 'created_timestamp': '2023-02-24T16:13:50.313363+00:00', 'atlas_indices': [{'id': 'b1b01833-0964-4597-a4bc-a2d60700949d', 'project_id': 'ee2354a3-7f9a-4c6b-af43-b0cda09d7198', 'index_name': 'test_index_1677255228.136989_index', 'indexed_field': 'text', 'created_timestamp': '2023-02-24T16:13:52.957101+00:00', 'updated_timestamp': '2023-02-24T16:14:03.469621+00:00', 'atoms': ['charchunk', 'document'], 'colorable_fields': [], 'embedders': [{'id': '7ec0868a-4eed-4414-a482-25cce9803e1b', 'atlas_index_id': 'b1b01833-0964-4597-a4bc-a2d60700949d', 'ready': True, 'model_name': 'NomicEmbed', 'hyperparameters': {'norm': 'both', 'batch_size': 20, 'polymerize_by': 'charchunk', 'dataset_buffer_size': 1000}}], 'nearest_neighbor_indices': [{'id': '86f8e3ff-e07c-4678-a4d7-144db4b0301d', 'index_name': 'NomicOrganize', 'ready': True, 'hyperparameters': {'dim': 384, 'space': 'l2'}, 'atom_strategies': ['document']}], 'projections': [{'id': 'db996d77-8981-48a0-897a-ff2c22bbf541', 'projection_name': 'NomicProject', 'ready': True, 'hyperparameters': {'spread': 1.0, 'n_epochs': 50, 'n_neighbors': 15}, 'atom_strategies': ['document'], 'created_timestamp': '2023-02-24T16:13:52.979561+00:00', 'updated_timestamp': '2023-02-24T16:14:03.466309+00:00'}]}], 'insert_update_delete_lock': False}>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"db.project"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is a map with the result of this code. This map displays the texts of the State of the Union.\n",
"https://atlas.nomic.ai/map/3e4de075-89ff-486a-845c-36c23f30bb67/d8ce2284-8edb-4050-8b9b-9bb543d7f647"
]
}
],
"metadata": {

Loading…
Cancel
Save