docs: Update nomic AI embeddings integration docs (#25308)

Issue: https://github.com/langchain-ai/langchain/issues/24856

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
This commit is contained in:
Eugene Yurtsev 2024-08-13 20:32:07 -04:00 committed by GitHub
parent f82c3f622a
commit b4e3bdb714
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 219 additions and 80 deletions

View File

@ -12,121 +12,254 @@
},
{
"cell_type": "markdown",
"id": "e49f1e0d",
"id": "9a3d6f34",
"metadata": {},
"source": [
"# NomicEmbeddings\n",
"\n",
"This notebook covers how to get started with Nomic embedding models.\n",
"This will help you get started with Nomic embedding models using LangChain. For detailed documentation on `NomicEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html).\n",
"\n",
"## Installation"
"## Overview\n",
"### Integration details\n",
"\n",
"import { ItemTable } from \"@theme/FeatureTables\";\n",
"\n",
"<ItemTable category=\"text_embedding\" item=\"Nomic\" />\n",
"\n",
"## Setup\n",
"\n",
"To access Nomic embedding models you'll need to create a/an Nomic account, get an API key, and install the `langchain-nomic` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"Head to [https://atlas.nomic.ai/](https://atlas.nomic.ai/) to sign up to Nomic and generate an API key. Once you've done this set the `NOMIC_API_KEY` environment variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c3bef91",
"execution_count": 2,
"id": "36521c2a",
"metadata": {},
"outputs": [],
"source": [
"# install package\n",
"!pip install -U langchain-nomic"
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"NOMIC_API_KEY\"):\n",
" os.environ[\"NOMIC_API_KEY\"] = getpass.getpass(\"Enter your Nomic API key: \")"
]
},
{
"cell_type": "markdown",
"id": "2b4f3e15",
"id": "c84fb993",
"metadata": {},
"source": [
"## Environment Setup\n",
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "39a4953b",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
]
},
{
"cell_type": "markdown",
"id": "d9664366",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Make sure to set the following environment variables:\n",
"The LangChain Nomic integration lives in the `langchain-nomic` package:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "64853226",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain-nomic"
]
},
{
"cell_type": "markdown",
"id": "45dd1724",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"- `NOMIC_API_KEY`\n",
"Now we can instantiate our model object and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "9ea7a09b",
"metadata": {},
"outputs": [],
"source": [
"from langchain_nomic import NomicEmbeddings\n",
"\n",
"## Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62e0dbc3",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_nomic.embeddings import NomicEmbeddings\n",
"\n",
"embeddings = NomicEmbeddings(model=\"nomic-embed-text-v1.5\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12fcfb4b",
"metadata": {},
"outputs": [],
"source": [
"embeddings.embed_query(\"My query to look up\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f2e6104",
"metadata": {},
"outputs": [],
"source": [
"embeddings.embed_documents(\n",
" [\"This is a content of the document\", \"This is another document\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46739f68",
"metadata": {},
"outputs": [],
"source": [
"# async embed query\n",
"await embeddings.aembed_query(\"My query to look up\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e48632ea",
"metadata": {},
"outputs": [],
"source": [
"# async embed documents\n",
"await embeddings.aembed_documents(\n",
" [\"This is a content of the document\", \"This is another document\"]\n",
"embeddings = NomicEmbeddings(\n",
" model=\"nomic-embed-text-v1.5\",\n",
" # dimensionality=256,\n",
" # Nomic's `nomic-embed-text-v1.5` model was [trained with Matryoshka learning](https://blog.nomic.ai/posts/nomic-embed-matryoshka)\n",
" # to enable variable-length embeddings with a single model.\n",
" # This means that you can specify the dimensionality of the embeddings at inference time.\n",
" # The model supports dimensionality from 64 to 768.\n",
" # inference_mode=\"remote\",\n",
" # One of `remote`, `local` (Embed4All), or `dynamic` (automatic). Defaults to `remote`.\n",
" # api_key=... , # if using remote inference,\n",
" # device=\"cpu\",\n",
" # The device to use for local embeddings. Choices include\n",
" # `cpu`, `gpu`, `nvidia`, `amd`, or a specific device name. See\n",
" # the docstring for `GPT4All.__init__` for more info. Typically\n",
" # defaults to CPU. Do not use on macOS.\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7a331dc3",
"id": "77d271b6",
"metadata": {},
"source": [
"### Custom Dimensionality\n",
"## Indexing and Retrieval\n",
"\n",
"Nomic's `nomic-embed-text-v1.5` model was [trained with Matryoshka learning](https://blog.nomic.ai/posts/nomic-embed-matryoshka) to enable variable-length embeddings with a single model. This means that you can specify the dimensionality of the embeddings at inference time. The model supports dimensionality from 64 to 768."
"Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n",
"\n",
"Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "993f65c8",
"execution_count": 5,
"id": "d817716b",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"'LangChain is the framework for building context-aware reasoning applications'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embeddings = NomicEmbeddings(model=\"nomic-embed-text-v1.5\", dimensionality=256)\n",
"# Create a vector store with a sample text\n",
"from langchain_core.vectorstores import InMemoryVectorStore\n",
"\n",
"embeddings.embed_query(\"My query to look up\")"
"text = \"LangChain is the framework for building context-aware reasoning applications\"\n",
"\n",
"vectorstore = InMemoryVectorStore.from_texts(\n",
" [text],\n",
" embedding=embeddings,\n",
")\n",
"\n",
"# Use the vectorstore as a retriever\n",
"retriever = vectorstore.as_retriever()\n",
"\n",
"# Retrieve the most similar text\n",
"retrieved_documents = retriever.invoke(\"What is LangChain?\")\n",
"\n",
"# show the retrieved document's content\n",
"retrieved_documents[0].page_content"
]
},
{
"cell_type": "markdown",
"id": "e02b9855",
"metadata": {},
"source": [
"## Direct Usage\n",
"\n",
"Under the hood, the vectorstore and retriever implementations are calling `embeddings.embed_documents(...)` and `embeddings.embed_query(...)` to create embeddings for the text(s) used in `from_texts` and retrieval `invoke` operations, respectively.\n",
"\n",
"You can directly call these methods to get embeddings for your own use cases.\n",
"\n",
"### Embed single texts\n",
"\n",
"You can embed single texts or documents with `embed_query`:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0d2befcd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.024642944, 0.029083252, -0.14013672, -0.09082031, 0.058898926, -0.07489014, -0.0138168335, 0.0037\n"
]
}
],
"source": [
"single_vector = embeddings.embed_query(text)\n",
"print(str(single_vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "markdown",
"id": "1b5a7d03",
"metadata": {},
"source": [
"### Embed multiple texts\n",
"\n",
"You can embed multiple texts with `embed_documents`:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "2f4d6e97",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.012771606, 0.023727417, -0.12365723, -0.083740234, 0.06530762, -0.07110596, -0.021896362, -0.0068\n",
"[-0.019058228, 0.04058838, -0.15222168, -0.06842041, -0.012130737, -0.07128906, -0.04534912, 0.00522\n"
]
}
],
"source": [
"text2 = (\n",
" \"LangGraph is a library for building stateful, multi-actor applications with LLMs\"\n",
")\n",
"two_vectors = embeddings.embed_documents([text, text2])\n",
"for vector in two_vectors:\n",
" print(str(vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "markdown",
"id": "98785c12",
"metadata": {},
"source": [
"## API Reference\n",
"\n",
"For detailed documentation on `NomicEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html).\n"
]
}
],
@ -146,7 +279,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.9.6"
}
},
"nbformat": 4,

View File

@ -340,6 +340,12 @@ const FEATURE_TABLES = {
package: "langchain-cohere",
apiLink: "https://api.python.langchain.com/en/latest/embeddings/langchain_cohere.embeddings.CohereEmbeddings.html#langchain_cohere.embeddings.CohereEmbeddings"
},
{
name: "Nomic",
link: "cohere",
package: "langchain-nomic",
apiLink: "https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html#langchain_nomic.embeddings.NomicEmbeddings"
},
]
},
document_retrievers: {