docs: updating NIM documentation (#22258)

Updating NVIDIA NIM notebooks and readme file. Thanks! Daniel
3 months ago · 7ff05357ba
parent 6dd0f095c3
commit 7ff05357ba
3 changed files with 163 additions and 194 deletions
--- a/docs/docs/integrations/chat/nvidia_ai_endpoints.ipynb
+++ b/docs/docs/integrations/chat/nvidia_ai_endpoints.ipynb
@ -7,18 +7,24 @@
    "id": "cc6caafa"
   },
   "source": [
-    "# NVIDIA AI Foundation Endpoints\n",
+    "# NVIDIA NIMs\n",
    "\n",
-    "The `ChatNVIDIA` class is a LangChain chat model that connects to [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/).\n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on \n",
+    "NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models \n",
+    "from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA \n",
+    "accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single \n",
+    "command on NVIDIA accelerated infrastructure.\n",
    "\n",
+    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, \n",
+    "NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, \n",
+    "giving enterprises ownership and full control of their IP and AI application.\n",
    "\n",
-    "> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable Diffusion, etc. These models, hosted on the [NVIDIA API catalog](https://build.nvidia.com/), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.\n",
-    "> \n",
-    "> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these models can be deployed anywhere with enterprise-grade security, stability, and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).\n",
-    "> \n",
-    "> These models can be easily accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) package, as shown below.\n",
+    "NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. \n",
+    "At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model.\n",
    "\n",
-    "This example goes over how to use LangChain to interact with and develop LLM-powered systems using the publicly-accessible AI Foundation endpoints."
+    "This example goes over how to use LangChain to interact with NVIDIA supported via the `ChatNVIDIA` class.\n",
+    "\n",
+    "For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
   ]
  },
  {
@ -50,9 +56,9 @@
    "\n",
    "**To get started:**\n",
    "\n",
-    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models\n",
+    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.\n",
    "\n",
-    "2. Click on your model of choice\n",
+    "2. Click on your model of choice.\n",
    "\n",
    "3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.\n",
    "\n",
@ -69,12 +75,23 @@
    "import getpass\n",
    "import os\n",
    "\n",
-    "if not os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
-    "    nvapi_key = getpass.getpass(\"Enter your NVIDIA API key: \")\n",
+    "# del os.environ['NVIDIA_API_KEY']  ## delete key and reset\n",
+    "if os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
+    "    print(\"Valid NVIDIA_API_KEY already in environment. Delete to reset\")\n",
+    "else:\n",
+    "    nvapi_key = getpass.getpass(\"NVAPI Key (starts with nvapi-): \")\n",
    "    assert nvapi_key.startswith(\"nvapi-\"), f\"{nvapi_key[:5]}... is not a valid key\"\n",
    "    os.environ[\"NVIDIA_API_KEY\"] = nvapi_key"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "af0ce26b",
+   "metadata": {},
+   "source": [
+    "## Working with NVIDIA API Catalog"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -96,6 +113,30 @@
    "print(result.content)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "9d35686b",
+   "metadata": {},
+   "source": [
+    "## Working with NVIDIA NIMs\n",
+    "When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.\n",
+    "\n",
+    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "49838930",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
+    "\n",
+    "# connect to an embedding NIM running at localhost:8000, specifying a specific model\n",
+    "llm = ChatNVIDIA(base_url=\"http://localhost:8000/v1\", model=\"meta-llama3-8b-instruct\")"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "71d37987-d568-4a73-9d2a-8bd86323f8bf",
@ -252,81 +293,6 @@
    "    print(txt, end=\"\")"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "id": "642a618a-faa3-443e-99c3-67b8142f3c51",
-   "metadata": {},
-   "source": [
-    "## Steering LLMs\n",
-    "\n",
-    "> [SteerLM-optimized models](https://developer.nvidia.com/blog/announcing-steerlm-a-simple-and-practical-technique-to-customize-llms-during-inference/) supports \"dynamic steering\" of model outputs at inference time.\n",
-    "\n",
-    "This lets you \"control\" the complexity, verbosity, and creativity of the model via integer labels on a scale from 0 to 9. Under the hood, these are passed as a special type of assistant message to the model.\n",
-    "\n",
-    "The \"steer\" models support this type of input, such as `nemotron_steerlm_8b`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "36a96b1a-e3e7-4ae3-b4b0-9331b5eca04f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
-    "\n",
-    "llm = ChatNVIDIA(model=\"nemotron_steerlm_8b\")\n",
-    "# Try making it uncreative and not verbose\n",
-    "complex_result = llm.invoke(\n",
-    "    \"What's a PB&J?\", labels={\"creativity\": 0, \"complexity\": 3, \"verbosity\": 0}\n",
-    ")\n",
-    "print(\"Un-creative\\n\")\n",
-    "print(complex_result.content)\n",
-    "\n",
-    "# Try making it very creative and verbose\n",
-    "print(\"\\n\\nCreative\\n\")\n",
-    "creative_result = llm.invoke(\n",
-    "    \"What's a PB&J?\", labels={\"creativity\": 9, \"complexity\": 3, \"verbosity\": 9}\n",
-    ")\n",
-    "print(creative_result.content)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "75849e7a-2adf-4038-8d9d-8a9e12417789",
-   "metadata": {},
-   "source": [
-    "#### Use within LCEL\n",
-    "\n",
-    "The labels are passed as invocation params. You can `bind` these to the LLM using the `bind` method on the LLM to include it within a declarative, functional chain. Below is an example."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ae1105c3-2a0c-4db3-916e-24d5e427bd01",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_core.output_parsers import StrOutputParser\n",
-    "from langchain_core.prompts import ChatPromptTemplate\n",
-    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
-    "\n",
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [(\"system\", \"You are a helpful AI assistant named Fred.\"), (\"user\", \"{input}\")]\n",
-    ")\n",
-    "chain = (\n",
-    "    prompt\n",
-    "    | ChatNVIDIA(model=\"nemotron_steerlm_8b\").bind(\n",
-    "        labels={\"creativity\": 9, \"complexity\": 0, \"verbosity\": 9}\n",
-    "    )\n",
-    "    | StrOutputParser()\n",
-    ")\n",
-    "\n",
-    "for txt in chain.stream({\"input\": \"Why is a PB&J?\"}):\n",
-    "    print(txt, end=\"\")"
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "7f465ff6-5922-41d8-8abb-1d1e4095cc27",
@ -334,7 +300,7 @@
   "source": [
    "## Multimodal\n",
    "\n",
-    "NVIDIA also supports multimodal inputs, meaning you can provide both images and text for the model to reason over. An example model supporting multimodal inputs is `playground_neva_22b`.\n",
+    "NVIDIA also supports multimodal inputs, meaning you can provide both images and text for the model to reason over. An example model supporting multimodal inputs is `nvidia/neva-22b`.\n",
    "\n",
    "\n",
    "These models accept LangChain's standard image formats, and accept `labels`, similar to the Steering LLMs above. In addition to `creativity`, `complexity`, and `verbosity`, these models support a `quality` toggle.\n",
@ -367,7 +333,7 @@
   "source": [
    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
    "\n",
-    "llm = ChatNVIDIA(model=\"playground_neva_22b\")"
+    "llm = ChatNVIDIA(model=\"nvidia/neva-22b\")"
   ]
  },
  {
@ -500,7 +466,7 @@
   "source": [
    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
    "\n",
-    "kosmos = ChatNVIDIA(model=\"kosmos_2\")\n",
+    "kosmos = ChatNVIDIA(model=\"microsoft/kosmos-2\")\n",
    "\n",
    "from langchain_core.messages import HumanMessage\n",
    "\n",
@ -544,7 +510,7 @@
    "\n",
    "\n",
    "## Override the payload passthrough. Default is to pass through the payload as is.\n",
-    "kosmos = ChatNVIDIA(model=\"kosmos_2\")\n",
+    "kosmos = ChatNVIDIA(model=\"microsoft/kosmos-2\")\n",
    "kosmos.client.payload_fn = drop_streaming_key\n",
    "\n",
    "kosmos.invoke(\n",
@ -567,43 +533,6 @@
    "For more advanced or custom use-cases (i.e. supporting the diffusion models), you may be interested in leveraging the `NVEModel` client as a requests backbone. The `NVIDIAEmbeddings` class is a good source of inspiration for this. "
   ]
  },
-  {
-   "cell_type": "markdown",
-   "id": "1cd6249a-7ffa-4886-b7e8-5778dc93499e",
-   "metadata": {},
-   "source": [
-    "## RAG: Context models\n",
-    "\n",
-    "NVIDIA also has Q&A models that support a special \"context\" chat message containing retrieved context (such as documents within a RAG chain). This is useful to avoid prompt-injecting the model. The `_qa_` models like `nemotron_qa_8b` support this.\n",
-    "\n",
-    "**Note:** Only \"user\" (human) and \"context\" chat messages are supported for these models; System or AI messages that would useful in conversational flows are not supported."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f994b4d3-c1b0-4e87-aad0-a7b487e2aa43",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_core.messages import ChatMessage\n",
-    "from langchain_core.output_parsers import StrOutputParser\n",
-    "from langchain_core.prompts import ChatPromptTemplate\n",
-    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
-    "\n",
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        ChatMessage(\n",
-    "            role=\"context\", content=\"Parrots and Cats have signed the peace accord.\"\n",
-    "        ),\n",
-    "        (\"user\", \"{input}\"),\n",
-    "    ]\n",
-    ")\n",
-    "llm = ChatNVIDIA(model=\"nemotron_qa_8b\")\n",
-    "chain = prompt | llm | StrOutputParser()\n",
-    "chain.invoke({\"input\": \"What was signed?\"})"
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "137662a6",
@ -708,14 +637,6 @@
   "source": [
    "conversation.invoke(\"Tell me about yourself.\")[\"response\"]"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9a719bd3-755d-4a05-bda2-de132bf99314",
-   "metadata": {},
-   "outputs": [],
-   "source": []
  }
 ],
 "metadata": {
@ -723,9 +644,9 @@
   "provenance": []
  },
  "kernelspec": {
-   "display_name": "Python (venvoss)",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
-   "name": "venvoss"
+   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
@ -737,7 +658,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.10.13"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/providers/nvidia.mdx
+++ b/docs/docs/integrations/providers/nvidia.mdx
@ -1,63 +1,82 @@
 # NVIDIA
+The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on 
+NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models 
+from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA 
+accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single 
+command on NVIDIA accelerated infrastructure.

->NVIDIA provides an integration package for LangChain: `langchain-nvidia-ai-endpoints`.
+NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, 
+NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, 
+giving enterprises ownership and full control of their IP and AI application.

-## NVIDIA AI Foundation Endpoints
+NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. 
+At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model.

-> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for 
-> NVIDIA AI Foundation Models like `Mixtral 8x7B`, `Llama 2`, `Stable Diffusion`, etc. These models, 
-> hosted on the [NVIDIA API catalog](https://build.nvidia.com/), are optimized, tested, and hosted on 
-> the NVIDIA AI platform, making them fast and easy to evaluate, further customize, 
-> and seamlessly run at peak performance on any accelerated stack.
-> 
-> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully 
-> accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these 
-> models can be deployed anywhere with enterprise-grade security, stability, 
-> and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).
+Below is an example on how to use some common functionality surrounding text-generative and embedding models.

-A selection of NVIDIA AI Foundation models is supported directly in LangChain with familiar APIs.
+## Installation

-The supported models can be found [in build.nvidia.com](https://build.nvidia.com/).
-
-These models can be accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) 
-package, as shown below.
+```python
+pip install -U --quiet langchain-nvidia-ai-endpoints
+```

-### Setting up
+## Setup

-1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models
+**To get started:**

-2. Click on your model of choice
+1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.

-3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.
+2. Click on your model of choice.

-4. Copy and save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.
+3. Under Input select the Python tab, and click `Get API Key`. Then click `Generate Key`.

-```bash
-export NVIDIA_API_KEY=nvapi-XXXXXXXXXXXXXXXXXXXXXXXXXX
-```
+4. Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints.

- Install a package:
+```python
+import getpass
+import os

-```bash
-pip install -U langchain-nvidia-ai-endpoints
+if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
+    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
+    assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
+    os.environ["NVIDIA_API_KEY"] = nvidia_api_key
 ```
-
-### Chat models
-
-See a [usage example](/docs/integrations/chat/nvidia_ai_endpoints).
+## Working with NVIDIA API Catalog

 ```python
 from langchain_nvidia_ai_endpoints import ChatNVIDIA

-llm = ChatNVIDIA(model="mixtral_8x7b")
+llm = ChatNVIDIA(model="mistralai/mixtral-8x22b-instruct-v0.1")
 result = llm.invoke("Write a ballad about LangChain.")
 print(result.content)
 ```

-### Embedding models
+Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise, shown in the next section [Working with NVIDIA NIMs](##working-with-nvidia-nims). 
+
+## Working with NVIDIA NIMs
+When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.

-See a [usage example](/docs/integrations/text_embedding/nvidia_ai_endpoints).
+[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)

 ```python
-from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
+from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank
+
+# connect to an chat NIM running at localhost:8000, specifyig a specific model
+llm = ChatNVIDIA(base_url="http://localhost:8000/v1", model="meta-llama3-8b-instruct")
+
+# connect to an embedding NIM running at localhost:8080
+embedder = NVIDIAEmbeddings(base_url="http://localhost:8080/v1")
+
+# connect to a reranking NIM running at localhost:2016
+ranker = NVIDIARerank(base_url="http://localhost:2016/v1")
 ```
+
+## Using NVIDIA AI Foundation Endpoints
+
+A selection of NVIDIA AI Foundation models are supported directly in LangChain with familiar APIs.
+
+The active models which are supported can be found [in API Catalog](https://build.nvidia.com/).
+
+**The following may be useful examples to help you get started:**
+- **[`ChatNVIDIA` Model](/docs/integrations/chat/nvidia_ai_endpoints).**
+- **[`NVIDIAEmbeddings` Model for RAG Workflows](/docs/integrations/text_embedding/nvidia_ai_endpoints).**
--- a/docs/docs/integrations/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/docs/docs/integrations/text_embedding/nvidia_ai_endpoints.ipynb
@ -6,17 +6,24 @@
    "id": "GDDVue_1cq6d"
   },
   "source": [
-    "# NVIDIA AI Foundation Endpoints \n",
+    "# NVIDIA NIMs \n",
    "\n",
-    "> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable Diffusion, etc. These models, hosted on the [NVIDIA API catalog](https://build.nvidia.com/), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.\n",
-    "> \n",
-    "> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these models can be deployed anywhere with enterprise-grade security, stability, and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).\n",
-    "> \n",
-    "> These models can be easily accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) package, as shown below.\n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on \n",
+    "NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models \n",
+    "from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA \n",
+    "accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single \n",
+    "command on NVIDIA accelerated infrastructure.\n",
+    "\n",
+    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, \n",
+    "NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, \n",
+    "giving enterprises ownership and full control of their IP and AI application.\n",
+    "\n",
+    "NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. \n",
+    "At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model.\n",
    "\n",
    "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
    "\n",
-    "For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
+    "For more information on accessing the chat models through this API, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
   ]
  },
  {
@ -45,9 +52,9 @@
    "\n",
    "**To get started:**\n",
    "\n",
-    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models\n",
+    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.\n",
    "\n",
-    "2. Select the `Retrieval` tab, then select your model of choice\n",
+    "2. Select the `Retrieval` tab, then select your model of choice.\n",
    "\n",
    "3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.\n",
    "\n",
@ -84,16 +91,16 @@
    "id": "l185et2kc8pS"
   },
   "source": [
-    "We should be able to see an embedding model among that list which can be used in conjunction with an LLM for effective RAG solutions. We can interface with this model pretty easily with the help of the `NVIDIAEmbeddings` model."
+    "We should be able to see an embedding model among that list which can be used in conjunction with an LLM for effective RAG solutions. We can interface with this model as well as other embedding models supported by NIM through the `NVIDIAEmbeddings` class."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Initialization\n",
+    "## Working with NIMs on the NVIDIA API Catalog\n",
    "\n",
-    "When initializing an embedding model you can select a model by passing it, e.g. `ai-embed-qa-4` below, or use the default by not passing any arguments."
+    "When initializing an embedding model you can select a model by passing it, e.g. `NV-Embed-QA` below, or use the default by not passing any arguments."
   ]
  },
  {
@ -106,7 +113,7 @@
   "source": [
    "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
    "\n",
-    "embedder = NVIDIAEmbeddings(model=\"ai-embed-qa-4\")"
+    "embedder = NVIDIAEmbeddings(model=\"NV-Embed-QA\")"
   ]
  },
  {
@ -121,7 +128,29 @@
    "\n",
    "- `embed_documents`: Generate passage embeddings for a list of documents which you would like to search over.\n",
    "\n",
-    "- `aembed_quey`/`embed_documents`: Asynchronous versions of the above."
+    "- `aembed_query`/`aembed_documents`: Asynchronous versions of the above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with self-hosted NVIDIA NIMs\n",
+    "When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.\n",
+    "\n",
+    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
+    "\n",
+    "# connect to an embedding NIM running at localhost:8080\n",
+    "embedder = NVIDIAEmbeddings(base_url=\"http://localhost:8080/v1\")"
   ]
  },
  {
@ -382,7 +411,7 @@
   },
   "outputs": [],
   "source": [
-    "%pip install --upgrade --quiet  langchain faiss-cpu tiktoken\n",
+    "%pip install --upgrade --quiet  langchain faiss-cpu tiktoken langchain_community\n",
    "\n",
    "from operator import itemgetter\n",
    "\n",
@ -408,7 +437,7 @@
   "source": [
    "vectorstore = FAISS.from_texts(\n",
    "    [\"harrison worked at kensho\"],\n",
-    "    embedding=NVIDIAEmbeddings(model=\"ai-embed-qa-4\"),\n",
+    "    embedding=NVIDIAEmbeddings(model=\"NV-Embed-QA\"),\n",
    ")\n",
    "retriever = vectorstore.as_retriever()\n",
    "\n",
@ -478,9 +507,9 @@
   "provenance": []
  },
  "kernelspec": {
-   "display_name": "Python (venvoss)",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
-   "name": "venvoss"
+   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
@ -492,7 +521,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.10.13"
  }
 },
 "nbformat": 4,