docs[patch]: Update integration docs for AzureOpenAIEmbeddings (#25311)

https://github.com/langchain-ai/langchain/issues/24856

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
This commit is contained in:
Eugene Yurtsev 2024-08-13 20:33:13 -04:00 committed by GitHub
parent b4e3bdb714
commit 27def6bddb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -2,195 +2,261 @@
"cells": [
{
"cell_type": "raw",
"id": "0aed0743",
"id": "afaf8039",
"metadata": {},
"source": [
"---\n",
"keywords: [AzureOpenAIEmbeddings]\n",
"sidebar_label: AzureOpenAI\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "c3852491",
"id": "9a3d6f34",
"metadata": {},
"source": [
"# Azure OpenAI\n",
"# AzureOpenAIEmbeddings\n",
"\n",
"Let's load the Azure OpenAI Embedding class with environment variables set to indicate to use Azure endpoints."
"This will help you get started with AzureOpenAI embedding models using LangChain. For detailed documentation on `AzureOpenAIEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.azure.AzureOpenAIEmbeddings.html).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"import { ItemTable } from \"@theme/FeatureTables\";\n",
"\n",
"<ItemTable category=\"text_embedding\" item=\"AzureOpenAI\" />\n",
"\n",
"## Setup\n",
"\n",
"To access AzureOpenAI embedding models you'll need to create an Azure account, get an API key, and install the `langchain-openai` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"Youll need to have an Azure OpenAI instance deployed. You can deploy a version on Azure Portal following this [guide](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal).\n",
"\n",
"Once you have your instance running, make sure you have the name of your instance and key. You can find the key in the Azure Portal, under the “Keys and Endpoint” section of your instance.\n",
"\n",
"```bash\n",
"AZURE_OPENAI_ENDPOINT=<YOUR API ENDPOINT>\n",
"AZURE_OPENAI_API_KEY=<YOUR_KEY>\n",
"AZURE_OPENAI_API_VERSION=\"2024-02-01\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "36521c2a",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"OPENAI_API_KEY\"):\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your AzureOpenAI API key: \")"
]
},
{
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "39a4953b",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
]
},
{
"cell_type": "markdown",
"id": "d9664366",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"The LangChain AzureOpenAI integration lives in the `langchain-openai` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "228faf0c",
"id": "64853226",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-openai"
"%pip install -qU langchain-openai"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8a6ed30d-806f-4800-b5fd-d04126be9060",
"cell_type": "markdown",
"id": "45dd1724",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"## Instantiation\n",
"\n",
"os.environ[\"AZURE_OPENAI_API_KEY\"] = \"...\"\n",
"os.environ[\"AZURE_OPENAI_ENDPOINT\"] = \"https://<your-endpoint>.openai.azure.com/\""
"Now we can instantiate our model object and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "20179bc7-3f71-4909-be12-d38bce009b18",
"execution_count": 11,
"id": "9ea7a09b",
"metadata": {},
"outputs": [],
"source": [
"from langchain_openai import AzureOpenAIEmbeddings\n",
"\n",
"embeddings = AzureOpenAIEmbeddings(\n",
" azure_deployment=\"<your-embeddings-deployment-name>\",\n",
" openai_api_version=\"2023-05-15\",\n",
" model=\"text-embedding-3-large\",\n",
" # dimensions: Optional[int] = None, # Can specify dimensions with new text-embedding-3 models\n",
" # azure_endpoint=\"https://<your-endpoint>.openai.azure.com/\", If not provided, will read env variable AZURE_OPENAI_ENDPOINT\n",
" # api_key=... # Can provide an API key directly. If missing read env variable AZURE_OPENAI_API_KEY\n",
" # openai_api_version=..., # If not provided, will read env variable AZURE_OPENAI_API_VERSION\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f8cb9dca-738b-450f-9986-5c3efd3c6eb3",
"cell_type": "markdown",
"id": "77d271b6",
"metadata": {},
"outputs": [],
"source": [
"text = \"this is a test document\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0fae0295-b117-4a5a-8b98-500c79306551",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
"## Indexing and Retrieval\n",
"\n",
"Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n",
"\n",
"Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "65a01ddd-0bbf-444f-a87f-93af25ef902c",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "45771052-68ca-4e03-9c4f-a0c7796d9442",
"id": "d817716b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[-0.012222584727053133,\n",
" 0.0072103982392216145,\n",
" -0.014818063280923775,\n",
" -0.026444746872933557,\n",
" -0.0034330499700826883]"
"'LangChain is the framework for building context-aware reasoning applications'"
]
},
"execution_count": 6,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"doc_result[0][:5]"
"# Create a vector store with a sample text\n",
"from langchain_core.vectorstores import InMemoryVectorStore\n",
"\n",
"text = \"LangChain is the framework for building context-aware reasoning applications\"\n",
"\n",
"vectorstore = InMemoryVectorStore.from_texts(\n",
" [text],\n",
" embedding=embeddings,\n",
")\n",
"\n",
"# Use the vectorstore as a retriever\n",
"retriever = vectorstore.as_retriever()\n",
"\n",
"# Retrieve the most similar text\n",
"retrieved_documents = retriever.invoke(\"What is LangChain?\")\n",
"\n",
"# show the retrieved document's content\n",
"retrieved_documents[0].page_content"
]
},
{
"cell_type": "markdown",
"id": "e66ec1f2-6768-4ee5-84bf-a2d76adc20c8",
"id": "e02b9855",
"metadata": {},
"source": [
"## [Legacy] When using `openai<1`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b40f827",
"metadata": {},
"outputs": [],
"source": [
"# set the environment variables needed for openai package to know to reach out to azure\n",
"import os\n",
"## Direct Usage\n",
"\n",
"os.environ[\"OPENAI_API_TYPE\"] = \"azure\"\n",
"os.environ[\"OPENAI_API_BASE\"] = \"https://<your-endpoint.openai.azure.com/\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"your AzureOpenAI key\"\n",
"os.environ[\"OPENAI_API_VERSION\"] = \"2023-05-15\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb36d16c",
"metadata": {},
"outputs": [],
"source": [
"from langchain_openai import OpenAIEmbeddings\n",
"Under the hood, the vectorstore and retriever implementations are calling `embeddings.embed_documents(...)` and `embeddings.embed_query(...)` to create embeddings for the text(s) used in `from_texts` and retrieval `invoke` operations, respectively.\n",
"\n",
"embeddings = OpenAIEmbeddings(deployment=\"your-embeddings-deployment-name\")"
"You can directly call these methods to get embeddings for your own use cases.\n",
"\n",
"### Embed single texts\n",
"\n",
"You can embed single texts or documents with `embed_query`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "228abcbb",
"execution_count": 6,
"id": "0d2befcd",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.0011676070280373096, 0.007125577889382839, -0.014674457721412182, -0.034061674028635025, 0.01128\n"
]
}
],
"source": [
"text = \"This is a test document.\""
"single_vector = embeddings.embed_query(text)\n",
"print(str(single_vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60dd7fad",
"cell_type": "markdown",
"id": "1b5a7d03",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
"### Embed multiple texts\n",
"\n",
"You can embed multiple texts with `embed_documents`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83bc1a72",
"execution_count": 7,
"id": "2f4d6e97",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.0011966148158535361, 0.007160289213061333, -0.014659193344414234, -0.03403077274560928, 0.011280\n",
"[-0.005595256108790636, 0.016757294535636902, -0.011055258102715015, -0.031094247475266457, -0.00363\n"
]
}
],
"source": [
"doc_result = embeddings.embed_documents([text])"
"text2 = (\n",
" \"LangGraph is a library for building stateful, multi-actor applications with LLMs\"\n",
")\n",
"two_vectors = embeddings.embed_documents([text, text2])\n",
"for vector in two_vectors:\n",
" print(str(vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aaad49f8",
"cell_type": "markdown",
"id": "98785c12",
"metadata": {},
"outputs": [],
"source": []
"source": [
"## API Reference\n",
"\n",
"For detailed documentation on `AzureOpenAIEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.azure.AzureOpenAIEmbeddings.html).\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -204,7 +270,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.9.6"
}
},
"nbformat": 4,