docs: together ai embeddings integration docs (#25252)

Update together AI embedding integration docs

Related issue: https://github.com/langchain-ai/langchain/issues/24856

```json
[
   {
      "provider": "together",
      "js":  true,
      "local": false,
     "serializable": false,
   }
]
```

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
This commit is contained in:
Eugene Yurtsev 2024-08-13 20:24:02 -04:00 committed by GitHub
parent 8645a49f31
commit 0f6217f507
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -12,101 +12,244 @@
},
{
"cell_type": "markdown",
"id": "e49f1e0d",
"id": "9a3d6f34",
"metadata": {},
"source": [
"# TogetherEmbeddings\n",
"\n",
"This notebook covers how to get started with open source embedding models hosted in the Together AI API.\n",
"This will help you get started with Together embedding models using LangChain. For detailed documentation on `TogetherEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_together.embeddings.TogetherEmbeddings.html).\n",
"\n",
"## Installation"
"## Overview\n",
"### Integration details\n",
"\n",
"import { ItemTable } from \"@theme/FeatureTables\";\n",
"\n",
"<ItemTable category=\"text_embedding\" item=\"Together\" />\n",
"\n",
"## Setup\n",
"\n",
"To access Together embedding models you'll need to create a/an Together account, get an API key, and install the `langchain-together` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"Head to [https://api.together.xyz/](https://api.together.xyz/) to sign up to Together and generate an API key. Once you've done this set the TOGETHER_API_KEY environment variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c3bef91",
"execution_count": 1,
"id": "36521c2a",
"metadata": {},
"outputs": [],
"source": [
"# install package\n",
"%pip install --upgrade --quiet langchain-together"
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"TOGETHER_API_KEY\"):\n",
" os.environ[\"TOGETHER_API_KEY\"] = getpass.getpass(\"Enter your Together API key: \")"
]
},
{
"cell_type": "markdown",
"id": "2b4f3e15",
"id": "c84fb993",
"metadata": {},
"source": [
"## Environment Setup\n",
"\n",
"Make sure to set the following environment variables:\n",
"\n",
"- `TOGETHER_API_KEY`\n",
"\n",
"## Usage\n",
"\n",
"First, select a supported model from [this list](https://docs.together.ai/docs/embedding-models). In the following example, we will use `togethercomputer/m2-bert-80M-8k-retrieval`."
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62e0dbc3",
"metadata": {
"tags": []
},
"execution_count": 2,
"id": "39a4953b",
"metadata": {},
"outputs": [],
"source": [
"from langchain_together.embeddings import TogetherEmbeddings\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
]
},
{
"cell_type": "markdown",
"id": "d9664366",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"embeddings = TogetherEmbeddings(model=\"togethercomputer/m2-bert-80M-8k-retrieval\")"
"The LangChain Together integration lives in the `langchain-together` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12fcfb4b",
"execution_count": 3,
"id": "64853226",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"embeddings.embed_query(\"My query to look up\")"
"%pip install -qU langchain-together"
]
},
{
"cell_type": "markdown",
"id": "45dd1724",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our model object and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f2e6104",
"execution_count": 5,
"id": "9ea7a09b",
"metadata": {},
"outputs": [],
"source": [
"embeddings.embed_documents(\n",
" [\"This is a content of the document\", \"This is another document\"]\n",
"from langchain_together import TogetherEmbeddings\n",
"\n",
"embeddings = TogetherEmbeddings(\n",
" model=\"togethercomputer/m2-bert-80M-8k-retrieval\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46739f68",
"cell_type": "markdown",
"id": "77d271b6",
"metadata": {},
"outputs": [],
"source": [
"# async embed query\n",
"await embeddings.aembed_query(\"My query to look up\")"
"## Indexing and Retrieval\n",
"\n",
"Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n",
"\n",
"Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e48632ea",
"execution_count": 6,
"id": "d817716b",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"'LangChain is the framework for building context-aware reasoning applications'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# async embed documents\n",
"await embeddings.aembed_documents(\n",
" [\"This is a content of the document\", \"This is another document\"]\n",
")"
"# Create a vector store with a sample text\n",
"from langchain_core.vectorstores import InMemoryVectorStore\n",
"\n",
"text = \"LangChain is the framework for building context-aware reasoning applications\"\n",
"\n",
"vectorstore = InMemoryVectorStore.from_texts(\n",
" [text],\n",
" embedding=embeddings,\n",
")\n",
"\n",
"# Use the vectorstore as a retriever\n",
"retriever = vectorstore.as_retriever()\n",
"\n",
"# Retrieve the most similar text\n",
"retrieved_documents = retriever.invoke(\"What is LangChain?\")\n",
"\n",
"# show the retrieved document's content\n",
"retrieved_documents[0].page_content"
]
},
{
"cell_type": "markdown",
"id": "e02b9855",
"metadata": {},
"source": [
"## Direct Usage\n",
"\n",
"Under the hood, the vectorstore and retriever implementations are calling `embeddings.embed_documents(...)` and `embeddings.embed_query(...)` to create embeddings for the text(s) used in `from_texts` and retrieval `invoke` operations, respectively.\n",
"\n",
"You can directly call these methods to get embeddings for your own use cases.\n",
"\n",
"### Embed single texts\n",
"\n",
"You can embed single texts or documents with `embed_query`:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "0d2befcd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.3812227, -0.052848946, -0.10564975, 0.03480297, 0.2878488, 0.0084609175, 0.11605915, 0.05303011, \n"
]
}
],
"source": [
"single_vector = embeddings.embed_query(text)\n",
"print(str(single_vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "markdown",
"id": "1b5a7d03",
"metadata": {},
"source": [
"### Embed multiple texts\n",
"\n",
"You can embed multiple texts with `embed_documents`:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "2f4d6e97",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.3812227, -0.052848946, -0.10564975, 0.03480297, 0.2878488, 0.0084609175, 0.11605915, 0.05303011, \n",
"[0.066308185, -0.032866564, 0.115751594, 0.19082588, 0.14017, -0.26976448, -0.056340694, -0.26923394\n"
]
}
],
"source": [
"text2 = (\n",
" \"LangGraph is a library for building stateful, multi-actor applications with LLMs\"\n",
")\n",
"two_vectors = embeddings.embed_documents([text, text2])\n",
"for vector in two_vectors:\n",
" print(str(vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "markdown",
"id": "98785c12",
"metadata": {},
"source": [
"## API Reference\n",
"\n",
"For detailed documentation on `TogetherEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_together.embeddings.TogetherEmbeddings.html).\n"
]
}
],
@ -126,7 +269,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.11.4"
}
},
"nbformat": 4,