Docs: Ollama (LLM, Chat Model & Text Embedding) (#22321)

- [x] Docs Update: Ollama
  - llm/ollama 
- Switched to using llama3 as model with reference to templating and
prompting
      - Added concurrency notes to llm/ollama docs
  - chat_models/ollama
      - Added concurrency notes to llm/ollama docs
  - text_embedding/ollama
     - include example for specific embedding models from Ollama
pull/22326/head
KhoPhi 3 months ago committed by GitHub
parent 10b12e1c08
commit c64b0a3095
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -54,12 +54,12 @@
"\n",
"Here are a few ways to interact with pulled local models\n",
"\n",
"#### directly in the terminal:\n",
"#### In the terminal:\n",
"\n",
"* All of your local models are automatically served on `localhost:11434`\n",
"* Run `ollama run <name-of-model>` to start interacting via the command line directly\n",
"\n",
"### via an API\n",
"#### Via an API\n",
"\n",
"Send an `application/json` request to the API endpoint of Ollama to interact.\n",
"\n",
@ -72,9 +72,11 @@
"\n",
"See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n",
"\n",
"#### via LangChain\n",
"#### Via LangChain\n",
"\n",
"See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application."
"See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application. \n",
"\n",
"View the [API Reference for ChatOllama](https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.ollama.ChatOllama.html#langchain_community.chat_models.ollama.ChatOllama) for more."
]
},
{
@ -105,7 +107,7 @@
"\n",
"# using LangChain Expressive Language chain syntax\n",
"# learn more about the LCEL on\n",
"# /docs/expression_language/why\n",
"# /docs/concepts/#langchain-expression-language-lcel\n",
"chain = prompt | llm | StrOutputParser()\n",
"\n",
"# for brevity, response is printed in terminal\n",
@ -189,7 +191,7 @@
"\n",
"## Building from source\n",
"\n",
"For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/jmorganca/ollama?tab=readme-ov-file#building)"
"For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/ollama/ollama?tab=readme-ov-file#building)"
]
},
{
@ -333,7 +335,7 @@
}
],
"source": [
"pip install --upgrade --quiet pillow"
"!pip install --upgrade --quiet pillow"
]
},
{
@ -444,6 +446,24 @@
"\n",
"print(query_chain)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Concurrency Features\n",
"\n",
"Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n",
"\n",
"Start the Ollama server with:\n",
"\n",
"* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n",
"* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n",
"\n",
"Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n",
"\n",
"Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)."
]
}
],
"metadata": {

@ -12,16 +12,15 @@
"\n",
"It optimizes setup and configuration details, including GPU usage.\n",
"\n",
"For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).\n",
"For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/ollama/ollama#model-library).\n",
"\n",
"## Setup\n",
"\n",
"First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n",
"First, follow [these instructions](https://github.com/ollama/ollama) to set up and run a local Ollama instance:\n",
"\n",
"* [Download](https://ollama.ai/download) and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)\n",
"* Fetch available LLM model via `ollama pull <name-of-model>`\n",
" * View a list of available models via the [model library](https://ollama.ai/library)\n",
" * e.g., `ollama pull llama3`\n",
" * View a list of available models via the [model library](https://ollama.ai/library) and pull to use locally with the command `ollama pull llama3`\n",
"* This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.\n",
"\n",
"> On Mac, the models will be download to `~/.ollama/models`\n",
@ -29,28 +28,29 @@
"> On Linux (or WSL), the models will be stored at `/usr/share/ollama/.ollama/models`\n",
"\n",
"* Specify the exact version of the model of interest as such `ollama pull vicuna:13b-v1.5-16k-q4_0` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)\n",
"* To view all pulled models, use `ollama list`\n",
"* To view all pulled models on your local instance, use `ollama list`\n",
"* To chat directly with a model from the command line, use `ollama run <name-of-model>`\n",
"* View the [Ollama documentation](https://github.com/jmorganca/ollama) for more commands. Run `ollama help` in the terminal to see available commands too.\n",
"* View the [Ollama documentation](https://github.com/ollama/ollama) for more commands. \n",
"* Run `ollama help` in the terminal to see available commands too.\n",
"\n",
"## Usage\n",
"\n",
"You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain.llms.ollama.Ollama.html).\n",
"You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.ollama.Ollama.html).\n",
"\n",
"If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` interface.\n",
"If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` [interface](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/).\n",
"\n",
"This includes [special tokens](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) for system message and user input.\n",
"This includes [special tokens](https://ollama.com/library/llama3) for system message and user input.\n",
"\n",
"## Interacting with Models \n",
"\n",
"Here are a few ways to interact with pulled local models\n",
"\n",
"#### directly in the terminal:\n",
"#### In the terminal:\n",
"\n",
"* All of your local models are automatically served on `localhost:11434`\n",
"* Run `ollama run <name-of-model>` to start interacting via the command line directly\n",
"\n",
"### via an API\n",
"#### Via the API\n",
"\n",
"Send an `application/json` request to the API endpoint of Ollama to interact.\n",
"\n",
@ -61,11 +61,20 @@
"}'\n",
"```\n",
"\n",
"See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n",
"See the Ollama [API documentation](https://github.com/ollama/ollama/blob/main/docs/api.md) for all endpoints.\n",
"\n",
"#### via LangChain\n",
"\n",
"See a typical basic example of using Ollama chat model in your LangChain application."
"See a typical basic example of using [Ollama chat model](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/) in your LangChain application."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install langchain-community"
]
},
{
@ -87,7 +96,9 @@
"source": [
"from langchain_community.llms import Ollama\n",
"\n",
"llm = Ollama(model=\"llama3\")\n",
"llm = Ollama(\n",
" model=\"llama3\"\n",
") # assuming you have Ollama installed and have llama3 model pulled with `ollama pull llama3 `\n",
"\n",
"llm.invoke(\"Tell me a joke\")"
]
@ -280,6 +291,24 @@
"llm_with_image_context = bakllava.bind(images=[image_b64])\n",
"llm_with_image_context.invoke(\"What is the dollar based gross retention rate:\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Concurrency Features\n",
"\n",
"Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n",
"\n",
"Start the Ollama server with:\n",
"\n",
"* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n",
"* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n",
"\n",
"Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n",
"\n",
"Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)."
]
}
],
"metadata": {

@ -7,36 +7,42 @@
"source": [
"# Ollama\n",
"\n",
"Let's load the Ollama Embeddings class."
"\"Ollama supports embedding models, making it possible to build retrieval augmented generation (RAG) applications that combine text prompts with existing documents or other data.\" Learn more about the introduction to [Ollama Embeddings](https://ollama.com/blog/embedding-models) in the blog post.\n",
"\n",
"To use Ollama Embeddings, first, install [LangChain Community](https://pypi.org/project/langchain-community/) package:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0be1af71",
"execution_count": null,
"id": "854d6a2e",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings import OllamaEmbeddings"
"!pip install langchain-community"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2c66e5da",
"cell_type": "markdown",
"id": "54fbb4cd",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OllamaEmbeddings()"
"Load the Ollama Embeddings class:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "01370375",
"execution_count": 1,
"id": "0be1af71",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings import OllamaEmbeddings\n",
"\n",
"embeddings = (\n",
" OllamaEmbeddings()\n",
") # by default, uses llama2. Run `ollama pull llama2` to pull down the model\n",
"\n",
"text = \"This is a test document.\""
]
},
@ -105,7 +111,13 @@
"id": "bb61bbeb",
"metadata": {},
"source": [
"Let's load the Ollama Embeddings class with smaller model (e.g. llama:7b). Note: See other supported models [https://ollama.ai/library](https://ollama.ai/library)"
"### Embedding Models\n",
"\n",
"Ollama has embedding models, that are lightweight enough for use in embeddings, with the smallest about the size of 25Mb. See some of the available [embedding models from Ollama](https://ollama.com/blog/embedding-models).\n",
"\n",
"Let's load the Ollama Embeddings class with smaller model (e.g. `mxbai-embed-large`). \n",
"\n",
"> Note: See other supported models [https://ollama.ai/library](https://ollama.ai/library)"
]
},
{
@ -115,26 +127,8 @@
"metadata": {},
"outputs": [],
"source": [
"embeddings = OllamaEmbeddings(model=\"llama2:7b\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "14aefb64",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "3c39ed33",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OllamaEmbeddings(model=\"mxbai-embed-large\")\n",
"text = \"This is a test document.\"\n",
"query_result = embeddings.embed_query(text)"
]
},

Loading…
Cancel
Save