{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Ollama\n", "\n", "[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as LLaMA2, locally.\n", "\n", "Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. \n", "\n", "It optimizes setup and configuration details, including GPU usage.\n", "\n", "For a complete list of supported models and model variants, see the [Ollama model library](https://ollama.ai/library).\n", "\n", "## Setup\n", "\n", "First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n", "\n", "* [Download](https://ollama.ai/download)\n", "* Fetch a model via `ollama pull `\n", "* e.g., for `Llama-7b`: `ollama pull llama2`\n", "* This will download the most basic version of the model (e.g., minimum # parameters and 4-bit quantization)\n", "* On Mac, it will download to:\n", "\n", "`~/.ollama/models/manifests/registry.ollama.ai/library//latest`\n", "\n", "* And we can specify a particular version, e.g., for `ollama pull vicuna:13b-v1.5-16k-q4_0`\n", "* The file is here with the model version in place of `latest`\n", "\n", "`~/.ollama/models/manifests/registry.ollama.ai/library/vicuna/13b-v1.5-16k-q4_0`\n", "\n", "You can easily access models in a few ways:\n", "\n", "1/ if the app is running:\n", "* All of your local models are automatically served on `localhost:11434`\n", "* Select your model when setting `llm = Ollama(..., model=\":\")`\n", "* If you set `llm = Ollama(..., model=\"> Use the following pieces of context to answer the question at the end. \n", "If you don't know the answer, just say that you don't know, don't try to make up an answer. \n", "Use three sentences maximum and keep the answer as concise as possible. <>\n", "{context}\n", "Question: {question}\n", "Helpful Answer:[/INST]\"\"\"\n", "QA_CHAIN_PROMPT = PromptTemplate(\n", " input_variables=[\"context\", \"question\"],\n", " template=template,\n", ")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Chat model\n", "from langchain.chat_models import ChatOllama\n", "from langchain.callbacks.manager import CallbackManager\n", "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n", "chat_model = ChatOllama(model=\"llama2:13b\",\n", " verbose=True,\n", " callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# QA chain\n", "from langchain.chains import RetrievalQA\n", "qa_chain = RetrievalQA.from_chain_type(\n", " chat_model,\n", " retriever=vectorstore.as_retriever(),\n", " chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n", ")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Based on the provided context, there are three approaches to task decomposition for AI agents:\n", "\n", "1. LLM with simple prompting, such as \"Steps for XYZ.\" or \"What are the subgoals for achieving XYZ?\"\n", "2. Task-specific instructions, such as \"Write a story outline\" for writing a novel.\n", "3. Human inputs." ] } ], "source": [ "question = \"What are the various approaches to Task Decomposition for AI Agents?\"\n", "result = qa_chain({\"query\": question})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also get logging for tokens." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Based on the given context, here is the answer to the question \"What are the approaches to Task Decomposition?\"\n", "\n", "There are three approaches to task decomposition:\n", "\n", "1. LLM with simple prompting, such as \"Steps for XYZ.\" or \"What are the subgoals for achieving XYZ?\"\n", "2. Using task-specific instructions, like \"Write a story outline\" for writing a novel.\n", "3. With human inputs.{'model': 'llama2:13b-chat', 'created_at': '2023-08-23T15:37:51.469127Z', 'done': True, 'context': [1, 29871, 1, 29961, 25580, 29962, 518, 25580, 29962, 518, 25580, 29962, 3532, 14816, 29903, 6778, 4803, 278, 1494, 12785, 310, 3030, 304, 1234, 278, 1139, 472, 278, 1095, 29889, 29871, 13, 3644, 366, 1016, 29915, 29873, 1073, 278, 1234, 29892, 925, 1827, 393, 366, 1016, 29915, 29873, 1073, 29892, 1016, 29915, 29873, 1018, 304, 1207, 701, 385, 1234, 29889, 29871, 13, 11403, 2211, 25260, 7472, 322, 3013, 278, 1234, 408, 3022, 895, 408, 1950, 29889, 529, 829, 14816, 29903, 6778, 13, 5398, 26227, 508, 367, 2309, 313, 29896, 29897, 491, 365, 26369, 411, 2560, 9508, 292, 763, 376, 7789, 567, 363, 1060, 29979, 29999, 7790, 29876, 29896, 19602, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 29973, 613, 313, 29906, 29897, 491, 773, 3414, 29899, 14940, 11994, 29936, 321, 29889, 29887, 29889, 376, 6113, 263, 5828, 27887, 1213, 363, 5007, 263, 9554, 29892, 470, 313, 29941, 29897, 411, 5199, 10970, 29889, 13, 13, 5398, 26227, 508, 367, 2309, 313, 29896, 29897, 491, 365, 26369, 411, 2560, 9508, 292, 763, 376, 7789, 567, 363, 1060, 29979, 29999, 7790, 29876, 29896, 19602, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 29973, 613, 313, 29906, 29897, 491, 773, 3414, 29899, 14940, 11994, 29936, 321, 29889, 29887, 29889, 376, 6113, 263, 5828, 27887, 1213, 363, 5007, 263, 9554, 29892, 470, 313, 29941, 29897, 411, 5199, 10970, 29889, 13, 13, 1451, 16047, 267, 297, 1472, 29899, 8489, 18987, 322, 3414, 26227, 29901, 1858, 9450, 975, 263, 3309, 29891, 4955, 322, 17583, 3902, 8253, 278, 1650, 2913, 3933, 18066, 292, 29889, 365, 26369, 29879, 21117, 304, 10365, 13900, 746, 20050, 411, 15668, 4436, 29892, 3907, 963, 3109, 16424, 9401, 304, 25618, 1058, 5110, 515, 14260, 322, 1059, 29889, 13, 13, 1451, 16047, 267, 297, 1472, 29899, 8489, 18987, 322, 3414, 26227, 29901, 1858, 9450, 975, 263, 3309, 29891, 4955, 322, 17583, 3902, 8253, 278, 1650, 2913, 3933, 18066, 292, 29889, 365, 26369, 29879, 21117, 304, 10365, 13900, 746, 20050, 411, 15668, 4436, 29892, 3907, 963, 3109, 16424, 9401, 304, 25618, 1058, 5110, 515, 14260, 322, 1059, 29889, 13, 16492, 29901, 1724, 526, 278, 13501, 304, 9330, 897, 510, 3283, 29973, 13, 29648, 1319, 673, 10834, 29914, 25580, 29962, 518, 29914, 25580, 29962, 518, 29914, 25580, 29962, 29871, 16564, 373, 278, 2183, 3030, 29892, 1244, 338, 278, 1234, 304, 278, 1139, 376, 5618, 526, 278, 13501, 304, 9330, 897, 510, 3283, 3026, 13, 13, 8439, 526, 2211, 13501, 304, 3414, 26227, 29901, 13, 13, 29896, 29889, 365, 26369, 411, 2560, 9508, 292, 29892, 1316, 408, 376, 7789, 567, 363, 1060, 29979, 29999, 1213, 470, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 3026, 13, 29906, 29889, 5293, 3414, 29899, 14940, 11994, 29892, 763, 376, 6113, 263, 5828, 27887, 29908, 363, 5007, 263, 9554, 29889, 13, 29941, 29889, 2973, 5199, 10970, 29889, 2], 'total_duration': 9514823750, 'load_duration': 795542, 'sample_count': 99, 'sample_duration': 68732000, 'prompt_eval_count': 146, 'prompt_eval_duration': 6206275000, 'eval_count': 98, 'eval_duration': 3229641000}\n" ] } ], "source": [ "from langchain.schema import LLMResult\n", "from langchain.callbacks.base import BaseCallbackHandler\n", "\n", "class GenerationStatisticsCallback(BaseCallbackHandler):\n", " def on_llm_end(self, response: LLMResult, **kwargs) -> None:\n", " print(response.generations[0][0].generation_info)\n", " \n", "callback_manager = CallbackManager([StreamingStdOutCallbackHandler(), GenerationStatisticsCallback()])\n", "\n", "chat_model = ChatOllama(model=\"llama2:13b-chat\",\n", " verbose=True,\n", " callback_manager=callback_manager)\n", "\n", "qa_chain = RetrievalQA.from_chain_type(\n", " chat_model,\n", " retriever=vectorstore.as_retriever(),\n", " chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n", ")\n", "\n", "question = \"What are the approaches to Task Decomposition?\"\n", "result = qa_chain({\"query\": question})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`eval_count` / (`eval_duration`/10e9) gets `tok / s`" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "30.343929867127645" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "98 / (3229641000/1000/1000/1000)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 2 }