{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Ollama\n", "\n", "[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as Llama 2, locally.\n", "\n", "Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. \n", "\n", "It optimizes setup and configuration details, including GPU usage.\n", "\n", "For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).\n", "\n", "## Setup\n", "\n", "First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n", "\n", "* [Download](https://ollama.ai/download)\n", "* Fetch a model via `ollama pull `\n", "* e.g., for `Llama-7b`: `ollama pull llama2` (see full list [here](https://github.com/jmorganca/ollama))\n", "* This will download the most basic version of the model typically (e.g., smallest # parameters and `q4_0`)\n", "* On Mac, it will download to \n", "\n", "`~/.ollama/models/manifests/registry.ollama.ai/library//latest`\n", "\n", "* And we specify a particular version, e.g., for `ollama pull vicuna:13b-v1.5-16k-q4_0`\n", "* The file is here with the model version in place of `latest`\n", "\n", "`~/.ollama/models/manifests/registry.ollama.ai/library/vicuna/13b-v1.5-16k-q4_0`\n", "\n", "You can easily access models in a few ways:\n", "\n", "1/ if the app is running:\n", "* All of your local models are automatically served on `localhost:11434`\n", "* Select your model when setting `llm = Ollama(..., model=\":\")`\n", "* If you set `llm = Ollama(..., model=\" None:\n", " print(response.generations[0][0].generation_info)\n", " \n", "callback_manager = CallbackManager([StreamingStdOutCallbackHandler(), GenerationStatisticsCallback()])\n", "\n", "llm = Ollama(base_url=\"http://localhost:11434\",\n", " model=\"llama2\",\n", " verbose=True,\n", " callback_manager=callback_manager)\n", "\n", "qa_chain = RetrievalQA.from_chain_type(\n", " llm,\n", " retriever=vectorstore.as_retriever(),\n", " chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n", ")\n", "\n", "question = \"What are the approaches to Task Decomposition?\"\n", "result = qa_chain({\"query\": question})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`eval_count` / (`eval_duration`/10e9) gets `tok / s`" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "47.22003469910937" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "62 / (1313002000/1000/1000/1000)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 2 }