{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "rT1cmV4qCa2X" }, "source": [ "# Using Apache Kafka to route messages\n", "\n", "---\n", "\n", "\n", "\n", "This notebook shows you how to use LangChain's standard chat features while passing the chat messages back and forth via Apache Kafka.\n", "\n", "This goal is to simulate an architecture where the chat front end and the LLM are running as separate services that need to communicate with one another over an internal network.\n", "\n", "It's an alternative to typical pattern of requesting a response from the model via a REST API (there's more info on why you would want to do this at the end of the notebook)." ] }, { "cell_type": "markdown", "metadata": { "id": "UPYtfAR_9YxZ" }, "source": [ "### 1. Install the main dependencies\n", "\n", "Dependencies include:\n", "\n", "- The Quix Streams library for managing interactions with Apache Kafka (or Kafka-like tools such as Redpanda) in a \"Pandas-like\" way.\n", "- The LangChain library for managing interactions with Llama-2 and storing conversation state." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZX5tfKiy9cN-" }, "outputs": [], "source": [ "!pip install quixstreams==2.1.2a langchain==0.0.340 huggingface_hub==0.19.4 langchain-experimental==0.0.42 python-dotenv" ] }, { "cell_type": "markdown", "metadata": { "id": "losTSdTB9d9O" }, "source": [ "### 2. Build and install the llama-cpp-python library (with CUDA enabled so that we can advantage of Google Colab GPU\n", "\n", "The `llama-cpp-python` library is a Python wrapper around the `llama-cpp` library which enables you to efficiently leverage just a CPU to run quantized LLMs.\n", "\n", "When you use the standard `pip install llama-cpp-python` command, you do not get GPU support by default. Generation can be very slow if you rely on just the CPU in Google Colab, so the following command adds an extra option to build and install\n", "`llama-cpp-python` with GPU support (make sure you have a GPU-enabled runtime selected in Google Colab)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-JCQdl1G9tbl" }, "outputs": [], "source": [ "!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install llama-cpp-python" ] }, { "cell_type": "markdown", "metadata": { "id": "5_vjVIAh9rLl" }, "source": [ "### 3. Download and setup Kafka and Zookeeper instances\n", "\n", "Download the Kafka binaries from the Apache website and start the servers as daemons. We'll use the default configurations (provided by Apache Kafka) for spinning up the instances." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "zFz7czGRW5Wr" }, "outputs": [], "source": [ "!curl -sSOL https://dlcdn.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz\n", "!tar -xzf kafka_2.13-3.6.1.tgz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Uf7NR_UZ9wye" }, "outputs": [], "source": [ "!./kafka_2.13-3.6.1/bin/zookeeper-server-start.sh -daemon ./kafka_2.13-3.6.1/config/zookeeper.properties\n", "!./kafka_2.13-3.6.1/bin/kafka-server-start.sh -daemon ./kafka_2.13-3.6.1/config/server.properties\n", "!echo \"Waiting for 10 secs until kafka and zookeeper services are up and running\"\n", "!sleep 10" ] }, { "cell_type": "markdown", "metadata": { "id": "H3SafFuS94p1" }, "source": [ "### 4. Check that the Kafka Daemons are running\n", "\n", "Show the running processes and filter it for Java processes (you should see two—one for each server)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "CZDC2lQP99yp" }, "outputs": [], "source": [ "!ps aux | grep -E '[j]ava'" ] }, { "cell_type": "markdown", "metadata": { "id": "Snoxmjb5-V37" }, "source": [ "### 5. Import the required dependencies and initialize required variables\n", "\n", "Import the Quix Streams library for interacting with Kafka, and the necessary LangChain components for running a `ConversationChain`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "id": "plR9e_MF-XL5" }, "outputs": [], "source": [ "# Import utility libraries\n", "import json\n", "import random\n", "import re\n", "import time\n", "import uuid\n", "from os import environ\n", "from pathlib import Path\n", "from random import choice, randint, random\n", "\n", "from dotenv import load_dotenv\n", "\n", "# Import a Hugging Face utility to download models directly from Hugging Face hub:\n", "from huggingface_hub import hf_hub_download\n", "from langchain.chains import ConversationChain\n", "\n", "# Import Langchain modules for managing prompts and conversation chains:\n", "from langchain.llms import LlamaCpp\n", "from langchain.memory import ConversationTokenBufferMemory\n", "from langchain.prompts import PromptTemplate, load_prompt\n", "from langchain_core.messages import SystemMessage\n", "from langchain_experimental.chat_models import Llama2Chat\n", "from quixstreams import Application, State, message_key\n", "\n", "# Import Quix dependencies\n", "from quixstreams.kafka import Producer\n", "\n", "# Initialize global variables.\n", "AGENT_ROLE = \"AI\"\n", "chat_id = \"\"\n", "\n", "# Set the current role to the role constant and initialize variables for supplementary customer metadata:\n", "role = AGENT_ROLE" ] }, { "cell_type": "markdown", "metadata": { "id": "HgJjJ9aZ-liy" }, "source": [ "### 6. Download the \"llama-2-7b-chat.Q4_K_M.gguf\" model\n", "\n", "Download the quantized LLama-2 7B model from Hugging Face which we will use as a local LLM (rather than relying on REST API calls to an external service)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 67, "referenced_widgets": [ "969343cdbe604a26926679bbf8bd2dda", "d8b8370c9b514715be7618bfe6832844", "0def954cca89466b8408fadaf3b82e64", "462482accc664729980562e208ceb179", "80d842f73c564dc7b7cc316c763e2633", "fa055d9f2a9d4a789e9cf3c89e0214e5", "30ecca964a394109ac2ad757e3aec6c0", "fb6478ce2dac489bb633b23ba0953c5c", "734b0f5da9fc4307a95bab48cdbb5d89", "b32f3a86a74741348511f4e136744ac8", "e409071bff5a4e2d9bf0e9f5cc42231b" ] }, "id": "Qwu4YoSA-503", "outputId": "f956976c-7485-415b-ac93-4336ade31964" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The model path does not exist in state. Downloading model...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "969343cdbe604a26926679bbf8bd2dda", "version_major": 2, "version_minor": 0 }, "text/plain": [ "llama-2-7b-chat.Q4_K_M.gguf: 0%| | 0.00/4.08G [00:00