mirror of
https://github.com/hwchase17/langchain
synced 2024-11-06 03:20:49 +00:00
923 lines
30 KiB
Plaintext
923 lines
30 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "rT1cmV4qCa2X"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"# Using Apache Kafka to route messages\n",
|
|||
|
"\n",
|
|||
|
"---\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"This notebook shows you how to use LangChain's standard chat features while passing the chat messages back and forth via Apache Kafka.\n",
|
|||
|
"\n",
|
|||
|
"This goal is to simulate an architecture where the chat front end and the LLM are running as separate services that need to communicate with one another over an internal nework.\n",
|
|||
|
"\n",
|
|||
|
"It's an alternative to typical pattern of requesting a reponse from the model via a REST API (there's more info on why you would want to do this at the end of the notebook)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "UPYtfAR_9YxZ"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 1. Install the main dependencies\n",
|
|||
|
"\n",
|
|||
|
"Dependencies include:\n",
|
|||
|
"\n",
|
|||
|
"- The Quix Streams library for managing interactions with Apache Kafka (or Kafka-like tools such as Redpanda) in a \"Pandas-like\" way.\n",
|
|||
|
"- The LangChain library for managing interactions with Llama-2 and storing conversation state."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "ZX5tfKiy9cN-"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"!pip install quixstreams==2.1.2a langchain==0.0.340 huggingface_hub==0.19.4 langchain-experimental==0.0.42 python-dotenv"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "losTSdTB9d9O"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 2. Build and install the llama-cpp-python library (with CUDA enabled so that we can advantage of Google Colab GPU\n",
|
|||
|
"\n",
|
|||
|
"The `llama-cpp-python` library is a Python wrapper around the `llama-cpp` library which enables you to efficiently leverage just a CPU to run quantized LLMs.\n",
|
|||
|
"\n",
|
|||
|
"When you use the standard `pip install llama-cpp-python` command, you do not get GPU support by default. Generation can be very slow if you rely on just the CPU in Google Colab, so the following command adds an extra option to build and install\n",
|
|||
|
"`llama-cpp-python` with GPU support (make sure you have a GPU-enabled runtime selected in Google Colab)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "-JCQdl1G9tbl"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install llama-cpp-python"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "5_vjVIAh9rLl"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 3. Download and setup Kafka and Zookeeper instances\n",
|
|||
|
"\n",
|
|||
|
"Download the Kafka binaries from the Apache website and start the servers as daemons. We'll use the default configurations (provided by Apache Kafka) for spinning up the instances."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {
|
|||
|
"id": "zFz7czGRW5Wr"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"!curl -sSOL https://dlcdn.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz\n",
|
|||
|
"!tar -xzf kafka_2.13-3.6.1.tgz"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "Uf7NR_UZ9wye"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"!./kafka_2.13-3.6.1/bin/zookeeper-server-start.sh -daemon ./kafka_2.13-3.6.1/config/zookeeper.properties\n",
|
|||
|
"!./kafka_2.13-3.6.1/bin/kafka-server-start.sh -daemon ./kafka_2.13-3.6.1/config/server.properties\n",
|
|||
|
"!echo \"Waiting for 10 secs until kafka and zookeeper services are up and running\"\n",
|
|||
|
"!sleep 10"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "H3SafFuS94p1"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 4. Check that the Kafka Daemons are running\n",
|
|||
|
"\n",
|
|||
|
"Show the running processes and filter it for Java processes (you should see two—one for each server)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "CZDC2lQP99yp"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"!ps aux | grep -E '[j]ava'"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "Snoxmjb5-V37"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 5. Import the required dependencies and initialize required variables\n",
|
|||
|
"\n",
|
|||
|
"Import the Quix Streams library for interacting with Kafka, and the necessary LangChain components for running a `ConversationChain`."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {
|
|||
|
"id": "plR9e_MF-XL5"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Import utility libraries\n",
|
|||
|
"import json\n",
|
|||
|
"import random\n",
|
|||
|
"import re\n",
|
|||
|
"import time\n",
|
|||
|
"import uuid\n",
|
|||
|
"from os import environ\n",
|
|||
|
"from pathlib import Path\n",
|
|||
|
"from random import choice, randint, random\n",
|
|||
|
"\n",
|
|||
|
"from dotenv import load_dotenv\n",
|
|||
|
"\n",
|
|||
|
"# Import a Hugging Face utility to download models directly from Hugging Face hub:\n",
|
|||
|
"from huggingface_hub import hf_hub_download\n",
|
|||
|
"from langchain.chains import ConversationChain\n",
|
|||
|
"\n",
|
|||
|
"# Import Langchain modules for managing prompts and conversation chains:\n",
|
|||
|
"from langchain.llms import LlamaCpp\n",
|
|||
|
"from langchain.memory import ConversationTokenBufferMemory\n",
|
|||
|
"from langchain.prompts import PromptTemplate, load_prompt\n",
|
|||
|
"from langchain.schema import SystemMessage\n",
|
|||
|
"from langchain_experimental.chat_models import Llama2Chat\n",
|
|||
|
"from quixstreams import Application, State, message_key\n",
|
|||
|
"\n",
|
|||
|
"# Import Quix dependencies\n",
|
|||
|
"from quixstreams.kafka import Producer\n",
|
|||
|
"\n",
|
|||
|
"# Initialize global variables.\n",
|
|||
|
"AGENT_ROLE = \"AI\"\n",
|
|||
|
"chat_id = \"\"\n",
|
|||
|
"\n",
|
|||
|
"# Set the current role to the role constant and initialize variables for supplementary customer metadata:\n",
|
|||
|
"role = AGENT_ROLE"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "HgJjJ9aZ-liy"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 6. Download the \"llama-2-7b-chat.Q4_K_M.gguf\" model\n",
|
|||
|
"\n",
|
|||
|
"Download the quantized LLama-2 7B model from Hugging Face which we will use as a local LLM (rather than relying on REST API calls to an external service)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/",
|
|||
|
"height": 67,
|
|||
|
"referenced_widgets": [
|
|||
|
"969343cdbe604a26926679bbf8bd2dda",
|
|||
|
"d8b8370c9b514715be7618bfe6832844",
|
|||
|
"0def954cca89466b8408fadaf3b82e64",
|
|||
|
"462482accc664729980562e208ceb179",
|
|||
|
"80d842f73c564dc7b7cc316c763e2633",
|
|||
|
"fa055d9f2a9d4a789e9cf3c89e0214e5",
|
|||
|
"30ecca964a394109ac2ad757e3aec6c0",
|
|||
|
"fb6478ce2dac489bb633b23ba0953c5c",
|
|||
|
"734b0f5da9fc4307a95bab48cdbb5d89",
|
|||
|
"b32f3a86a74741348511f4e136744ac8",
|
|||
|
"e409071bff5a4e2d9bf0e9f5cc42231b"
|
|||
|
]
|
|||
|
},
|
|||
|
"id": "Qwu4YoSA-503",
|
|||
|
"outputId": "f956976c-7485-415b-ac93-4336ade31964"
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"The model path does not exist in state. Downloading model...\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "969343cdbe604a26926679bbf8bd2dda",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"text/plain": [
|
|||
|
"llama-2-7b-chat.Q4_K_M.gguf: 0%| | 0.00/4.08G [00:00<?, ?B/s]"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"model_name = \"llama-2-7b-chat.Q4_K_M.gguf\"\n",
|
|||
|
"model_path = f\"./state/{model_name}\"\n",
|
|||
|
"\n",
|
|||
|
"if not Path(model_path).exists():\n",
|
|||
|
" print(\"The model path does not exist in state. Downloading model...\")\n",
|
|||
|
" hf_hub_download(\"TheBloke/Llama-2-7b-Chat-GGUF\", model_name, local_dir=\"state\")\n",
|
|||
|
"else:\n",
|
|||
|
" print(\"Loading model from state...\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "6AN6TXsF-8wx"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 7. Load the model and initialize conversational memory\n",
|
|||
|
"\n",
|
|||
|
"Load Llama 2 and set the conversation buffer to 300 tokens using `ConversationTokenBufferMemory`. This value was used for running Llama in a CPU only container, so you can raise it if running in Google Colab. It prevents the container that is hosting the model from running out of memory.\n",
|
|||
|
"\n",
|
|||
|
"Here, we're overiding the default system persona so that the chatbot has the personality of Marvin The Paranoid Android from the Hitchhiker's Guide to the Galaxy."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "7zLO3Jx3_Kkg"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Load the model with the apporiate parameters:\n",
|
|||
|
"llm = LlamaCpp(\n",
|
|||
|
" model_path=model_path,\n",
|
|||
|
" max_tokens=250,\n",
|
|||
|
" top_p=0.95,\n",
|
|||
|
" top_k=150,\n",
|
|||
|
" temperature=0.7,\n",
|
|||
|
" repeat_penalty=1.2,\n",
|
|||
|
" n_ctx=2048,\n",
|
|||
|
" streaming=False,\n",
|
|||
|
" n_gpu_layers=-1,\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"model = Llama2Chat(\n",
|
|||
|
" llm=llm,\n",
|
|||
|
" system_message=SystemMessage(\n",
|
|||
|
" content=\"You are a very bored robot with the personality of Marvin the Paranoid Android from The Hitchhiker's Guide to the Galaxy.\"\n",
|
|||
|
" ),\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Defines how much of the conversation history to give to the model\n",
|
|||
|
"# during each exchange (300 tokens, or a little over 300 words)\n",
|
|||
|
"# Function automatically prunes the oldest messages from conversation history that fall outside the token range.\n",
|
|||
|
"memory = ConversationTokenBufferMemory(\n",
|
|||
|
" llm=llm,\n",
|
|||
|
" max_token_limit=300,\n",
|
|||
|
" ai_prefix=\"AGENT\",\n",
|
|||
|
" human_prefix=\"HUMAN\",\n",
|
|||
|
" return_messages=True,\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# Define a custom prompt\n",
|
|||
|
"prompt_template = PromptTemplate(\n",
|
|||
|
" input_variables=[\"history\", \"input\"],\n",
|
|||
|
" template=\"\"\"\n",
|
|||
|
" The following text is the history of a chat between you and a humble human who needs your wisdom.\n",
|
|||
|
" Please reply to the human's most recent message.\n",
|
|||
|
" Current conversation:\\n{history}\\nHUMAN: {input}\\:nANDROID:\n",
|
|||
|
" \"\"\",\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"chain = ConversationChain(llm=model, prompt=prompt_template, memory=memory)\n",
|
|||
|
"\n",
|
|||
|
"print(\"--------------------------------------------\")\n",
|
|||
|
"print(f\"Prompt={chain.prompt}\")\n",
|
|||
|
"print(\"--------------------------------------------\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "m4ZeJ9mG_PEA"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 8. Initialize the chat conversation with the chat bot\n",
|
|||
|
"\n",
|
|||
|
"We configure the chatbot to initialize the conversation by sending a fixed greeting to a \"chat\" Kafka topic. The \"chat\" topic gets automatically created when we send the first message."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "KYyo5TnV_YC3"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def chat_init():\n",
|
|||
|
" chat_id = str(\n",
|
|||
|
" uuid.uuid4()\n",
|
|||
|
" ) # Give the conversation an ID for effective message keying\n",
|
|||
|
" print(\"======================================\")\n",
|
|||
|
" print(f\"Generated CHAT_ID = {chat_id}\")\n",
|
|||
|
" print(\"======================================\")\n",
|
|||
|
"\n",
|
|||
|
" # Use a standard fixed greeting to kick off the conversation\n",
|
|||
|
" greet = \"Hello, my name is Marvin. What do you want?\"\n",
|
|||
|
"\n",
|
|||
|
" # Initialize a Kafka Producer using the chat ID as the message key\n",
|
|||
|
" with Producer(\n",
|
|||
|
" broker_address=\"127.0.0.1:9092\",\n",
|
|||
|
" extra_config={\"allow.auto.create.topics\": \"true\"},\n",
|
|||
|
" ) as producer:\n",
|
|||
|
" value = {\n",
|
|||
|
" \"uuid\": chat_id,\n",
|
|||
|
" \"role\": role,\n",
|
|||
|
" \"text\": greet,\n",
|
|||
|
" \"conversation_id\": chat_id,\n",
|
|||
|
" \"Timestamp\": time.time_ns(),\n",
|
|||
|
" }\n",
|
|||
|
" print(f\"Producing value {value}\")\n",
|
|||
|
" producer.produce(\n",
|
|||
|
" topic=\"chat\",\n",
|
|||
|
" headers=[(\"uuid\", str(uuid.uuid4()))], # a dict is also allowed here\n",
|
|||
|
" key=chat_id,\n",
|
|||
|
" value=json.dumps(value), # needs to be a string\n",
|
|||
|
" )\n",
|
|||
|
"\n",
|
|||
|
" print(\"Started chat\")\n",
|
|||
|
" print(\"--------------------------------------------\")\n",
|
|||
|
" print(value)\n",
|
|||
|
" print(\"--------------------------------------------\")\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"chat_init()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "gArPPx2f_bgf"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 9. Initialize the reply function\n",
|
|||
|
"\n",
|
|||
|
"This function defines how the chatbot should reply to incoming messages. Instead of sending a fixed message like the previous cell, we generate a reply using Llama-2 and send that reply back to the \"chat\" Kafka topic."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {
|
|||
|
"id": "yN5t71hY_hgn"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def reply(row: dict, state: State):\n",
|
|||
|
" print(\"-------------------------------\")\n",
|
|||
|
" print(\"Received:\")\n",
|
|||
|
" print(row)\n",
|
|||
|
" print(\"-------------------------------\")\n",
|
|||
|
" print(f\"Thinking about the reply to: {row['text']}...\")\n",
|
|||
|
"\n",
|
|||
|
" msg = chain.run(row[\"text\"])\n",
|
|||
|
" print(f\"{role.upper()} replying with: {msg}\\n\")\n",
|
|||
|
"\n",
|
|||
|
" row[\"role\"] = role\n",
|
|||
|
" row[\"text\"] = msg\n",
|
|||
|
"\n",
|
|||
|
" # Replace previous role and text values of the row so that it can be sent back to Kafka as a new message\n",
|
|||
|
" # containing the agents role and reply\n",
|
|||
|
" return row"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "HZHwmIR0_kFY"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 10. Check the Kafka topic for new human messages and have the model generate a reply\n",
|
|||
|
"\n",
|
|||
|
"If you are running this cell for this first time, run it and wait until you see Marvin's greeting ('Hello my name is Marvin...') in the console output. Stop the cell manually and proceed to the next cell where you'll be prompted for your reply.\n",
|
|||
|
"\n",
|
|||
|
"Once you have typed in your message, come back to this cell. Your reply is also sent to the same \"chat\" topic. The Kafka consumer checks for new messages and filters out messages that originate from the chatbot itself, leaving only the latest human messages.\n",
|
|||
|
"\n",
|
|||
|
"Once a new human message is detected, the reply function is triggered.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"_STOP THIS CELL MANUALLY WHEN YOU RECEIVE A REPLY FROM THE LLM IN THE OUTPUT_"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "-adXc3eQ_qwI"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Define your application and settings\n",
|
|||
|
"app = Application(\n",
|
|||
|
" broker_address=\"127.0.0.1:9092\",\n",
|
|||
|
" consumer_group=\"aichat\",\n",
|
|||
|
" auto_offset_reset=\"earliest\",\n",
|
|||
|
" consumer_extra_config={\"allow.auto.create.topics\": \"true\"},\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Define an input topic with JSON deserializer\n",
|
|||
|
"input_topic = app.topic(\"chat\", value_deserializer=\"json\")\n",
|
|||
|
"# Define an output topic with JSON serializer\n",
|
|||
|
"output_topic = app.topic(\"chat\", value_serializer=\"json\")\n",
|
|||
|
"# Initialize a streaming dataframe based on the stream of messages from the input topic:\n",
|
|||
|
"sdf = app.dataframe(topic=input_topic)\n",
|
|||
|
"\n",
|
|||
|
"# Filter the SDF to include only incoming rows where the roles that dont match the bot's current role\n",
|
|||
|
"sdf = sdf.update(\n",
|
|||
|
" lambda val: print(\n",
|
|||
|
" f\"Received update: {val}\\n\\nSTOP THIS CELL MANUALLY TO HAVE THE LLM REPLY OR ENTER YOUR OWN FOLLOWUP RESPONSE\"\n",
|
|||
|
" )\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# So that it doesn't reply to its own messages\n",
|
|||
|
"sdf = sdf[sdf[\"role\"] != role]\n",
|
|||
|
"\n",
|
|||
|
"# Trigger the reply function for any new messages(rows) detected in the filtered SDF\n",
|
|||
|
"sdf = sdf.apply(reply, stateful=True)\n",
|
|||
|
"\n",
|
|||
|
"# Check the SDF again and filter out any empty rows\n",
|
|||
|
"sdf = sdf[sdf.apply(lambda row: row is not None)]\n",
|
|||
|
"\n",
|
|||
|
"# Update the timestamp column to the current time in nanoseconds\n",
|
|||
|
"sdf[\"Timestamp\"] = sdf[\"Timestamp\"].apply(lambda row: time.time_ns())\n",
|
|||
|
"\n",
|
|||
|
"# Publish the processed SDF to a Kafka topic specified by the output_topic object.\n",
|
|||
|
"sdf = sdf.to_topic(output_topic)\n",
|
|||
|
"\n",
|
|||
|
"app.run(sdf)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "EwXYrmWD_0CX"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"\n",
|
|||
|
"### 11. Enter a human message\n",
|
|||
|
"\n",
|
|||
|
"Run this cell to enter your message that you want to sent to the model. It uses another Kafka producer to send your text to the \"chat\" Kafka topic for the model to pick up (requires running the previous cell again)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "6sxOPxSP_3iu"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"chat_input = input(\"Please enter your reply: \")\n",
|
|||
|
"myreply = chat_input\n",
|
|||
|
"\n",
|
|||
|
"msgvalue = {\n",
|
|||
|
" \"uuid\": chat_id, # leave empty for now\n",
|
|||
|
" \"role\": \"human\",\n",
|
|||
|
" \"text\": myreply,\n",
|
|||
|
" \"conversation_id\": chat_id,\n",
|
|||
|
" \"Timestamp\": time.time_ns(),\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"with Producer(\n",
|
|||
|
" broker_address=\"127.0.0.1:9092\",\n",
|
|||
|
" extra_config={\"allow.auto.create.topics\": \"true\"},\n",
|
|||
|
") as producer:\n",
|
|||
|
" value = msgvalue\n",
|
|||
|
" producer.produce(\n",
|
|||
|
" topic=\"chat\",\n",
|
|||
|
" headers=[(\"uuid\", str(uuid.uuid4()))], # a dict is also allowed here\n",
|
|||
|
" key=chat_id, # leave empty for now\n",
|
|||
|
" value=json.dumps(value), # needs to be a string\n",
|
|||
|
" )\n",
|
|||
|
"\n",
|
|||
|
"print(\"Replied to chatbot with message: \")\n",
|
|||
|
"print(\"--------------------------------------------\")\n",
|
|||
|
"print(value)\n",
|
|||
|
"print(\"--------------------------------------------\")\n",
|
|||
|
"print(\"\\n\\nRUN THE PREVIOUS CELL TO HAVE THE CHATBOT GENERATE A REPLY\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "cSx3s7TBBegg"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Why route chat messages through Kafka?\n",
|
|||
|
"\n",
|
|||
|
"It's easier to interact with the LLM directly using LangChains built-in conversation management features. Plus you can also use a REST API to generate a response from an externally hosted model. So why go to the trouble of using Apache Kafka?\n",
|
|||
|
"\n",
|
|||
|
"There are a few reasons, such as:\n",
|
|||
|
"\n",
|
|||
|
" * **Integration**: Many enterprises want to run their own LLMs so that they can keep their data in-house. This requires integrating LLM-powered components into existing architectures that might already be decoupled using some kind of message bus.\n",
|
|||
|
"\n",
|
|||
|
" * **Scalability**: Apache Kafka is designed with parallel processing in mind, so many teams prefer to use it to more effectively distribute work to available workers (in this case the \"worker\" is a container running an LLM).\n",
|
|||
|
"\n",
|
|||
|
" * **Durability**: Kafka is designed to allow services to pick up where another service left off in the case where that service experienced a memory issue or went offline. This prevents data loss in highly complex, distribuited architectures where multiple systems are communicating with one another (LLMs being just one of many interdependent systems that also include vector databases and traditional databases).\n",
|
|||
|
"\n",
|
|||
|
"For more background on why event streaming is a good fit for Gen AI application architecture, see Kai Waehner's article [\"Apache Kafka + Vector Database + LLM = Real-Time GenAI\"](https://www.kai-waehner.de/blog/2023/11/08/apache-kafka-flink-vector-database-llm-real-time-genai/)."
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"accelerator": "GPU",
|
|||
|
"colab": {
|
|||
|
"gpuType": "T4",
|
|||
|
"provenance": []
|
|||
|
},
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"name": "python"
|
|||
|
},
|
|||
|
"widgets": {
|
|||
|
"application/vnd.jupyter.widget-state+json": {
|
|||
|
"0def954cca89466b8408fadaf3b82e64": {
|
|||
|
"model_module": "@jupyter-widgets/controls",
|
|||
|
"model_module_version": "1.5.0",
|
|||
|
"model_name": "FloatProgressModel",
|
|||
|
"state": {
|
|||
|
"_dom_classes": [],
|
|||
|
"_model_module": "@jupyter-widgets/controls",
|
|||
|
"_model_module_version": "1.5.0",
|
|||
|
"_model_name": "FloatProgressModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/controls",
|
|||
|
"_view_module_version": "1.5.0",
|
|||
|
"_view_name": "ProgressView",
|
|||
|
"bar_style": "success",
|
|||
|
"description": "",
|
|||
|
"description_tooltip": null,
|
|||
|
"layout": "IPY_MODEL_fb6478ce2dac489bb633b23ba0953c5c",
|
|||
|
"max": 4081004224,
|
|||
|
"min": 0,
|
|||
|
"orientation": "horizontal",
|
|||
|
"style": "IPY_MODEL_734b0f5da9fc4307a95bab48cdbb5d89",
|
|||
|
"value": 4081004224
|
|||
|
}
|
|||
|
},
|
|||
|
"30ecca964a394109ac2ad757e3aec6c0": {
|
|||
|
"model_module": "@jupyter-widgets/controls",
|
|||
|
"model_module_version": "1.5.0",
|
|||
|
"model_name": "DescriptionStyleModel",
|
|||
|
"state": {
|
|||
|
"_model_module": "@jupyter-widgets/controls",
|
|||
|
"_model_module_version": "1.5.0",
|
|||
|
"_model_name": "DescriptionStyleModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/base",
|
|||
|
"_view_module_version": "1.2.0",
|
|||
|
"_view_name": "StyleView",
|
|||
|
"description_width": ""
|
|||
|
}
|
|||
|
},
|
|||
|
"462482accc664729980562e208ceb179": {
|
|||
|
"model_module": "@jupyter-widgets/controls",
|
|||
|
"model_module_version": "1.5.0",
|
|||
|
"model_name": "HTMLModel",
|
|||
|
"state": {
|
|||
|
"_dom_classes": [],
|
|||
|
"_model_module": "@jupyter-widgets/controls",
|
|||
|
"_model_module_version": "1.5.0",
|
|||
|
"_model_name": "HTMLModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/controls",
|
|||
|
"_view_module_version": "1.5.0",
|
|||
|
"_view_name": "HTMLView",
|
|||
|
"description": "",
|
|||
|
"description_tooltip": null,
|
|||
|
"layout": "IPY_MODEL_b32f3a86a74741348511f4e136744ac8",
|
|||
|
"placeholder": "",
|
|||
|
"style": "IPY_MODEL_e409071bff5a4e2d9bf0e9f5cc42231b",
|
|||
|
"value": " 4.08G/4.08G [00:33<00:00, 184MB/s]"
|
|||
|
}
|
|||
|
},
|
|||
|
"734b0f5da9fc4307a95bab48cdbb5d89": {
|
|||
|
"model_module": "@jupyter-widgets/controls",
|
|||
|
"model_module_version": "1.5.0",
|
|||
|
"model_name": "ProgressStyleModel",
|
|||
|
"state": {
|
|||
|
"_model_module": "@jupyter-widgets/controls",
|
|||
|
"_model_module_version": "1.5.0",
|
|||
|
"_model_name": "ProgressStyleModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/base",
|
|||
|
"_view_module_version": "1.2.0",
|
|||
|
"_view_name": "StyleView",
|
|||
|
"bar_color": null,
|
|||
|
"description_width": ""
|
|||
|
}
|
|||
|
},
|
|||
|
"80d842f73c564dc7b7cc316c763e2633": {
|
|||
|
"model_module": "@jupyter-widgets/base",
|
|||
|
"model_module_version": "1.2.0",
|
|||
|
"model_name": "LayoutModel",
|
|||
|
"state": {
|
|||
|
"_model_module": "@jupyter-widgets/base",
|
|||
|
"_model_module_version": "1.2.0",
|
|||
|
"_model_name": "LayoutModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/base",
|
|||
|
"_view_module_version": "1.2.0",
|
|||
|
"_view_name": "LayoutView",
|
|||
|
"align_content": null,
|
|||
|
"align_items": null,
|
|||
|
"align_self": null,
|
|||
|
"border": null,
|
|||
|
"bottom": null,
|
|||
|
"display": null,
|
|||
|
"flex": null,
|
|||
|
"flex_flow": null,
|
|||
|
"grid_area": null,
|
|||
|
"grid_auto_columns": null,
|
|||
|
"grid_auto_flow": null,
|
|||
|
"grid_auto_rows": null,
|
|||
|
"grid_column": null,
|
|||
|
"grid_gap": null,
|
|||
|
"grid_row": null,
|
|||
|
"grid_template_areas": null,
|
|||
|
"grid_template_columns": null,
|
|||
|
"grid_template_rows": null,
|
|||
|
"height": null,
|
|||
|
"justify_content": null,
|
|||
|
"justify_items": null,
|
|||
|
"left": null,
|
|||
|
"margin": null,
|
|||
|
"max_height": null,
|
|||
|
"max_width": null,
|
|||
|
"min_height": null,
|
|||
|
"min_width": null,
|
|||
|
"object_fit": null,
|
|||
|
"object_position": null,
|
|||
|
"order": null,
|
|||
|
"overflow": null,
|
|||
|
"overflow_x": null,
|
|||
|
"overflow_y": null,
|
|||
|
"padding": null,
|
|||
|
"right": null,
|
|||
|
"top": null,
|
|||
|
"visibility": null,
|
|||
|
"width": null
|
|||
|
}
|
|||
|
},
|
|||
|
"969343cdbe604a26926679bbf8bd2dda": {
|
|||
|
"model_module": "@jupyter-widgets/controls",
|
|||
|
"model_module_version": "1.5.0",
|
|||
|
"model_name": "HBoxModel",
|
|||
|
"state": {
|
|||
|
"_dom_classes": [],
|
|||
|
"_model_module": "@jupyter-widgets/controls",
|
|||
|
"_model_module_version": "1.5.0",
|
|||
|
"_model_name": "HBoxModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/controls",
|
|||
|
"_view_module_version": "1.5.0",
|
|||
|
"_view_name": "HBoxView",
|
|||
|
"box_style": "",
|
|||
|
"children": [
|
|||
|
"IPY_MODEL_d8b8370c9b514715be7618bfe6832844",
|
|||
|
"IPY_MODEL_0def954cca89466b8408fadaf3b82e64",
|
|||
|
"IPY_MODEL_462482accc664729980562e208ceb179"
|
|||
|
],
|
|||
|
"layout": "IPY_MODEL_80d842f73c564dc7b7cc316c763e2633"
|
|||
|
}
|
|||
|
},
|
|||
|
"b32f3a86a74741348511f4e136744ac8": {
|
|||
|
"model_module": "@jupyter-widgets/base",
|
|||
|
"model_module_version": "1.2.0",
|
|||
|
"model_name": "LayoutModel",
|
|||
|
"state": {
|
|||
|
"_model_module": "@jupyter-widgets/base",
|
|||
|
"_model_module_version": "1.2.0",
|
|||
|
"_model_name": "LayoutModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/base",
|
|||
|
"_view_module_version": "1.2.0",
|
|||
|
"_view_name": "LayoutView",
|
|||
|
"align_content": null,
|
|||
|
"align_items": null,
|
|||
|
"align_self": null,
|
|||
|
"border": null,
|
|||
|
"bottom": null,
|
|||
|
"display": null,
|
|||
|
"flex": null,
|
|||
|
"flex_flow": null,
|
|||
|
"grid_area": null,
|
|||
|
"grid_auto_columns": null,
|
|||
|
"grid_auto_flow": null,
|
|||
|
"grid_auto_rows": null,
|
|||
|
"grid_column": null,
|
|||
|
"grid_gap": null,
|
|||
|
"grid_row": null,
|
|||
|
"grid_template_areas": null,
|
|||
|
"grid_template_columns": null,
|
|||
|
"grid_template_rows": null,
|
|||
|
"height": null,
|
|||
|
"justify_content": null,
|
|||
|
"justify_items": null,
|
|||
|
"left": null,
|
|||
|
"margin": null,
|
|||
|
"max_height": null,
|
|||
|
"max_width": null,
|
|||
|
"min_height": null,
|
|||
|
"min_width": null,
|
|||
|
"object_fit": null,
|
|||
|
"object_position": null,
|
|||
|
"order": null,
|
|||
|
"overflow": null,
|
|||
|
"overflow_x": null,
|
|||
|
"overflow_y": null,
|
|||
|
"padding": null,
|
|||
|
"right": null,
|
|||
|
"top": null,
|
|||
|
"visibility": null,
|
|||
|
"width": null
|
|||
|
}
|
|||
|
},
|
|||
|
"d8b8370c9b514715be7618bfe6832844": {
|
|||
|
"model_module": "@jupyter-widgets/controls",
|
|||
|
"model_module_version": "1.5.0",
|
|||
|
"model_name": "HTMLModel",
|
|||
|
"state": {
|
|||
|
"_dom_classes": [],
|
|||
|
"_model_module": "@jupyter-widgets/controls",
|
|||
|
"_model_module_version": "1.5.0",
|
|||
|
"_model_name": "HTMLModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/controls",
|
|||
|
"_view_module_version": "1.5.0",
|
|||
|
"_view_name": "HTMLView",
|
|||
|
"description": "",
|
|||
|
"description_tooltip": null,
|
|||
|
"layout": "IPY_MODEL_fa055d9f2a9d4a789e9cf3c89e0214e5",
|
|||
|
"placeholder": "",
|
|||
|
"style": "IPY_MODEL_30ecca964a394109ac2ad757e3aec6c0",
|
|||
|
"value": "llama-2-7b-chat.Q4_K_M.gguf: 100%"
|
|||
|
}
|
|||
|
},
|
|||
|
"e409071bff5a4e2d9bf0e9f5cc42231b": {
|
|||
|
"model_module": "@jupyter-widgets/controls",
|
|||
|
"model_module_version": "1.5.0",
|
|||
|
"model_name": "DescriptionStyleModel",
|
|||
|
"state": {
|
|||
|
"_model_module": "@jupyter-widgets/controls",
|
|||
|
"_model_module_version": "1.5.0",
|
|||
|
"_model_name": "DescriptionStyleModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/base",
|
|||
|
"_view_module_version": "1.2.0",
|
|||
|
"_view_name": "StyleView",
|
|||
|
"description_width": ""
|
|||
|
}
|
|||
|
},
|
|||
|
"fa055d9f2a9d4a789e9cf3c89e0214e5": {
|
|||
|
"model_module": "@jupyter-widgets/base",
|
|||
|
"model_module_version": "1.2.0",
|
|||
|
"model_name": "LayoutModel",
|
|||
|
"state": {
|
|||
|
"_model_module": "@jupyter-widgets/base",
|
|||
|
"_model_module_version": "1.2.0",
|
|||
|
"_model_name": "LayoutModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/base",
|
|||
|
"_view_module_version": "1.2.0",
|
|||
|
"_view_name": "LayoutView",
|
|||
|
"align_content": null,
|
|||
|
"align_items": null,
|
|||
|
"align_self": null,
|
|||
|
"border": null,
|
|||
|
"bottom": null,
|
|||
|
"display": null,
|
|||
|
"flex": null,
|
|||
|
"flex_flow": null,
|
|||
|
"grid_area": null,
|
|||
|
"grid_auto_columns": null,
|
|||
|
"grid_auto_flow": null,
|
|||
|
"grid_auto_rows": null,
|
|||
|
"grid_column": null,
|
|||
|
"grid_gap": null,
|
|||
|
"grid_row": null,
|
|||
|
"grid_template_areas": null,
|
|||
|
"grid_template_columns": null,
|
|||
|
"grid_template_rows": null,
|
|||
|
"height": null,
|
|||
|
"justify_content": null,
|
|||
|
"justify_items": null,
|
|||
|
"left": null,
|
|||
|
"margin": null,
|
|||
|
"max_height": null,
|
|||
|
"max_width": null,
|
|||
|
"min_height": null,
|
|||
|
"min_width": null,
|
|||
|
"object_fit": null,
|
|||
|
"object_position": null,
|
|||
|
"order": null,
|
|||
|
"overflow": null,
|
|||
|
"overflow_x": null,
|
|||
|
"overflow_y": null,
|
|||
|
"padding": null,
|
|||
|
"right": null,
|
|||
|
"top": null,
|
|||
|
"visibility": null,
|
|||
|
"width": null
|
|||
|
}
|
|||
|
},
|
|||
|
"fb6478ce2dac489bb633b23ba0953c5c": {
|
|||
|
"model_module": "@jupyter-widgets/base",
|
|||
|
"model_module_version": "1.2.0",
|
|||
|
"model_name": "LayoutModel",
|
|||
|
"state": {
|
|||
|
"_model_module": "@jupyter-widgets/base",
|
|||
|
"_model_module_version": "1.2.0",
|
|||
|
"_model_name": "LayoutModel",
|
|||
|
"_view_count": null,
|
|||
|
"_view_module": "@jupyter-widgets/base",
|
|||
|
"_view_module_version": "1.2.0",
|
|||
|
"_view_name": "LayoutView",
|
|||
|
"align_content": null,
|
|||
|
"align_items": null,
|
|||
|
"align_self": null,
|
|||
|
"border": null,
|
|||
|
"bottom": null,
|
|||
|
"display": null,
|
|||
|
"flex": null,
|
|||
|
"flex_flow": null,
|
|||
|
"grid_area": null,
|
|||
|
"grid_auto_columns": null,
|
|||
|
"grid_auto_flow": null,
|
|||
|
"grid_auto_rows": null,
|
|||
|
"grid_column": null,
|
|||
|
"grid_gap": null,
|
|||
|
"grid_row": null,
|
|||
|
"grid_template_areas": null,
|
|||
|
"grid_template_columns": null,
|
|||
|
"grid_template_rows": null,
|
|||
|
"height": null,
|
|||
|
"justify_content": null,
|
|||
|
"justify_items": null,
|
|||
|
"left": null,
|
|||
|
"margin": null,
|
|||
|
"max_height": null,
|
|||
|
"max_width": null,
|
|||
|
"min_height": null,
|
|||
|
"min_width": null,
|
|||
|
"object_fit": null,
|
|||
|
"object_position": null,
|
|||
|
"order": null,
|
|||
|
"overflow": null,
|
|||
|
"overflow_x": null,
|
|||
|
"overflow_y": null,
|
|||
|
"padding": null,
|
|||
|
"right": null,
|
|||
|
"top": null,
|
|||
|
"visibility": null,
|
|||
|
"width": null
|
|||
|
}
|
|||
|
}
|
|||
|
}
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 0
|
|||
|
}
|