You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
openai-cookbook/apps/chatbot-kickstarter/powering_your_products_with...

1044 lines
48 KiB
Plaintext

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"id": "63785634",
"metadata": {},
"source": [
"# Power your products with ChatGPT and your own data\n",
"\n",
"This is a walkthrough taking readers through how to build starter Q&A and Chatbot applications using the ChatGPT API and their own data. \n",
"\n",
"It is laid out in these sections:\n",
"- **Setup:** \n",
" - Initiate variables and source the data\n",
"- **Lay the foundations:**\n",
" - Set up the vector database to accept vectors and data\n",
" - Load the dataset, chunk the data up for embedding and store in the vector database\n",
"- **Make it a product:**\n",
" - Add a retrieval step where users provide queries and we return the most relevant entries\n",
" - Summarise search results with GPT-3\n",
" - Test out this basic Q&A app in Streamlit\n",
"- **Build your moat:**\n",
" - Create an Assistant class to manage context and interact with our bot\n",
" - Use the Chatbot to answer questions using semantic search context\n",
" - Test out this basic Chatbot app in Streamlit\n",
" \n",
"Upon completion, you have the building blocks to create your own production chatbot or Q&A application using OpenAI APIs and a vector database.\n",
"\n",
"This notebook was originally presented with [these slides](https://drive.google.com/file/d/1dB-RQhZC_Q1iAsHkNNdkqtxxXqYODFYy/view?usp=share_link), which provide visual context for this journey."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "59f08ea7",
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"id": "13649895",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"First we'll setup our libraries and environment variables"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "7590fbfc",
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"import os\n",
"import requests\n",
"import numpy as np\n",
"import pandas as pd\n",
"from typing import Iterator\n",
"import tiktoken\n",
"import textract\n",
"from numpy import array, average\n",
"\n",
"from database import get_redis_connection\n",
"\n",
"# Set our default models and chunking size\n",
"from config import COMPLETIONS_MODEL, EMBEDDINGS_MODEL, CHAT_MODEL, TEXT_EMBEDDING_CHUNK_SIZE, VECTOR_FIELD_NAME\n",
"\n",
"# Ignore unclosed SSL socket warnings - optional in case you get these errors\n",
"import warnings\n",
"\n",
"warnings.filterwarnings(action=\"ignore\", message=\"unclosed\", category=ImportWarning)\n",
"warnings.filterwarnings(\"ignore\", category=DeprecationWarning) "
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "760efc1e",
"metadata": {},
"outputs": [],
"source": [
"pd.set_option('display.max_colwidth', 0)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3f90817d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[\"FIA Practice Directions - Competitor's Staff Registration System.pdf\",\n",
" 'fia_2022_formula_1_sporting_regulations_-_issue_9_-_2022-10-19_0.pdf',\n",
" 'fia_2023_formula_1_technical_regulations_-_issue_4_-_2022-12-07.pdf',\n",
" 'fia_f1_power_unit_financial_regulations_issue_1_-_2022-08-16.pdf',\n",
" 'fia_formula_1_financial_regulations_iss.13.pdf']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_dir = os.path.join(os.curdir,'data')\n",
"pdf_files = sorted([x for x in os.listdir(data_dir) if 'DS_Store' not in x])\n",
"pdf_files"
]
},
{
"cell_type": "markdown",
"id": "5dc4018c",
"metadata": {},
"source": [
"## Laying the foundations"
]
},
{
"cell_type": "markdown",
"id": "632b82ed",
"metadata": {},
"source": [
"### Storage\n",
"\n",
"We're going to use Redis as our database for both document contents and the vector embeddings. You will need the full Redis Stack to enable use of Redisearch, which is the module that allows semantic search - more detail is in the [docs for Redis Stack](https://redis.io/docs/stack/get-started/install/docker/).\n",
"\n",
"To set this up locally, you will need to install Docker and then run the following command: ```docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest```.\n",
"\n",
"The code used here draws heavily on [this repo](https://github.com/RedisAI/vecsim-demo).\n",
"\n",
"After setting up the Docker instance of Redis Stack, you can follow the below instructions to initiate a Redis connection and create a Hierarchical Navigable Small World (HNSW) index for semantic search."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "17d6b886",
"metadata": {},
"outputs": [],
"source": [
"# Setup Redis\n",
"from redis import Redis\n",
"from redis.commands.search.query import Query\n",
"from redis.commands.search.field import (\n",
" TextField,\n",
" VectorField,\n",
" NumericField\n",
")\n",
"from redis.commands.search.indexDefinition import (\n",
" IndexDefinition,\n",
" IndexType\n",
")\n",
"\n",
"redis_client = get_redis_connection()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "4f3d3e6b",
"metadata": {},
"outputs": [],
"source": [
"# Constants\n",
"VECTOR_DIM = 1536 #len(data['title_vector'][0]) # length of the vectors\n",
"#VECTOR_NUMBER = len(data) # initial number of vectors\n",
"PREFIX = \"sportsdoc\" # prefix for the document keys\n",
"DISTANCE_METRIC = \"COSINE\" # distance metric for the vectors (ex. COSINE, IP, L2)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d3c352ca",
"metadata": {},
"outputs": [],
"source": [
"# Create search index\n",
"\n",
"# Index\n",
"INDEX_NAME = \"f1-index\" # name of the search index\n",
"VECTOR_FIELD_NAME = 'content_vector'\n",
"\n",
"# Define RediSearch fields for each of the columns in the dataset\n",
"# This is where you should add any additional metadata you want to capture\n",
"filename = TextField(\"filename\")\n",
"text_chunk = TextField(\"text_chunk\")\n",
"file_chunk_index = NumericField(\"file_chunk_index\")\n",
"\n",
"# define RediSearch vector fields to use HNSW index\n",
"\n",
"text_embedding = VectorField(VECTOR_FIELD_NAME,\n",
" \"HNSW\", {\n",
" \"TYPE\": \"FLOAT32\",\n",
" \"DIM\": VECTOR_DIM,\n",
" \"DISTANCE_METRIC\": DISTANCE_METRIC\n",
" }\n",
")\n",
"# Add all our field objects to a list to be created as an index\n",
"fields = [filename,text_chunk,file_chunk_index,text_embedding]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a6c78b7e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"redis_client.ping()"
]
},
{
"cell_type": "code",
"execution_count": 304,
"id": "cf3ad41f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Not there yet. Creating\n"
]
}
],
"source": [
"# Optional step to drop the index if it already exists\n",
"#redis_client.ft(INDEX_NAME).dropindex()\n",
"\n",
"# Check if index exists\n",
"try:\n",
" redis_client.ft(INDEX_NAME).info()\n",
" print(\"Index already exists\")\n",
"except Exception as e:\n",
" print(e)\n",
" # Create RediSearch Index\n",
" print('Not there yet. Creating')\n",
" redis_client.ft(INDEX_NAME).create_index(\n",
" fields = fields,\n",
" definition = IndexDefinition(prefix=[PREFIX], index_type=IndexType.HASH)\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "f74ebeb5",
"metadata": {},
"source": [
"### Ingestion\n",
"\n",
"We'll load up our PDFs and do the following\n",
"- Initiate our tokenizer\n",
"- Run a processing pipeline to:\n",
" - Mine the text from each PDF\n",
" - Split them into chunks and embed them\n",
" - Store them in Redis"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "ed23bf9d",
"metadata": {},
"outputs": [],
"source": [
"# The transformers.py file contains all of the transforming functions, including ones to chunk, embed and load data\n",
"# For more details, check the file and work through each function individually\n",
"from transformers import handle_file_string"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "31f299f6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"./data/FIA Practice Directions - Competitor's Staff Registration System.pdf\n",
"./data/fia_2022_formula_1_sporting_regulations_-_issue_9_-_2022-10-19_0.pdf\n",
"./data/fia_2023_formula_1_technical_regulations_-_issue_4_-_2022-12-07.pdf\n",
"./data/fia_f1_power_unit_financial_regulations_issue_1_-_2022-08-16.pdf\n",
"./data/fia_formula_1_financial_regulations_iss.13.pdf\n",
"CPU times: user 3.16 s, sys: 372 ms, total: 3.54 s\n",
"Wall time: 45.8 s\n"
]
}
],
"source": [
"%%time\n",
"# This step takes about 5 minutes\n",
"\n",
"# Initialise tokenizer\n",
"tokenizer = tiktoken.get_encoding(\"cl100k_base\")\n",
"\n",
"# Process each PDF file and prepare for embedding\n",
"for pdf_file in pdf_files:\n",
" \n",
" pdf_path = os.path.join(data_dir,pdf_file)\n",
" print(pdf_path)\n",
" \n",
" # Extract the raw text from each PDF using textract\n",
" text = textract.process(pdf_path, method='pdfminer')\n",
" \n",
" # Chunk each document, embed the contents and load to Redis\n",
" handle_file_string((pdf_file,text.decode(\"utf-8\")),tokenizer,redis_client,VECTOR_FIELD_NAME,INDEX_NAME)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "22aff597",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'829'"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Check that our docs have been inserted\n",
"redis_client.ft(INDEX_NAME).info()['num_docs']"
]
},
{
"cell_type": "markdown",
"id": "6b12cb6e",
"metadata": {},
"source": [
"## Make it a product\n",
"\n",
"Now we can test that our search works as intended by:\n",
"- Querying our data in Redis using semantic search and verifying results\n",
"- Adding a step to pass the results to GPT-3 for summarisation"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "e921ac96",
"metadata": {},
"outputs": [],
"source": [
"from database import get_redis_results"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "cb9dfacf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 5.85 ms, sys: 2.61 ms, total: 8.45 ms\n",
"Wall time: 240 ms\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>result</th>\n",
" <th>certainty</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>The IT will, therefore, be competent to establish the existence, or not, of a breach of the FIA regulations and to impose any sanction upon the person and Competitor concerned (see the process governed by the FIA Judicial and Disciplinary Rules). The President of the FIA, in its capacity as prosecuting authority, will ask, in respect of every disciplinary procedure: - - for the imposition of a suspension upon Competitors Staff Certificate of Registration holders who have contravened the FIA Code of Good Standing or the withdrawal of the Competitors Staff Certificate of Registration (any withdrawal can only be imposed for the remaining period of the current season of the FIA Formula One World Championship) and that these same people not be fined. The person and/or Competitor sanctioned may bring an appeal before the ICA against the ITs decision. ********* The FIA will inform the relevant Competitor of any proceedings instigated against any member of its staff. It is the responsibility to the relevant Competitor to send the IT a written request to be heard, and if granted, it shall be permitted to submit written observations. The FIA undertakes to support before the IT and/or the ICA any request from the Competitor to intervene as a third party within the framework of a disciplinary procedure. The right to deprive any duly registered member of a Competitors staff of access to the Reserved Areas at events forming part of the FIA Formula One World Championship is subject to the procedure set forth in the FIA Judicial and Disciplinary Rules. The Stewards during the course of an Event or otherwise will have no authority to suspend or withdraw a Competitors Staff Certificate of Registration for any breach or alleged breach of the FIA Code of Good Standing.</td>\n",
" <td>0.205749571323</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>The following sets out examples of the type of behaviours which might constitute an infringement of the FIA Code of Good Standing (non-exhaustive list of examples) in relation to a person who is subject to the Code of Good Standing: - - - - giving instructions to a driver or other member of a Competitors staff with the intention or with the likely result of causing an accident, collision or crash or a race to be stopped or suspended any action which is likely to endanger or materially compromise the safety of any driver, other members of the Competitors staff, other participants in a race, Officials or any spectators or other members of the public who attend an event giving instructions to make any changes to a car in breach of any safety requirements or regulations giving instructions to tamper with or adversely affect the set-up or performance of the car of any other Competitor 4 / 5 \f",
"FIA Legal Department Practice Directions - Competitors Staff Registration System 17 March 2011 - - giving instructions to a driver or otherwise taking any action by which the result or course of a race may be influenced or affected for the purpose of profiting or assisting someone to profit through betting on the outcome of a race or any part of a race or being convicted of a criminal offence (other than a driving offence) which carries a maximum prison sentence of five years. VII. AMENDMENTS TO THE COMPETITORS STAFF REGISTRATION SYSTEM The FIA will not make any amendments with regard to the Competitors Staff Registration System, either to the International Sporting Code or to the Practice Directions, prior consultation with the Competitors entered in the FIA Formula One World Championship and adequate opportunity to provide input on the proposed amendments.</td>\n",
" <td>0.206525266171</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id \\\n",
"0 0 \n",
"1 1 \n",
"\n",
" result \\\n",
"0 The IT will, therefore, be competent to establish the existence, or not, of a breach of the FIA regulations and to impose any sanction upon the person and Competitor concerned (see the process governed by the FIA Judicial and Disciplinary Rules). The President of the FIA, in its capacity as prosecuting authority, will ask, in respect of every disciplinary procedure: - - for the imposition of a suspension upon Competitors Staff Certificate of Registration holders who have contravened the FIA Code of Good Standing or the withdrawal of the Competitors Staff Certificate of Registration (any withdrawal can only be imposed for the remaining period of the current season of the FIA Formula One World Championship) and that these same people not be fined. The person and/or Competitor sanctioned may bring an appeal before the ICA against the ITs decision. ********* The FIA will inform the relevant Competitor of any proceedings instigated against any member of its staff. It is the responsibility to the relevant Competitor to send the IT a written request to be heard, and if granted, it shall be permitted to submit written observations. The FIA undertakes to support before the IT and/or the ICA any request from the Competitor to intervene as a third party within the framework of a disciplinary procedure. The right to deprive any duly registered member of a Competitors staff of access to the Reserved Areas at events forming part of the FIA Formula One World Championship is subject to the procedure set forth in the FIA Judicial and Disciplinary Rules. The Stewards during the course of an Event or otherwise will have no authority to suspend or withdraw a Competitors Staff Certificate of Registration for any breach or alleged breach of the FIA Code of Good Standing. \n",
"1 The following sets out examples of the type of behaviours which might constitute an infringement of the FIA Code of Good Standing (non-exhaustive list of examples) in relation to a person who is subject to the Code of Good Standing: - - - - giving instructions to a driver or other member of a Competitors staff with the intention or with the likely result of causing an accident, collision or crash or a race to be stopped or suspended any action which is likely to endanger or materially compromise the safety of any driver, other members of the Competitors staff, other participants in a race, Officials or any spectators or other members of the public who attend an event giving instructions to make any changes to a car in breach of any safety requirements or regulations giving instructions to tamper with or adversely affect the set-up or performance of the car of any other Competitor 4 / 5 \n",
"FIA Legal Department Practice Directions - Competitors Staff Registration System 17 March 2011 - - giving instructions to a driver or otherwise taking any action by which the result or course of a race may be influenced or affected for the purpose of profiting or assisting someone to profit through betting on the outcome of a race or any part of a race or being convicted of a criminal offence (other than a driving offence) which carries a maximum prison sentence of five years. VII. AMENDMENTS TO THE COMPETITORS STAFF REGISTRATION SYSTEM The FIA will not make any amendments with regard to the Competitors Staff Registration System, either to the International Sporting Code or to the Practice Directions, prior consultation with the Competitors entered in the FIA Formula One World Championship and adequate opportunity to provide input on the proposed amendments. \n",
"\n",
" certainty \n",
"0 0.205749571323 \n",
"1 0.206525266171 "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"f1_query='what are the criteria for disqualification'\n",
"\n",
"result_df = get_redis_results(redis_client,f1_query,index_name=INDEX_NAME)\n",
"result_df.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "51340903",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"- Breach of FIA regulations can lead to disqualification\n",
"- Imposed sanctions may include suspension of Competitor's Staff Certificate of Registration\n",
"- Competitor's Staff Certificate of Registration may also be withdrawn for remaining duration of the season\n",
"- Competitor has right to appeal the IT's decision\n",
"- Competitor must make written request to be heard\n",
"- FIA will support Competitor's request\n",
"- Stewards have no authority to suspend or withdraw Competitor's Staff Certificate of Registration for any breach or alleged breach of the FIA Code of Good Standing\n"
]
}
],
"source": [
"# Build a prompt to provide the original query, the result and ask to summarise for the user\n",
"summary_prompt = '''Summarise this result in a bulleted list to answer the search query a customer has sent.\n",
"Search query: SEARCH_QUERY_HERE\n",
"Search result: SEARCH_RESULT_HERE\n",
"Summary:\n",
"'''\n",
"summary_prepped = summary_prompt.replace('SEARCH_QUERY_HERE',f1_query).replace('SEARCH_RESULT_HERE',result_df['result'][0])\n",
"summary = openai.Completion.create(engine=COMPLETIONS_MODEL,prompt=summary_prepped,max_tokens=500)\n",
"# Response provided by GPT-3\n",
"print(summary['choices'][0]['text'])"
]
},
{
"cell_type": "markdown",
"id": "d008ff23",
"metadata": {},
"source": [
"### Search\n",
"\n",
"Now that we've got our knowledge embedded and stored in Redis, we can now create an internal search application. Its not sophisticated but it'll get the job done for us.\n",
"\n",
"In the directory containing this app, execute ```streamlit run search.py```. This will open up a Streamlit app in your browser where you can ask questions of your embedded data.\n",
"\n",
"__Example Questions__:\n",
"- what is the cost cap for a power unit in 2023\n",
"- what should competitors include on their application form"
]
},
{
"cell_type": "markdown",
"id": "dd12b31e",
"metadata": {},
"source": [
"## Build your moat\n",
"\n",
"The Q&A was useful, but fairly limited in the complexity of interaction we can have - if the user asks a sub-optimal question, there is no assistance from the system to prompt them for more info or conversation to lead them down the right path.\n",
"\n",
"For the next step we'll make a Chatbot using the Chat Completions endpoint, which will:\n",
"- Be given instructions on how it should act and what the goals of its users are\n",
"- Be supplied some required information that it needs to collect\n",
"- Go back and forth with the customer until it has populated that information\n",
"- Say a trigger word that will kick off semantic search and summarisation of the response\n",
"\n",
"For more details on our Chat Completions endpoint and how to interact with it, please check out the docs [here](https://platform.openai.com/docs/guides/chat)."
]
},
{
"cell_type": "markdown",
"id": "34135886",
"metadata": {},
"source": [
"### Framework\n",
"\n",
"This section outlines a basic framework for working with the API and storing context of previous conversation \"turns\". Once this is established, we'll extend it to use our retrieval endpoint."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "45c0acc8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"assistant: \n",
"\n",
"As an AI language model developed by OpenAI, I can help you in several ways. Here are some examples:\n",
"\n",
"\n",
"- Answer questions: You can ask me any questions you have, and I'll do my best to provide accurate and helpful answers. I can help you with anything from math problems to historical facts to recommendations for restaurants or movies.\n",
"\n",
"- Assist in writing: Whether you need help with grammar, sentence structure, or word choice, I can assist you in writing. I can also help you generate ideas for writing assignments and provide writing prompts for inspiration.\n",
"\n",
"- Provide information: If you need information on a specific topic or want to learn more about a particular subject, I can provide you with useful information and resources.\n",
"\n",
"- Give suggestions: If you're unsure about something or need advice, I can give you suggestions and recommendations based on my knowledge and experience.\n",
"\n",
"- Chat and entertain: If you're feeling bored or lonely, I can chat with you and try to entertain you with jokes, stories, or interesting facts.\n"
]
}
],
"source": [
"# A basic example of how to interact with our ChatCompletion endpoint\n",
"# It requires a list of \"messages\", consisting of a \"role\" (one of system, user or assistant) and \"content\"\n",
"question = 'How can you help me'\n",
"\n",
"\n",
"completion = openai.ChatCompletion.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": question}\n",
" ]\n",
")\n",
"print(f\"{completion['choices'][0]['message']['role']}: {completion['choices'][0]['message']['content']}\")"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "23e4fc55",
"metadata": {},
"outputs": [],
"source": [
"from termcolor import colored\n",
"\n",
"# A basic class to create a message as a dict for chat\n",
"class Message:\n",
" \n",
" \n",
" def __init__(self,role,content):\n",
" \n",
" self.role = role\n",
" self.content = content\n",
" \n",
" def message(self):\n",
" \n",
" return {\"role\": self.role,\"content\": self.content}\n",
" \n",
"# Our assistant class we'll use to converse with the bot\n",
"class Assistant:\n",
" \n",
" def __init__(self):\n",
" self.conversation_history = []\n",
"\n",
" def _get_assistant_response(self, prompt):\n",
" \n",
" try:\n",
" completion = openai.ChatCompletion.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=prompt\n",
" )\n",
" \n",
" response_message = Message(completion['choices'][0]['message']['role'],completion['choices'][0]['message']['content'])\n",
" return response_message.message()\n",
" \n",
" except Exception as e:\n",
" \n",
" return f'Request failed with exception {e}'\n",
"\n",
" def ask_assistant(self, next_user_prompt, colorize_assistant_replies=True):\n",
" [self.conversation_history.append(x) for x in next_user_prompt]\n",
" assistant_response = self._get_assistant_response(self.conversation_history)\n",
" self.conversation_history.append(assistant_response)\n",
" return assistant_response\n",
" \n",
" \n",
" def pretty_print_conversation_history(self, colorize_assistant_replies=True):\n",
" for entry in self.conversation_history:\n",
" if entry['role'] == 'system':\n",
" pass\n",
" else:\n",
" prefix = entry['role']\n",
" content = entry['content']\n",
" output = colored(prefix +':\\n' + content, 'green') if colorize_assistant_replies and entry['role'] == 'assistant' else prefix +':\\n' + content\n",
" print(output)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "e18c88b4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'role': 'system',\n",
" 'content': 'You are a helpful business assistant who has innovative ideas'},\n",
" {'role': 'user', 'content': 'What can you do to help me'}]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Initiate our Assistant class\n",
"conversation = Assistant()\n",
"\n",
"# Create a list to hold our messages and insert both a system message to guide behaviour and our first user question\n",
"messages = []\n",
"system_message = Message('system','You are a helpful business assistant who has innovative ideas')\n",
"user_message = Message('user','What can you do to help me')\n",
"messages.append(system_message.message())\n",
"messages.append(user_message.message())\n",
"messages"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "377243c9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"As a business assistant with innovative ideas, I can:\n",
"\n",
"1. Conduct market research: I can carry out thorough research on your target market, analyze consumer behavior, and gather insights to help you make informed decisions about your business.\n",
"\n",
"2. Develop a strategic plan: I can help you come up with a comprehensive business plan that includes a clear vision, goals, and actionable steps to help you achieve your business objectives.\n",
"\n",
"3. Create a strong online presence: I can help you create a compelling website, develop a social media strategy, and create engaging online content to help you connect with your customers.\n",
"\n",
"4. Improve customer service: I can help you identify areas where your customer service needs improvement and help you develop strategies for how to provide excellent service to your customers.\n",
"\n",
"5. Optimize your operations: I can help you streamline your business processes, identify inefficiencies, and create systems that will improve productivity, reduce costs, and increase profitability.\n",
"\n",
"6. Explore new revenue streams: I can help you identify new opportunities for revenue generation, such as introducing new products or services, or expanding into new markets.\n",
"\n",
"7. Develop innovative marketing: I can help you come up with innovative marketing strategies that will help you stand out from competitors and resonate with your target audience.\n"
]
}
],
"source": [
"# Get back a response from the Chatbot to our question\n",
"response_message = conversation.ask_assistant(messages)\n",
"print(response_message['content'])"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "f364c3b5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Developing a strategic plan is one of the most critical aspects of running a successful business. A strategic plan outlines a clear vision for your business and provides a roadmap for achieving your goals. As a business assistant, I can help you develop a comprehensive strategic plan that includes:\n",
"\n",
"1. A clear and concise vision statement that defines what you want to achieve with your business.\n",
"\n",
"2. An analysis of your competition and the market environment to identify opportunities and threats.\n",
"\n",
"3. A SWOT analysis to identify your business's strengths, weaknesses, opportunities, and threats.\n",
"\n",
"4. A clear set of objectives and goals that are specific, measurable, attainable, relevant, and time-bound.\n",
"\n",
"5. An action plan that outlines the steps you need to take to achieve your goals.\n",
"\n",
"6. A timeline that details when each objective will be completed.\n",
"\n",
"7. A monitoring and evaluation plan to keep track of your progress and identify areas that need improvement.\n",
"\n",
"By working together, we can develop a strategic plan that aligns with your long-term goals and helps you grow your business.\n"
]
}
],
"source": [
"next_question = 'Tell me more about option 2'\n",
"\n",
"# Initiate a fresh messages list and insert our next question\n",
"messages = []\n",
"user_message = Message('user',next_question)\n",
"messages.append(user_message.message())\n",
"response_message = conversation.ask_assistant(messages)\n",
"print(response_message['content'])"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "f62842a1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"user:\n",
"What can you do to help me\n",
"\u001b[32massistant:\n",
"As a business assistant with innovative ideas, I can:\n",
"\n",
"1. Conduct market research: I can carry out thorough research on your target market, analyze consumer behavior, and gather insights to help you make informed decisions about your business.\n",
"\n",
"2. Develop a strategic plan: I can help you come up with a comprehensive business plan that includes a clear vision, goals, and actionable steps to help you achieve your business objectives.\n",
"\n",
"3. Create a strong online presence: I can help you create a compelling website, develop a social media strategy, and create engaging online content to help you connect with your customers.\n",
"\n",
"4. Improve customer service: I can help you identify areas where your customer service needs improvement and help you develop strategies for how to provide excellent service to your customers.\n",
"\n",
"5. Optimize your operations: I can help you streamline your business processes, identify inefficiencies, and create systems that will improve productivity, reduce costs, and increase profitability.\n",
"\n",
"6. Explore new revenue streams: I can help you identify new opportunities for revenue generation, such as introducing new products or services, or expanding into new markets.\n",
"\n",
"7. Develop innovative marketing: I can help you come up with innovative marketing strategies that will help you stand out from competitors and resonate with your target audience.\u001b[0m\n",
"user:\n",
"Tell me more about option 2\n",
"\u001b[32massistant:\n",
"Developing a strategic plan is one of the most critical aspects of running a successful business. A strategic plan outlines a clear vision for your business and provides a roadmap for achieving your goals. As a business assistant, I can help you develop a comprehensive strategic plan that includes:\n",
"\n",
"1. A clear and concise vision statement that defines what you want to achieve with your business.\n",
"\n",
"2. An analysis of your competition and the market environment to identify opportunities and threats.\n",
"\n",
"3. A SWOT analysis to identify your business's strengths, weaknesses, opportunities, and threats.\n",
"\n",
"4. A clear set of objectives and goals that are specific, measurable, attainable, relevant, and time-bound.\n",
"\n",
"5. An action plan that outlines the steps you need to take to achieve your goals.\n",
"\n",
"6. A timeline that details when each objective will be completed.\n",
"\n",
"7. A monitoring and evaluation plan to keep track of your progress and identify areas that need improvement.\n",
"\n",
"By working together, we can develop a strategic plan that aligns with your long-term goals and helps you grow your business.\u001b[0m\n"
]
}
],
"source": [
"# Print out a log of our conversation so far\n",
"\n",
"conversation.pretty_print_conversation_history()"
]
},
{
"cell_type": "markdown",
"id": "f18d5b54",
"metadata": {},
"source": [
"### Knowledge retrieval\n",
"\n",
"Now we'll extend the class to call a downstream service when a stop sequence is spoken by the Chatbot.\n",
"\n",
"The main changes are:\n",
"- The system message is more comprehensive, giving criteria for the Chatbot to advance the conversation\n",
"- Adding an explicit stop sequence for it to use when it has the info it needs\n",
"- Extending the class with a function ```_get_search_results``` which sources Redis results"
]
},
{
"cell_type": "code",
"execution_count": 76,
"id": "8a0cef87",
"metadata": {},
"outputs": [],
"source": [
"# Updated system prompt requiring Question and Year to be extracted from the user\n",
"system_prompt = '''\n",
"You are a helpful Formula 1 knowledge base assistant. You need to capture a Question and Year from each customer.\n",
"The Question is their query on Formula 1, and the Year is the year of the applicable Formula 1 season.\n",
"If they haven't provided the Year, ask them for it again.\n",
"Once you have the Year, say \"searching for answers\".\n",
"\n",
"Example 1:\n",
"\n",
"User: I'd like to know the cost cap for a power unit\n",
"\n",
"Assistant: Certainly, what year would you like this for?\n",
"\n",
"User: 2023 please.\n",
"\n",
"Assistant: Searching for answers.\n",
"'''\n",
"\n",
"# New Assistant class to add a vector database call to its responses\n",
"class RetrievalAssistant:\n",
" \n",
" def __init__(self):\n",
" self.conversation_history = [] \n",
"\n",
" def _get_assistant_response(self, prompt):\n",
" \n",
" try:\n",
" completion = openai.ChatCompletion.create(\n",
" model=CHAT_MODEL,\n",
" messages=prompt,\n",
" temperature=0.1\n",
" )\n",
" \n",
" response_message = Message(completion['choices'][0]['message']['role'],completion['choices'][0]['message']['content'])\n",
" return response_message.message()\n",
" \n",
" except Exception as e:\n",
" \n",
" return f'Request failed with exception {e}'\n",
" \n",
" # The function to retrieve Redis search results\n",
" def _get_search_results(self,prompt):\n",
" latest_question = prompt\n",
" search_content = get_redis_results(redis_client,latest_question,INDEX_NAME)['result'][0]\n",
" return search_content\n",
" \n",
"\n",
" def ask_assistant(self, next_user_prompt):\n",
" [self.conversation_history.append(x) for x in next_user_prompt]\n",
" assistant_response = self._get_assistant_response(self.conversation_history)\n",
" \n",
" # Answer normally unless the trigger sequence is used \"searching_for_answers\"\n",
" if 'searching for answers' in assistant_response['content'].lower():\n",
" question_extract = openai.Completion.create(model=COMPLETIONS_MODEL,prompt=f\"Extract the user's latest question and the year for that question from this conversation: {self.conversation_history}. Extract it as a sentence stating the Question and Year\")\n",
" search_result = self._get_search_results(question_extract['choices'][0]['text'])\n",
" \n",
" # We insert an extra system prompt here to give fresh context to the Chatbot on how to use the Redis results\n",
" # In this instance we add it to the conversation history, but in production it may be better to hide\n",
" self.conversation_history.insert(-1,{\"role\": 'system',\"content\": f\"Answer the user's question using this content: {search_result}. If you cannot answer the question, say 'Sorry, I don't know the answer to this one'\"})\n",
" #[self.conversation_history.append(x) for x in next_user_prompt]\n",
" \n",
" assistant_response = self._get_assistant_response(self.conversation_history)\n",
" print(next_user_prompt)\n",
" print(assistant_response)\n",
" self.conversation_history.append(assistant_response)\n",
" return assistant_response\n",
" else:\n",
" self.conversation_history.append(assistant_response)\n",
" return assistant_response\n",
" \n",
" \n",
" def pretty_print_conversation_history(self, colorize_assistant_replies=True):\n",
" for entry in self.conversation_history:\n",
" if entry['role'] == 'system':\n",
" pass\n",
" else:\n",
" prefix = entry['role']\n",
" content = entry['content']\n",
" output = colored(prefix +':\\n' + content, 'green') if colorize_assistant_replies and entry['role'] == 'assistant' else prefix +':\\n' + content\n",
" #prefix = entry['role']\n",
" print(output)"
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "101d502c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'role': 'assistant',\n",
" 'content': 'Sure, what year would you like this information for?'}"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation = RetrievalAssistant()\n",
"messages = []\n",
"system_message = Message('system',system_prompt)\n",
"user_message = Message('user','How can a competitor be disqualified from competition')\n",
"messages.append(system_message.message())\n",
"messages.append(user_message.message())\n",
"response_message = conversation.ask_assistant(messages)\n",
"response_message"
]
},
{
"cell_type": "code",
"execution_count": 78,
"id": "702eb4fc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{'role': 'user', 'content': 'For 2023 please.'}]\n",
"{'role': 'assistant', 'content': 'According to the FIA Sporting Regulations for the 2023 Formula One season, a competitor can be disqualified from the competition if they breach the FIA regulations. The FIA will investigate and establish the existence of the breach and impose any sanction upon the person and competitor concerned. The President of the FIA, in its capacity as prosecuting authority, will ask for the imposition of a suspension upon Competitors Staff Certificate of Registration holders who have contravened the FIA Code of Good Standing or the withdrawal of the Competitors Staff Certificate of Registration. The person and/or competitor sanctioned may bring an appeal before the ICA against the ITs decision.'}\n"
]
}
],
"source": [
"messages = []\n",
"user_message = Message('user','For 2023 please.')\n",
"messages.append(user_message.message())\n",
"response_message = conversation.ask_assistant(messages)\n",
"#response_message"
]
},
{
"cell_type": "code",
"execution_count": 79,
"id": "e2f2c812",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"user:\n",
"How can a competitor be disqualified from competition\n",
"\u001b[32massistant:\n",
"Sure, what year would you like this information for?\u001b[0m\n",
"user:\n",
"For 2023 please.\n",
"\u001b[32massistant:\n",
"According to the FIA Sporting Regulations for the 2023 Formula One season, a competitor can be disqualified from the competition if they breach the FIA regulations. The FIA will investigate and establish the existence of the breach and impose any sanction upon the person and competitor concerned. The President of the FIA, in its capacity as prosecuting authority, will ask for the imposition of a suspension upon Competitors Staff Certificate of Registration holders who have contravened the FIA Code of Good Standing or the withdrawal of the Competitors Staff Certificate of Registration. The person and/or competitor sanctioned may bring an appeal before the ICA against the ITs decision.\u001b[0m\n"
]
}
],
"source": [
"conversation.pretty_print_conversation_history()"
]
},
{
"cell_type": "markdown",
"id": "a9f9ef37",
"metadata": {},
"source": [
"### Chatbot\n",
"\n",
"Now we'll put all this into action with a real (basic) Chatbot.\n",
"\n",
"In the directory containing this app, execute ```streamlit run chat.py```. This will open up a Streamlit app in your browser where you can ask questions of your embedded data. \n",
"\n",
"__Example Questions__:\n",
"- what is the cost cap for a power unit in 2023\n",
"- what should competitors include on their application form\n",
"- how can a competitor be disqualified"
]
},
{
"cell_type": "markdown",
"id": "b8e6c4ca",
"metadata": {},
"source": [
"### Consolidation\n",
"\n",
"Over the course of this notebook you have:\n",
"- Laid the foundations of your product by embedding our knowledge base\n",
"- Created a Q&A application to serve basic use cases\n",
"- Extended this to be an interactive Chatbot\n",
"\n",
"These are the foundational building blocks of any Q&A or Chat application using our APIs - these are your starting point, and we look forward to seeing what you build with them!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "oaibook",
"language": "python",
"name": "oaibook"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}