openai-cookbook/apps/chatbot-kickstarter/README.md

2.3 KiB

Powering your products with ChatGPT and your own data

The Chatbot Kickstarter is a starter repo to get you used to building basic a basic Chatbot using the ChatGPT API and your own knowledge base. The flow you're taken through was originally presented with these slides, which may come in useful to refer to.

This repo contains one notebook and two basic Streamlit apps:

  • powering_your_products_with_chatgpt_and_your_data.ipynb: A notebook containing a step by step process of tokenising, chunking and embedding your data in a vector database, and building simple Q&A and Chatbot functionality on top.
  • search.py: A Streamlit app providing simple Q&A via a search bar to query your knowledge base.
  • chat.py: A Streamlit app providing a simple Chatbot via a search bar to query your knowledge base.

To run either version of the app, please follow the instructions in the respective README.md files in the subdirectories.

How it works

The notebook is the best place to start, and is broadly laid out as follows:

  • Lay the foundations:
    • Set up the vector database to accept vectors and data
    • Load the dataset, chunk the data up for embedding and store in the vector database
  • Make it a product:
    • Add a retrieval step where users provide queries and we return the most relevant entries
    • Summarise search results with GPT-3
    • Test out this basic Q&A app in Streamlit
  • Build your moat:
    • Create an Assistant class to manage context and interact with our bot
    • Use the Chatbot to answer questions using semantic search context
    • Test out this basic Chatbot app in Streamlit

Once you've run the notebook and tried the two Streamlit apps, you should be in a position to strip out any useful snippets and start your own Q&A or Chat application.

Limitations

  • This app uses Redis as a vector database, but there are many other options highlighted ../examples/vector_databases depending on your need.
  • This is a simple starting point - if you hit issues deploying your use case you may need to tune (non-exhaustive list):
    • The prompt and parameters for the model for it to answer accurately
    • Your search to return more relevant results
    • Your chunking/embedding approach to store the most relevant content effectively for retrieval