diff --git a/examples/third_party/Automated_Finetuning_with_MindsDB.ipynb b/examples/third_party/Automated_Finetuning_with_MindsDB.ipynb new file mode 100644 index 00000000..1a72971f --- /dev/null +++ b/examples/third_party/Automated_Finetuning_with_MindsDB.ipynb @@ -0,0 +1,362 @@ +{ + "cells": [ + { + "attachments": { + "a746bc32-a593-4599-9ee0-86b4daefdd95.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "48a102bd-9fac-4422-a966-64c562fddcc9", + "metadata": {}, + "source": [ + "![image.png](attachment:a746bc32-a593-4599-9ee0-86b4daefdd95.png)" + ] + }, + { + "cell_type": "markdown", + "id": "dc004d0f-9a2e-41a4-9902-749bf5f45465", + "metadata": {}, + "source": [ + "[MindsDB](https://github.com/mindsdb/mindsdb) is the platform for customizing AI from enterprise data. With MindsDB and its nearly 200 integrations to data sources and AI/ML frameworks, any developer can deploy, serve, and fine-tune models in real-time, and build AI-powered applications." + ] + }, + { + "cell_type": "markdown", + "id": "fd1a412b-bfa7-48c2-a92f-cdf81341e064", + "metadata": {}, + "source": [ + "[MindsDB integrates with OpenAI](https://docs.mindsdb.com/integrations/ai-engines/openai), enabling users to deploy, serve, and fine-tune OpenAI models within MindsDB, making them accessible to numerous data sources.\n", + "\n", + "In this example we are going to teach an OpenAI model, how to write MindsDB AI SQL queries" + ] + }, + { + "cell_type": "markdown", + "id": "d4f62724-f89d-418e-a3e5-6fe334940d51", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "Before proceeding, ensure the following prerequisites are met:\n", + "\n", + "- Install MindsDB locally via [Docker.](https://docs.mindsdb.com/setup/self-hosted/docker)\n", + "- Obtain the OpenAI API key required to deploy and use OpenAI models within MindsDB." + ] + }, + { + "cell_type": "markdown", + "id": "a123da53-94c8-4137-af9b-07dd99484fe0", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "MindsDB provides the [OpenAI handler](https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/openai_handler) that enables you to create OpenAI models within MindsDB.\n", + "\n", + "## AI Engine\n", + "\n", + "Before creating a model, it is required to create an AI engine based on the provided handler." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa7ec23f-37d3-4b6b-9884-410d1a399825", + "metadata": {}, + "outputs": [], + "source": [ + "CREATE ML_ENGINE openai_engine\n", + "FROM openai\n", + "USING\n", + " openai_api_key = 'your-openai-api-key';" + ] + }, + { + "cell_type": "markdown", + "id": "e055ce51-c94f-4adf-aeee-60da2f82e08a", + "metadata": {}, + "source": [ + "## AI Model" + ] + }, + { + "cell_type": "markdown", + "id": "cb50d7f1-d661-42b8-8a9a-144d3ee32d72", + "metadata": {}, + "source": [ + "Then, create a model to answer questions about MindsDB’s custom SQL syntax using this engine:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "52e5fe9e-8192-4118-ab0d-9240bff73ac1", + "metadata": {}, + "outputs": [], + "source": [ + "CREATE MODEL openai_davinci\n", + "PREDICT completion\n", + "USING\n", + " engine = 'openai_engine',\n", + " model_name = 'davinci-002',\n", + " prompt_template = 'Return a valid SQL string for the following question about MindsDB in-database machine learning: {{prompt}}';" + ] + }, + { + "cell_type": "markdown", + "id": "235057a2-45db-456f-9268-5c055e4050a6", + "metadata": {}, + "source": [ + "You can check model status with this command:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e06cb6b9-8ace-43b5-b392-465bdabef80c", + "metadata": {}, + "outputs": [], + "source": [ + "DESCRIBE openai_davinci;" + ] + }, + { + "cell_type": "markdown", + "id": "b07b55a4-9427-420d-b793-f0f85b5f0201", + "metadata": {}, + "source": [ + "Once the status is complete, we can query for predictions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13185a09-20d4-4a29-ac68-eb144a5ab67d", + "metadata": {}, + "outputs": [], + "source": [ + "SELECT prompt, completion\n", + "FROM openai_davinci as m\n", + "WHERE prompt = 'What is the SQL syntax to join input data with predictions from a MindsDB machine learning model?'\n", + "USING max_tokens=400;" + ] + }, + { + "cell_type": "markdown", + "id": "64e0d76b-1881-4f93-a152-b2651c08b2f1", + "metadata": {}, + "source": [ + "On execution, we get:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc3fc557-1aec-4f3c-81de-5fffecbe0d6e", + "metadata": {}, + "outputs": [], + "source": [ + "+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+\n", + "| prompt | completion |\n", + "+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+\n", + "| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | The SQL syntax is: SELECT * FROM input_data INNER JOIN predictions ON input_data.id = predictions.id |\n", + "+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+" + ] + }, + { + "cell_type": "markdown", + "id": "40c01e12-9f7d-4907-af7c-65d53cafcda9", + "metadata": {}, + "source": [ + "## Finetune Model" + ] + }, + { + "cell_type": "markdown", + "id": "16796956-7b36-43ce-908d-c2851c7050d1", + "metadata": {}, + "source": [ + "Now, we’ll fine-tune our model using a table that stores details about MindsDB’s custom SQL syntax.\n", + "\n", + "[Upload](https://docs.mindsdb.com/mindsdb_sql/sql/create/file) this [dataset](https://github.com/mindsdb/mindsdb/blob/staging/docs/use-cases/automated_finetuning/data.csv) as a file which you can use as a table." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bc394180-1cc0-4494-88b2-29e05a4af7d1", + "metadata": {}, + "outputs": [], + "source": [ + "FINETUNE openai_davinci\n", + "FROM files\n", + " (SELECT prompt, completion FROM openai_learninghub_ft);" + ] + }, + { + "cell_type": "markdown", + "id": "56d6adb5-8adc-4198-882f-56d2cd251922", + "metadata": {}, + "source": [ + "The `FINETUNE` command creates a new version of the openai_davinci model. You can query all available versions as below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "410063cc-9b64-462f-bc16-17b5c0b1583c", + "metadata": {}, + "outputs": [], + "source": [ + "SELECT *\n", + "FROM models_versions\n", + "WHERE name = 'openai_davinci';" + ] + }, + { + "cell_type": "markdown", + "id": "3fc43552-7dbd-44da-ba96-6ce626ceefa7", + "metadata": {}, + "source": [ + "Once the new version status is complete and active, we can query the model again, expecting a more accurate output." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c513479e-b8e0-4d6d-b3fd-a20c23c52ab7", + "metadata": {}, + "outputs": [], + "source": [ + "SELECT prompt, completion\n", + "FROM openai_davinci as m\n", + "WHERE prompt = 'What is the SQL syntax to join input data with predictions from a MindsDB machine learning model?'\n", + "USING max_tokens=400;" + ] + }, + { + "cell_type": "markdown", + "id": "3e9ece4a-09b8-440c-a9c7-c6f3401f4849", + "metadata": {}, + "source": [ + "On execution you get:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b3ebaaf-d95a-496a-8318-9ad6f1e8d150", + "metadata": {}, + "outputs": [], + "source": [ + "+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+\n", + "| prompt | completion |\n", + "+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+\n", + "| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | SELECT * FROM mindsdb.models.my_model JOIN mindsdb.input_data_name; |\n", + "+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+" + ] + }, + { + "cell_type": "markdown", + "id": "2cc7e120-6195-4b89-b673-6766febf5463", + "metadata": {}, + "source": [ + "If you have dynamic data that gets updated regularly, you can set up an automated fine-tuning as below.\n", + "\n", + "Note that the data source must contain an incremental column, such as timestamp or integer, so MindsDB can pick up only the recently added data.\n", + "\n", + "Create a view to store recently added data with the help of the `LAST` keyword:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "178e16e6-6243-4414-801f-a024fd7d3ac5", + "metadata": {}, + "outputs": [], + "source": [ + "CREATE VIEW recent_data (\n", + " SELECT *\n", + " FROM files.openai_learninghub_ft\n", + " WHERE timestamp > LAST\n", + ");" + ] + }, + { + "cell_type": "markdown", + "id": "1cd1446b-b910-47fa-984a-5b1190eeaea5", + "metadata": {}, + "source": [ + "Create a job to fine-tune the model periodically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2a2c3877-e1ee-4dcd-b0f0-50e3b51643aa", + "metadata": {}, + "outputs": [], + "source": [ + "CREATE JOB automated_finetuning (\n", + "\n", + " FINETUNE openai_davinci\n", + " FROM mindsdb\n", + " (SELECT * FROM recent_data)\n", + ")\n", + "EVERY 1 day;" + ] + }, + { + "cell_type": "markdown", + "id": "3376252e-fa19-4777-9580-8e3b59138c58", + "metadata": {}, + "source": [ + "Now your model will be fine-tuned with newly added data every day." + ] + }, + { + "cell_type": "markdown", + "id": "cbcbf589-061b-46ab-912b-f94526f36e82", + "metadata": {}, + "source": [ + "See below more Usecases:\n", + "\n", + "- [Text Summarization with MindsDB and OpenAI using SQL](https://docs.mindsdb.com/use-cases/data_enrichment/text-summarization-inside-mysql-with-openai)\n", + "- [Sentiment Analysis with MindsDB and OpenAI using SQL](https://docs.mindsdb.com/use-cases/data_enrichment/sentiment-analysis-inside-mysql-with-openai)\n", + "- [Question Answering with MindsDB and OpenAI using SQL](https://docs.mindsdb.com/use-cases/data_enrichment/question-answering-inside-mysql-with-openai)\n", + "- [Extract JSON from Text](https://docs.mindsdb.com/use-cases/data_enrichment/json-from-text)\n", + "- [Text Summarization with MindsDB and OpenAI using MQL](https://docs.mindsdb.com/use-cases/data_enrichment/text-summarization-inside-mongodb-with-openai)\n", + "- [Sentiment Analysis with MindsDB and OpenAI using MQL](https://docs.mindsdb.com/use-cases/data_enrichment/sentiment-analysis-inside-mongodb-with-openai)\n", + "- [Question Answering with MindsDB and OpenAI using MQL](https://docs.mindsdb.com/use-cases/data_enrichment/question-answering-inside-mongodb-with-openai)" + ] + }, + { + "cell_type": "markdown", + "id": "dea49252-c6d8-4f49-b0a0-a96a6440a037", + "metadata": {}, + "source": [ + "Follow MindsDB's [OpenAI documentation](https://docs.mindsdb.com/integrations/ai-engines/openai) for more information." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}