mirror of
https://github.com/openai/openai-cookbook
synced 2024-11-11 13:11:02 +00:00
Gladstone Snowflake Middleware GPT Action Library cookbook (#1375)
Co-authored-by: pap <pap@openai.com>
This commit is contained in:
parent
2cece5d500
commit
3691b4be78
@ -0,0 +1,687 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GPT Actions - Snowflake middleware\n",
|
||||
"## Introduction"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"This page provides an instruction & guide for developers building a GPT Action for a specific application. Before you proceed, make sure to first familiarize yourself with the following information:\n",
|
||||
"\n",
|
||||
"* [Introduction to GPT Actions](https://platform.openai.com/docs/actions)\n",
|
||||
"* [Introduction to GPT Actions Library](https://platform.openai.com/docs/actions/actions-library)\n",
|
||||
"* [Example of Buliding a GPT Action from Scratch](https://platform.openai.com/docs/actions/getting-started)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"This guide provides details on how to connect ChatGPT with a Snowflake Data Warehouse for the purposes of returning a SQL query to ChatGPT for use with [Data Analysis](https://help.openai.com/en/articles/8437071-data-analysis-with-chatgpt). The GPT requires an action that interfaces with middleware (ie Azure function) so that the action can properly format the response from Snowflake for use in the Python notebook environment. Data must be [returned as a file](https://platform.openai.com/docs/actions/sending-files/returning-files), so the middleware function should transform the SQL response into a CSV/Excel file, under 10MB in size.\n",
|
||||
"\n",
|
||||
"This document will outline the Middleware function GPT action. For setting up the middleware function itself, see [GPT Actions library (Middleware) - Azure Functions](https://cookbook.openai.com/examples/chatgpt/gpt_actions_library/gpt_middleware_azure_function). You can combine this Snowflake middleware action with an action to Snowflake Directly to enable a GPT that can form and test SQL queries prior to executing them."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Value + Example Business Use Cases"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"Existing Snowflake customers can leverage these guidelines to query data from their data warehouse and load that data into the Data Analysis Python environment for further insights. This enables ChatGPT powered analysis such as visualizing data sets, identifying patterns/anomalies, or identifying gaps for data cleansing purposes. This GPT can be used to drive business decisions from relatively small datasets, or to explore subsets of data through AI to generate hypotheses as you explore the holistic dataset in your BI tool, saving time and money, while identifying previously unseen patterns."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Application Information"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Application Key Links"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"Check out these links from Snowflake and Azure before you get started:\n",
|
||||
"\n",
|
||||
"**Snowflake Action**\n",
|
||||
"\n",
|
||||
"* Application Website: [https://app.snowflake.com/](https://app.snowflake.com/)\n",
|
||||
"* Application Python Connector Documentation: [https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-connect](https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-connect)\n",
|
||||
"\n",
|
||||
"**Azure Function**\n",
|
||||
"\n",
|
||||
"* Application Website: [https://learn.microsoft.com/en-us/azure/azure-functions/](https://learn.microsoft.com/en-us/azure/azure-functions/)\n",
|
||||
"* Application API Documentation: [https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference/](https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference/)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Application Prerequisites"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"Before you get started, make sure you go through the following steps in your application environment:\n",
|
||||
"\n",
|
||||
"* Provision a Snowflake Data Warehouse\n",
|
||||
"* Ensure that the user authenticating into Snowflake via ChatGPT has access to the database, schemas, and tables with the necessary role\n",
|
||||
"\n",
|
||||
"In addition, before creating your application in Azure Function App, you’ll need a way to handle user authentication. You’ll need to set up an OAuth App Registration in Azure Entra ID that can be linked with a Snowflake External OAuth security integration. Snowflake’s External OAuth security integrations allow external systems to issue access tokens that Snowflake can use for determining level of access. In this case, that external token provider is Azure Entra ID. Since ChatGPT will connect to Azure rather than Snowflake, the GPT user’s OAuth token will be provisioned by Azure associated with their user in Entra ID. Thus you’ll need a way to map users in Snowflake to their corresponding user in Azure. \n",
|
||||
"\n",
|
||||
"All of the necessary steps for both the Azure side and the Snowflake side are laid out below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Configure the OAuth resource in Azure Entra ID"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"We’ll set up a new App Registration, configure the necessary Snowflake Scopes in Azure that will be used, and retrieve all of the OAuth configuration parameters that will be needed in both Snowflake and ChatGPT. This section will all be in Azure so that in the next section, you’ll have the necessary info to link to this App Registration when configuring on the Snowflake side."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"1. Navigate to the [Microsoft Azure Portal](https://portal.azure.com/) and authenticate.\n",
|
||||
"2. Navigate to Azure Entra ID (formerly Active Directory).\n",
|
||||
"3. Click on **App Registrations** under **Manage**.\n",
|
||||
"4. Click on **New Registration**.\n",
|
||||
"5. Enter `Snowflake GPT OAuth Client`, or similar value as the **Name**.\n",
|
||||
"6. Verify the **Supported account types** is set to **Single Tenant**.\n",
|
||||
"7. Ignore Redirect URI for now. You will come back for this once you are configuring your GPT\n",
|
||||
"8. Click **Register**.\n",
|
||||
"9. Note down the **Directory (tenant) ID** (`TENANT_ID`) under **Essentials**. You will use this to generate your `AZURE_AD_ISSUER` and `AZURE_AD_JWS_KEY_ENDPOINT.`\n",
|
||||
" * The `AZURE_AD_ISSUER` is `https://sts.windows.net/TENANT_ID/`\n",
|
||||
" * The `AZURE_AD_JWS_KEY_ENDPOINT` is `https://login.microsoftonline.com/TENANT_ID/discovery/v2.0/keys`\n",
|
||||
"10. Click on **Endpoints** in the **Overview** interface.\n",
|
||||
"11. On the right-hand side, note the **OAuth 2.0 authorization endpoint (v2)** as the `AZURE_AD_OAUTH_AUTHORIZATION_ENDPOINT` and **OAuth 2.0 token endpoint (v2)** as the `AZURE_AD_OAUTH_TOKEN_ENDPOINT`.\n",
|
||||
" * The endpoints should be similar to `https://login.microsoftonline.com/90288a9b-97df-4c6d-b025-95713f21cef9/oauth2/v2.0/authorization` and `https://login.microsoftonline.com/90288a9b-97df-4c6d-b025-95713f21cef9/oauth2/v2.0/token`.\n",
|
||||
"12. Click on **Expose an API **under **Manage**.\n",
|
||||
"13. Click on the **Set** link next to **Application ID URI** to set the `Application ID URI`.\n",
|
||||
" * The `Application ID URI` must be unique within your organization’s directory, such as `https://your.company.com/4d2a8c2b-a5f4-4b86-93ca-294185f45f2e`. This value will be referred to as the `<SNOWFLAKE_APPLICATION_ID_URI>` in the subsequent configuration steps.\n",
|
||||
"14. To add a Snowflake Role as an OAuth scope for OAuth flows where the programmatic client acts on behalf of a user, click on **Add a scope** to add a scope representing the Snowflake role.\n",
|
||||
" * Enter the scope by having the name of the Snowflake role with the `session:scope:` prefix. For example, for the Snowflake Analyst role, enter `session:scope:analyst`.\n",
|
||||
" * Select who can consent.\n",
|
||||
" * Enter a **display name** for the scope (e.g.: Account Admin).\n",
|
||||
" * Enter a **description** for the scope (e.g.: Can administer the Snowflake account).\n",
|
||||
" * Click **Add Scope**.\n",
|
||||
" * Save the scope as `AZURE_AD_SCOPE`. It should be a concatenation of your `Application ID URI` and your `Scope name`\n",
|
||||
"15. In the **Overview** section, copy the `ClientID` from the **Application (client) ID** field. This will be known as the `OAUTH_CLIENT_ID` in the following steps.\n",
|
||||
"16. Click on **Certificates & secrets** and then **New client secret**.\n",
|
||||
"17. Add a description of the secret.\n",
|
||||
"18. Select **730 days (24 months)**. For testing purposes, select secrets that don’t expire soon.\n",
|
||||
"19. Click **Add**. Copy the secret. This will be known as the `OAUTH_CLIENT_SECRET` in the following steps.\n",
|
||||
"20. For programmatic clients that will request an Access Token on behalf of a user, configure Delegated permissions for Applications as follows.\n",
|
||||
" * Click on **API Permissions**.\n",
|
||||
" * Click on **Add Permission**.\n",
|
||||
" * Click on **My APIs**.\n",
|
||||
" * Click on the **Snowflake OAuth Resource** that you created in [Configure the OAuth resource in Azure AD](https://docs.snowflake.com/en/user-guide/oauth-azure#configure-the-oauth-resource-in-azure-ad).\n",
|
||||
" * Click on the **Delegated Permissions** box.\n",
|
||||
" * Check on the Permission related to the Scopes defined in the Application that you wish to grant to this client.\n",
|
||||
" * Click **Add Permissions**.\n",
|
||||
" * Click on the **Grant Admin Consent** button to grant the permissions to the client. Note that for testing purposes, permissions are configured this way. However, in a production environment, granting permissions in this manner is not advisable.\n",
|
||||
" * Click **Yes**.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create a security integration in Snowflake"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Once the App Registration is complete in Azure Entra ID, the next step is to link that App Registration to Snowflake via an External OAuth Security Integration. The `external_oauth_audience_list` parameter of the security integration must match the **Application ID URI** that you specified while configuring Azure Entra ID. \n",
|
||||
"\n",
|
||||
"The **Issuer** and the **JWS Keys endpoint** will also come from values collected in the previous steps. The **User Mapping Attribute** can either be set to `EMAIL_ADDRESS` or `LOGIN_NAME`, and this is how user’s Microsoft login credentials will be mapped to their user in Snowflake to ensure permissions in Snowflake are honored by the access token issued to ChatGPT. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "sql"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"CREATE OR REPLACE SECURITY INTEGRATION AZURE_OAUTH_INTEGRATION\n",
|
||||
" TYPE = EXTERNAL_OAUTH\n",
|
||||
" ENABLED = TRUE\n",
|
||||
" EXTERNAL_OAUTH_TYPE = 'AZURE'\n",
|
||||
" EXTERNAL_OAUTH_ISSUER = '<AZURE_AD_ISSUER>'\n",
|
||||
" EXTERNAL_OAUTH_JWS_KEYS_URL = '<AZURE_AD_JWS_KEY_ENDPOINT>'\n",
|
||||
" EXTERNAL_OAUTH_AUDIENCE_LIST = ('<SNOWFLAKE_APPLICATION_ID_URI>')\n",
|
||||
" EXTERNAL_OAUTH_TOKEN_USER_MAPPING_CLAIM = 'upn'\n",
|
||||
" EXTERNAL_OAUTH_SNOWFLAKE_USER_MAPPING_ATTRIBUTE = 'EMAIL_ADDRESS';"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Middleware information:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Make sure you go through the following steps in your Azure environment:\n",
|
||||
"\n",
|
||||
"* Azure Portal or VS Code with access to create Azure Function Apps and Azure Entra App Registrations\n",
|
||||
"* There is [a detailed section in this guide](#azure-function-app) related to deploying and designing the function required to wrap the response from Snowflake in order to return the query results as a CSV to ChatGPT. The Azure Function App allows your GPT to ingest larger datasets as ChatGPT can ingest more data from files responses rather than from application/json payloads. Additionally, those datasets will only be available for Data Analysis (aka Code Interpreter) with a response formatted as a CSV file."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Azure Function App"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that we have the GPT created and handled Azure/Snowflake authentication, we can create the Azure Function App itself to execute the SQL query and handle the response formatting enabling the GPT to download the result as a CSV for use with Data Analysis.\n",
|
||||
"\n",
|
||||
"Follow this [Azure Cookbook Guide](https://cookbook.openai.com/examples/azure/functions) for further details deploying an Azure Function App. Below you will find sample code to add to the function.\n",
|
||||
"\n",
|
||||
"This code is meant to be directional - while it should work out of the box, you should customize it based on the needs specific to your GPT and your IT setup."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Application Code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You’ll need to setup the following flows in your Azure Function App:\n",
|
||||
"\n",
|
||||
"* Extracting the token from the HTTP request and using it to connect to Snowflake\n",
|
||||
"* Executing the SQL query and writing the results to a CSV\n",
|
||||
"* Temporarily storing that CSV in Blob Storage*\n",
|
||||
"* Generating a pre-signed URL to access that CSV securely*\n",
|
||||
"* Responding with an openaiFileResponse\n",
|
||||
"\n",
|
||||
"*These steps may not be required if you use the [file stream](https://platform.openai.com/docs/actions/getting-started/inline-option) option instead of the [url](https://platform.openai.com/docs/actions/getting-started/url-option) option for returning files to your GPT. More on this below.\n",
|
||||
"\n",
|
||||
"Ensure you have the necessary libraries installed and imported into your script. In addition to Python standard libraries, this sample script leveraged the following:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import azure.functions as func\n",
|
||||
"from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions, ContentSettings\n",
|
||||
"import snowflake.connector\n",
|
||||
"import jwt # pyjwt for token decoding"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Connecting to Snowflake"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To connect to Snowflake, you’ll need to extract the access token assigned from Azure Entra ID from the Authorization header and use that token when connecting to the Snowflake server. \n",
|
||||
"\n",
|
||||
"In this this example, Snowflake usernames are email addresses which simplifies the mapping of the Entra ID user extracted from the HTTP access token to the Snowflake user ID needed to connect. If this is not the case for your organization, you can map email addresses to Snowflake user IDs in your Python application.\n",
|
||||
"\n",
|
||||
"My application was built to interface with a single Snowflake Account (i.e. ab12345.eastus2.azure) and Warehouse. If you need to access multiple accounts or warehouses, you may consider passing these parameters in your GPT action parameters so you can extract them from the HTTP request."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Extract the token from the Authorization header\n",
|
||||
"auth_header = req.headers.get('Authorization')\n",
|
||||
"token_type, token = auth_header.split()\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" # Extract email address from token to use for Snowflake user mapping\n",
|
||||
" # If Snowflake usernames are not emails, then identify the username accordingly\n",
|
||||
" decoded_token = jwt.decode(token, options={\"verify_signature\": False})\n",
|
||||
" email = decoded_token.get('upn') \n",
|
||||
" \n",
|
||||
" conn = snowflake.connector.connect(\n",
|
||||
" user=email, # Snowflake username, i.e., user's email in my example\n",
|
||||
" account=SNOWFLAKE_ACCOUNT, # Snowflake account, i.e., ab12345.eastus2.azure\n",
|
||||
" authenticator=\"oauth\",\n",
|
||||
" token=token,\n",
|
||||
" warehouse=SNOWFLAKE_WAREHOUSE # Replace with Snowflake warehouse\n",
|
||||
" )\n",
|
||||
" logging.info(\"Successfully connected to Snowflake.\")\n",
|
||||
"except Exception as e:\n",
|
||||
" logging.error(f\"Failed to connect to Snowflake: {e}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Execute query and save CSV"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Once you connect to Snowflake you’ll need to execute the query and store the results into a CSV. While the role in Snowflake should prevent any chance of harmful queries, you may want to sanitize your query in your application (not included below) just as you would any other programmatic SQL query execution. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Extract SQL query from request parameters or body\n",
|
||||
"sql_query = req.params.get('sql_query')\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" # Use the specified warehouse\n",
|
||||
" cursor = conn.cursor()\n",
|
||||
"\n",
|
||||
" # Execute the query\n",
|
||||
" cursor.execute(sql_query)\n",
|
||||
" results = cursor.fetchall()\n",
|
||||
" column_names = [desc[0] for desc in cursor.description]\n",
|
||||
" logger.info(f\"Query executed successfully: {sql_query}\")\n",
|
||||
"\n",
|
||||
" # Convert results to CSV\n",
|
||||
" csv_file_path = write_results_to_csv(results, column_names)\n",
|
||||
"except Exception as e:\n",
|
||||
" logger.error(f\"Error executing query or processing data: {e}\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def write_results_to_csv(results, column_names):\n",
|
||||
" try:\n",
|
||||
" # Create a temporary file\n",
|
||||
" temp_file = tempfile.NamedTemporaryFile(delete=False, mode='w', newline='')\n",
|
||||
" csv_writer = csv.writer(temp_file)\n",
|
||||
" csv_writer.writerow(column_names) # Write the column headers\n",
|
||||
" csv_writer.writerows(results) # Write the data rows\n",
|
||||
" temp_file.close() # Close the file to flush the contents\n",
|
||||
" return temp_file.name # Return file path\n",
|
||||
" except Exception as e:\n",
|
||||
" logger.error(f\"Error writing results to CSV: {e}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Storing the file in Blob Storage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are 2 methods for returning files to ChatGPT for processing. You can either [stream](https://platform.openai.com/docs/actions/getting-started/inline-option) the base64 encoded data along with the mimeType and file name in the openaiFileResponse list response, or you can return a [list of URLs](https://platform.openai.com/docs/actions/getting-started/url-option). In this solution we’ll focus on the latter.\n",
|
||||
"\n",
|
||||
"To do this, you’ll need to upload the CSV to Azure Blob Storage and return a pre-signed URL for accessing that file securely in ChatGPT. It is important to note that in order to download a URL in ChatGPT, you’ll need to ensure that URL includes a content_type and content_disposition, as in the below example. If you’d like to inspect whether a URL has the necessary headers, you can use ``curl -I <url>`` from any terminal.\n",
|
||||
"\n",
|
||||
"You’ll need to get a connection String for your Azure storage bucket, as per instructions [here](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string). "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def upload_csv_to_azure(file_path, container_name, blob_name, connect_str):\n",
|
||||
" try:\n",
|
||||
" # Create the BlobServiceClient object which will be used to create a container client\n",
|
||||
" blob_service_client = BlobServiceClient.from_connection_string(connect_str)\n",
|
||||
" \n",
|
||||
" # Create a blob client using the local file name as the name for the blob\n",
|
||||
" blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)\n",
|
||||
"\n",
|
||||
" # Upload the file with specified content settings\n",
|
||||
" with open(file_path, \"rb\") as data:\n",
|
||||
" blob_client.upload_blob(data, overwrite=True, content_settings=ContentSettings(\n",
|
||||
" content_type='text/csv',\n",
|
||||
" content_disposition=f'attachment; filename=\"{blob_name}\"'\n",
|
||||
" ))\n",
|
||||
" logger.info(f\"Successfully uploaded {file_path} to {container_name}/{blob_name}\")\n",
|
||||
"\n",
|
||||
" # Generate a SAS token for the blob\n",
|
||||
" sas_token = generate_blob_sas(\n",
|
||||
" account_name=blob_service_client.account_name,\n",
|
||||
" container_name=container_name,\n",
|
||||
" blob_name=blob_name,\n",
|
||||
" account_key=blob_service_client.credential.account_key,\n",
|
||||
" permission=BlobSasPermissions(read=True),\n",
|
||||
" expiry=datetime.datetime.utcnow() + datetime.timedelta(hours=1) # Token valid for 1 hour\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Generate a presigned URL using the SAS token\n",
|
||||
" url = f\"https://{blob_service_client.account_name}.blob.core.windows.net/{container_name}/{blob_name}?{sas_token}\"\n",
|
||||
" logger.info(f\"Generated presigned URL: {url}\")\n",
|
||||
"\n",
|
||||
" return url\n",
|
||||
" except Exception as e:\n",
|
||||
" logger.error(f\"Error uploading file to Azure Blob Storage: {e}\")\n",
|
||||
" raise"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Format openaiFileResponse"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Lastly, you’ll need to format the response appropriately to instruct ChatGPT to process that response as a file or series of files. The openaiFileResponse is a list which can include up to 10 URLs (or base64 encodings if using the [inline option](https://platform.openai.com/docs/actions/getting-started/inline-option))."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Format the response so ChatGPT treats it as a file\n",
|
||||
"response = {\n",
|
||||
" 'openaiFileResponse': [csv_url]\n",
|
||||
"}\n",
|
||||
"cursor.close()\n",
|
||||
"conn.close()\n",
|
||||
"return func.HttpResponse(\n",
|
||||
" json.dumps(response), \n",
|
||||
" status_code=200\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"There are a lot of moving pieces to this application, so testing your Azure Function App can be important. ChatGPT can be a difficult testing grounds given that requests and responses can sometimes be more opaque than needed for debugging. Initial testing of your application through cURL or Postman to invoke the HTTP request from a more controlled environment will allow you to debug and triage issues more easily. Once you determine that responses are being returned as expected in those tools, you are ready to build your GPT."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ChatGPT Steps"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Custom GPT Instructions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Once you've created a Custom GPT, use the text below in the Instructions panel for inspiration. Have questions? Check out [Getting Started Example](https://platform.openai.com/docs/actions/getting-started) to see how this step works in more detail."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"##### Example Instructions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is important that ChatGPT understands your table schema to properly form SQL queries. There are different methods for doing so, and this Instruction set represents the most direct way. We are working to publish additional instructions for different versions of Snowflake GPTs you may want to build to allow for working with multiple different tables, schemas and databases, or to even learn dynamically for schemas that tend to change over time.\n",
|
||||
"\n",
|
||||
"Below are some basic instructions when working with a single schema and table. This GPT has been optimized for a single use case (analyzing flight data from January 2013 out of NYC) which allows for the most simple instructions to provide the most reliable GPT performance."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"You are an expert at writing SQL queries to fetch data from Snowflake. You help users convert their prompts into SQL queries. Any question around flight data will be converted into a Snowflake SQL query that hits the table `FLIGHTS.PUBLIC.JAN_2013_NYC`. Pass any query into the \"sql_query\" parameter\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"The schema of the table includes\n",
|
||||
"```\n",
|
||||
"ID\tNUMBER\tA unique identifier for each flight\n",
|
||||
"YEAR\tNUMBER\tThe year of the flight\n",
|
||||
"MONTH\tNUMBER\tThe month of the flight\n",
|
||||
"DAY\t\tNUMBER\tThe day of the month on which the flight departed\n",
|
||||
"DEP_TIME\tNUMBER\tThe actual departure time of the flight\n",
|
||||
"SCHED_DEP_TIME\tNUMBER\tThe scheduled departure time of the flight\n",
|
||||
"DEP_DELAY\tNUMBER\tThe departure delay in minutes (negative values indicate early departures)\n",
|
||||
"ARR_TIME\tNUMBER\tThe actual arrival time of the flight\n",
|
||||
"SCHED_ARR_TIME\tNUMBER\tThe scheduled arrival time of the flight\n",
|
||||
"ARR_DELAY\tNUMBER\tThe arrival delay in minutes (negative values indicate early arrivals)\n",
|
||||
"CARRIER_CODE\tTEXT\tThe carrier code of the airline\n",
|
||||
"FLIGHT\tNUMBER\tThe flight number\n",
|
||||
"TAILNUM\tTEXT\tThe aircraft tail number\n",
|
||||
"ORIGIN_AIRPORT_CODE\tTEXT\tThe origin airport code\n",
|
||||
"DEST_AIRPORT_CODE\tTEXT\tThe destination airport code\n",
|
||||
"AIR_TIME\tNUMBER\tThe total airtime of the flight in minutes\n",
|
||||
"DISTANCE\tNUMBER\tThe distance traveled by the flight in miles\n",
|
||||
"HOUR\tNUMBER\tThe hour part of the scheduled departure time\n",
|
||||
"MINUTE\tNUMBER\tThe minute part of the scheduled departure time\n",
|
||||
"TIME_HOUR\tNUMBER\tThe time at which the flight departed (rounded to the nearest hour)\n",
|
||||
"CARRIER_NAME\tTEXT\tThe full name of the airline carrier\n",
|
||||
"ORIGIN_AIRPORT_NAME\tTEXT\tThe full name of the origin airport\n",
|
||||
"ORIGIN_REGION\tTEXT\tThe region code of the origin airport\n",
|
||||
"ORIGIN_MUNICIPALITY\tTEXT\tThe city where the origin airport is located\n",
|
||||
"ORIGIN_COORDINATES\tTEXT\tThe geographical coordinates of the origin airport\n",
|
||||
"DEST_AIRPORT_NAME\tTEXT\tThe full name of the destination airport\n",
|
||||
"DEST_REGION\tTEXT\tThe region code of the destination airport\n",
|
||||
"DEST_MUNICIPALITY\tTEXT\tThe city where the destination airport is located\n",
|
||||
"DEST_COORDINATES\tTEXT\tThe geographical coordinates of the destination airport\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"When a user asks for data around flights, perform the following:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"1. Use the `executeSQL` action to send a POST request to the Azure function endpoint\n",
|
||||
"2. Receive the file that is returned as part of the Action response. Display it as a spreadsheet\n",
|
||||
"3. Perform analysis on the file and provide the necessary information that the user has asked for\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"The user will wish to ask questions about the data in code interpreter, so use that for any data analysis insights from the dataset you pulled."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### OpenAPI Schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Once you've created a Custom GPT, copy the text below in the Actions panel, replacing the placeholder values with your specific function details and updating your parameters based on any additional inputs you built into your Azure Function App. \n",
|
||||
"\n",
|
||||
"Have questions? Check out [Getting Started Example](https://platform.openai.com/docs/actions/getting-started) to see how this step works in more detail."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "yaml"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openapi: 3.1.0\n",
|
||||
"info:\n",
|
||||
" title: Snowflake GPT API\n",
|
||||
" description: API to execute SQL queries on Snowflake and get the results as a CSV file URL.\n",
|
||||
" version: 1.0.0\n",
|
||||
"servers:\n",
|
||||
" - url: https://<server-name>.azurewebsites.net\n",
|
||||
" description: Azure Function App server running Snowflake integration application\n",
|
||||
"paths:\n",
|
||||
" /api/<function_name>?code=<code>:\n",
|
||||
" post:\n",
|
||||
" operationId: executeSQL\n",
|
||||
" summary: Executes a SQL query on Snowflake and returns the result file URL as a CSV.\n",
|
||||
" requestBody:\n",
|
||||
" required: true\n",
|
||||
" content:\n",
|
||||
" application/json:\n",
|
||||
" schema:\n",
|
||||
" type: object\n",
|
||||
" properties:\n",
|
||||
" sql_query:\n",
|
||||
" type: string\n",
|
||||
" description: The SQL query to be executed on Snowflake.\n",
|
||||
" required:\n",
|
||||
" - sql_query\n",
|
||||
" responses:\n",
|
||||
" '200':\n",
|
||||
" description: Successfully executed the query.\n",
|
||||
" content:\n",
|
||||
" application/json:\n",
|
||||
" schema:\n",
|
||||
" type: object\n",
|
||||
" properties:\n",
|
||||
" openaiFileResponse:\n",
|
||||
" type: array\n",
|
||||
" items:\n",
|
||||
" type: string\n",
|
||||
" format: uri\n",
|
||||
" description: Array of URLs pointing to the result files.\n",
|
||||
" '401':\n",
|
||||
" description: Unauthorized. Missing or invalid authentication token.\n",
|
||||
" '400':\n",
|
||||
" description: Bad Request. The request was invalid or cannot be otherwise served.\n",
|
||||
" '500':\n",
|
||||
" description: Internal Server Error. An error occurred on the server.\n",
|
||||
"components:\n",
|
||||
" schemas: {} "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### FAQ & Troubleshooting"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"* Files returned to ChatGPT are limited in size to 10MB. Your request may fail if the file returned is larger than that. Ensure to include LIMITs on your SQL commands if you find you are running into these limitations.\n",
|
||||
"* _Why is the Azure Function App requred in the first place?_ ChatGPT’s Data Analysis feature (aka Code Interpreter) depends on a secure Python environment that is separate from the model’s context window. Data passed to Data Analysis must be done so by uploading a file today. GPT actions returning data must then return that data as a CSV or other data file type. In order to return a file via GPT action, the response must be wrapped in an `openaiFileResponse` object. This requires custom code to properly format the response.\n",
|
||||
"* _My company uses a different cloud provider than Azure._ For connecting other middleware functions to ChatGPT via GPT action, please refer to other [AWS](https://cookbook.openai.com/examples/chatgpt/gpt_actions_library/gpt_middleware_aws_function) or [GCP](https://cookbook.openai.com/examples/chatgpt/gpt_actions_library/gpt_middleware_google_cloud_function) middleware cookbooks. You can use the concepts discussed in this cookbook to advise on considerations when building your middleware app, but connecting that middleware to Snowflake may be different for different cloud providers. For example, Snowflake built an External OAuth integration specifically for linking with Azure Entra ID. \n",
|
||||
"* _How do I limit the datasets that my GPT has access to?_ It can be imporant to limit the scope of access ChatGPT has within Snowflake. There are a few ways to do this:\n",
|
||||
" * Snowflake roles can limit who has access to which tables, and will be respected by the GPT user’s access token provisioned by Azure Entra ID\n",
|
||||
" * In your middleware function you can add sanity checks to verify the tables accessed are approved by for that application\n",
|
||||
" * You may want to generate an entirely new Database/Warehouse specific to integrating with ChatGPT that is scrubbed of anything sensitive, such as PII.\n",
|
||||
"* _Schema calls the wrong warehouse or dataset:_ If ChatGPT calls the wrong warehouse or database, consider updating your instructions to make it more explicit either (a) which warehouse / database should be called or (b) to require the user provide those exact details before it runs the query"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"_Are there integrations that you’d like us to prioritize? Are there errors in our integrations? File a PR or issue in our github, and we’ll take a look._\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
@ -1550,3 +1550,12 @@
|
||||
tags:
|
||||
- gpt-actions-library
|
||||
- chatgpt
|
||||
|
||||
- title: GPT Actions library - Snowflake Middleware
|
||||
path: examples/chatgpt/gpt_actions_library/gpt_action_snowflake_middleware.ipynb
|
||||
date: 2024-08-14
|
||||
authors:
|
||||
- gladstone-openai
|
||||
tags:
|
||||
- gpt-actions-library
|
||||
- chatgpt
|
||||
|
Loading…
Reference in New Issue
Block a user