"This notebook showcases an agent designed to interact with a `SQL` databases. \n",
"The agent builds off of [SQLDatabaseChain](https://python.langchain.com/docs/use_cases/tabular/sqlite) and is designed to answer more general questions about a database, as well as recover from errors.\n",
"\n",
"Note that, as this agent is in active development, all answers might not be correct. Additionally, it is not guaranteed that the agent won't perform DML statements on your database given certain questions. Be careful running it on sensitive data!\n",
"The agent builds from [SQLDatabaseChain](https://python.langchain.com/docs/use_cases/tabular/sqlite). \n",
"\n",
"This uses the example `Chinook` database. To set it up follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the .db file in a notebooks folder at the root of this repository."
"It is designed to answer more general questions about a database, as well as recover from errors.\n",
"\n",
"Note that, as this agent is in active development, all answers might not be correct. \n",
"\n",
"Additionally, it is not guaranteed that the agent won't perform DML statements on your database given certain questions. \n",
"\n",
"Be careful running it on sensitive data!|\n",
"\n",
"This uses the example `Chinook` database. \n",
"\n",
"To set it up follow [these instructions](https://database.guide/2-sample-databases-sqlite/) and place the .db file in a notebooks folder at the root of this repository."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "39ee5811-7f7d-47a9-941a-83dcf4b74239",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mWARNING: There was an error checking the latest version of pip.\u001b[0m\u001b[33m\n",
"\u001b[0mNote: you may need to restart the kernel to use updated packages.\n"
"1\tLuís\tGonçalves\tEmbraer - Empresa Brasileira de Aeronáutica S.A.\tAv. Brigadeiro Faria Lima, 2170\tSão José dos Campos\tSP\tBrazil\t12227-000\t+55 (12) 3923-5555\t+55 (12) 3923-5566\tluisg@embraer.com.br\t3\n",
"Invoking: `sql_db_query` with `SELECT c.Country, SUM(i.Total) AS TotalSales FROM Invoice i JOIN Customer c ON i.CustomerId = c.CustomerId GROUP BY c.Country ORDER BY TotalSales DESC LIMIT 10;`\n",
"responded: To list the total sales per country, I can query the \"Invoice\" and \"Customer\" tables. I will join these tables on the \"CustomerId\" column and group the results by the \"BillingCountry\" column. Then, I will calculate the sum of the \"Total\" column to get the total sales per country. Finally, I will order the results in descending order of the total sales.\n",
"\n",
"Here is the SQL query:\n",
"\n",
"```sql\n",
"SELECT c.Country, SUM(i.Total) AS TotalSales\n",
"FROM Invoice i\n",
"JOIN Customer c ON i.CustomerId = c.CustomerId\n",
"GROUP BY c.Country\n",
"ORDER BY TotalSales DESC\n",
"LIMIT 10;\n",
"```\n",
"\n",
"Now, I will execute this query to get the total sales per country.\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m[('USA', 523.0600000000003), ('Canada', 303.9599999999999), ('France', 195.09999999999994), ('Brazil', 190.09999999999997), ('Germany', 156.48), ('United Kingdom', 112.85999999999999), ('Czech Republic', 90.24000000000001), ('Portugal', 77.23999999999998), ('India', 75.25999999999999), ('Chile', 46.62)]\u001b[0m\u001b[32;1m\u001b[1;3mThe total sales per country are as follows:\n",
"\n",
"1. USA: $523.06\n",
"2. Canada: $303.96\n",
"3. France: $195.10\n",
"4. Brazil: $190.10\n",
"5. Germany: $156.48\n",
"6. United Kingdom: $112.86\n",
"7. Czech Republic: $90.24\n",
"8. Portugal: $77.24\n",
"9. India: $75.26\n",
"10. Chile: $46.62\n",
"\n",
"To answer the second question, the country whose customers spent the most is the USA, with a total sales of $523.06.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'input': \"List the total sales per country. Which country's customers spent the most?\",\n",
" 'output': 'The total sales per country are as follows:\\n\\n1. USA: $523.06\\n2. Canada: $303.96\\n3. France: $195.10\\n4. Brazil: $190.10\\n5. Germany: $156.48\\n6. United Kingdom: $112.86\\n7. Czech Republic: $90.24\\n8. Portugal: $77.24\\n9. India: $75.26\\n10. Chile: $46.62\\n\\nTo answer the second question, the country whose customers spent the most is the USA, with a total sales of $523.06.'}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.invoke(\n",
" \"List the total sales per country. Which country's customers spent the most?\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ec927ac6-9b2a-4e8a-9a6e-3e429191875c",
"metadata": {
"tags": []
},
"source": [
"## Toolkit\n",
"\n",
"We can look at what runs under the hood with the `create_sql_agent` helper above.\n",
"\n",
"We can also highlight how the toolkit is used with the agent specifically."
" JOIN \"public\".\"user_permissions\" ON \"public\".\"users\".id = \"public\".\"user_permissions\".user_id\n",
" JOIN \"public\".\"projects\" ON \"public\".\"users\".id = \"public\".\"projects\".user_id\n",
" JOIN \"public\".\"events\" ON \"public\".\"projects\".id = \"public\".\"events\".project_id;\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "5a4a9455",
"metadata": {},
"source": [
"```\n",
"\n",
"For a transactional SQL database, if one of the table above contains millions of rows, the query might cause trouble to other applications using the same database.\n",
"\n",
"Most datawarehouse oriented databases support user-level quota, for limiting resource usage."
]
},
{
"cell_type": "markdown",
"id": "36ae48c7-cb08-4fef-977e-c7d4b96a464b",
"metadata": {},
"source": [
"Most datawarehouse oriented databases support user-level quota, for limiting resource usage.\n",
"\n",
"## Example: describing a table"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ff70e83d-5ad0-4fc7-bb96-27d82ac166d7",
"metadata": {
"tags": []
},
"execution_count": 8,
"id": "f740fbef-d635-4121-856d-33593ae6bfc0",
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -155,54 +329,40 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"*/\u001b[0m\u001b[32;1m\u001b[1;3mThe `PlaylistTrack` table has two columns: `PlaylistId` and `TrackId`. It is a junction table that represents the relationship between playlists and tracks. \n",
"\n",
"Here is the schema of the `PlaylistTrack` table:\n",
"*/\u001b[0m\u001b[32;1m\u001b[1;3mThe schema of the \"Playlist\" table is as follows:\n",
"\n",
"Here are three sample rows from the `PlaylistTrack` table:\n",
"- PlaylistId: INTEGER (Primary Key)\n",
"- Name: NVARCHAR(120)\n",
"\n",
"```\n",
"PlaylistId TrackId\n",
"1 3402\n",
"1 3389\n",
"1 3390\n",
"```\n",
"Here are three sample rows from the \"Playlist\" table:\n",
"\n",
"Please let me know if there is anything else I can help you with.\u001b[0m\n",
"| PlaylistId | Name |\n",
"|------------|-----------|\n",
"| 1 | Music |\n",
"| 2 | Movies |\n",
"| 3 | TV Shows |\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@ -210,16 +370,17 @@
{
"data": {
"text/plain": [
"'The `PlaylistTrack` table has two columns: `PlaylistId` and `TrackId`. It is a junction table that represents the relationship between playlists and tracks. \\n\\nHere is the schema of the `PlaylistTrack` table:\\n\\n```\\nCREATE TABLE \"PlaylistTrack\" (\\n\\t\"PlaylistId\" INTEGER NOT NULL, \\n\\t\"TrackId\" INTEGER NOT NULL, \\n\\tPRIMARY KEY (\"PlaylistId\", \"TrackId\"), \\n\\tFOREIGN KEY(\"TrackId\") REFERENCES \"Track\" (\"TrackId\"), \\n\\tFOREIGN KEY(\"PlaylistId\") REFERENCES \"Playlist\" (\"PlaylistId\")\\n)\\n```\\n\\nHere are three sample rows from the `PlaylistTrack` table:\\n\\n```\\nPlaylistId TrackId\\n1 3402\\n1 3389\\n1 3390\\n```\\n\\nPlease let me know if there is anything else I can help you with.'"
"{'input': 'Describe the schema of the playlist table',\n",
" 'output': 'The schema of the \"Playlist\" table is as follows:\\n\\n- PlaylistId: INTEGER (Primary Key)\\n- Name: NVARCHAR(120)\\n\\nHere are three sample rows from the \"Playlist\" table:\\n\\n| PlaylistId | Name |\\n|------------|-----------|\\n| 1 | Music |\\n| 2 | Movies |\\n| 3 | TV Shows |'}"
]
},
"execution_count": 4,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\"Describe the playlisttrack table\")"
"agent_executor.invoke({\"input\": \"Describe the schema of the playlist table\"})"
]
},
{
@ -229,16 +390,14 @@
"source": [
"## Example: describing a table, recovering from an error\n",
"\n",
"In this example, the agent tries to search for a table that doesn't exist, but finds the next best result"
"In this example, the agent tries to search for a table, `playlists`, that doesn't exist, but finds the next best result"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "bea76658-a65b-47e2-b294-6d52c5556246",
"metadata": {
"tags": []
},
"execution_count": 10,
"id": "d78ddb6e-f499-4e90-9604-fe22a9c3592e",
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -247,36 +406,39 @@
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: The PlaylistTrack table contains two columns, PlaylistId and TrackId, which are both integers and are used to link Playlist and Track tables.\u001b[0m\n",
"/*\n",
"3 rows from Playlist table:\n",
"PlaylistId\tName\n",
"1\tMusic\n",
"2\tMovies\n",
"3\tTV Shows\n",
"*/\u001b[0m\u001b[32;1m\u001b[1;3mThe \"Playlists\" table has the following schema:\n",
"\n",
"- PlaylistId: INTEGER (Primary Key)\n",
"- Name: NVARCHAR(120)\n",
"\n",
"Here are three sample rows from the \"Playlists\" table:\n",
"\n",
"| PlaylistId | Name |\n",
"|------------|-----------|\n",
"| 1 | Music |\n",
"| 2 | Movies |\n",
"| 3 | TV Shows |\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@ -284,16 +446,17 @@
{
"data": {
"text/plain": [
"'The PlaylistTrack table contains two columns, PlaylistId and TrackId, which are both integers and are used to link Playlist and Track tables.'"
"{'input': 'Describe the playlists table',\n",
" 'output': 'The \"Playlists\" table has the following schema:\\n\\n- PlaylistId: INTEGER (Primary Key)\\n- Name: NVARCHAR(120)\\n\\nHere are three sample rows from the \"Playlists\" table:\\n\\n| PlaylistId | Name |\\n|------------|-----------|\\n| 1 | Music |\\n| 2 | Movies |\\n| 3 | TV Shows |'}"
]
},
"execution_count": 15,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\"Describe the playlistsong table\")"
"agent_executor.invoke({\"input\": \"Describe the playlists table\"})"
]
},
{
@ -306,11 +469,9 @@
},
{
"cell_type": "code",
"execution_count": 8,
"id": "17bea710-4a23-4de0-b48e-21d57be48293",
"metadata": {
"tags": []
},
"execution_count": 11,
"id": "5457af9a-037d-4dba-9316-0abfe841f114",
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -319,13 +480,15 @@
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId\n",
"1 Luís Gonçalves Embraer - Empresa Brasileira de Aeronáutica S.A. Av. Brigadeiro Faria Lima, 2170 São José dos Campos SP Brazil 12227-000 +55 (12) 3923-5555 +55 (12) 3923-5566 luisg@embraer.com.br 3\n",
"1\tLuís\tGonçalves\tEmbraer - Empresa Brasileira de Aeronáutica S.A.\tAv. Brigadeiro Faria Lima, 2170\tSão José dos Campos\tSP\tBrazil\t12227-000\t+55 (12) 3923-5555\t+55 (12) 3923-5566\tluisg@embraer.com.br\t3\n",
"Thought:\u001b[32;1m\u001b[1;3m I should query the Invoice and Customer tables to get the total sales per country.\n",
"Action: query_sql_db\n",
"Action Input: SELECT c.Country, SUM(i.Total) AS TotalSales FROM Invoice i INNER JOIN Customer c ON i.CustomerId = c.CustomerId GROUP BY c.Country ORDER BY TotalSales DESC LIMIT 10\u001b[0m\n",
"Invoking: `sql_db_query` with `SELECT c.Country, SUM(i.Total) AS TotalSales FROM Customer c JOIN Invoice i ON c.CustomerId = i.CustomerId GROUP BY c.Country ORDER BY TotalSales DESC LIMIT 1;`\n",
"responded: Based on the schema of the `Customer` and `Invoice` tables, we can join these two tables on the `CustomerId` column to get the total sales per customer. Then, we can group the results by the `Country` column to get the total sales per country. Finally, we can sort the results in descending order of total sales and select the country with the highest total sales.\n",
"\n",
"Here is the SQL query to achieve this:\n",
"\n",
"```sql\n",
"SELECT c.Country, SUM(i.Total) AS TotalSales\n",
"FROM Customer c\n",
"JOIN Invoice i ON c.CustomerId = i.CustomerId\n",
"GROUP BY c.Country\n",
"ORDER BY TotalSales DESC\n",
"LIMIT 1;\n",
"```\n",
"\n",
"Let me execute this query to find out which country's customers spent the most.\n",
"Thought:\u001b[32;1m\u001b[1;3m I can use a SELECT statement to get the total number of tracks in each playlist.\n",
"Action: query_checker_sql_db\n",
"Action Input: SELECT Playlist.Name, COUNT(PlaylistTrack.TrackId) AS TotalTracks FROM Playlist INNER JOIN PlaylistTrack ON Playlist.PlaylistId = PlaylistTrack.PlaylistId GROUP BY Playlist.Name\u001b[0m\n",
"Observation: \u001b[31;1m\u001b[1;3m\n",
"\n",
"SELECT Playlist.Name, COUNT(PlaylistTrack.TrackId) AS TotalTracks FROM Playlist INNER JOIN PlaylistTrack ON Playlist.PlaylistId = PlaylistTrack.PlaylistId GROUP BY Playlist.Name\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m The query looks correct, I can now execute it.\n",
"Action: query_sql_db\n",
"Action Input: SELECT Playlist.Name, COUNT(PlaylistTrack.TrackId) AS TotalTracks FROM Playlist INNER JOIN PlaylistTrack ON Playlist.PlaylistId = PlaylistTrack.PlaylistId GROUP BY Playlist.Name LIMIT 10\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m[('90’s Music', 1477), ('Brazilian Music', 39), ('Classical', 75), ('Classical 101 - Deep Cuts', 25), ('Classical 101 - Next Steps', 25), ('Classical 101 - The Basics', 25), ('Grunge', 15), ('Heavy Metal Classic', 26), ('Music', 6580), ('Music Videos', 1)]\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: The total number of tracks in each playlist are: '90’s Music' (1477), 'Brazilian Music' (39), 'Classical' (75), 'Classical 101 - Deep Cuts' (25), 'Classical 101 - Next Steps' (25), 'Classical 101 - The Basics' (25), 'Grunge' (15), 'Heavy Metal Classic' (26), 'Music' (6580), 'Music Videos' (1).\u001b[0m\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m[('USA', 523.0600000000003)]\u001b[0m\u001b[32;1m\u001b[1;3mThe country whose customers spent the most is the USA, with a total sales amount of $523.06.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@ -464,17 +561,20 @@
{
"data": {
"text/plain": [
"\"The total number of tracks in each playlist are: '90’s Music' (1477), 'Brazilian Music' (39), 'Classical' (75), 'Classical 101 - Deep Cuts' (25), 'Classical 101 - Next Steps' (25), 'Classical 101 - The Basics' (25), 'Grunge' (15), 'Heavy Metal Classic' (26), 'Music' (6580), 'Music Videos' (1).\""
"{'input': \"List the total sales per country. Which country's customers spent the most?\",\n",
" 'output': 'The country whose customers spent the most is the USA, with a total sales amount of $523.06.'}"
]
},
"execution_count": 7,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\n",
" \"Show the total number of tracks in each playlist. The Playlist name should be included in the result.\"\n",
"agent_executor.invoke(\n",
" {\n",
" \"input\": \"List the total sales per country. Which country's customers spent the most?\"\n",
" }\n",
")"
]
},
@ -490,11 +590,9 @@
},
{
"cell_type": "code",
"execution_count": 16,
"id": "9fe4901e-f9e1-4022-b6bc-80e2b2d6a3a4",
"metadata": {
"tags": []
},
"execution_count": 30,
"id": "4d8d8933-c1e5-4ba9-8143-1059606c4adb",
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -503,49 +601,15 @@
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I should query the database to get the top 3 best selling artists.\n",
"Action: query_sql_db\n",
"Action Input: SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity FROM Artist INNER JOIN Track ON Artist.ArtistId = Track.ArtistId INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY TotalQuantity DESC LIMIT 3\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mError: (sqlite3.OperationalError) no such column: Track.ArtistId\n",
"[SQL: SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity FROM Artist INNER JOIN Track ON Artist.ArtistId = Track.ArtistId INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY TotalQuantity DESC LIMIT 3]\n",
"(Background on this error at: https://sqlalche.me/e/14/e3q8)\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I should double check my query before executing it.\n",
"Action: query_checker_sql_db\n",
"Action Input: SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity FROM Artist INNER JOIN Track ON Artist.ArtistId = Track.ArtistId INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY TotalQuantity DESC LIMIT 3\u001b[0m\n",
"Observation: \u001b[31;1m\u001b[1;3m\n",
"\n",
"SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity \n",
"FROM Artist \n",
"INNER JOIN Track ON Artist.ArtistId = Track.ArtistId \n",
"INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId \n",
"GROUP BY Artist.Name \n",
"ORDER BY TotalQuantity DESC \n",
"LIMIT 3;\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Action: query_sql_db\n",
"Action Input: SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity FROM Artist INNER JOIN Album ON Artist.ArtistId = Album.ArtistId INNER JOIN Track ON Album.AlbumId = Track.AlbumId INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY TotalQuantity DESC LIMIT 3\u001b[0m\n",
"Invoking: `sql_db_query` with `SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity FROM Artist JOIN Album ON Artist.ArtistId = Album.ArtistId JOIN Track ON Album.AlbumId = Track.AlbumId JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY TotalQuantity DESC LIMIT 3`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m[('Iron Maiden', 140), ('U2', 107), ('Metallica', 91)]\u001b[0m\u001b[32;1m\u001b[1;3mThe top 3 best selling artists are:\n",
"\n",
"1. Iron Maiden - 140 total quantity sold\n",
"2. U2 - 107 total quantity sold\n",
"3. Metallica - 91 total quantity sold\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@ -593,17 +660,26 @@
{
"data": {
"text/plain": [
"'The top 3 best selling artists are Iron Maiden, U2, and Metallica.'"
"{'input': 'Who are the top 3 best selling artists?',\n",
" 'output': 'The top 3 best selling artists are:\\n\\n1. Iron Maiden - 140 total quantity sold\\n2. U2 - 107 total quantity sold\\n3. Metallica - 91 total quantity sold'}"
]
},
"execution_count": 16,
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\"Who are the top 3 best selling artists?\")"
"agent_executor.invoke({\"input\": \"Who are the top 3 best selling artists?\"})"