"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/sql.ipynb)\n",
"\n",
"## Use case\n",
"\n",
"Enterprise data is often stored in SQL databases.\n",
"\n",
"LLMs make it possible to interact with SQL databases using natural langugae.\n",
"\n",
"LangChain offers SQL Chains and Agents to build and run SQL queries based on natural language prompts. \n",
"\n",
"These are compatible with any SQL dialect supported by SQLAlchemy (e.g., MySQL, PostgreSQL, Oracle SQL, Databricks, SQLite).\n",
"\n",
"They enable use cases such as:\n",
"\n",
"- Generating queries that will be run based on natural language questions\n",
"- Creating chatbots that can answer questions based on database data\n",
"- Building custom dashboards based on insights a user wants to analyze\n",
"\n",
"## Overview\n",
"\n",
"LangChain provides tools to interact with SQL Databases:\n",
"\n",
"1. `Build SQL queries` based on natural language user questions\n",
"2. `Query a SQL database` using chains for query creation and execution\n",
"3. `Interact with a SQL database` using agents for robust and flexible querying \n",
"\n",
"![sql_usecase.png](/img/sql_usecase.png)\n",
"\n",
"## Quickstart\n",
"\n",
"First, get required packages and set environment variables:"
"The below example will use a SQLite connection with Chinook database. \n",
" \n",
"Follow [installation steps](https://database.guide/2-sample-databases-sqlite/) to create `Chinook.db` in the same directory as this notebook:\n",
"\n",
"* Save [this file](https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql) to the directory as `Chinook_Sqlite.sql`\n",
"* Run `sqlite3 Chinook.db`\n",
"* Run `.read Chinook_Sqlite.sql`\n",
"* Test `SELECT * FROM Artist LIMIT 10;`\n",
"\n",
"Now, `Chinhook.db` is in our directory.\n",
"\n",
"Let's create a `SQLDatabaseChain` to create and execute SQL queries."
"TEMPLATE = \"\"\"Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.\n",
"Use the following format:\n",
"\n",
"Question: \"Question here\"\n",
"SQLQuery: \"SQL Query to run\"\n",
"SQLResult: \"Result of the SQLQuery\"\n",
"Answer: \"Final answer here\"\n",
"\n",
"Only use the following tables:\n",
"\n",
"{table_info}.\n",
"\n",
"Some examples of SQL queries that corrsespond to questions are:\n",
"Answer:\u001b[32;1m\u001b[1;3mThere are 8 employees.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'There are 8 employees.'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db_chain.run(\"How many employees are there?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, we get the same result as the previous case.\n",
"\n",
"Here, the chain **also handles the query execution** and provides a final answer based on the user question and the query result.\n",
"\n",
"**Be careful** while using this approach as it is susceptible to `SQL Injection`:\n",
"\n",
"* The chain is executing queries that are created by an LLM, and weren't validated\n",
"* e.g. records may be created, modified or deleted unintentionally_\n",
"\n",
"This is why we see the `SQLDatabaseChain` is inside `langchain_experimental`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Go deeper\n",
"\n",
"**Looking under the hood**\n",
"\n",
"We can use the [LangSmith trace](https://smith.langchain.com/public/7f202a0c-1e35-42d6-a84a-6c2a58f697ef/r) to see what is happening under the hood:\n",
"\n",
"* As discussed above, first we create the query:\n",
"\n",
"```\n",
"text: ' SELECT COUNT(*) FROM \"Employee\";'\n",
"```\n",
"\n",
"* Then, it executes the query and passes the results to an LLM for synthesis.\n",
"- [Using Query Checker](/docs/integrations/tools/sqlite#use-query-checker) self-correct invalid SQL using parameter `use_query_checker=True`\n",
"- [Customizing the LLM Prompt](/docs/integrations/tools/sqlite#customize-prompt) include specific instructions or relevant information, using parameter `prompt=CUSTOM_PROMPT`\n",
"- [Get intermediate steps](/docs/integrations/tools/sqlite#return-intermediate-steps) access the SQL statement as well as the final result using parameter `return_intermediate_steps=True`\n",
"- [Limit the number of rows](/docs/integrations/tools/sqlite#choosing-how-to-limit-the-number-of-rows-returned) a query will return using parameter `top_k=5`\n",
"1\tLuís\tGonçalves\tEmbraer - Empresa Brasileira de Aeronáutica S.A.\tAv. Brigadeiro Faria Lima, 2170\tSão José dos Campos\tSP\tBrazil\t12227-000\t+55 (12) 3923-5555\t+55 (12) 3923-5566\tluisg@embraer.com.br\t3\n",
"Thought:\u001b[32;1m\u001b[1;3m I should query the total sales per country.\n",
"Action: sql_db_query\n",
"Action Input: SELECT Country, SUM(Total) AS TotalSales FROM Invoice INNER JOIN Customer ON Invoice.CustomerId = Customer.CustomerId GROUP BY Country ORDER BY TotalSales DESC LIMIT 10\u001b[0m\n",
"Thought: I should query the schema of the Invoice and Customer tables.\n",
"Action: sql_db_schema\n",
"Action Input: Invoice, Customer\n",
"```\n",
"\n",
"* It then formulates the query using the schema from tool `sql_db_schema`\n",
"\n",
"```\n",
"Thought: I should query the total sales per country.\n",
"Action: sql_db_query\n",
"Action Input: SELECT Country, SUM(Total) AS TotalSales FROM Invoice INNER JOIN Customer ON Invoice.CustomerId = Customer.CustomerId GROUP BY Country ORDER BY TotalSales DESC LIMIT 10\n",
"```\n",
"\n",
"* It finally executes the generated query using tool `sql_db_query`\n",
"\n",
"![sql_usecase.png](/img/SQLDatabaseToolkit.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Agent task example #2 - Describing a Table"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: The PlaylistTrack table contains two columns, PlaylistId and TrackId, which are both integers and form a primary key. It also has two foreign keys, one to the Track table and one to the Playlist table.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The PlaylistTrack table contains two columns, PlaylistId and TrackId, which are both integers and form a primary key. It also has two foreign keys, one to the Track table and one to the Playlist table.'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\"Describe the playlisttrack table\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Go deeper\n",
"\n",
"To learn more about the SQL Agent and how it works we refer to the [SQL Agent Toolkit](/docs/integrations/toolkits/sql_database) documentation.\n",
"\n",
"You can also check Agents for other document types:\n",
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
"\n",
"Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
"\n",
"Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
"\n",
"Use the following format:\n",
"\n",
"Question: Question here\n",
"ESQuery: Elasticsearch Query formatted as json\n",