# Fixes SQLAlchemy truncating the result if you have a big/text column
with many chars.
SQLAlchemy truncates columns if you try to convert a Row or Sequence to
a string directly
For comparison:
- Before:
```[('Harrison', 'That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio ... (2 characters truncated) ... hat is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio ')]```
- After:
```[('Harrison', 'That is my Bio That is my Bio That is my Bio That is
my Bio That is my Bio That is my Bio That is my Bio That is my Bio That
is my Bio That is my Bio That is my Bio That is my Bio That is my Bio
That is my Bio That is my Bio That is my Bio That is my Bio That is my
Bio That is my Bio That is my Bio ')]```
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
I'm not sure who to tag for chains, maybe @vowelparrot ?
Seeing a lot of issues in Discord in which the LLM is not using the
correct LIMIT clause for different SQL dialects. ie, it's using `LIMIT`
for mssql instead of `TOP`, or instead of `ROWNUM` for Oracle, etc.
I think this could be due to us specifying the LIMIT statement in the
example rows portion of `table_info`. So the LLM is seeing the `LIMIT`
statement used in the prompt.
Since we can't specify each dialect's method here, I think it's fine to
just replace the `SELECT... LIMIT 3;` statement with `3 rows from
table_name table:`, and wrap everything in a block comment directly
following the `CREATE` statement. The Rajkumar et al paper wrapped the
example rows and `SELECT` statement in a block comment as well anyway.
Thoughts @fpingham?
This approach has several advantages:
* it improves the readability of the code
* removes incompatibilities between SQL dialects
* fixes a bug with `datetime` values in rows and `ast.literal_eval`
Huge thanks and credits to @jzluo for finding the weaknesses in the
current approach and for the thoughtful discussion on the best way to
implement this.
---------
Co-authored-by: Francisco Ingham <>
Co-authored-by: Jon Luo <20971593+jzluo@users.noreply.github.com>
Currently the chain is getting the column names and types on the one
side and the example rows on the other. It is easier for the llm to read
the table information if the column name and examples are shown together
so that it can easily understand to which columns do the examples refer
to. For an instantiation of this, please refer to the changes in the
`sqlite.ipynb` notebook.
Also changed `eval` for `ast.literal_eval` when interpreting the results
from the sample row query since it is a better practice.
---------
Co-authored-by: Francisco Ingham <>
---------
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
The agents usually benefit from understanding what the data looks like
to be able to filter effectively. Sending just one row in the table info
allows the agent to understand the data before querying and get better
results.
---------
Co-authored-by: Francisco Ingham <>
---------
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
`SQLDatabase` now accepts two `init` arguments:
1. `ignore_tables` to pass in a list of tables to not search over
2. `include_tables` to restrict to a list of tables to consider