Close dataframe column names are being treated as one by the LLM (#3611)

We are sending sample dataframe to LLM with df.head().
If the column names are close by, LLM treats two columns names as one,
returning incorrect results.


![image](https://user-images.githubusercontent.com/4707543/234678692-97851fa0-9e12-44db-92ec-9ad9f3545ae2.png)

In the above case the LLM uses **Org Week** as the column name instead
of **Week** if asked about a specific week.

Returning head() as a markdown separates out the columns names and thus
using correct column name.


![image](https://user-images.githubusercontent.com/4707543/234678945-c6d7b218-143e-4e70-9e17-77dc64841a49.png)
This commit is contained in:
Bhupendra Aole 2023-04-26 19:05:53 -04:00 committed by GitHub
parent 860fa59cd3
commit 568c4f0d81
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -35,7 +35,7 @@ def create_pandas_dataframe_agent(
prompt = ZeroShotAgent.create_prompt(
tools, prefix=prefix, suffix=suffix, input_variables=input_variables
)
partial_prompt = prompt.partial(df=str(df.head()))
partial_prompt = prompt.partial(df=str(df.head().to_markdown()))
llm_chain = LLMChain(
llm=llm,
prompt=partial_prompt,