improve the performance of base.py (#8610)

This removes the use of the intermediate df list and directly
concatenates the dataframes if path is a list of strings. The pd.concat
function combines the dataframes efficiently, making it faster and more
memory-efficient compared to appending dataframes to a list.

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This commit is contained in:
Mohamad Zamini 2023-09-27 18:36:03 -06:00 committed by GitHub
parent 05b75f3f13
commit 9306394078
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -24,11 +24,13 @@ def create_csv_agent(
if isinstance(path, (str, IOBase)): if isinstance(path, (str, IOBase)):
df = pd.read_csv(path, **_kwargs) df = pd.read_csv(path, **_kwargs)
elif isinstance(path, list): elif isinstance(path, list):
df = [] if not all(isinstance(item, (str, IOBase)) for item in path):
for item in path: raise ValueError(
if not isinstance(item, (str, IOBase)): f"Expected all elements in the list to be strings, got {type(path)}."
raise ValueError(f"Expected str or file-like object, got {type(path)}") )
df.append(pd.read_csv(item, **_kwargs)) dfs = [pd.read_csv(item, **_kwargs) for item in path]
df = pd.concat(dfs, ignore_index=True)
else: else:
raise ValueError(f"Expected str, list, or file-like object, got {type(path)}") raise ValueError(f"Expected str or list, got {type(path)}")
return create_pandas_dataframe_agent(llm, df, **kwargs) return create_pandas_dataframe_agent(llm, df, **kwargs)