From 8685d53adcdd0310e76349ecb4e2b87f980c4673 Mon Sep 17 00:00:00 2001 From: Harrison Chase Date: Sat, 18 Mar 2023 11:12:18 -0700 Subject: [PATCH] querying tabular data (#1758) --- docs/index.rst | 3 +++ docs/use_cases/tabular.md | 31 +++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+) create mode 100644 docs/use_cases/tabular.md diff --git a/docs/index.rst b/docs/index.rst index 3b716f5a..8b8c8be7 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -97,6 +97,8 @@ The above modules can be used in a variety of ways. LangChain also provides guid - `Summarization <./use_cases/summarization.html>`_: Summarizing longer documents into shorter, more condensed chunks of information. A type of Data Augmented Generation. +- `Querying Tabular Data <./use_cases/tabular.html>`_: If you want to understand how to use LLMs to query data that is stored in a tabular format (csvs, SQL, dataframes, etc) you should read this page. + - `Evaluation <./use_cases/evaluation.html>`_: Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this. - `Generate similar examples <./use_cases/generate_examples.html>`_: Generating similar examples to a given input. This is a common use case for many applications, and LangChain provides some prompts/chains for assisting in this. @@ -117,6 +119,7 @@ The above modules can be used in a variety of ways. LangChain also provides guid ./use_cases/combine_docs.md ./use_cases/question_answering.md ./use_cases/summarization.md + ./use_cases/tabular.rst ./use_cases/evaluation.rst ./use_cases/model_laboratory.ipynb diff --git a/docs/use_cases/tabular.md b/docs/use_cases/tabular.md new file mode 100644 index 00000000..c4dd0dd2 --- /dev/null +++ b/docs/use_cases/tabular.md @@ -0,0 +1,31 @@ +# Querying Tabular Data + +Lots of data and information is stored in tabular data, whether it be csvs, excel sheets, or SQL tables. +This page covers all resources available in LangChain for working with data in this format. + +## Document Loading +If you have text data stored in a tabular format, you may want to load the data into a Document and then index it as you would +other text/unstructured data. For this, you should use a document loader like the [CSVLoader](../modules/document_loaders/examples/csv.ipynb) +and then you should [create an index](../modules/indexes.rst) over that data, and [query it that way](../modules/indexes/chain_examples/vector_db_qa.ipynb). + +## Querying +If you have more numeric tabular data, or have a large amount of data and don't want to index it, you should get started +by looking at various chains and agents we have for dealing with this data. + +### Chains + +If you are just getting started, and you have relatively small/simple tabular data, you should get started with chains. +Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you +understand what is happening better. + +- [SQL Database Chain](../modules/chains/examples/sqlite.ipynb) + +### Agents + +Agents are more complex, and involve multiple queries to the LLM to understand what to do. +The downside of agents are that you have less control. The upside is that they are more powerful, +which allows you to use them on larger databases and more complex schemas. + +- [SQL Agent](../modules/agents/agent_toolkits/sql_database.ipynb) +- [Pandas Agent](../modules/agents/agent_toolkits/pandas.ipynb) +- [CSV Agent](../modules/agents/agent_toolkits/csv.ipynb)