diff --git a/docs/index.rst b/docs/index.rst index 8b8c8be7..4f7c4d8c 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -120,6 +120,7 @@ The above modules can be used in a variety of ways. LangChain also provides guid ./use_cases/question_answering.md ./use_cases/summarization.md ./use_cases/tabular.rst + ./use_cases/extraction.md ./use_cases/evaluation.rst ./use_cases/model_laboratory.ipynb diff --git a/docs/use_cases/extraction.md b/docs/use_cases/extraction.md new file mode 100644 index 00000000..a6dfe286 --- /dev/null +++ b/docs/use_cases/extraction.md @@ -0,0 +1,20 @@ +# Extraction + +Most APIs and databases still deal with structured information. +Therefore, in order to better work with those, it can be useful to extract structured information from text. +Examples of this include: + +- Extracting a structured row to insert into a database from a sentence +- Extracting multiple rows to insert into a database from a long document +- Extracting the correct API parameters from a user query + +This work is extremely related to [output parsing](../modules/prompts/examples/output_parsers.ipynb). +Output parsers are responsible for instructing the LLM to respond in a specific format. +In this case, the output parsers specify the format of the data you would like to extract from the document. +Then, in addition to the output format instructions, the prompt should also contain the data you would like to extract information from. + +While normal output parsers are good enough for basic structuring of response data, +when doing extraction you often want to extract more complicated or nested structures. +For a deep dive on extraction, we recommend checking out [`kor`](https://eyurtsev.github.io/kor/), +a library that uses the existing LangChain chain and OutputParser abstractions +but deep dives on allowing extraction of more complicated schemas.