Go to file

Eugene Yurtsev 2ceb807da2 Add PDF parser implementations (#4356 ) # Add PDF parser implementations This PR separates the data loading from the parsing for a number of existing PDF loaders. Parser tests have been designed to help encourage developers to create a consistent interface for parsing PDFs. This interface can be made more consistent in the future by adding information into the initializer on desired behavior with respect to splitting by page etc. This code is expected to be backwards compatible -- with the exception of a bug fix with pymupdf parser which was returning `bytes` in the page content rather than strings. Also changing the lazy parser method of document loader to return an Iterator rather than Iterable over documents. ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @ <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoader Abstractions - @eyurtsev LLM/Chat Wrappers - @hwchase17 - @agola11 Tools / Toolkits - @vowelparrot -->		2023-05-09 10:24:17 -04:00
.devcontainer	Visual Studio Code/Github Codespaces Dev Containers (#4035 ) (#4122 )	2023-05-04 11:37:00 -07:00
.github	Add Pull Request Template (#4247 )	2023-05-08 08:34:37 -07:00
docs	Fix grammar in Text Splitters docs (#4373 )	2023-05-08 22:38:40 -04:00
langchain	Add PDF parser implementations (#4356 )	2023-05-09 10:24:17 -04:00
tests	Add PDF parser implementations (#4356 )	2023-05-09 10:24:17 -04:00
.dockerignore	fix: tests with Dockerfile (#2382 )	2023-04-04 06:47:19 -07:00
.flake8	change run to use args and kwargs (#367 )	2022-12-18 15:54:56 -05:00
.gitignore	Harrison/relevancy score (#3907 )	2023-05-01 20:37:24 -07:00
.readthedocs.yaml	bring back ref (#4308 )	2023-05-07 17:32:28 -07:00
CITATION.cff	bump version to 0069 (#710 )	2023-01-24 00:24:54 -08:00
Dockerfile	make ARG POETRY_HOME available in multistage (#3882 )	2023-05-01 20:57:41 -07:00
LICENSE	add license (#50 )	2022-11-01 21:12:02 -07:00
Makefile	Add lint_diff command (#2449 )	2023-04-05 09:34:24 -07:00
poetry.lock	JSON loader (#4067 )	2023-05-05 14:48:13 -07:00
poetry.toml	fix Poetry 1.4.0+ installation (#1935 )	2023-03-27 08:27:54 -07:00
pyproject.toml	Add progress bar to filesystemblob loader, update pytest config for unit tests (#4212 )	2023-05-08 16:15:09 -04:00
README.md	added GitHub star number (#4214 )	2023-05-09 09:39:53 -04:00

README.md

🦜️🔗 LangChain

⚡ Building applications with LLMs through composability ⚡

Looking for the JS/TS version? Check out LangChain.js.

Production Support: As you move your LangChains into production, we'd love to offer more comprehensive support. Please fill out this form and we'll set up a dedicated support Slack channel.

Quick Install

pip install langchain or conda install langchain -c conda-forge

🤔 What is this?

Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. However, using these LLMs in isolation is often insufficient for creating a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.

This library aims to assist in the development of those types of applications. Common examples of these applications include:

❓ Question Answering over specific documents

Documentation
End-to-end Example: Question Answering over Notion Database

💬 Chatbots

Documentation
End-to-end Example: Chat-LangChain

🤖 Agents

Documentation
End-to-end Example: GPT+WolframAlpha

📖 Documentation

Please see here for full documentation on:

Getting started (installation, setting up the environment, simple examples)
How-To examples (demos, integrations, helper functions)
Reference (full API docs)
Resources (high-level explanation of core concepts)

🚀 What can this help with?

There are six main areas that LangChain is designed to help with. These are, in increasing order of complexity:

📃 LLMs and Prompts:

This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.

🔗 Chains:

Chains go beyond a single LLM call and involve sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.

📚 Data Augmented Generation:

Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.

🤖 Agents:

Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.

🧠 Memory:

Memory refers to persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.

🧐 Evaluation:

[BETA] Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.

For more information on these concepts, please see our full documentation.

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see here.

README.md Unescape Escape