langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

Author	SHA1	Message	Date
seamusp	16945c9922	docs: misc retrievers fixes (#9791 ) Various miscellaneous fixes to most pages in the 'Retrievers' section of the documentation: - "VectorStore" and "vectorstore" changed to "vector store" for consistency - Various spelling, grammar, and formatting improvements for readability Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-03 20:26:49 -07:00
Nate Nethercott	0024824a6e	docs: Fix spelling mistakes in retrievers/get_started.mdx (#9920 ) Description: Fix spelling mistakes in retrievers/get_started.mdx	2023-08-29 14:50:07 -07:00
seamusp	25f2c82ae8	docs:misc fixes (#9671 ) Improve internal consistency in LangChain documentation - Change occurrences of eg and eg. to e.g. - Fix headers containing unnecessary capital letters. - Change instances of "few shot" to "few-shot". - Add periods to end of sentences where missing. - Minor spelling and grammar fixes.	2023-08-23 22:36:54 -07:00
Martin Schade	0c8a88b3fa	AmazonTextractPDFLoader documentation updates (#9415 ) Description: Updating documentation to add AmazonTextractPDFLoader according to [comment](https://github.com/langchain-ai/langchain/pull/8661#issuecomment-1666572992) from [baskaryan](https://github.com/baskaryan) Adding one notebook and instructions to the modules/data_connection/document_loaders/pdf.mdx --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-20 16:40:15 -07:00
Chris Pappalardo	beab637f04	added filter kwarg to VectorStoreIndexWrapper query and query_with_so… (#8844 ) - Description: added filter to query methods in VectorStoreIndexWrapper for filtering by metadata (i.e. search_kwargs) - Tag maintainer: @rlancemartin, @eyurtsev Updated the doc snippet on this topic as well. It took me a long while to figure out how to filter the vectorstore by filename, so this might help someone else out. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-08 10:10:45 -07:00
Snehil Kumar	a6ee646ef3	Update get_started.mdx (#8744 ) - Description: Added a missing word and rearranged a sentence in the documentation of Self Query Retrievers., - Issue: NA, - Dependencies: NA, - Tag maintainer: @baskaryan, - Twitter handle: NA Thanks for your time.	2023-08-04 15:32:19 -04:00
Ilya	6f0bccfeb5	Add regex control over separators in character text splitter (#7933 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #7854 Added the ability to use the `separator` ase a regex or a simple character. Fixed a bug where `start_index` was incorrectly counting from -1. Who can review? @eyurtsev @hwchase17 @mmz-001	2023-08-03 20:25:23 -07:00
William FH	0a16b3d84b	Update Integrations links (#8206 )	2023-07-24 21:20:32 -07:00
Jeff Huber	dc8b790214	Improve vector store onboarding exp (#6698 ) This PR - fixes the `similarity_search_by_vector` example, makes the code run and adds the example to mirror `similarity_search` - reverts back to chroma from faiss to remove sharp edges / create a happy path for new developers. (1) real metadata filtering, (2) expected functionality like `update`, `delete`, etc to serve beyond the most trivial use cases @hwchase17 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 13:48:42 -07:00
Kacper Łukawski	1ff5b67025	Implement async API for Qdrant vector store (#7704 ) Inspired by #5550, I implemented full async API support in Qdrant. The docs were extended to mention the existence of asynchronous operations in Langchain. I also used that chance to restructure the tests of Qdrant and provided a suite of tests for the async version. Async API requires the GRPC protocol to be enabled. Thus, it doesn't work on local mode yet, but we're considering including the support to be consistent.	2023-07-15 09:33:26 -04:00
Jason Fan	8effd90be0	Add new types of document transformers (#7379 ) - Description: Add two new document transformers that translates documents into different languages and converts documents into q&a format to improve vector search results. Uses OpenAI function calling via the [doctran](https://github.com/psychic-api/doctran/tree/main) library. - Issue: N/A - Dependencies: `doctran = "^0.0.5"` - Tag maintainer: @rlancemartin @eyurtsev @hwchase17 - Twitter handle: @psychicapi or @jfan001 Notes - Adheres to the `DocumentTransformer` abstraction set by @dev2049 in #3182 - refactored `EmbeddingsRedundantFilter` to put it in a file under a new `document_transformers` module - Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as well as the existing `EmbeddingsRedundantFilter` --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-12 23:53:30 -04:00
Yaohui Wang	d85c33a5c3	Fix the markdown rendering issue with a code block inside a markdown code block (#6625 ) ### Description - Fix the markdown rendering issue with a code block inside a markdown, using a different number of backticks for the delimiters. Current doc site: <https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/code_splitter#markdown> After fix: <img width="480" alt="image" src="https://github.com/hwchase17/langchain/assets/3115235/d9921d59-64e6-4a34-9c62-79743667f528"> ### Who can review PTAL @dev2049 Co-authored-by: Yaohui Wang <wangyaohui.01@bytedance.com>	2023-07-12 16:29:25 -04:00
Yaroslav Halchenko	0d92a7f357	codespell: workflow, config + some (quite a few) typos fixed (#6785 ) Probably the most boring PR to review ;) Individual commits might be easier to digest --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2023-07-12 16:20:08 -04:00
nikkie	2eb4a2ceea	docs(retrievers/get-started): Fix broken state_of_the_union.txt link (#7399 ) Thank you for this awesome library. - Description: Fix broken link in documentation - Issue: - https://python.langchain.com/docs/modules/data_connection/retrievers/#get-started - the URL: https://github.com/hwchase17/langchain/blob/master/docs/modules/state_of_the_union.txt - I think the right one is https://github.com/hwchase17/langchain/blob/master/docs/extras/modules/state_of_the_union.txt - Dependencies: - - Tag maintainer: @baskaryan - Twitter handle: -	2023-07-08 11:11:05 -04:00
Jeroen Van Goey	887bb12287	Use correct Language for html_splitter (#7274 ) `html_splitter` was using `Language.MARKDOWN`.	2023-07-06 09:24:25 -04:00
Sergey Kozlov	6d15854cda	Add JSON Lines support to JSONLoader (#6913 ) Description: The JSON Lines format is used by some services such as OpenAI and HuggingFace. It's also a convenient alternative to CSV. This PR adds JSON Lines support to `JSONLoader` and also updates related tests. Tag maintainer: @rlancemartin, @eyurtsev. PS I was not able to build docs locally so didn't update related section.	2023-07-02 12:32:41 -07:00
Bagatur	7acd524210	Rm retriever kwargs (#7013 ) Doesn't actually limit the Retriever interface but hopefully in practice it does	2023-07-02 08:22:24 -06:00
Johnny Lim	9dc77614e3	Polish reference docs (#7045 ) This PR fixes broken links in the reference docs.	2023-07-02 08:08:51 -06:00
Zander Chase	b0859c9b18	Add New Retriever Interface with Callbacks (#5962 ) Handle the new retriever events in a way that (I think) is entirely backwards compatible? Needs more testing for some of the chain changes and all. This creates an entire new run type, however. We could also just treat this as an event within a chain run presumably (same with memory) Adds a subclass initializer that upgrades old retriever implementations to the new schema, along with tests to ensure they work. First commit doesn't upgrade any of our retriever implementations (to show that we can pass the tests along with additional ones testing the upgrade logic). Second commit upgrades the known universe of retrievers in langchain. - [X] Add callback handling methods for retriever start/end/error (open to renaming to 'retrieval' if you want that) - [X] Update BaseRetriever schema to support callbacks - [X] Tests for upgrading old "v1" retrievers for backwards compatibility - [X] Update existing retriever implementations to implement the new interface - [X] Update calls within chains to .{a]get_relevant_documents to pass the child callback manager - [X] Update the notebooks/docs to reflect the new interface - [X] Test notebooks thoroughly Not handled: - Memory pass throughs: retrieval memory doesn't have a parent callback manager passed through the method --------- Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>	2023-06-30 14:44:03 -07:00
Davis Chase	3298bf4f00	docs/fix links (#6498 )	2023-06-20 14:06:50 -07:00
Davis Chase	87e502c6bc	Doc refactor (#6300 ) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-16 11:52:56 -07:00

21 Commits