langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Ilya	d5b1608216	fix markdown text splitter horizontal lines (#5625 ) Fixes #5614 #### Issue The `**` combination produces an exception when used as a seperator in `re.split`. Instead `\\\` should be used for regex exprations. #### Who can review? @eyurtsev	2023-06-05 16:40:26 -07:00
Harrison Chase	5ce74b5958	code splitter docs (#5480 ) Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-31 07:11:53 -07:00
ByronHsu	9d658aaa5a	Add more code splitters (go, rst, js, java, cpp, scala, ruby, php, swift, rust) (#5171 ) As the title says, I added more code splitters. The implementation is trivial, so i don't add separate tests for each splitter. Let me know if any concerns. Fixes # (issue) https://github.com/hwchase17/langchain/issues/5170 ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @eyurtsev @hwchase17 --------- Signed-off-by: byhsu <byhsu@linkedin.com> Co-authored-by: byhsu <byhsu@linkedin.com>	2023-05-30 11:04:05 -04:00
Harrison Chase	72f99ff953	Harrison/text splitter (#5417 ) adds support for keeping separators around when using recursive text splitter	2023-05-29 16:56:31 -07:00
Eugene Yurtsev	d56313acba	Improve effeciency of TextSplitter.split_documents, iterate once (#5111 ) # Improve TextSplitter.split_documents, collect page_content and metadata in one iteration ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @eyurtsev In the case where documents is a generator that can only be iterated once making this change is a huge help. Otherwise a silent issue happens where metadata is empty for all documents when documents is a generator. So we expand the argument from `List[Document]` to `Union[Iterable[Document], Sequence[Document]]` --------- Co-authored-by: Steven Tartakovsky <tartakovsky.developer@gmail.com>	2023-05-22 23:00:24 -04:00
Roma	2b4e9a3efa	Add unit test for _merge_splits function (#3513 ) This commit adds a new unit test for the _merge_splits function in the text splitter. The new test verifies that the function merges text into chunks of the correct size and overlap, using a specified separator. The test passes on the current implementation of the function.	2023-04-25 10:02:59 -07:00
Harrison Chase	f95d551f7a	Harrison/shallow metadata (#1599 ) Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>	2023-03-11 09:18:25 -08:00
Harrison Chase	064741db58	Harrison/fix text splitter (#1511 ) Co-authored-by: ajaysolanky <ajsolanky@gmail.com> Co-authored-by: Ajay Solanky <ajaysolanky@saw-l14668307kd.myfiosgateway.com>	2023-03-07 15:42:28 -08:00
Harrison Chase	1511606799	Harrison/fix splitting (#563 ) fix issue where text splitting could possibly create empty docs	2023-01-08 19:19:32 -08:00
Harrison Chase	1192cc0767	smart text splitter (#530 ) smart text splitter that iteratively tries different separators until it works!	2023-01-08 15:11:10 -08:00
Harrison Chase	c104d507bf	Harrison/improve data augmented generation docs (#390 ) Co-authored-by: cameronccohen <cameron.c.cohen@gmail.com> Co-authored-by: Cameron Cohen <cameron.cohen@quantco.com>	2022-12-20 22:24:08 -05:00
Harrison Chase	e7b625fe03	fix text splitter (#375 )	2022-12-18 20:21:43 -05:00
Harrison Chase	160af4ba6b	Harrison/map reduce (#36 )	2022-10-31 20:17:22 -07:00

13 Commits