langchain/libs/community/langchain_community/document_transformers
Gabriel Petracca c6660df58e
community[minor]: Implement Doctran async execution (#22372)
**Description**

The DoctranTextTranslator has an async transform function that was not
implemented because [the doctran
library](https://github.com/psychic-api/doctran) uses a sync version of
the `execute` method.

- I implemented the `DoctranTextTranslator.atransform_documents()`
method using `asyncio.to_thread` to run the function in a separate
thread.
- I updated the example in the Notebook with the new async version.
- The performance improvements can be appreciated when a big document is
divided into multiple chunks.

Relates to:
- Issue #14645: https://github.com/langchain-ai/langchain/issues/14645
- Issue #14437: https://github.com/langchain-ai/langchain/issues/14437
- https://github.com/langchain-ai/langchain/pull/15264

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-06-18 18:17:37 +00:00
..
xsl community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
__init__.py infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
beautiful_soup_transformer.py community[patch]: add BeautifulSoupTransformer remove_unwanted_classnames method (#20467) 2024-04-25 17:04:04 +00:00
doctran_text_extract.py community[minor]: Adding asynchronous function implementation for Doctran (#15941) 2024-01-15 10:39:25 -08:00
doctran_text_qa.py community: Make doctran synchronous (#15264) 2023-12-28 08:05:24 -08:00
doctran_text_translate.py community[minor]: Implement Doctran async execution (#22372) 2024-06-18 18:17:37 +00:00
embeddings_redundant_filter.py langchain[minor]: Make EmbeddingsFilters async (#22737) 2024-06-12 12:27:26 -04:00
google_translate.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
html2text.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
long_context_reorder.py community[minor]: Fix long_context_reorder.py async (#22839) 2024-06-14 13:55:18 -04:00
markdownify.py community: Add MarkdownifyTransformer to langchain_community.document_transformers (#21247) 2024-05-08 14:45:13 -07:00
nuclia_text_transform.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
openai_functions.py infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00