langchain/libs/community/langchain_community/document_transformers
Gabriel Petracca c6660df58e
community[minor]: Implement Doctran async execution (#22372)
**Description**

The DoctranTextTranslator has an async transform function that was not
implemented because [the doctran
library](https://github.com/psychic-api/doctran) uses a sync version of
the `execute` method.

- I implemented the `DoctranTextTranslator.atransform_documents()`
method using `asyncio.to_thread` to run the function in a separate
thread.
- I updated the example in the Notebook with the new async version.
- The performance improvements can be appreciated when a big document is
divided into multiple chunks.

Relates to:
- Issue #14645: https://github.com/langchain-ai/langchain/issues/14645
- Issue #14437: https://github.com/langchain-ai/langchain/issues/14437
- https://github.com/langchain-ai/langchain/pull/15264

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-06-18 18:17:37 +00:00
..
xsl
__init__.py infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
beautiful_soup_transformer.py
doctran_text_extract.py
doctran_text_qa.py
doctran_text_translate.py community[minor]: Implement Doctran async execution (#22372) 2024-06-18 18:17:37 +00:00
embeddings_redundant_filter.py langchain[minor]: Make EmbeddingsFilters async (#22737) 2024-06-12 12:27:26 -04:00
google_translate.py
html2text.py
long_context_reorder.py community[minor]: Fix long_context_reorder.py async (#22839) 2024-06-14 13:55:18 -04:00
markdownify.py community: Add MarkdownifyTransformer to langchain_community.document_transformers (#21247) 2024-05-08 14:45:13 -07:00
nuclia_text_transform.py
openai_functions.py infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00