You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/libs/community/langchain_community/document_transformers
Gabriel Petracca c6660df58e
community[minor]: Implement Doctran async execution (#22372)
**Description**

The DoctranTextTranslator has an async transform function that was not
implemented because [the doctran
library](https://github.com/psychic-api/doctran) uses a sync version of
the `execute` method.

- I implemented the `DoctranTextTranslator.atransform_documents()`
method using `asyncio.to_thread` to run the function in a separate
thread.
- I updated the example in the Notebook with the new async version.
- The performance improvements can be appreciated when a big document is
divided into multiple chunks.

Relates to:
- Issue #14645: https://github.com/langchain-ai/langchain/issues/14645
- Issue #14437: https://github.com/langchain-ai/langchain/issues/14437
- https://github.com/langchain-ai/langchain/pull/15264

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2 weeks ago
..
xsl community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 7 months ago
__init__.py infra: rm unused # noqa violations (#22049) 1 month ago
beautiful_soup_transformer.py community[patch]: add BeautifulSoupTransformer remove_unwanted_classnames method (#20467) 2 months ago
doctran_text_extract.py community[minor]: Adding asynchronous function implementation for Doctran (#15941) 6 months ago
doctran_text_qa.py community: Make doctran synchronous (#15264) 6 months ago
doctran_text_translate.py community[minor]: Implement Doctran async execution (#22372) 2 weeks ago
embeddings_redundant_filter.py langchain[minor]: Make EmbeddingsFilters async (#22737) 3 weeks ago
google_translate.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2 months ago
html2text.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 7 months ago
long_context_reorder.py community[minor]: Fix long_context_reorder.py async (#22839) 2 weeks ago
markdownify.py community: Add MarkdownifyTransformer to langchain_community.document_transformers (#21247) 2 months ago
nuclia_text_transform.py community[patch]: docstrings update (#20301) 3 months ago
openai_functions.py infra: rm unused # noqa violations (#22049) 1 month ago