Bagatur/apify (#8008)

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Jiří Moravčík <jiri.moravcik@gmail.com>
Co-authored-by: Jan Čurn <jan.curn@gmail.com>
This commit is contained in:
Bagatur 2023-07-20 08:36:01 -07:00 committed by GitHub
parent 1d7414a371
commit 7c24a6b9d1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -7,7 +7,23 @@ from langchain.document_loaders.base import BaseLoader
class ApifyDatasetLoader(BaseLoader, BaseModel): class ApifyDatasetLoader(BaseLoader, BaseModel):
"""Loading Documents from Apify datasets.""" """Loads datasets from Apify-a web scraping, crawling, and data extraction platform.
For details, see https://docs.apify.com/platform/integrations/langchain
Example:
.. code-block:: python
from langchain.document_loaders import ApifyDatasetLoader
from langchain.schema import Document
loader = ApifyDatasetLoader(
dataset_id="YOUR-DATASET-ID",
dataset_mapping_function=lambda dataset_item: Document(
page_content=dataset_item["text"], metadata={"source": dataset_item["url"]}
),
)
documents = loader.load()
""" # noqa: E501
apify_client: Any apify_client: Any
"""An instance of the ApifyClient class from the apify-client Python package.""" """An instance of the ApifyClient class from the apify-client Python package."""