**Description:** This minor PR aims to add `llm_extraction` to Firecrawl
loader. This feature is supported on API and PythonSDK, but the
langchain loader omits adding this to the response.
**Twitter handle:** [scalable_pizza](https://x.com/scalablepizza)
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
community:Add support for specifying document_loaders.firecrawl api url.
Add support for specifying document_loaders.firecrawl api url.
This is mainly to support the
[self-hosting](https://github.com/mendableai/firecrawl/blob/main/SELF_HOST.md)
option firecrawl provides. Eg. now I can specify localhost:....
The corresponding firecrawl class already provides functionality to pass
the argument. See here:
4c9d62f6d3/apps/python-sdk/firecrawl/firecrawl.py (L29)
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Change 'FIREWALL' to 'FIRECRAWL' as I believe this may have been in
error. Other docs refer to 'FIRECRAWL_API_KEY'.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Summary
I ran `ruff check --extend-select RUF100 -n` to identify `# noqa`
comments that weren't having any effect in Ruff, and then `ruff check
--extend-select RUF100 -n --fix` on select files to remove all of the
unnecessary `# noqa: F401` violations. It's possible that these were
needed at some point in the past, but they're not necessary in Ruff
v0.1.15 (used by LangChain) or in the latest release.
Co-authored-by: Erick Friis <erick@langchain.dev>
Added the [FireCrawl](https://firecrawl.dev) document loader. Firecrawl
crawls and convert any website into LLM-ready data. It crawls all
accessible subpages and give you clean markdown for each.
- **Description:** Adds FireCrawl data loader
- **Dependencies:** firecrawl-py
- **Twitter handle:** @mendableai
ccing contributors: (@ericciarla @nickscamara)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>