community: Add llm-extraction option to FireCrawl Document Loader (#25231)

**Description:** This minor PR aims to add `llm_extraction` to Firecrawl
loader. This feature is supported on API and PythonSDK, but the
langchain loader omits adding this to the response.
**Twitter handle:** [scalable_pizza](https://x.com/scalablepizza)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
Shivendra Soni 2024-08-09 19:29:10 +05:30 committed by GitHub
parent c81c77b465
commit 66b7206ab6
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -63,7 +63,10 @@ class FireCrawlLoader(BaseLoader):
f"Unrecognized mode '{self.mode}'. Expected one of 'crawl', 'scrape'."
)
for doc in firecrawl_docs:
yield Document(
page_content=doc.get("markdown", ""),
metadata=doc.get("metadata", {}),
)
metadata = doc.get("metadata", {})
if (self.params is not None) and self.params.get(
"extractorOptions", {}
).get("mode") == "llm-extraction":
metadata["llm_extraction"] = doc.get("llm_extraction")
yield Document(page_content=doc.get("markdown", ""), metadata=metadata)