Commit Graph

6 Commits (804390ba4bcc306b90cb6d75b7f01a4231ab6463)

Author SHA1 Message Date
WilliamEspegren 804390ba4b
community: Spider integration (#20937)
Added the [Spider.cloud](https://spider.cloud) document loader.
[Spider](https://github.com/spider-rs/spider) is the
[fastest](https://github.com/spider-rs/spider/blob/main/benches/BENCHMARKS.md)
and cheapest crawler that returns LLM-ready data.

```
- **Description:** Adds Spider data loader
- **Dependencies:** spider-client
- **Twitter handle:** @WilliamEspegren 
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: = <=>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2 months ago
Nicolas ad04585e30
community[minor]: Firecrawl.dev integration (#20364)
Added the [FireCrawl](https://firecrawl.dev) document loader. Firecrawl
crawls and convert any website into LLM-ready data. It crawls all
accessible subpages and give you clean markdown for each.

    - **Description:** Adds FireCrawl data loader
    - **Dependencies:** firecrawl-py
    - **Twitter handle:** @mendableai 

ccing contributors: (@ericciarla @nickscamara)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2 months ago
Erick Friis 7bc100fd43
docs: integration package pip installs (#15762)
More than 300 files - will fail check_diff. Will merge after Vercel
deploy succeeds

Still occurrences that need changing - will update more later
5 months ago
Bagatur fa5d49f2c1
docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429)
ran 
```bash
g grep -l "langchain.vectorstores" | xargs -L 1 sed -i '' "s/langchain\.vectorstores/langchain_community.vectorstores/g"
g grep -l "langchain.document_loaders" | xargs -L 1 sed -i '' "s/langchain\.document_loaders/langchain_community.document_loaders/g"
g grep -l "langchain.chat_loaders" | xargs -L 1 sed -i '' "s/langchain\.chat_loaders/langchain_community.chat_loaders/g"
g grep -l "langchain.document_transformers" | xargs -L 1 sed -i '' "s/langchain\.document_transformers/langchain_community.document_transformers/g"
g grep -l "langchain\.graphs" | xargs -L 1 sed -i '' "s/langchain\.graphs/langchain_community.graphs/g"
g grep -l "langchain\.memory\.chat_message_histories" | xargs -L 1 sed -i '' "s/langchain\.memory\.chat_message_histories/langchain_community.chat_message_histories/g"
gco master libs/langchain/tests/unit_tests/*/test_imports.py
gco master libs/langchain/tests/unit_tests/**/test_public_api.py
```
6 months ago
Predrag Gruevski 2ebd167dba
Lint Python notebooks with ruff. (#12677)
The new ruff version fixed the blocking bugs, and I was able to fairly
easily us to a passing state: ruff fixed some issues on its own, I fixed
a handful by hand, and I added a list of narrowly-targeted exclusions
for files that are currently failing ruff rules that we probably should
look into eventually.

I went pretty lenient on the docs / cookbooks rules, allowing dead code
and such things. Perhaps in the future we may want to tighten the rules
further, but this is already a good set of checks that found real issues
and will prevent them going forward.
7 months ago
Bagatur eedfddac2d
Restructure docs (#11620) 8 months ago