Commit Graph

8459 Commits

Author SHA1 Message Date
mwmajewsk
e192f6b6eb
community[patch]: fix, better error message in deeplake vectoriser (#18397)
If the document loader recieves Pathlib path instead of str, it reads
the file correctly, but the problem begins when the document is added to
Deeplake.
This problem arises from casting the path to str in the metadata.

```python
deeplake = True
fname = Path('./lorem_ipsum.txt')
loader = TextLoader(fname, encoding="utf-8")
docs = loader.load_and_split()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks= text_splitter.split_documents(docs)
if deeplake:
    db = DeepLake(dataset_path=ds_path, embedding=embeddings, token=activeloop_token)
    db.add_documents(chunks)
else:
    db = Chroma.from_documents(docs, embeddings)
```

So using this snippet of code the error message for deeplake looks like
this:

```
[part of error message omitted]

Traceback (most recent call last):
  File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 53, in <module>
    db.add_documents(chunks)
  File "/home/mwm/repositories/sources/langchain/libs/core/langchain_core/vectorstores.py", line 139, in add_documents
    return self.add_texts(texts, metadatas, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/deeplake.py", line 258, in add_texts
    return self.vectorstore.add(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/deeplake_vectorstore.py", line 226, in add
    return self.dataset_handler.add(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/dataset_handlers/client_side_dataset_handler.py", line 139, in add
    dataset_utils.extend_or_ingest_dataset(
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 544, in extend_or_ingest_dataset
    extend(
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 505, in extend
    dataset.extend(batched_processed_tensors, progressbar=False)
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/dataset/dataset.py", line 3247, in extend
    raise SampleExtendError(str(e)) from e.__cause__
deeplake.util.exceptions.SampleExtendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback. If you wish to skip the samples that cause errors, please specify `ignore_errors=True`.
```

Which is does not explain the error well enough.
The same error for chroma looks like this 

```
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 56, in <module>
    db = Chroma.from_documents(docs, embeddings)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 778, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 736, in from_texts
    chroma_collection.add_texts(
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 309, in add_texts
    raise ValueError(e.args[0] + "\n\n" + msg)
ValueError: Expected metadata value to be a str, int, float or bool, got lorem_ipsum.txt which is a <class 'pathlib.PosixPath'>

Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata.
```

Which is way more user friendly, so I just added information about
possible mismatch of the type in the error message, the same way it is
covered in chroma
https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/chroma.py#L224
2024-03-01 11:21:21 -08:00
Daniel Chico
7d962278f6
community[patch]: type ignore fixes (#18395)
Related to #17048
2024-03-01 11:21:02 -08:00
Christophe Bornet
69be82c86d
community[patch]: Implement lazy_load() for CSVLoader (#18391)
Covered by `test_csv_loader.py`
2024-03-01 11:17:08 -08:00
Bagatur
c54d6eb5da
fireworks[patch]: support "any" tool_choice (#18343)
per https://readme.fireworks.ai/docs/function-calling
2024-03-01 11:12:28 -08:00
Leonid Ganeline
d937fa4f9c
docs: Tutorials update (#18230)
A big update of the `Tutorials` page. Cleaned it up. Added several new
resources.
2024-03-01 11:07:39 -08:00
Erick Friis
6afb135baa
astradb: move to langchain-datastax repo (#18354) 2024-03-01 19:04:43 +00:00
Akash A Desai
b641be2edf
templates: Lanceb RAG template (#17809)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-01 18:52:50 +00:00
Guangdong Liu
760a16ff32
community[patch]: Fix ChatModel for sparkllm Bug. (#18375)
**PR message**: ***Delete this entire checklist*** and replace with
    - **Description:** fix sparkllm paramer error
    - **Issue:**   close #18370
- **Dependencies:** change `IFLYTEK_SPARK_APP_URL` to
`IFLYTEK_SPARK_API_URL`
    - **Twitter handle:** No
2024-03-01 10:49:30 -08:00
Yujie Qian
cbb65741a7
community[patch]: Voyage AI updates default model and batch size (#17655)
- **Description:** update the default model and batch size in
VoyageEmbeddings
    - **Issue:** N/A
    - **Dependencies:** N/A
    - **Twitter handle:** N/A

---------

Co-authored-by: fodizoltan <zoltan@conway.expert>
2024-03-01 10:22:24 -08:00
Shengsheng Huang
ae471a7dcb
community[minor]: add BigDL-LLM integrations (#17953)
- **Description**:
[`bigdl-llm`](https://github.com/intel-analytics/BigDL) is a library for
running LLM on Intel XPU (from Laptop to GPU to Cloud) using
INT4/FP4/INT8/FP8 with very low latency (for any PyTorch model). This PR
adds bigdl-llm integrations to langchain.
- **Issue**: NA
- **Dependencies**: `bigdl-llm` library
- **Contribution maintainer**: @shane-huang 
 
Examples added:
- docs/docs/integrations/llms/bigdl.ipynb
2024-03-01 10:04:53 -08:00
Ethan Yang
f61cb8d407
community[minor]: Add openvino backend support (#11591)
- **Description:** add openvino backend support by HuggingFace Optimum
Intel,
  - **Dependencies:** “optimum[openvino]”,

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-01 10:04:24 -08:00
Leonid Ganeline
a89f007947
docs: runnable module description (#17966)
Added a module description. Added `batch` description.
2024-03-01 10:01:32 -08:00
Leonid Ganeline
6d0af4e805
docs: nvidia: provider page update (#18054)
Nvidia provider page is missing a Triton Inference Server package
reference.
Changes:
- added the Triton Inference Server reference
- copied the example notebook from the package into the doc files.
- added the Triton Inference Server description and links, the link to
the above example notebook
- formatted page to the consistent format

NOTE:
It seems that the [example
notebook](https://github.com/langchain-ai/langchain/blob/master/libs/partners/nvidia-trt/docs/llms.ipynb)
was originally created in wrong place. It should be in the LangChain
docs
[here](https://github.com/langchain-ai/langchain/tree/master/docs/docs/integrations/llms).
So, I've created a copy of this example. The original example is still
in the nvidia-trt package.
2024-03-01 10:00:42 -08:00
RadhikaBansal97
8bafd2df5e
community[patch]: Change github endpoint in GithubLoader (#17622)
Description- 
- Changed the GitHub endpoint as existing was not working and giving 404
not found error
- Also the existing function was failing if file_filter is not passed as
the tree api return all paths including directory as well, and when
get_file_content was iterating over these path, the function was failing
for directory as the api was returning list of files inside the
directory, so added a condition to ignore the paths if it a directory
- Fixes this issue -
https://github.com/langchain-ai/langchain/issues/17453

Co-authored-by: Radhika Bansal <Radhika.Bansal@veritas.com>
2024-03-01 09:36:31 -08:00
Yufei (Benny) Chen
2b93206f02
fireworks[patch]: Fix fireworks async stream (#18372)
- **Description:**  Fix the async stream issue with Fireworks
- **Dependencies:** fireworks >= 0.13.0

```
tests/integration_tests/test_chat_models.py ..........                                                                   [ 45%]
tests/integration_tests/test_compile.py .                                                                                [ 50%]
tests/integration_tests/test_embeddings.py ..                                                                            [ 59%]
tests/integration_tests/test_llms.py .........                                                                           [100%]
```
```
tests/unit_tests/test_embeddings.py .                                                                                    [ 16%]
tests/unit_tests/test_imports.py .                                                                                       [ 33%]
tests/unit_tests/test_llms.py ....                                                                                       [100%]
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-01 09:20:26 -08:00
William FH
1deb8cadd5
Add dataset version info (#18299) 2024-02-29 22:00:44 -08:00
Anush
9d663f31fa
community[patch]: FastEmbed to latest (#18040)
## Description

Updates the `langchain_community.embeddings.fastembed` provider as per
the recent updates to [`FastEmbed`](https://github.com/qdrant/fastembed)
library.
2024-02-29 21:15:51 -08:00
Jacob Lee
590d47bff4
docs[patch]: Add Neo4j GraphAcademy to tutorials section (#18353) 2024-02-29 20:50:24 -07:00
Erick Friis
3c8a115e21
fireworks[patch]: remove custom async and stream implementations (#18363) 2024-03-01 03:20:02 +00:00
Bagatur
4730ee2766
docs: update api ref nav (#18362) 2024-02-29 19:04:56 -08:00
Bagatur
12f19b8a6a
infra: update create_api_rst (#18361) 2024-02-29 19:04:44 -08:00
Erick Friis
1317578ad1
templates: use langchain-text-splitters (#18360)
- deps
- import
- import
2024-03-01 03:00:58 +00:00
Bagatur
f220af3dce
docs: text splitters readme (#18359) 2024-03-01 03:00:42 +00:00
Bagatur
0d7fb5f60a
langchain[patch]: langchain-text-splitters dep (#18357) 2024-02-29 18:48:55 -08:00
Eugene Yurtsev
51b661cfe8
community[patch]: BaseLoader load method should just delegate to lazy_load (#18289)
load() should just reference lazy_load()
2024-02-29 21:45:28 -05:00
Bagatur
5efb5c099f
text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346) 2024-02-29 18:33:21 -08:00
Nuno Campos
7891934173
Fix missing labels (#18356)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-02-29 18:11:18 -08:00
William FH
fdab931fd3
[Core] Patch: rm dumpd of outputs from runnables/base (#18295)
It obstructs evaluations when your return a pydantic object.
2024-02-29 18:04:53 -08:00
Erick Friis
c7d5ed6f5c
infra: tolerate partner package move in ci (#18355) 2024-02-29 17:49:28 -08:00
William FH
f481cbb32d
fireworks[patch]: Fix fireworks bind tools (#18352)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-01 01:18:15 +00:00
Erick Friis
eefb49680f
multiple[patch]: fix deprecation versions (#18349) 2024-02-29 16:58:33 -08:00
Erick Friis
11cb42c2c1
core[patch]: deprecation docstring with lib (#18350) 2024-03-01 00:44:13 +00:00
Erick Friis
bce0684327
docs: airbyte deps note (#18243) 2024-02-29 16:02:13 -08:00
Erick Friis
7bbff98dc7
mongodb[patch]: core 0.1.5 dep (#18348) 2024-02-29 15:39:04 -08:00
Erick Friis
4e27e66938
infra: mongodb env vars (#18347) 2024-02-29 15:24:28 -08:00
Jib
72bfc1d3db
mongodb[minor]: MongoDB Partner Package -- Porting MongoDBAtlasVectorSearch (#17652)
This PR migrates the existing MongoDBAtlasVectorSearch abstraction from
the `langchain_community` section to the partners package section of the
codebase.
- [x] Run the partner package script as advised in the partner-packages
documentation.
- [x] Add Unit Tests
- [x] Migrate Integration Tests
- [x] Refactor `MongoDBAtlasVectorStore` (autogenerated) to
`MongoDBAtlasVectorSearch`
- [x] ~Remove~ deprecate the old `langchain_community` VectorStore
references.

## Additional Callouts
- Implemented the `delete` method
- Included any missing async function implementations
  - `amax_marginal_relevance_search_by_vector`
  - `adelete` 
- Added new Unit Tests that test for functionality of
`MongoDBVectorSearch` methods
- Removed [`del
res[self._embedding_key]`](e0c81e1cb0/libs/community/langchain_community/vectorstores/mongodb_atlas.py (L218))
in `_similarity_search_with_score` function as it would make the
`maximal_marginal_relevance` function fail otherwise. The `Document`
needs to store the embedding key in metadata to work.

Checklist:

- [x] PR title: Please title your PR "package: description", where
"package" is whichever of langchain, community, core, experimental, etc.
is being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"
- [x] PR message
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. Existing tests supplied in docs/docs do not change. Updated
docstrings for new functions like `delete`
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory. (This already exists)

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Steven Silvester <steven.silvester@ieee.org>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-29 23:09:48 +00:00
William De Vena
412148773c
Updated partners/fireworks README (#18267)
## PR title
partners: changed the README file for the Fireworks integration in the
libs/partners/fireworks folder

## PR message
Description: Changed the README file of partners/fireworks following the
docs on https://python.langchain.com/docs/integrations/llms/Fireworks

The README includes:

- Brief description
- Installation
- Setting-up instructions (API key, model id, ...)
- Basic usage

Issue: https://github.com/langchain-ai/langchain/issues/17545

Dependencies: None

Twitter handle: None
2024-02-29 14:55:03 -08:00
Kai Kugler
df234fb171
community[patch]: Fixing embedchain document mapping (#18255)
- **Description:** The current embedchain implementation seems to handle
document metadata differently than done in the current implementation of
langchain and a KeyError is thrown. I would love for someone else to
test this...

---------

Co-authored-by: KKUGLER <kai.kugler@mercedes-benz.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Deshraj Yadav <deshraj@gatech.edu>
2024-02-29 14:54:37 -08:00
Erick Friis
040271f33a
community[patch]: remove llmlingua extended tests (#18344) 2024-02-29 13:51:29 -08:00
William De Vena
87dca8e477
Updated partners/ibm README (#18268)
## PR title
partners: changed the README file for the IBM Watson AI integration in
the libs/partners/ibm folder.

## PR message
Description: Changed the README file of partners/ibm following the docs
on https://python.langchain.com/docs/integrations/llms/ibm_watsonx

The README includes:

- Brief description
- Installation
- Setting-up instructions (API key, project id, ...)
- Basic usage:
  - Loading the model
  - Direct inference
  - Chain invoking
  - Streaming the model output
  
Issue: https://github.com/langchain-ai/langchain/issues/17545

Dependencies: None

Twitter handle: None

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
2024-02-29 13:29:28 -08:00
Erick Friis
dfd9787388
infra: ci dirs in wrong order (#18340) 2024-02-29 21:13:29 +00:00
Bagatur
9e46535ebc
core[patch]: Release 0.1.28 (#18341) 2024-02-29 13:03:13 -08:00
Tomaz Bratanic
5999c4a240
Add support for parameters in neo4j retrieval query (#18310)
Sometimes, you want to use various parameters in the retrieval query of
Neo4j Vector to personalize/customize results. Before, when there were
only predefined chains, it didn't really make sense. Now that it's all
about custom chains and LCEL, it is worth adding since users can inject
any params they wish at query time. Isn't prone to SQL injection-type
attacks since we use parameters and not concatenating strings.
2024-02-29 13:00:54 -08:00
Hasan
15d1b73a00
Add optional output_parser param in create_react_agent (#18320)
**Description:** Add facility to pass the optional output parser to
customize the parsing logic

---------

Co-authored-by: hasan <hasan@m2sys.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-29 12:35:43 -08:00
Bagatur
a6f0506aaf
docs: query analysis use case (#17766)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-29 12:33:49 -08:00
kkdamowang
6782dac420
docs: remove duplicate quote in AzureOpenAIEmbeddings doc (#18315)
- **Description:** Remove duplicate quote in AzureOpenAIEmbeddings doc,
remove trailing spaces.
- **Issue:** No
- **Dependencies:** No
2024-02-29 11:25:50 -08:00
Filip Schouwenaars
4c62362eab
Add links to relevant DataCamp code alongs (#18332)
This PR adds links to some more free resources for people to get
acquainted with Langhchain without having to configure their system.

<!-- If no one reviews your PR within a few days, please @-mention one
of baskaryan, efriis, eyurtsev, hwchase17. -->

Co-authored-by: Filip Schouwenaars <filipsch@users.noreply.github.com>
2024-02-29 11:25:01 -08:00
Virat Singh
cd926ac3dd
community: Add PolygonFinancials Tool (#18324)
**Description:**
In this PR, I am adding a `PolygonFinancials` tool, which can be used to
get financials data for a given ticker. The financials data is the
fundamental data that is found in income statements, balance sheets, and
cash flow statements of public US companies.

**Twitter**: 
[@virattt](https://twitter.com/virattt)
2024-02-29 10:56:05 -08:00
Leonid Ganeline
d43fa2eab1
docs providers update (#18336)
Formatted pages into a consistent form. Added descriptions and links
when needed.
2024-02-29 10:53:12 -08:00
Erick Friis
68be5a7658
infra: skip ibm api docs (#18335) 2024-02-29 10:16:57 -08:00