Commit Graph

901 Commits

Author SHA1 Message Date
Peter Vandenabeele
e830a4e731
community[patch]: Add remove_comments option (default True): do not extract html comments (#13259)
- **Description:** add `remove_comments` option (default: True): do not
extract html _comments_,
  - **Issue:** None,
  - **Dependencies:** None,
  - **Tag maintainer:** @nfcampos ,
  - **Twitter handle:** peter_v

I ran `make format`, `make lint` and `make test`.

Discussion: I my use case, I prefer to not have the comments in the
extracted text:
* e.g. from a Google tag that is added in the html as comment
* e.g. content that the authors have temporarily hidden to make it non
visible to the regular reader

Removing the comments makes the extracted text more alike the intended
text to be seen by the reader.


**Choice to make:** do we prefer to make the default for this
`remove_comments` option to be True or False?
I have changed it to True in a second commit, since that is how I would
prefer to use it by default. Have the
cleaned text (without technical Google tags etc.) and also closer to the
actually visible and intended content.
I am not sure what is best aligned with the conventions of langchain in
general ...


INITIAL VERSION (new version above):
~**Choice to make:** do we prefer to make the default for this
`ignore_comments` option to be True or False?
I have set it to False now to be backwards compatible. On the other
hand, I would use it mostly with True.
I am not sure what is best aligned with the conventions of langchain in
general ...~

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-02 00:19:12 +00:00
Jamsheed Mistri
4f70bc119d
community[minor]: add Layerup Security integration (#19787)
**Description:** adds integration with [Layerup
Security](https://uselayerup.com). Docs can be found
[here](https://docs.uselayerup.com). Integrates directly with our Python
SDK.

**Dependencies:**
[LayerupSecurity](https://pypi.org/project/LayerupSecurity/)

**Note**: all methods for our product require a paid API key, so I only
included 1 test which checks for an invalid API key response. I have
tested extensively locally.

**Twitter handle**: [@layerup_](https://twitter.com/layerup_)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-01 23:49:00 +00:00
Anıl Berk Altuner
4384fa8e49
community[minor]: Add Dria retriever (#17098)
[Dria](https://dria.co/) is a hub of public RAG models for developers to
both contribute and utilize a shared embedding lake. This PR adds a
retriever that can retrieve documents from Dria.
2024-04-01 12:04:19 -07:00
Ethan Yang
48f84e253e
community[minor]: Add OpenVINO rerank model support (#19791)
@eaidova @AlexKoff88 Could you help to review, thanks

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-01 18:27:23 +00:00
Chenhui Zhang
a1f3e9f537
community[minor]: Update ChatZhipuAI to support GLM-4 model (#16695)
Description: Update `ChatZhipuAI` to support the latest `glm-4` model.
Issue: N/A
Dependencies: httpx, httpx-sse, PyJWT

The previous `ChatZhipuAI` implementation requires the `zhipuai`
package, and cannot call the latest GLM model. This is because
- The old version `zhipuai==1.*` doesn't support the latest model.
- `zhipuai==2.*` requires `pydantic V2`, which is incompatible with
'langchain-community'.

This re-implementation invokes the GLM model by sending HTTP requests to
[open.bigmodel.cn](https://open.bigmodel.cn/dev/api) via the `httpx`
package, and uses the `httpx-sse` package to handle stream events.

---------

Co-authored-by: zR <2448370773@qq.com>
2024-04-01 18:11:21 +00:00
Bagatur
d62e84c4f5
community[patch]: Revert " Fix the bug that Chroma does not specify `e… (#19866)
…mbedding_function` (#19277)"

This reverts commit 7042934b5f.

Fixes #19848
2024-04-01 10:10:44 -07:00
hsuyuming
5ab6b39098
community[patch]: add attribution_token within GoogleVertexAISearchRetriever (#18520)
- **Description:** Add attribution_token within
GoogleVertexAISearchRetriever so user can provide this information to
Google support team or product team during debug session.
    
Reference:
https://cloud.google.com/generative-ai-app-builder/docs/view-analytics#user-events

Attribution tokens. Attribution tokens are unique IDs generated by
Vertex AI Search and returned with each search request. Make sure to
include that attribution token as UserEvent.attributionToken with any
user events resulting from a search. This is needed to identify if a
search is served by the API. Only user events with a Google-generated
attribution token are used to compute metrics.
    
    - **Issue:** No
    - **Dependencies:** No
    - **Twitter handle:** abehsu1992626
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-31 13:54:56 -07:00
Kenneth Choe
f98d7f7494
langchain[minor], community[minor]: add CrossEncoderReranker with HuggingFaceCrossEncoder and SagemakerEndpointCrossEncoder (#13687)
- **Description:** Support reranking based on cross encoder models
available from HuggingFace.
      - Added `CrossEncoder` schema
- Implemented `HuggingFaceCrossEncoder` and
`SagemakerEndpointCrossEncoder`
- Implemented `CrossEncoderReranker` that performs similar functionality
to `CohereRerank`
- Added `cross-encoder-reranker.ipynb` to demonstrate how to use it.
Please let me know if anything else needs to be done to make it visible
on the table-of-contents navigation bar on the left, or on the card list
on [retrievers documentation
page](https://python.langchain.com/docs/integrations/retrievers).
  - **Issue:** N/A
  - **Dependencies:** None other than the existing ones.

---------

Co-authored-by: Kenny Choe <kchoe@amazon.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-31 20:51:31 +00:00
Kamal Zhang
368e35c3b1
community[patch]: introduce convert_to_secret() to bananadev llm (#14283)
- **Description:** Per #12165, this PR add to BananaLLM the function
convert_to_secret_str() during environment variable validation.
- **Issue:** #12165
- **Tag maintainer:** @eyurtsev
- **Twitter handle:** @treewatcha75751

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-30 00:52:25 +00:00
anshaneel
0884e5de7f
community[minor]: Add Alpha Vantage API Tool (#14332)
### Description
This implementation adds functionality from the AlphaVantage API,
renowned for its comprehensive financial data. The class encapsulates
various methods, each dedicated to fetching specific types of financial
information from the API.

### Implemented Functions

- **`search_symbols`**: 
- Searches the AlphaVantage API for financial symbols using the provided
keywords.

- **`_get_market_news_sentiment`**: 
- Retrieves market news sentiment for a specified stock symbol from the
AlphaVantage API.

- **`_get_time_series_daily`**: 
- Fetches daily time series data for a specific symbol from the
AlphaVantage API.

- **`_get_quote_endpoint`**: 
- Obtains the latest price and volume information for a given symbol
from the AlphaVantage API.

- **`_get_time_series_weekly`**: 
- Gathers weekly time series data for a particular symbol from the
AlphaVantage API.

- **`_get_top_gainers_losers`**: 
- Provides details on top gainers, losers, and most actively traded
tickers in the US market from the AlphaVantage API.

  ### Issue: 
  - #11994 
  
### Dependencies: 
  - 'requests' library for HTTP requests. (import requests)
  - 'pytest' library for testing. (import pytest)

---------

Co-authored-by: Adam Badar <94140103+adam-badar@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-30 00:44:01 +00:00
Alex Sherstinsky
a9bc212bf2
community[minor]: fix failing Predibase integration (#19776)
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Langchain-Predibase integration was failing, because
it was not current with the Predibase SDK; in addition, Predibase
integration tests were instantiating the Langchain Community `Predibase`
class with one required argument (`model`) missing. This change updates
the Predibase SDK usage and fixes the integration tests.
    - **Twitter handle:** `@alexsherstinsky`


---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-30 00:38:13 +00:00
ethynic
e9caa22d47
community[patch]: Update minimax.py (#14384)
MiniMaxChat class _generate method shoud return a ChatResult object not
str

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 23:57:06 +00:00
M.Abdulrahman Alnaseer
ba54f1577f
community[minor]: add support for llmsherpa (#19741)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: added support for llmsherpa library"

- [x] **Add tests and docs**: 
1. Integration test:
'docs/docs/integrations/document_loaders/test_llmsherpa.py'.
2. an example notebook:
`docs/docs/integrations/document_loaders/llmsherpa.ipynb`.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 16:04:57 -07:00
Hrvoje Milković
b7344e3347
community[minor]: Infobip tool integration (#16805)
**Description:** Adding Tool that wraps Infobip API for sending sms or
emails and email validation.
**Dependencies:** None,
**Twitter handle:** @hmilkovic

Implementation:
```
libs/community/langchain_community/utilities/infobip.py
```

Integration tests:
```
libs/community/tests/integration_tests/utilities/test_infobip.py
```

Example notebook:
```
docs/docs/integrations/tools/infobip.ipynb
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 19:01:27 +00:00
Luka Krapic
727a2ea9f1
community[patch]: history size support for DynamoDBChatMessageHistory (#16794)
**Description:** PR adds support for limiting number of messages
preserved in a session history for DynamoDBChatMessageHistory

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 18:56:21 +00:00
Dt22
6dbf1a2de0
community[patch]: fix redis input type for index_schema field (#16874)
### Subject: Fix Type Misdeclaration for index_schema in redis/base.py

I noticed a type misdeclaration for the index_schema column in the
redis/base.py file.

When following the instructions outlined in [Redis Custom Metadata
Indexing](https://python.langchain.com/docs/integrations/vectorstores/redis)
to create our own index_schema, it leads to a Pylance type error. <br/>
**The error message indicates that Dict[str, list[Dict[str, str]]] is
incompatible with the type Optional[Union[Dict[str, str], str,
os.PathLike]].**

```
index_schema = {
    "tag": [{"name": "credit_score"}],
    "text": [{"name": "user"}, {"name": "job"}],
    "numeric": [{"name": "age"}],
}

rds, keys = Redis.from_texts_return_keys(
    texts,
    embeddings,
    metadatas=metadata,
    redis_url="redis://localhost:6379",
    index_name="users_modified",
    index_schema=index_schema,  
)
```
Therefore, I have created this pull request to rectify the type
declaration problem.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 18:55:54 +00:00
morgana
074ad5095f
community[patch]: mmr search for Rockset vectorstore integration (#16908)
- **Description:** Adding support for mmr search in the Rockset
vectorstore integration.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@_morgan_adams_`

---------

Co-authored-by: Rockset API Bot <admin@rockset.io>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-29 18:45:22 +00:00
shahrin014
f51e6a35ba
community[patch]: OllamaEmbeddings - Pass headers to post request (#16880)
## Feature
- Set additional headers in constructor
- Headers will be sent in post request

This feature is useful if deploying Ollama on a cloud service such as
hugging face, which requires authentication tokens to be passed in the
request header.

## Tests
- Test if header is passed
- Test if header is not passed

Similar to https://github.com/langchain-ai/langchain/pull/15881

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 18:44:52 +00:00
Jan Chorowski
b8b42ccbc5
community[minor]: Pathway vectorstore(#14859)
- **Description:** Integration with pathway.com data processing pipeline
acting as an always updated vectorstore
  - **Issue:** not applicable
- **Dependencies:** optional dependency on
[`pathway`](https://pypi.org/project/pathway/)
  - **Twitter handle:** pathway_com

The PR provides and integration with `pathway` to provide an easy to use
always updated vector store:

```python
import pathway as pw
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer

data_sources = []
data_sources.append(
    pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True))

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])
vector_server = PathwayVectorServer(
    *data_sources,
    embedder=embeddings_model,
    splitter=text_splitter,
)
vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False)
client = PathwayVectorClient(
    host="127.0.0.1",
    port="8765",
)
query = "What is Pathway?"
docs = client.similarity_search(query)
```

The `PathwayVectorServer` builds a data processing pipeline which
continusly scans documents in a given source connector (google drive,
s3, ...) and builds a vector store. The `PathwayVectorClient` implements
LangChain's `VectorStore` interface and connects to the server to
retrieve documents.

---------

Co-authored-by: Mateusz Lewandowski <lewymati@users.noreply.github.com>
Co-authored-by: mlewandowski <mlewandowski@MacBook-Pro-mlewandowski.local>
Co-authored-by: Berke <berkecanrizai1@gmail.com>
Co-authored-by: Adrian Kosowski <adrian@pathway.com>
Co-authored-by: mlewandowski <mlewandowski@macbook-pro-mlewandowski.home>
Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: mlewandowski <mlewandowski@MBPmlewandowski.ht.home>
Co-authored-by: Szymon Dudycz <szymond@pathway.com>
Co-authored-by: Szymon Dudycz <szymon.dudycz@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-29 10:50:39 -07:00
Arturs Konfino
2319212d54
community[patch]: avoid executing toolkit.get_context() when not necessary (#19762)
If `prompt` is passed into `create_sql_agent()`, then
`toolkit.get_context()` shouldn't be executed against the database
unless relevant prompt variables (`table_info` or `table_names`) are
present .
2024-03-29 16:42:21 +00:00
高璟琦
ec7a59c96c
community[minor]: Add solar embedding (#19761)
Solar is a large language model developed by
[Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM.
You can visit the embedding service provided by Solar within this pr.

You may get **SOLAR_API_KEY** from
https://console.upstage.ai/services/embedding
You can refer to more details about accepted llm integration at
https://python.langchain.com/docs/integrations/llms/solar.
2024-03-29 09:36:05 -07:00
Tomaz Bratanic
dec00d3050
community[patch]: Add the ability to pass maps to neo4j retrieval query (#19758)
Makes it easier to flatten complex values to text, so you don't have to
use a lot of Cypher to do it.
2024-03-29 08:33:48 -07:00
Robby
f7e8a382cc
community[minor]: add hugging face text-to-speech inference API (#18880)
Description: I implemented a tool to use Hugging Face text-to-speech
inference API.

Issue: n/a

Dependencies: n/a

Twitter handle: No Twitter, but do have
[LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol.

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-29 15:02:29 +00:00
DasDingoCodes
73eb3f8fd9
community[minor]: Implement DirectoryLoader lazy_load function (#19537)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: Implement DirectoryLoader lazy_load
function"

- [x] **Description**: The `lazy_load` function of the `DirectoryLoader`
yields each document separately. If the given `loader_cls` of the
`DirectoryLoader` also implemented `lazy_load`, it will be used to yield
subdocuments of the file.

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access:
`libs/community/tests/unit_tests/document_loaders/test_directory_loader.py`
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory:
`docs/docs/integrations/document_loaders/directory.ipynb`


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-29 14:46:52 +00:00
Jialei
f7c903e24a
community[minor]: add support for Moonshot llm and chat model (#17100) 2024-03-29 08:54:23 +00:00
Ethan Yang
7164015135
community[minor]: Add Openvino embedding support (#19632)
This PR is used to support both HF and BGE embeddings with openvino

---------

Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
2024-03-29 01:34:51 -07:00
T Cramer
540ebf35a9
community[patch]: Add explicit error message to Bedrock error output. (#17328)
- **Description:** Propagate Bedrock errors into Langchain explicitly.
Use-case: unset region error is hidden behind 'Could not load
credentials...' message
- **Issue:**
[17654](https://github.com/langchain-ai/langchain/issues/17654)
  - **Dependencies:** None

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-29 03:07:33 +00:00
Marcus Virginia
69bb96c80f
community[patch]: surrealdb handle for empty metadata and allow collection names with complex characters (#17374)
- **Description:** Handle for empty metadata and allow collection names
with complex characters
  - **Issue:** #17057
  - **Dependencies:** `surrealdb`

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-29 01:04:27 +00:00
kYLe
124ab79c23
community[minor]: Add Anyscale embedding support (#17605)
**Description:** Add embedding model support for Anyscale Endpoint
**Dependencies:** openai

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 00:53:53 +00:00
Lance Martin
12843f292f
community[patch]: llama cpp embeddings reset default n_batch (#17594)
When testing Nomic embeddings --
```
from langchain_community.embeddings import LlamaCppEmbeddings
embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf"
embd_lc = LlamaCppEmbeddings(model_path=embd_model_path)
embedding_lc = embd_lc.embed_query(query)
```

We were seeing this error for strings > a certain size -- 
```
File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count)
    824     s_sizes = []
    826 # add to batch
--> 827 self._batch.add_sequence(tokens, len(s_sizes), False)
    828 t_batch += n_tokens
    829 s_sizes.append(n_tokens)

File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all)
    540 self.batch.token[j] = batch[i]
    541 self.batch.pos[j] = i
--> 542 self.batch.seq_id[j][0] = seq_id
    543 self.batch.n_seq_id[j] = 1
    544 self.batch.logits[j] = logits_all

ValueError: NULL pointer access
```

The default `n_batch` of llama-cpp-python's Llama is `512` but we were
explicitly setting it to `8`.
 
These need to be set to equal for embedding models. 
* The embedding.cpp example has an assertion to make sure these are
always equal.
* Apparently this is not being done properly in llama-cpp-python.

With `n_batch` set to 8, if more than 8 tokens are passed the batch runs
out of space and it crashes.

This also explains why the CPU compute buffer size was small:

raw client with default `n_batch=512`
```
llama_new_context_with_model:        CPU input buffer size   =     3.51 MiB
llama_new_context_with_model:        CPU compute buffer size =    21.00 MiB
```
langchain with `n_batch=8`
```
llama_new_context_with_model:        CPU input buffer size   =     0.04 MiB
llama_new_context_with_model:        CPU compute buffer size =     0.33 MiB
```

We can work around this by passing `n_batch=512`, but this will not be
obvious to some users:
```
    embedding = LlamaCppEmbeddings(model_path=embd_model_path,
                                   n_batch=512)
```

From discussion w/ @cebtenzzre. Related:

https://github.com/abetlen/llama-cpp-python/issues/1189

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 00:47:22 +00:00
Zijian Han
8e976545f3
community[patch]: support OpenAI whisper base url (#17695)
**Description:** The base URL for OpenAI is retrieved from the
environment variable "OPENAI_BASE_URL", whereas for langchain it is
obtained from "OPENAI_API_BASE". By adding `base_url =
os.environ.get("OPENAI_API_BASE")`, the OpenAI proxy can execute
correctly.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 00:35:27 +00:00
Paulo Nascimento
44a3484503
community[patch]: add NotebookLoader unit test (#17721)
Thank you for contributing to LangChain!

- **Description:** added unit tests for NotebookLoader. Linked PR:
https://github.com/langchain-ai/langchain/pull/17614
- **Issue:**
[#17614](https://github.com/langchain-ai/langchain/pull/17614)
    - **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 00:27:46 +00:00
Paulo Nascimento
4c3a67122f
community[patch]: add Integration for OpenAI image gen with v1 sdk (#17771)
**Description:** Created a Langchain Tool for OpenAI DALLE Image
Generation.
**Issue:**
[#15901](https://github.com/langchain-ai/langchain/issues/15901)
**Dependencies:** n/a
**Twitter handle:** @paulodoestech

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-29 00:23:14 +00:00
Jiaming
3d3cc71287
community[patch]: fix bugs for bilibili Loader (#18036)
- **Description:** 
1. Fix the BiliBiliLoader that can receive cookie parameters, it
requires 3 other parameters to run. The change is backward compatible.
  2. Add test;      
  3. Add example in docs

- **Issue:** [#14213]

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-28 16:39:38 -07:00
Sachin Paryani
25c9f3d1d1
community[patch]: Support Streaming in Azure Machine Learning (#18246)
- [x] **PR title**: "community: Support streaming in Azure ML and few
naming changes"

- [x] **PR message**:
- **Description:** Added support for streaming for azureml_endpoint.
Also, renamed and AzureMLEndpointApiType.realtime to
AzureMLEndpointApiType.dedicated. Also, added new classes
CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and
updated the classes LlamaChatContentFormatter and LlamaContentFormatter
to now show a deprecated warning message when instantiated.

---------

Co-authored-by: Sachin Paryani <saparan@microsoft.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-28 23:38:20 +00:00
Victor Adan
afa2d85405
community[patch]: Added missing from_documents method to KNNRetriever. (#18411)
- Description: Added missing `from_documents` method to `KNNRetriever`,
providing the ability to supply metadata to LangChain `Document`s, and
to give it parity to the other retrievers, which do have
`from_documents`.
- Issue: None
- Dependencies: None
- Twitter handle: None

Co-authored-by: Victor Adan <vadan@netroadshow.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-28 23:18:50 +00:00
Smit Parmar
dfc4177b50
community[patch]: mypy ignore fix (#18483)
Relates to #17048 
Description : Applied fix to dynamodb and elasticsearch file.

Error was : `Cannot override writeable attribute with read-only
property`
Suggestion:
instead of adding 
```
@messages.setter
def messages(self, messages: List[BaseMessage]) -> None:
    raise NotImplementedError("Use add_messages instead")
```

we can change base class property
`messages: List[BaseMessage]`
to
```
@property
def messages(self) -> List[BaseMessage]:...
```

then we don't need to add `@messages.setter` in all child classes.
2024-03-28 15:36:53 -07:00
Luca Dorigo
f19229c564
core[patch]: fix beta, deprecated typing (#18877)
**Description:** 

While not technically incorrect, the TypeVar used for the `@beta`
decorator prevented pyright (and thus most vscode users) from correctly
seeing the types of functions/classes decorated with `@beta`.

This is in part due to a small bug in pyright
(https://github.com/microsoft/pyright/issues/7448 ) - however, the
`Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not
doing anything - classes are `Callables` by default, so by my
understanding binding to `Type` does not actually provide any more
safety - the modified annotation still works correctly for both
functions, properties, and classes.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-28 22:33:43 +00:00
wulixuan
b7c8bc8268
community[patch]: fix yuan2 errors in LLMs (#19004)
1. fix yuan2 errors while invoke Yuan2.
2. update tests.
2024-03-28 14:37:44 -07:00
高远
688ca48019
community[patch]: Adding validation when vector does not exist (#19698)
Adding validation when vector does not exist

Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>
2024-03-28 13:58:23 -07:00
Chaunte W. Lacewell
4a49fc5a95
community[patch]: Fix bug in vdms (#19728)
**Description:** Fix embedding check in vdms
**Contribution maintainer:** [@cwlacewe](https://github.com/cwlacewe)
2024-03-28 12:54:24 -07:00
高璟琦
75173d31db
community[minor]: Add solar model chat model (#18556)
Add our solar chat models, available model choices:
* solar-1-mini-chat
* solar-1-mini-translate-enko
* solar-1-mini-translate-koen

More documents and pricing can be found at
https://console.upstage.ai/services/solar.

The references to our solar model can be found at
* https://arxiv.org/abs/2402.17032

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-28 12:31:11 -07:00
Davide Menini
f7042321f1
community[patch]: gather token usage info in BedrockChat during generation (#19127)
This PR allows to calculate token usage for prompts and completion
directly in the generation method of BedrockChat. The token usage
details are then returned together with the generations, so that other
downstream tasks can access them easily.

This allows to define a callback for tokens tracking and cost
calculation, similarly to what happens with OpenAI (see
[OpenAICallbackHandler](https://api.python.langchain.com/en/latest/_modules/langchain_community/callbacks/openai_info.html#OpenAICallbackHandler).
I plan on adding a BedrockCallbackHandler later.
Right now keeping track of tokens in the callback is already possible,
but it requires passing the llm, as done here:
https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html.
However, I find the approach of this PR cleaner.

Thanks for your reviews. FYI @baskaryan, @hwchase17

---------

Co-authored-by: taamedag <Davide.Menini@swisscom.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-28 18:58:46 +00:00
ligang-super
a662468dde
community[patch]: Fix the error of Baidu Qianfan not passing the stop parameter (#18666)
- [x] **PR title**: "community: fix baidu qianfan missing stop
parameter"
- [x] **PR message**:
- **Description: Baidu Qianfan lost the stop parameter when requesting
service due to extracting it from kwargs. This bug can cause the agent
to receive incorrect results

---------

Co-authored-by: ligang33 <ligang33@baidu.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-28 18:21:49 +00:00
kaijietti
9c4b6dc979
community[patch]: fix bug in cohere that async for a coroutine in ChatCohere (#19381)
Without `await`, the `stream` returned from the `async_client` is
actually a coroutine, which could not be used in `async for`.
2024-03-27 21:34:46 -07:00
Christian Galo
1adaa3c662
community[minor]: Update Azure Cognitive Services to Azure AI Services (#19488)
This is a follow up to #18371. These are the changes:
- New **Azure AI Services** toolkit and tools to replace those of
**Azure Cognitive Services**.
- Updated documentation for Microsoft platform.
- The image analysis tool has been rewritten to use the new package
`azure-ai-vision-imageanalysis`, doing a proper replacement of
`azure-ai-vision`.

These changes:
- Update outdated naming from "Azure Cognitive Services" to "Azure AI
Services".
- Update documentation to use non-deprecated methods to create and use
agents.
- Removes need to depend on yanked python package (`azure-ai-vision`)

There is one new dependency that is needed as a replacement to
`azure-ai-vision`:
- `azure-ai-vision-imageanalysis`. This is optional and declared within
a function.

There is a new `azure_ai_services.ipynb` notebook showing usage; Changes
have been linted and formatted.

I am leaving the actions of adding deprecation notices and future
removal of Azure Cognitive Services up to the LangChain team, as I am
not sure what the current practice around this is.

---

If this PR makes it, my handle is  @galo@mastodon.social

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-03-28 03:19:02 +00:00
Shengsheng Huang
ac1dd8ad94
community[minor]: migrate bigdl-llm to ipex-llm (#19518)
- **Description**: `bigdl-llm` library has been renamed to
[`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR
migrates the `bigdl-llm` integration to `ipex-llm` .
- **Issue**: N/A. The original PR of `bigdl-llm` is
https://github.com/langchain-ai/langchain/pull/17953
- **Dependencies**: `ipex-llm` library
- **Contribution maintainer**: @shane-huang

Updated doc:   docs/docs/integrations/llms/ipex_llm.ipynb
Updated test:
libs/community/tests/integration_tests/llms/test_ipex_llm.py
2024-03-27 20:12:59 -07:00
Chaunte W. Lacewell
a31f692f4e
community[minor]: Add VDMS vectorstore (#19551)
- **Description:** Add support for Intel Lab's [Visual Data Management
System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store
- **Dependencies:** `vdms` library which requires protobuf = "4.24.2".
There is a conflict with dashvector in `langchain` package but conflict
is resolved in `community`.
- **Contribution maintainer:** [@cwlacewe](https://github.com/cwlacewe)
- **Added tests:**
libs/community/tests/integration_tests/vectorstores/test_vdms.py
- **Added docs:** docs/docs/integrations/vectorstores/vdms.ipynb
- **Added cookbook:** cookbook/multi_modal_RAG_vdms.ipynb

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-28 03:12:11 +00:00
William FH
b7b62e29fb
community[patch], mongodb[patch]: Stop spamming SIMD import warnings (#19531)
If you use an embedding dist function in an eval loop, you get warned
every time. Would prefer to just check once and forget about it.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-28 03:11:02 +00:00
yongheng.liu
7e29b6061f
community[minor]: integrate China Mobile Ecloud vector search (#15298)
- **Description:** integrate China Mobile Ecloud vector search, 
  - **Dependencies:** elasticsearch==7.10.1

Co-authored-by: liuyongheng <liuyongheng@cmss.chinamobile.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-27 23:02:40 +00:00
Hyeongchan Kim
9b70131aed
community[patch]: refactor the type hint of file_path in UnstructuredAPIFileLoader class (#18839)
* **Description**: add `None` type for `file_path` along with `str` and
`List[str]` types.
* `file_path`/`filename` arguments in `get_elements_from_api()` and
`partition()` can be `None`, however, there's no `None` type hint for
`file_path` in `UnstructuredAPIFileLoader` and `UnstructuredFileLoader`
currently.
* calling the function with `file_path=None` is no problem, but my IDE
annoys me lol.
* **Issue**: N/A
* **Dependencies**: N/A

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-27 22:31:54 +00:00
CaroFG
cf96060ab7
community[patch]: update for compatibility with latest Meilisearch version (#18970)
- **Description:** Updates Meilisearch vectorstore for compatibility
with v1.6 and above. Adds embedders settings and embedder_name which are
now required.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-27 22:08:27 +00:00
chyroc
be2adb1083
community[patch]: support unstructured_kwargs for s3 loader (#15473)
fix https://github.com/langchain-ai/langchain/issues/15472

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-27 22:03:48 +00:00
Tomaz Bratanic
87d2a6b777
community[minor]: Add the option to omit schema refresh in Neo4jGraph (#19654) 2024-03-27 14:20:12 -04:00
Rajendra Kadam
0019d8a948
community[minor]: Add support for non-file-based Document Loaders in PebbloSafeLoader (#19574)
**Description:**
PebbloSafeLoader: Add support for non-file-based Document Loaders

This pull request enhances PebbloSafeLoader by introducing support for
several non-file-based Document Loaders. With this update,
PebbloSafeLoader now seamlessly integrates with the following loaders:
- GoogleDriveLoader
- SlackDirectoryLoader
- Unstructured EmailLoader

**Issue:** NA
**Dependencies:** - None
**Twitter handle:** @Raj__725

---------

Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-03-27 17:39:52 +00:00
hulitaitai
dc2c9dd4d7
Update text2vec.py (#19657)
Add that URL of the embedding tool "text2vec".
Fix minor mistakes in the doc-string.
2024-03-27 13:13:30 -04:00
Guangdong Liu
7042934b5f
community[patch]: Fix the bug that Chroma does not specify embedding_function (#19277)
- **Issue:** close #18291
- @baskaryan, @eyurtsev PTAL
2024-03-27 11:43:38 -04:00
yuwenzho
3a7d2cf443
community[minor]: Add ITREX optimized Embeddings (#18474)
Introduction
[Intel® Extension for
Transformers](https://github.com/intel/intel-extension-for-transformers)
is an innovative toolkit designed to accelerate GenAI/LLM everywhere
with the optimal performance of Transformer-based models on various
Intel platforms

Description

adding ITREX runtime embeddings using intel-extension-for-transformers.
added mdx documentation and example notebooks
added embedding import testing.

---------

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-27 07:22:06 +00:00
Fabrizio Ruocco
f12cb0bea4
community[patch]: Microsoft Azure Document Intelligence updates (#16932)
- **Description:** Update Azure Document Intelligence implementation by
Microsoft team and RAG cookbook with Azure AI Search

---------

Co-authored-by: Lu Zhang (AI) <luzhan@microsoft.com>
Co-authored-by: Yateng Hong <yatengh@microsoft.com>
Co-authored-by: teethache <hongyateng2006@126.com>
Co-authored-by: Lu Zhang <44625949+luzhang06@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-26 23:36:59 -07:00
Timothy
ad77fa15ee
community[patch]: Adding try-except block for GCSDirectoryLoader (#19591)
- **Description:** Implemented try-except block for
`GCSDirectoryLoader`. Reason: Users processing large number of
unstructured files in a folder may experience many different errors. A
try-exception block is added to capture these errors. A new argument
`use_try_except=True` is added to enable *silent failure* so that error
caused by processing one file does not break the whole function.
- **Issue:** N/A
- **Dependencies:** no new dependencies
- **Twitter handle:** timothywong731

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-27 00:12:24 +00:00
xsai9101
160a8eb178
community[minor]: add oracle autonomous database doc loader integration (#19536)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Adding oracle autonomous database document loader
integration. This will allow users to connect to oracle autonomous
database through connection string or TNS configuration.
    https://www.oracle.com/autonomous-database/
    - **Issue:** None
    - **Dependencies:** oracledb python package 
    https://pypi.org/project/oracledb/
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
  Unit test and doc are added.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-26 17:02:18 -07:00
Adam Law
aeb7b6b11d
community[patch]: use semantic_configurations in AzureSearch (#19347)
- **Description:** Currently the semantic_configurations are not used
when creating an AzureSearch instance, instead creating a new one with
default values. This PR changes the behavior to use the passed
semantic_configurations if it is present, and the existing default
configuration if not.

---------

Co-authored-by: Adam Law <adamlaw@microsoft.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-26 13:57:39 -07:00
Adrian Valente
2763d8cbe5
community: add len() implementation to Chroma (#19419)
Thank you for contributing to LangChain!

- [x] **Add len() implementation to Chroma**: "package: community"


- [x] **PR message**: 
- **Description:** add an implementation of the __len__() method for the
Chroma vectostore, for convenience.
- **Issue:** no exposed method to know the size of a Chroma vectorstore
    - **Dependencies:** None
    - **Twitter handle:** lowrank_adrian


- [x] **Add tests and docs**

- [x] **Lint and test**

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-26 12:53:10 -04:00
Tom Aarsen
e0a1278d2b
docs: HFEmbeddings: Add more information to model_kwargs/encode_kwargs (#19594)
- **Description:** Be more explicit with the `model_kwargs` and
`encode_kwargs` for `HuggingFaceEmbeddings`.
    - **Issue:** -
    - **Dependencies:** -

I received some reports by my users that they didn't realise that you
could change the default `batch_size` with `HuggingFaceEmbeddings`,
which may be attributed to how the `model_kwargs` and `encode_kwargs`
don't give much information about what you can specify.

I've added some parameter names & links to the Sentence Transformers
documentation to help clear it up. Let me know if you'd rather have
Markdown/Sphinx-style hyperlinks rather than a "bare URL".

- Tom Aarsen
2024-03-26 12:46:04 -04:00
Dobiichi-Origami
18e6f9376d
community[Qianfan]: add function_call in additional_kwargs (#19550)
- **Description:** add lacked `function_call` field in
`additional_kwargs` in previous version
- **Dependencies:** None of new dependency
2024-03-26 12:20:19 -04:00
mwmajewsk
f7a1fd91b8
community: better support of pathlib paths in document loaders (#18396)
So this arose from the
https://github.com/langchain-ai/langchain/pull/18397 problem of document
loaders not supporting `pathlib.Path`.

This pull request provides more uniform support for Path as an argument.
The core ideas for this upgrade: 
- if there is a local file path used as an argument, it should be
supported as `pathlib.Path`
- if there are some external calls that may or may not support Pathlib,
the argument is immidiately converted to `str`
- if there `self.file_path` is used in a way that it allows for it to
stay pathlib without conversion, is is only converted for the metadata.

Twitter handle: https://twitter.com/mwmajewsk
2024-03-26 11:51:52 -04:00
Yuki Watanabe
cfecbda48b
community[minor]: Allow passing allow_dangerous_deserialization when loading LLM chain (#18894)
### Issue
Recently, the new `allow_dangerous_deserialization` flag was introduced
for preventing unsafe model deserialization that relies on pickle
without user's notice (#18696). Since then some LLMs like Databricks
requires passing in this flag with true to instantiate the model.

However, this breaks existing functionality to loading such LLMs within
a chain using `load_chain` method, because the underlying loader
function
[load_llm_from_config](f96dd57501/libs/langchain/langchain/chains/loading.py (L40))
 (and load_llm) ignores keyword arguments passed in. 

### Solution
This PR fixes this issue by propagating the
`allow_dangerous_deserialization` argument to the class loader iff the
LLM class has that field.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-26 11:07:55 -04:00
hulitaitai
d7c14cb6f9
community[minor]: Add embeddings integration for text2vec (#19267)
Create a Class which allows to use the "text2vec" open source embedding
model.

It should install the model by running 'pip install -U text2vec'.
Example to call the model through LangChain:

from langchain_community.embeddings.text2vec import Text2vecEmbeddings

            embedding = Text2vecEmbeddings()
            bookend.embed_documents([
                "This is a CoSENT(Cosine Sentence) model.",
"It maps sentences to a 768 dimensional dense vector space.",
            ])
            bookend.embed_query(
                "It can be used for text matching or semantic search."
            )

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-26 11:06:58 -04:00
Kalyan Mudumby
d27600c6f7
community[patch]: GPTCache pydantic validation error on lookup (#19427)
Description:
this change fixes the pydantic validation error when looking up from
GPTCache, the `ChatOpenAI` class returns `ChatGeneration` as response
which is not handled.
use the existing `_loads_generations` and `_dumps_generations` functions
to handle it

Trace
```
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/development/scripts/chatbot-postgres-test.py", line 90, in <module>
    print(llm.invoke("tell me a joke"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 166, in invoke
    self.generate_prompt(
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 544, in generate_prompt
    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 408, in generate
    raise e
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 398, in generate
    self._generate_with_cache(
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 585, in _generate_with_cache
    cache_val = llm_cache.lookup(prompt, llm_string)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 807, in lookup
    return [
           ^
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 808, in <listcomp>
    Generation(**generation_dict) for generation_dict in json.loads(res)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__
    super().__init__(**kwargs)
  File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation
type
  unexpected value; permitted: 'Generation' (type=value_error.const; given=ChatGeneration; permitted=('Generation',))
```


Although I don't seem to find any issues here, here's an
[issue](https://github.com/zilliztech/GPTCache/issues/585) raised in
GPTCache. Please let me know if I need to do anything else

Thank you

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-26 10:52:30 -04:00
Piyush Jain
72ba738bf5
community[minor]: Improvements for NeptuneRdfGraph, Improve discovery of graph schema using database statistics (#19546)
Fixes linting for PR
[19244](https://github.com/langchain-ai/langchain/pull/19244)

---------

Co-authored-by: mhavey <mchavey@gmail.com>
2024-03-26 10:36:51 -04:00
Christophe Bornet
8595c3ab59
community[minor]: Add InMemoryVectorStore to module level imports (#19576) 2024-03-26 14:07:44 +00:00
Aayush Kataria
03c38005cb
community[patch]: Fixing some caching issues for AzureCosmosDBSemanticCache (#18884)
Fixing some issues for AzureCosmosDBSemanticCache
- Added the entry for "AzureCosmosDBSemanticCache" which was missing in
langchain/cache.py
- Added application name when creating the MongoClient for the
AzureCosmosDBVectorSearch, for tracking purposes.

@baskaryan, can you please review this PR, we need this to go in asap.
These are just small fixes which we found today in our testing.
2024-03-25 19:06:17 -07:00
Clément Tamines
a6cbb755a7
community[patch]: fix semantic answer bug in AzureSearch vector store (#18938)
- **Description:** The `semantic_hybrid_search_with_score_and_rerank`
method of `AzureSearch` contains a hardcoded field name "metadata" for
the document metadata in the Azure AI Search Index. Adding such a field
is optional when creating an Azure AI Search Index, as other snippets
from `AzureSearch` test for the existence of this field before trying to
access it. Furthermore, the metadata field name shouldn't be hardcoded
as "metadata" and use the `FIELDS_METADATA` variable that defines this
field name instead. In the current implementation, any index without a
metadata field named "metadata" will yield an error if a semantic answer
is returned by the search in
`semantic_hybrid_search_with_score_and_rerank`.

- **Issue:** https://github.com/langchain-ai/langchain/issues/18731

- **Prior fix to this bug:** This bug was fixed in this PR
https://github.com/langchain-ai/langchain/pull/15642 by adding a check
for the existence of the metadata field named `FIELDS_METADATA` and
retrieving a value for the key called "key" in that metadata if it
exists. If the field named `FIELDS_METADATA` was not present, an empty
string was returned. This fix was removed in this PR
https://github.com/langchain-ai/langchain/pull/15659 (see
ed1ffca911#).
@lz-chen: could you confirm this wasn't intentional? 

- **New fix to this bug:** I believe there was an oversight in the logic
of the fix from
[#1564](https://github.com/langchain-ai/langchain/pull/15642) which I
explain below.
The `semantic_hybrid_search_with_score_and_rerank` method creates a
dictionary `semantic_answers_dict` with semantic answers returned by the
search as follows.

5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)
The keys in this dictionary are the unique document ids in the index, if
I understand the [documentation of semantic
answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers)
in Azure AI Search correctly. When the method transforms a search result
into a `Document` object, an "answer" key is added to the document's
metadata. The value for this "answer" key should be the semantic answer
returned by the search from this document, if such an answer is
returned. The match between a `Document` object and the semantic answers
returned by the search should be done through the unique document id,
which is used as a key for the `semantic_answers_dict` dictionary. This
id is defined in the search result's field named `FIELDS_ID`. I added a
check to avoid any error in case no field named `FIELDS_ID` exists in a
search result (which shouldn't happen in theory).
A benefit of this approach is that this fix should work whether or not
the Azure AI Search Index contains a metadata field.

@levalencia could you confirm my analysis and test the fix?
@raunakshrivastava7 do you agree with the fix?

Thanks for the help!
2024-03-25 18:51:54 -07:00
Anindyadeep
b2a11ce686
community[minor]: Prem AI langchain integration (#19113)
### Prem SDK integration in LangChain

This PR adds the integration with [PremAI's](https://www.premai.io/)
prem-sdk with langchain. User can now access to deployed models
(llms/embeddings) and use it with langchain's ecosystem. This PR adds
the following:

### This PR adds the following:

- [x]  Add chat support
- [X]  Adding embedding support
- [X]  writing integration tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  writing unit tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  Adding documentation
    - [X]  writing documentation for chat
    - [X]  writing documentation for embedding
- [X] run `make test`
- [X] run `make lint`, `make lint_diff` 
- [X]  Final checks (spell check, lint, format and overall testing)

---------

Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-26 01:37:19 +00:00
Souhail Hanfi
cbec43afa9
community[patch]: avoid creating extension PGvector while using readOnly Databases (#19268)
- **Description:** PgVector class always runs "create extension" on init
and this statement crashes on ReadOnly databases (read only replicas).
but wierdly the next create collection etc work even in readOnly
databases
- **Dependencies:** no new dependencies
- **Twitter handle:** @VenOmaX666

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-26 01:25:01 +00:00
Barun Amalkumar Halder
9246ec6b36
community[patch] : [Fiddler] ensure dataset is not added if model is present (#19293)
**Description:**
- minor PR to speed up onboarding by not trying to add a dataset, if a
model is already present.
- replace batch publish API with streaming when single events are
published.

**Dependencies:** any dependencies required for this change
**Twitter handle:** behalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
2024-03-25 17:28:05 -07:00
JSDu
6e090280fd
community[patch]: milvus will autoflush, manual flush is slowly (#19300)
reference:


https://milvus.io/docs/configure_quota_limits.md#quotaAndLimitsflushRateenabled

https://github.com/milvus-io/milvus/issues/31407

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-26 00:26:58 +00:00
mackong
e65dc4b95b
community[patch]: clean warning when delete by ids (#19301)
* Description: rearrange to avoid variable overwrite, which cause
warning always.
* Issue: N/A
* Dependencies: N/A
2024-03-25 17:23:22 -07:00
Stefano Mosconi
01fc69c191
community[patch]: expanding version in confluence loader (#19324)
**Description:**
Expanding version in all the Confluence API calls so to get when the
page was last modified/created in all cases.

**Issue:** #12812 
**Twitter handle:** zzste
2024-03-25 17:08:01 -07:00
Dmitry Tyumentsev
08b769d539
community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341)
Fixed inability to work with [yandexcloud
SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0
2024-03-25 17:05:57 -07:00
Marlene
f1313339ac
community[patch]: Fixing incorrect base URLs for Azure Cognitive Search Retriever (#19352)
This PR adds code to make sure that the correct base URL is being
created for the Azure Cognitive Search retriever. At the moment an
incorrect base URL is being generated. I think this is happening because
the original code was based on a depreciated API version. No
dependencies need to be added. I've also added more context to the test
doc strings.

I should also note that ACS is now Azure AI Search. I will open a
separate PR to make these changes as that would be a breaking change and
should potentially be discussed.

Twitter: @marlene_zw



- No new tests added, however the current ACS retriever tests are now
passing when I run them.
- Code was linted.

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-26 00:04:59 +00:00
FinTech秋田
03ba1d4731
community[patch]: Add Support for GPU Index Types in Milvus 2.4 (#19468)
- **Description:** This commit introduces support for the newly
available GPU index types introduced in Milvus 2.4 within the LangChain
project's `milvus.py`. With the release of Milvus 2.4, a range of
GPU-accelerated index types have been added, offering enhanced search
capabilities and performance optimizations for vector search operations.
This update ensures LangChain users can fully utilize the new
performance benefits for vector search operations.
    - Reference: https://milvus.io/docs/gpu_index.md

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-25 23:39:54 +00:00
Ash Vardanian
d01bad5169
core[patch]: Convert SimSIMD back to NumPy (#19473)
This patch fixes the #18022 issue, converting the SimSIMD internal
zero-copy outputs to NumPy.

I've also noticed, that oftentimes `dtype=np.float32` conversion is used
before passing to SimSIMD. Which numeric types do LangChain users
generally care about? We support `float64`, `float32`, `float16`, and
`int8` for cosine distances and `float16` seems reasonable for
practically any kind of embeddings and any modern piece of hardware, so
we can change that part as well 🤗
2024-03-25 16:36:26 -07:00
Mikelarg
dac2e0165a
community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516)
- **Description:** Added integration with
[GigaChat](https://developers.sber.ru/portal/products/gigachat)
embeddings. Also added support for extra fields in GigaChat LLM and
fixed docs.
2024-03-25 16:08:37 -07:00
Martin Kolb
e5bdb26f76
community[patch]: More flexible handling for entity names in vector store "HANA Cloud" (#19523)
- **Description:** Added support for lower-case and mixed-case names
The names for tables and columns previouly had to be UPPER_CASE.
With this enhancement, also lower_case and MixedCase are supported,


  - **Issue:** N/A
  - **Dependencies:** no new dependecies added
  - **Twitter handle:** @sapopensource
2024-03-25 15:52:45 -07:00
billytrend-cohere
63343b4987
cohere[patch]: add cohere as a partner package (#19049)
Description: adds support for langchain_cohere

---------

Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-25 20:23:47 +00:00
ccurme
82de8fd6c9
add kwargs (#19519)
`HanaDB.add_texts` is missing **kwargs.
2024-03-25 11:56:01 -04:00
Nikhil Kumar
3d3b46a782
docs: Update docs for HuggingFacePipeline (#19306)
Updated `HuggingFacePipeline` docs to be in sync with list of supported
tasks, including translation.

- [x] **PR title**: "community: Update docs for `HuggingFacePipeline`"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**:
- **Description:** Update docs for `HuggingFacePipeline`, was earlier
missing `translation` as a valid task
    - **Issue:** N/A
    - **Dependencies:** N/A
    - **Twitter handle:** None


- [x] **Add tests and docs**:


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2024-03-25 00:29:21 -07:00
Igor Muniz Soares
743f888580
community[minor]: Dappier chat model integration (#19370)
**Description:** 

This PR adds [Dappier](https://dappier.com/) for the chat model. It
supports generate, async generate, and batch functionalities. We added
unit and integration tests as well as a notebook with more details about
our chat model.


**Dependencies:** 
    No extra dependencies are needed.
2024-03-25 07:29:05 +00:00
Hugoberry
96dc180883
community[minor]: Add DuckDB as a vectorstore (#18916)
DuckDB has a cosine similarity function along list and array data types,
which can be used as a vector store.
- **Description:** The latest version of DuckDB features a cosine
similarity function, which can be used with its support for list or
array column types. This PR surfaces this functionality to langchain.
    - **Dependencies:** duckdb 0.10.0
    - **Twitter handle:** @igocrite

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-25 07:02:35 +00:00
preak95
6ea3e57a63
community[minor]: S3FileLoader to use expose mode and post_processors arguments of unstructured loader (#19270)
**Description:** Update s3_file.py to use arguments **mode** and
**post_processors** from the base class **UnstructuredBaseLoader** to
include more metadata about the files from the S3 bucket such as
*'page_number', 'languages'* etc.

**Issue:** NA
**Dependencies:** None
**Twitter handle:** preak95

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-25 06:56:55 +00:00
fengjial
3b52ee05d1
community[patch]: fix bugs in baiduvectordb as vectorstore (#19380)
fix small bugs in vectorstore/baiduvectordb
2024-03-22 17:03:59 -07:00
aditya thomas
515aab3312
community[patch]: invoke callback prior to yielding token (openai) (#19389)
**Description:** Invoke callback prior to yielding token for BaseOpenAI
& OpenAIChat
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
**Dependencies:** None
2024-03-22 16:45:55 -07:00
aditya thomas
49e932cd24
community[patch]: invoke callback prior to yielding token (fireworks) (#19388)
**Description:** Invoke callback prior to yielding token for Fireworks
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
**Dependencies:** None
2024-03-22 16:44:06 -07:00
Tarun Jain
ef6d3d66d6
community[patch]: docarray requires hnsw installation (#19416)
I have a small dataset, and I tried to use docarray:
``DocArrayHnswSearch ``. But when I execute, it returns:

```bash
    raise ImportError(
ImportError: Could not import docarray python package. Please install it with `pip install "langchain[docarray]"`.
```

Instead of docarray it needs to be 

```bash
docarray[hnswlib]
```

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-22 22:39:07 +00:00
German Swan
d4dc98a9f9
community[patch]: RecursiveUrlLoader: add base_url option (#19421)
RecursiveUrlLoader does not currently provide an option to set
`base_url` other than the `url`, though it uses a function with such an
option.
For example, this causes it unable to parse the
`https://python.langchain.com/docs`, as it returns the 404 page, and
`https://python.langchain.com/docs/get_started/introduction` has no
child routes to parse.
`base_url` allows setting the `https://python.langchain.com/docs` to
filter by, while the starting URL is anything inside, that contains
relevant links to continue crawling.
I understand that for this case, the docusaurus loader could be used,
but it's a common issue with many websites.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-22 15:34:31 -07:00
aditya thomas
4856a87261
community[patch]: invoke callback prior to yielding token (llama.cpp) (#19392)
**Description:** Invoke callback prior to yielding token for llama.cpp
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
**Dependencies:** None
2024-03-22 16:17:56 -04:00
billytrend-cohere
f6bcd42421
community[patch]: Replace positional argument with text=text for cohere>=5 compatibility (#19407)
- **Description:** Replace positional argument with text=text for
cohere>=5 compatibility
2024-03-21 10:42:51 -07:00
Yudhajit Sinha
7d216ad1e1
community[patch]: Invoke callback prior to yielding token (titan_takeoff_pro) (#18624)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/titan_takeoff_pro.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:58:18 -07:00
Yudhajit Sinha
455a74486b
community[patch]: Invoke callback prior to yielding token (sparkllm) (#18625)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/sparkllm.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:57:53 -07:00
Yudhajit Sinha
5ac1860484
community[patch]: Invoke callback prior to yielding token (replicate) (#18626)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/replicate.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:57:27 -07:00
Yudhajit Sinha
9525e392de
community[patch]: Invoke callback prior to yielding token (pai_eas_endpoint) (#18627)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/pai_eas_endpoint.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:56:58 -07:00
Yudhajit Sinha
140f06e59a
community[patch]: Invoke callback prior to yielding token (openai) (#18628)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/openai.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:56:30 -07:00
Yudhajit Sinha
280a914920
community[patch]: Invoke callback prior to yielding token (ollama) (#18629)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_ &
_astream_ methods in llms/ollama.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:56:09 -07:00
Christophe Bornet
00614f332a
community[minor]: Add InMemoryVectorStore (#19326)
This is a basic VectorStore implementation using an in-memory dict to
store the documents.
It doesn't need any extra/optional dependency as it uses numpy which is
already a dependency of langchain.
This is useful for quick testing, demos, examples.
Also it allows to write vendor-neutral tutorials, guides, etc...
2024-03-20 10:21:07 -04:00
Nithish Raghunandanan
7ad0a3f2a7
community: add Couchbase Vector Store (#18994)
- **Description:** Added support for Couchbase Vector Search to
LangChain.
- **Dependencies:** couchbase>=4.1.12
- **Twitter handle:** @nithishr

---------

Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
2024-03-19 12:39:51 -07:00
Christophe Bornet
30e4a35d7a
community: Use langchain-astradb for AstraDB caches (#18419)
- [x] Needs https://github.com/langchain-ai/langchain-datastax/pull/4
- [x] Needs a new release of langchain-astradb
2024-03-19 14:04:36 -04:00
Vittorio Rigamonti
9b2f9ee952
community: VectorStore Infinispan, adding autoconfiguration (#18967)
**Description**:
this PR enable VectorStore autoconfiguration for Infinispan: if
metadatas are only of basic types, protobuf
config will be automatically generated for the user.
2024-03-18 21:33:45 -07:00
gonvee
b82644078e
community: Add keep_alive parameter to control how long the model w… (#19005)
Add `keep_alive` parameter to control how long the model will stay
loaded into memory with Ollama。

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-19 04:29:01 +00:00
Harrison Chase
efcdf54edd
Josha91 fix docstring (#19249)
Co-authored-by: Josha van Houdt <josha.van.houdt@sap.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-18 21:19:56 -07:00
Taqi Jaffri
044bc22acc
Community: Add mistral oss model support to azureml endpoints, plus configurable timeout (#19123)
- **Description:** There was no formatter for mistral models for Azure
ML endpoints. Adding that, plus a configurable timeout (it was hard
coded before)
- **Dependencies:** none
- **Twitter handle:** @tjaffri @docugami
2024-03-18 21:10:42 -07:00
Hamza Muhammad Farooqi
24a0a4472a
Add docstrings for Clickhouse class methods (#19195)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-19 04:03:12 +00:00
Rohit Gupta
785f8ab174
[langchain_community] milvus vectorstores upsert: add **kwargs to make it use for other argument also (#19193)
add **kwargs in add_documents for upsert, to make it use for other
argument also.
Lets use this, it was unused as of now.

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>
2024-03-18 21:01:12 -07:00
Guangdong Liu
c3310c5e7f
community: Fix Milvus got multiple values for keyword argument 'timeout' (#19232)
- **Description:** Fix Milvus got multiple values for keyword argument
'timeout'
- **Issue:**  fix #18580
- @baskaryan @eyurtsev PTAL
2024-03-18 20:44:25 -07:00
Leonid Ganeline
7de1d9acfd
community: llms imports fixes (#18943)
Classes are missed in  __all__  and in different places of __init__.py
- BaichuanLLM 
- ChatDatabricks
- ChatMlflow
- Llamafile
- Mlflow
- Together
Added classes to __all__. I also sorted __all__ list.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-18 20:24:40 +00:00
Kenzie Mihardja
21f75991d4
deprecate community docugami loader (#19230)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: deprecate DocugamiLoader"

- [x] **PR message**: Deprecate the langchain_community and use the
docugami_langchain DocugamiLoader

---------

Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>
2024-03-18 12:56:47 -07:00
Pengfei Jiang
514fe80778
community[patch]: add stop parameter support to volcengine maas (#19052)
- **Description:** add stop parameter to volcengine maas model
- **Dependencies:** no

---------

Co-authored-by: 江鹏飞 <jiangpengfei.jiangpf@bytedance.com>
2024-03-17 01:58:50 +00:00
htaoruan
bcc771e37c
docs: ChatTongyi example error (#19013) 2024-03-17 01:55:56 +00:00
primate88
5aa68936e0
community: Fix import path for StreamingStdOutCallbackHandler example (#19170)
- Description:
- Updated the import path for `StreamingStdOutCallbackHandler` in the
streaming response example within `huggingface_endpoint.py`. This change
corrects the import statement to reflect the actual location of
`StreamingStdOutCallbackHandler` in
`langchain_core.callbacks.streaming_stdout`.
- Issue:
  - None
- Dependencies:
  - No additional dependencies are required for this change.
- Twitter handle:
  - None

## Note:
I have tested this change locally and confirmed that the
`StreamingStdOutCallbackHandler` works as expected with the updated
import path. This PR does not require the addition of new tests since it
is a correction to documentation/examples rather than functional code.
2024-03-17 00:50:37 +00:00
Nikhil Kumar
635b3372bd
community[minor]: Add support for translation in HuggingFacePipeline (#19190)
- [x] **Support for translation**: "community: Add support for
translation in `HuggingFacePipeline`"


- [x] **Add support for translation in `HuggingFacePipeline`**:
- **Description:** Add support for translation in `HuggingFacePipeline`,
which earlier used to support only text summarization and generation.
    - **Issue:** N/A
    - **Dependencies:** N/A
    - **Twitter handle:** None
2024-03-17 00:48:13 +00:00
k.muto
8d2c34e655
community: Fix all page numbers were the same for _BaseGoogleVertexAISearchRetriever (#19175)
- Description:
- This pull request is to fix a bug where page numbers were not set
correctly. In the current code, all chunks share the same metadata
object doc_metadata, so the page number is set with the same value for
all documents. To fix this, I changed to using separate metadata objects
for each chunk.
- Issue:
  - None
- Dependencies:
  - No additional dependencies are required for this change.
- Twitter handle:
  - @eycjur

- Test
- Even if it's not a bug, there are cases where everything ends up with
the same number of pages, so it's very difficult for me to write
integration tests.
2024-03-16 22:28:56 +00:00
Cailin Wang
7cd87d2f6a
community: Add partition parameter to DashVector (#19023)
**Description**: DashVector Add partition parameter
**Twitter handle**: @CailinWang_

---------

Co-authored-by: root <root@Bluedot-AI>
2024-03-16 15:20:30 -07:00
Rodrigo Nogueira
e64cf1aba4
community: Add model argument for maritalk models and better error handling (#19187) 2024-03-16 15:18:56 -07:00
Sergey Kozlov
1a55e950aa
community[patch]: support fastembed v1 and v2 (#19125)
**Description:**
#18040 forces `fastembed>2.0`, and this causes dependency conflicts with
the new `unstructured` package (different `onnxruntime`). There may be
other dependency conflicts.. The only way to use
`langchain-community>=0.0.28` is rollback to `unstructured 0.10.X`. But
new `unstructured` contains many fixes.

This PR allows to use both `fastembed` `v1` and `v2`.

How to reproduce:

`pyproject.toml`:
```toml
[tool.poetry]
name = "depstest"
version = "0.0.0"
description = "test"
authors = ["<dev@example.org>"]

[tool.poetry.dependencies]
python = ">=3.10,<3.12"
langchain-community = "^0.0.28"
fastembed = "^0.2.0"
unstructured = {extras = ["pdf"], version = "^0.12"}
```

```bash
$ poetry lock
```

Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
2024-03-15 18:33:51 -07:00
高远
ef9813dae6
docs: add vikingdb docstrings(#19016)
Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>
2024-03-15 16:29:29 -07:00
wulixuan
0e0030f494
community[patch]: fix yuan2 chat model errors while invoke. (#19015)
1. fix yuan2 chat model errors while invoke.
2. update related tests.
3. fix some deprecationWarning.
2024-03-15 16:28:36 -07:00
Shuai Liu
c244e1a50b
community[patch]: Fixed bug in merging generation_info during chunk concatenation in Tongyi and ChatTongyi (#19014)
- **Description:** 

In #16218 , during the `GenerationChunk` and `ChatGenerationChunk`
concatenation, the `generation_info` merging changed from simple keys &
values replacement to using the util method
[`merge_dicts`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py):


![image](https://github.com/langchain-ai/langchain/assets/2098020/10f315bf-7fe0-43a7-a0ce-6a3834b99a15)

The `merge_dicts` method could not handle merging values of `int` or
some other types, and would raise a
[`TypeError`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py#L55).

This PR fixes this issue in the **Tongyi and ChatTongyi Model** by
adopting the `generation_info` of the last chunk
and discarding the `generation_info` of the intermediate chunks,
ensuring that `stream` and `astream` function correctly.

- **Issue:**  
    - Related issues or PRs about Tongyi & ChatTongyi: #16605, #17105 
    - Other models or cases: #18441, #17376
- **Dependencies:** No new dependencies
2024-03-15 16:27:53 -07:00
Christophe Bornet
f2a7dda4bd
community[patch]: Use langchain-astradb for AstraDB doc loader (#19071)
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-15 22:57:25 +00:00
Holt Skinner
cee03630d9
community[patch]: Add Blended Search Support to GoogleVertexAISearchRetriever (#19082)
https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#multi-data-stores

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-15 22:39:31 +00:00
case-k
ebc4a64f9e
docs: fix databricks document url (#19096)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-15 22:25:11 +00:00
Guangdong Liu
cced3eb9bc
community[patch]: Fix sparkllm embeddings api bug. (#19122)
- **Description:** Fix sparkllm embeddings api bug.
@baskaryan PTAL
2024-03-15 15:08:49 -07:00
kaijietti
c20aeef79a
community[patch]: implement qdrant _aembed_query and use it in other async funcs (#19155)
`amax_marginal_relevance_search ` and `asimilarity_search_with_score `
should use an async version of `_embed_query `.
2024-03-15 21:20:12 +00:00
Barun Amalkumar Halder
34d6f0557d
community[patch] : publishes duration as milliseconds to Fiddler (#19166)
**Description:** Many LLM steps complete in sub-second duration, which
can lead to non-collection of duration field for Fiddler. This PR
updates duration from seconds to milliseconds.
**Issue:** [INTERNAL] FDL-17568
**Dependencies:** NA
**Twitter handle:** behalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
2024-03-15 14:04:56 -07:00
Barun Amalkumar Halder
b551d49cf5
community[patch] : adds feedback and status for Fiddler callback handler events (#19157)
**Description:** This PR adds updates the fiddler events schema to also
pass user feedback, and llm status to fiddler
   **Tickets:** [INTERNAL] FDL-17559 
   **Dependencies:**  NA
   **Twitter handle:** behalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
2024-03-15 12:03:49 -07:00
Juan Felipe Arias
f5b9aedc48
community[patch]: add args_schema to sql_database tools for langGraph integration (#18595)
- **Description:** This modification adds pydantic input definition for
sql_database tools. This helps for function calling capability in
LangGraph. Since actions nodes will usually check for the args_schema
attribute on tools, This update should make these tools compatible with
it (only implemented on the InfoSQLDatabaseTool)
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** juanfe8881
2024-03-15 19:03:36 +00:00
fengjial
c922ea36cb
community[minor]: Add Baidu VectorDB as vector store (#17997)
Co-authored-by: fengjialin <fengjialin@MacBook-Pro.local>
2024-03-15 19:01:58 +00:00
Erick Friis
781aee0068
community, langchain, infra: revert store extended test deps outside of poetry (#19153)
Reverts langchain-ai/langchain#18995

Because it makes installing dependencies in python 3.11 extended testing
take 80 minutes
2024-03-15 17:10:47 +00:00
Erick Friis
9e569d85a4
community, langchain, infra: store extended test deps outside of poetry (#18995)
poetry can't reliably handle resolving the number of optional "extended
test" dependencies we have. If we instead just rely on pip to install
extended test deps in CI, this isn't an issue.
2024-03-15 05:55:30 +00:00
Erick Friis
7ce81eb6f4
voyageai[patch]: init package (#19098)
Co-authored-by: fodizoltan <zoltan@conway.expert>
Co-authored-by: Yujie Qian <thomasq0809@gmail.com>
Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com>
2024-03-15 00:56:10 +00:00
billytrend-cohere
7253b816cc
community: Add support for cohere SDK v5 (keeps v4 backwards compatibility) (#19084)
- **Description:** Add support for cohere SDK v5 (keeps v4 backwards
compatibility)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-14 15:53:24 -07:00
Eugene Yurtsev
6cdca4355d
community[minor]: Revamp PGVector Filtering (#18992)
This PR makes the following updates in the pgvector database:

1. Use JSONB field for metadata instead of JSON
2. Update operator syntax to include required `$` prefix before the
operators (otherwise there will be name collisions with fields)
3. The change is non-breaking, old functionality is still the default,
but it will emit a deprecation warning
4. Previous functionality has bugs associated with comparisons due to
casting to text (so lexical ordering is used incorrectly for numeric
fields)
5. Adds an a GIN index on the JSONB field for more efficient querying
2024-03-14 16:56:00 -04:00
Anton Parkhomenko
ae73b9d839
community[patch]: Fix NotionDBLoader 400 Error by conditionally adding filter parameter (#19075)
- **Description:** This change fixes a bug where attempts to load data
from Notion using the NotionDBLoader resulted in a 400 Bad Request
error. The issue was traced to the unconditional addition of an empty
'filter' object in the request payload, which Notion's API does not
accept. The modification ensures that the 'filter' object is only
included in the payload when it is explicitly provided and not empty,
thus preventing the 400 error from occurring.
- **Issue:** Fixes
[#18009](https://github.com/langchain-ai/langchain/issues/18009)
- **Dependencies:** None
- **Twitter handle:** @gunnzolder

Co-authored-by: Anton Parkhomenko <anton@merge.rocks>
2024-03-14 13:56:57 +00:00
Leonid Ganeline
9c8523b529
community[patch]: flattening imports 3 (#18939)
@eyurtsev
2024-03-12 15:18:54 -07:00
Dobiichi-Origami
471f2ed40a
community[patch]: re-arrange the addtional_kwargs of returned qianfan structure to avoid _merge_dict issue (#18889)
fix issue: https://github.com/langchain-ai/langchain/issues/18441
PTAL, thanks
@baskaryan, @efriis, @eyurtsev, @hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-12 05:43:56 +00:00
Tymofii
0bec1f6877
commnity[patch]: refactor code for faiss vectorstore, update faiss vectorstore documentation (#18092)
**Description:** Refactor code of FAISS vectorcstore and update the
related documentation.
Details: 
 - replace `.format()` with f-strings for strings formatting;
- refactor definition of a filtering function to make code more readable
and more flexible;
- slightly improve efficiency of
`max_marginal_relevance_search_with_score_by_vector` method by removing
unnecessary looping over the same elements;
- slightly improve efficiency of `delete` method by using set data
structure for checking if the element was already deleted;

**Issue:** fix small inconsistency in the documentation (the old example
was incorrect and unappliable to faiss vectorstore)

**Dependencies:** basic langchain-community dependencies and `faiss`
(for CPU or for GPU)

**Twitter handle:** antonenkodev
2024-03-11 22:33:03 -07:00
Leonid Ganeline
11195cfa42
community[patch]: speed up import times in the community package (#18928)
This PR speeds up import times in the community package
2024-03-11 16:37:36 -04:00
Virat Singh
cafffe8a21
community: Add PolygonAggregates tool (#18882)
**Description:**
In this PR, I am adding a `PolygonAggregates` tool, which can be used to
get historical stock price data (called aggregates by Polygon) for a
given ticker.

Polygon
[docs](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to)
for this endpoint.

**Twitter**: 
[@virattt](https://twitter.com/virattt)
2024-03-11 11:58:10 -07:00
Mohammad Mohtashim
43db4cd20e
core[major]: On Tool End Observation Casting Fix (#18798)
This PR updates the on_tool_end handlers to return the raw output from the tool instead of casting it to a string. 

This is technically a breaking change, though it's impact is expected to be somewhat minimal. It will fix behavior in `astream_events` as well.

Fixes the following issue #18760 raised by @eyurtsev

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-11 10:59:04 -04:00
Massimiliano Pronesti
8113d612bb
community[patch]: support modin document loader (#18866)
Langchain community document loaders support `pyspark`, `polars`, and
`pandas` dataframes but not `modin`'s. This PR addresses this point.
2024-03-10 18:40:04 -07:00
Pol Ruiz Farre
a7f63d8cb4
community[patch]: Fix BasePDFLoader suffix for s3 presigned urls (#18844)
BasePDFLoader doesn't parse the suffix of the file correctly when
parsing S3 presigned urls. This fix enables the proper detection and
parsing of S3 presigned URLs to prevent errors such as `OSError: [Errno
36] File name too long`.
No additional dependencies required.
2024-03-11 00:58:51 +00:00
Joshua Carroll
ddaf9de169
community: Fix bug with StreamlitChatMessageHistory (#18834)
- **Description:** Fix Streamlit bug which was introduced by
https://github.com/langchain-ai/langchain/pull/18250, update integration
test
- **Issue:** https://github.com/langchain-ai/langchain/issues/18684
- **Dependencies:** None
2024-03-09 13:42:22 -08:00
Tomaz Bratanic
a28be31a96
Switch to md5 for deduplication in neo4j integrations (#18846)
Deduplicate documents using MD5 of the page_content. Also allows for
custom deduplication with graph ingestion method by providing metadata
id attribute

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-03-09 13:28:55 -08:00
Leonid Ganeline
476d6dc596
community[patch]: Use getattr for toolkits imports (#18825)
This will preserve the namespace, without actually loading the underlying packages on init.
2024-03-08 20:54:28 -05:00
Luis Antonio Vieira Junior
67c880af74
community[patch]: adding linearization config to AmazonTextractPDFLoader (#17489)
- **Description:** Adding an optional parameter `linearization_config`
to the `AmazonTextractPDFLoader` so the caller can define how the output
will be linearized, instead of forcing a predefined set of linearization
configs. It will still have a default configuration as this will be an
optional parameter.
- **Issue:** #17457
- **Dependencies:** The same ones that already exist for
`AmazonTextractPDFLoader`
- **Twitter handle:** [@lvieirajr19](https://twitter.com/lvieirajr19)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-08 17:25:22 -08:00
Anis ZAKARI
37e89ba5b1
community[patch]: Bedrock add support for mistral models (#18756)
*Description**: My previous
[PR](https://github.com/langchain-ai/langchain/pull/18521) was
mistakenly closed, so I am reopening this one. Context: AWS released two
Mistral models on Bedrock last Friday (March 1, 2024). This PR includes
some code adjustments to ensure their compatibility with the Bedrock
class.

---------

Co-authored-by: Anis ZAKARI <anis.zakari@hymaia.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-09 01:20:38 +00:00
Keith Chan
914af69b44
community[patch]: Update azuresearch vectorstore from_texts() method to include fields argument (#17661)
- **Description:** Update azuresearch vectorstore from_texts() method to
include fields argument, necessary for creating an Azure AI Search index
with custom fields.
- **Issue:** Currently index fields are fixed to default fields if Azure
Search index is created using from_texts() method
- **Dependencies:** None
- **Twitter handle:** None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-08 17:05:35 -08:00
al1p
46f0cea2b9
community[patch][: improved the suffix prompt to avoid loop (#17791)
Small improvement to the openapi prompt.
The agent was not finding the server base URL (looping through all
nodes). This small change narrows the search and enables finding the url
faster.

No dependency 

Twitter : @al1pra
2024-03-08 16:53:09 -08:00
Théo LEBRUN
cf94091cd0
community[patch]: Skip nested directories when using S3DirectoryLoader (#17829)
- **Description:** `S3DirectoryLoader` is failing if prefix is a folder
(ex: `my_folder/`) because `S3FileLoader` will try to load that folder
and will fail. This PR skip nested directories so prefix can be set to
folder instead of `my_folder/files_prefix`.
- **Issue:**
  - #11917
  - #6535
  - #4326
- **Dependencies:** none
- **Twitter handle:** @Falydoor


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2024-03-08 16:50:58 -08:00
Venkatesan
7a18b63dbf
community[patch]: Mongo index creation (#17748)
- [ ] Title: Mongodb: MongoDB connection performance improvement. 
- [ ] Message: 
- **Description:** I made collection index_creation as optional. Index
Creation is one time process.
- **Issue:** MongoDBChatMessageHistory class object is attempting to
create an index during connection, causing each request to take longer
than usual. This should be optional with a parameter.
    - **Dependencies:** N/A
    - **Branch to be checked:** origin/mongo_index_creation

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-08 16:43:17 -08:00
wt3639
5b5b37a999
community[patch]: Add embedding instruction to HuggingFaceBgeEmbeddings (#18017)
- **Description:** Add embedding instruction to
HuggingFaceBgeEmbeddings, so that it can be compatible with nomic and
other models that need embedding instruction.

---------

Co-authored-by: Tao Wu <tao.wu@rwth-aachen.de>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-08 16:39:29 -08:00
Ishani Vyas
2b0cbd65ba
community[patch]: Add Passio Nutrition AI Food Search Tool to Community Package (#18278)
## Add Passio Nutrition AI Food Search Tool to Community Package

### Description
We propose adding a new tool to the `community` package, enabling
integration with Passio Nutrition AI for food search functionality. This
tool will provide a simple interface for retrieving nutrition facts
through the Passio Nutrition AI API, simplifying user access to
nutrition data based on food search queries.

### Implementation Details
- **Class Structure:** Implement `NutritionAI`, extending `BaseTool`. It
includes an `_run` method that accepts a query string and, optionally, a
`CallbackManagerForToolRun`.
- **API Integration:** Use `NutritionAIAPI` for the API wrapper,
encapsulating all interactions with the Passio Nutrition AI and
providing a clean API interface.
- **Error Handling:** Implement comprehensive error handling for API
request failures.

### Expected Outcome
- **User Benefits:** Enable easy querying of nutrition facts from Passio
Nutrition AI, enhancing the utility of the `langchain_community` package
for nutrition-related projects.
- **Functionality:** Provide a straightforward method for integrating
nutrition information retrieval into users' applications.

### Dependencies
- `langchain_core` for base tooling support
- `pydantic` for data validation and settings management
- Consider `requests` or another HTTP client library if not covered by
`NutritionAIAPI`.

### Tests and Documentation
- **Unit Tests:** Include tests that mock network interactions to ensure
tool reliability without external API dependency.
- **Documentation:** Create an example notebook in
`docs/docs/integrations/tools/passio_nutrition_ai.ipynb` showing usage,
setup, and example queries.

### Contribution Guidelines Compliance
- Adhere to the project's linting and formatting standards (`make
format`, `make lint`, `make test`).
- Ensure compliance with LangChain's contribution guidelines,
particularly around dependency management and package modifications.

### Additional Notes
- Aim for the tool to be a lightweight, focused addition, not
introducing significant new dependencies or complexity.
- Potential future enhancements could include caching for common queries
to improve performance.

### Twitter Handle
- Here is our Passio AI [twitter handle](https://twitter.com/@passio_ai)
where we announce our products.


If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-08 20:33:22 +00:00
Kushagra
b1f22bf76c
community[minor]: added a feature to filter documents in Mongoloader (#18253)
"community: added a feature to filter documents in Mongoloader"
- **Description:** added a feature to filter documents in Mongoloader
    - **Feature:** the feature #18251
    - **Dependencies:** No
    - **Twitter handle:** https://twitter.com/im_Kushagra
2024-03-08 12:06:35 -08:00
Christophe Bornet
e54a49b697
community[minor]: Add lazy_table_reflection param to SqlDatabase (#18742)
For some DBs with lots of tables, reflection of all the tables can take
very long. So this change will make the tables be reflected lazily when
get_table_info() is called and `lazy_table_reflection` is True.
2024-03-08 14:10:23 -05:00
Christophe Bornet
ead2a74806
community: Implement lazy_load() for JSONLoader (#18643)
Covered by `tests/unit_tests/document_loaders/test_json_loader.py`
2024-03-08 13:58:17 -05:00
Phat Vo
3ecb903d49
community[patch] : Tidy up and update Clarifai SDK functions (#18314)
Description :
* Tidy up, add missing docstring and fix unused params
* Enable using session token
2024-03-07 19:47:44 -08:00
Tomaz Bratanic
4bfe888717
comunity[patch]: Fix neo4j sanitizing values (#18750)
Fixing sanitization for when deeply nested lists appear
2024-03-07 19:21:52 -08:00
Yunmo Koo
fee6f983ef
community[minor]: Integration for Friendli LLM and ChatFriendli ChatModel. (#17913)
## Description
- Add [Friendli](https://friendli.ai/) integration for `Friendli` LLM
and `ChatFriendli` chat model.
- Unit tests and integration tests corresponding to this change are
added.
- Documentations corresponding to this change are added.

## Dependencies
- Optional dependency
[`friendli-client`](https://pypi.org/project/friendli-client/) package
is added only for those who use `Frienldi` or `ChatFriendli` model.

## Twitter handle
- https://twitter.com/friendliai
2024-03-08 02:20:47 +00:00
Smit Parmar
aed46cd6f2
community[patch]: Added support for filter out AWS Kendra search by score confidence (#12920)
**Description:** It will add support for filter out kendra search by
score confidence which will make result more accurate.
    For example
   ```
retriever = AmazonKendraRetriever(
        index_id=kendra_index_id, top_k=5, region_name=region,
        score_confidence="HIGH"
    )
```
Result will not include the records which has score confidence "LOW" or "MEDIUM". 
Relevant docs 
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kendra/client/query.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kendra/client/retrieve.html

 **Issue:** the issue # it resolve #11801 
**twitter:** [@SmitCode](https://twitter.com/SmitCode)
2024-03-07 17:28:09 -08:00
Ian
390ef6abe3
community[minor]: Add Initial Support for TiDB Vector Store (#15796)
This pull request introduces initial support for the TiDB vector store.
The current version is basic, laying the foundation for the vector store
integration. While this implementation provides the essential features,
we plan to expand and improve the TiDB vector store support with
additional enhancements in future updates.

Upcoming Enhancements:
* Support for Vector Index Creation: To enhance the efficiency and
performance of the vector store.
* Support for max marginal relevance search. 
* Customized Table Structure Support: Recognizing the need for
flexibility, we plan for more tailored and efficient data store
solutions.

Simple use case exmaple

```python
from typing import List, Tuple
from langchain.docstore.document import Document
from langchain_community.vectorstores import TiDBVectorStore
from langchain_openai import OpenAIEmbeddings

db = TiDBVectorStore.from_texts(
    embedding=embeddings,
    texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'],
    table_name="tidb_vector_langchain",
    connection_string=tidb_connection_url,
    distance_strategy="cosine",
)

query = "Can you tell me about Alexandra?"
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)
for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)
```
2024-03-07 17:18:20 -08:00
Bagatur
3b1eb1f828
community[patch]: chat hf typing fix (#18693) 2024-03-07 17:06:38 -08:00
Eugene Yurtsev
e188d4ecb0
Add dangerous parameter to requests tool (#18697)
The tools are already documented as dangerous. Not clear whether adding
an opt-in parameter is necessary or not
2024-03-07 15:10:56 -05:00
Erick Friis
1beb84b061
community[patch]: move pdf text tests to integration (#18746) 2024-03-07 10:34:22 -08:00
Christophe Bornet
6cd7607816
community[patch]: Implement lazy_load() for MHTMLLoader (#18648)
Covered by `tests/unit_tests/document_loaders/test_mhtml.py`
2024-03-07 11:50:18 -05:00
axiangcoding
9745b5894d
community[patch]: Chroma use uuid4 instead of uuid1 to generate random ids (#18723)
- **Description:** Chroma use uuid4 instead of uuid1 as random ids. Use
uuid1 may leak mac address, changing to uuid4 will not cause other
effects.
  - **Issue:** None
  - **Dependencies:** None
  - **Twitter handle:** None
2024-03-07 11:48:25 -05:00
Guangdong Liu
ced5e7bae7
community[patch]: Fix sparkllm authentication problem. (#18651)
- **Description:** fix sparkllm authentication problem.The current
timestamp is in RFC1123 format. The time deviation must be controlled
within 300s. I changed to re-obtain the url every time I ask a question.
https://www.xfyun.cn/doc/spark/general_url_authentication.html#_1-2-%E9%89%B4%E6%9D%83%E5%8F%82%E6%95%B0
2024-03-06 18:43:16 -08:00
Piyush Jain
2b234a4d96
Support for claude v3 models. (#18630)
Fixes #18513.

## Description
This PR attempts to fix the support for Anthropic Claude v3 models in
BedrockChat LLM. The changes here has updated the payload to use the
`messages` format instead of the formatted text prompt for all models;
`messages` API is backwards compatible with all models in Anthropic, so
this should not break the experience for any models.


## Notes
The PR in the current form does not support the v3 models for the
non-chat Bedrock LLM. This means, that with these changes, users won't
be able to able to use the v3 models with the Bedrock LLM. I can open a
separate PR to tackle this use-case, the intent here was to get this out
quickly, so users can start using and test the chat LLM. The Bedrock LLM
classes have also grown complex with a lot of conditions to support
various providers and models, and is ripe for a refactor to make future
changes more palatable. This refactor is likely to take longer, and
requires more thorough testing from the community. Credit to PRs
[18579](https://github.com/langchain-ai/langchain/pull/18579) and
[18548](https://github.com/langchain-ai/langchain/pull/18548) for some
of the code here.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-06 15:46:18 -08:00
Sam Khano
1b4dcf22f3
community[minor]: Add DocumentDBVectorSearch VectorStore (#17757)
**Description:**
- Added Amazon DocumentDB Vector Search integration (HNSW index)
- Added integration tests
- Updated AWS documentation with DocumentDB Vector Search instructions
- Added notebook for DocumentDB integration with example usage

---------

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-95-226.ec2.internal>
2024-03-06 15:11:34 -08:00
Vittorio Rigamonti
51f3902bc4
community[minor]: Adding support for Infinispan as VectorStore (#17861)
**Description:**
This integrates Infinispan as a vectorstore.
Infinispan is an open-source key-value data grid, it can work as single
node as well as distributed.

Vector search is supported since release 15.x 

For more: [Infinispan Home](https://infinispan.org)

Integration tests are provided as well as a demo notebook
2024-03-06 15:11:02 -08:00
Max Jakob
cca0167917
elasticsearch[patch], community[patch]: update references, deprecate community classes (#18506)
Follow up on https://github.com/langchain-ai/langchain/pull/17467.

- Update all references to the Elasticsearch classes to use the partners
package.
- Deprecate community classes.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-06 15:09:12 -08:00
Djordje
12b4a4d860
community[patch]: Opensearch delete method added - indexing supported (#18522)
- **Description:** Added delete method for OpenSearchVectorSearch,
therefore indexing supported
    - **Issue:** No
    - **Dependencies:** No
    - **Twitter handle:** stkbmf
2024-03-06 15:08:47 -08:00
Christophe Bornet
db8db6faae
community: Implement lazy_load() for PlaywrightURLLoader (#18676)
Integration tests:
`tests/integration_tests/document_loaders/test_url_playwright.py`
2024-03-06 16:52:13 -05:00
Aaron Yi
c092db862e
community[patch]: make metadata and text optional as expected in DocArray (#18678)
ValidationError: 2 validation errors for DocArrayDoc
text
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
metadata
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
```
In the `_get_doc_cls` method, the `DocArrayDoc` class is defined as
follows:

```python
class DocArrayDoc(BaseDoc):
    text: Optional[str]
    embedding: Optional[NdArray] = Field(**embeddings_params)
    metadata: Optional[dict]
```
2024-03-06 16:51:41 -05:00
Eugene Yurtsev
4c25b49229
community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696)
This is a PR that adds a dangerous load parameter to force users to opt in to use pickle.

This is a PR that's meant to raise user awareness that the pickling module is involved.
2024-03-06 16:43:01 -05:00
Eugene Yurtsev
0e52961562
community[patch]: Patch tdidf retriever (CVE-2024-2057) (#18695)
This is a patch for `CVE-2024-2057`:
https://www.cve.org/CVERecord?id=CVE-2024-2057

This affects users that: 

* Use the  `TFIDFRetriever`
* Attempt to de-serialize it from an untrusted source that contains a
malicious payload
2024-03-06 15:49:04 -05:00
Christophe Bornet
ea141511d8
core: Move document loader interfaces to core (#17723)
This is needed to be able to move document loaders to partner packages.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-06 13:59:00 -05:00
Christophe Bornet
5985454269
Merge pull request #18539
* Implement lazy_load() for GitLoader
2024-03-06 13:25:14 -05:00
Christophe Bornet
9a6f7e213b
Merge pull request #18423
* Implement lazy_load() for BSHTMLLoader
2024-03-06 13:25:01 -05:00
Christophe Bornet
b3a0c44838
Merge pull request #18673
* Implement lazy_load() for PDFMinerPDFasHTMLLoader and PyMuPDFLoader
2024-03-06 13:24:36 -05:00
Christophe Bornet
68fc0cf909
Merge pull request #18674
* Implement lazy_load() for TextLoader
2024-03-06 13:23:42 -05:00
Christophe Bornet
5b92f962f1
Merge pull request #18671
* Implement lazy_load() for MastodonTootsLoader
2024-03-06 13:23:14 -05:00
Christophe Bornet
15b1770326
Merge pull request #18421
* Implement lazy_load() for AssemblyAIAudioTranscriptLoader
2024-03-06 13:16:05 -05:00
Christophe Bornet
bb284eebe4
Merge pull request #18436
* Implement lazy_load() for ConfluenceLoader
2024-03-06 13:15:24 -05:00
Christophe Bornet
691480f491
Merge pull request #18647
* Implement lazy_load() for UnstructuredBaseLoader
2024-03-06 13:13:10 -05:00
Christophe Bornet
52ac67c5d8
Merge pull request #18654
* Implement lazy_load() for ObsidianLoader
2024-03-06 13:06:55 -05:00
Christophe Bornet
b9c0cf9025
Merge pull request #18656
* Implement lazy_load() for PsychicLoader
2024-03-06 13:05:04 -05:00
Christophe Bornet
aa7ac57b67
community: Implement lazy_load() for TrelloLoader (#18658)
Covered by `tests/unit_tests/document_loaders/test_trello.py`
2024-03-06 13:04:36 -05:00
Christophe Bornet
302985fea1
community: Implement lazy_load() for SlackDirectoryLoader (#18675)
Integration tests:
`tests/integration_tests/document_loaders/test_slack.py`
2024-03-06 13:04:13 -05:00
Christophe Bornet
ed36f9f604
community: Implement lazy_load() for WhatsAppChatLoader (#18677)
Integration test:
`tests/integration_tests/document_loaders/test_whatsapp_chat.py`
2024-03-06 13:03:46 -05:00
Christophe Bornet
f414f5cdb9
community[minor]: Implement lazy_load() for WikipediaLoader (#18680)
Integration test:
`tests/integration_tests/document_loaders/test_wikipedia.py`
2024-03-06 13:03:21 -05:00
Christophe Bornet
1100f8de7a
community[minor]: Implement lazy_load() for ArxivLoader (#18664)
Integration tests: `tests/integration_tests/utilities/test_arxiv.py` and
`tests/integration_tests/document_loaders/test_arxiv.py`
2024-03-06 09:16:49 -05:00
Christophe Bornet
2d96803ddd
community[minor]: Implement lazy_load() for OutlookMessageLoader (#18668)
Integration test:
`tests/integration_tests/document_loaders/test_email.py`
2024-03-06 09:15:57 -05:00
Christophe Bornet
ae167fb5b2
community[minor]: Implement lazy_load() for SitemapLoader (#18667)
Integration tests: `test_sitemap.py` and `test_docusaurus.py`
2024-03-06 09:15:35 -05:00
Christophe Bornet
623dfcc55c
community[minor]: Implement lazy_load() for FacebookChatLoader (#18669)
Integration test:
`tests/integration_tests/document_loaders/test_facebook_chat.py`
2024-03-06 09:15:00 -05:00
Christophe Bornet
20794bb889
community[minor]: Implement lazy_load() for GitbookLoader (#18670)
Integration test:
`tests/integration_tests/document_loaders/test_gitbook.py`
2024-03-06 09:14:36 -05:00
Liang Zhang
81985b31e6
community[patch]: Databricks SerDe uses cloudpickle instead of pickle (#18607)
- **Description:** Databricks SerDe uses cloudpickle instead of pickle
when serializing a user-defined function transform_input_fn since pickle
does not support functions defined in `__main__`, and cloudpickle
supports this.
- **Dependencies:** cloudpickle>=2.0.0

Added a unit test.
2024-03-05 18:04:45 -08:00
Christophe Bornet
7d6de96186
community[patch]: Implement lazy_load() for CubeSemanticLoader (#18535)
Covered by `test_cube_semantic.py`
2024-03-05 17:32:31 -08:00
Christophe Bornet
a6b5d45e31
community[patch]: Implement lazy_load() for EverNoteLoader (#18538)
Covered by `test_evernote_loader.py`
2024-03-05 17:29:52 -08:00
Sunchao Wang
dc81dba6cf
community[patch]: Improve amadeus tool and doc (#18509)
Description:

This pull request addresses two key improvements to the langchain
repository:

**Fix for Crash in Flight Search Interface**:

Previously, the code would crash when encountering a failure scenario in
the flight ticket search interface. This PR resolves this issue by
implementing a fix to handle such scenarios gracefully. Now, the code
handles failures in the flight search interface without crashing,
ensuring smoother operation.

**Documentation Update for Amadeus Toolkit**:

Prior to this update, examples provided in the documentation for the
Amadeus Toolkit were unable to run correctly due to outdated
information. This PR includes an update to the documentation, ensuring
that all examples can now be executed successfully. With this update,
users can effectively utilize the Amadeus Toolkit with accurate and
functioning examples.
These changes aim to enhance the reliability and usability of the
langchain repository by addressing issues related to error handling and
ensuring that documentation remains up-to-date and actionable.

Issue: https://github.com/langchain-ai/langchain/issues/17375

Twitter Handle: SingletonYxx
2024-03-05 16:17:22 -08:00
Christophe Bornet
f77f7dc3ec
community[patch]: Fix VectorStoreQATool (#18529)
Fix #18460
2024-03-05 15:56:58 -08:00
Dounx
ad48f55357
community[minor]: add Yuque document loader (#17924)
This pull request support loading documents from Yuque with Langchain.

Yuque is a professional cloud-based knowledge base for team
collaboration in documentation.

Website: https://www.yuque.com
OpenAPI: https://www.yuque.com/yuque/developer/openapi
2024-03-05 15:54:07 -08:00
Kazuki Maeda
60c5d964a8
community[minor]: use jq schema for content_key in json_loader (#18003)
### Description
Changed the value specified for `content_key` in JSONLoader from a
single key to a value based on jq schema.
I created [similar
PR](https://github.com/langchain-ai/langchain/pull/11255) before, but it
has several conflicts because of the architectural change associated
stable version release, so I re-create this PR to fit new architecture.

### Why
For json data like the following, specify `.data[].attributes.message`
for page_content and `.data[].attributes.id` or
`.data[].attributes.attributes. tags`, etc., the `content_key` must also
parse the json structure.

<details>
<summary>sample json data</summary>

```json
{
  "data": [
    {
      "attributes": {
        "message": "message1",
        "tags": [
          "tag1"
        ]
      },
      "id": "1"
    },
    {
      "attributes": {
        "message": "message2",
        "tags": [
          "tag2"
        ]
      },
      "id": "2"
    }
  ]
}
```

</details>

<details>
<summary>sample code</summary>

```python
def metadata_func(record: dict, metadata: dict) -> dict:

    metadata["source"] = None
    metadata["id"] = record.get("id")
    metadata["tags"] = record["attributes"].get("tags")

    return metadata

sample_file = "sample1.json"
loader = JSONLoader(
    file_path=sample_file,
    jq_schema=".data[]",
    content_key=".attributes.message", ## content_key is parsable into jq schema
    is_content_key_jq_parsable=True, ## this is added parameter
    metadata_func=metadata_func
)

data = loader.load()
data
```

</details>

### Dependencies
none

### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)
2024-03-05 15:51:24 -08:00
Hech
6a08134661
community[patch], langchain[minor]: Add retriever self_query and score_threshold in DingoDB (#18106) 2024-03-05 15:47:29 -08:00
Yudhajit Sinha
4570b477b9
community[patch]: Invoke callback prior to yielding token (titan_takeoff) (#18560)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/titan_takeoff.
- Issue: #16913 
- Dependencies: None
2024-03-05 12:54:26 -08:00
Tomaz Bratanic
ea51cdaede
Remove neo4j bloom labels from graph schema (#18564)
Neo4j tools use particular node labels and relationship types to store
metadata, but are irrelevant for text2cypher or graph generation, so we
want to ignore them in the schema representation.
2024-03-05 12:54:05 -08:00
Jib
9da1e0cf34
mongodb[patch]: Migrate MongoDBChatMessageHistory (#18590)
## **Description** 
Migrate the `MongoDBChatMessageHistory` to the managed
`langchain-mongodb` partner-package
## **Dependencies**
None
## **Twitter handle**
@mongodb

## **tests and docs**
- [x] Migrate existing integration test
- [x ]~ Convert existing integration test to a unit test~ Creation is
out of scope for this ticket
- [x ] ~Considering delaying work until #17470 merges to leverage the
`MockCollection` object. ~
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-05 18:53:02 +00:00
Tomaz Bratanic
353248838d
Add precedence for input params over env variables in neo4j integration (#18581)
input parameters take precedence over env variables
2024-03-05 09:36:56 -08:00
Christophe Bornet
c8a171a154
community: Implement lazy_load() for GithubFileLoader (#18584) 2024-03-05 09:35:50 -08:00
Leonid Kuligin
04d134df17
marked MatchingEngine as deprecated (#18585)
Thank you for contributing to LangChain!

- [ ] **PR title**: "community: deprecate vectorstores.MatchingEngine"


- [ ] **PR message**: 
- **Description:** announced a deprecation since this integration has
been moved to langchain_google_vertexai
2024-03-05 09:34:53 -08:00
Erick Friis
343438e872
community[patch]: deprecate community fireworks (#18544) 2024-03-05 01:04:26 +00:00
Scott Nath
b051bba1a9
community: Add you.com tool, add async to retriever, add async testing, add You tool doc (#18032)
- **Description:** finishes adding the you.com functionality including:
    - add async functions to utility and retriever
    - add the You.com Tool
    - add async testing for utility, retriever, and tool
    - add a tool integration notebook page
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** @scottnath
2024-03-03 14:30:05 -08:00
William De Vena
275877980e
community[patch]: Invoke callback prior to yielding token (#18447)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
Description: Invoke callback prior to yielding token in _stream method
in llms/vertexai.
Issue: https://github.com/langchain-ai/langchain/issues/16913
Dependencies: None
2024-03-03 14:14:40 -08:00
William De Vena
67375e96e0
community[patch]: Invoke callback prior to yielding token (#18448)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream method
in llms/tongyi.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
2024-03-03 14:14:22 -08:00
William De Vena
2087cbae64
community[patch]: Invoke callback prior to yielding token (#18449)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream method
in chat_models/perplexity.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
2024-03-03 14:14:00 -08:00
William De Vena
eb04d0d3e2
community[patch]: Invoke callback prior to yielding token (#18452)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream and
_astream methods in llms/anthropic.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
2024-03-03 14:13:41 -08:00
William De Vena
371bec79bc
community[patch]: Invoke callback prior to yielding token (#18454)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream and
_astream methods in llms/baidu_qianfan_endpoint.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
2024-03-03 14:13:22 -08:00
Aayush Kataria
7c2f3f6f95
community[minor]: Adding Azure Cosmos Mongo vCore Vector DB Cache (#16856)
Description:

This pull request introduces several enhancements for Azure Cosmos
Vector DB, primarily focused on improving caching and search
capabilities using Azure Cosmos MongoDB vCore Vector DB. Here's a
summary of the changes:

- **AzureCosmosDBSemanticCache**: Added a new cache implementation
called AzureCosmosDBSemanticCache, which utilizes Azure Cosmos MongoDB
vCore Vector DB for efficient caching of semantic data. Added
comprehensive test cases for AzureCosmosDBSemanticCache to ensure its
correctness and robustness. These tests cover various scenarios and edge
cases to validate the cache's behavior.
- **HNSW Vector Search**: Added HNSW vector search functionality in the
CosmosDB Vector Search module. This enhancement enables more efficient
and accurate vector searches by utilizing the HNSW (Hierarchical
Navigable Small World) algorithm. Added corresponding test cases to
validate the HNSW vector search functionality in both
AzureCosmosDBSemanticCache and AzureCosmosDBVectorSearch. These tests
ensure the correctness and performance of the HNSW search algorithm.
- **LLM Caching Notebook** - The notebook now includes a comprehensive
example showcasing the usage of the AzureCosmosDBSemanticCache. This
example highlights how the cache can be employed to efficiently store
and retrieve semantic data. Additionally, the example provides default
values for all parameters used within the AzureCosmosDBSemanticCache,
ensuring clarity and ease of understanding for users who are new to the
cache implementation.
 
 @hwchase17,@baskaryan, @eyurtsev,
2024-03-03 14:04:15 -08:00
Sourav Pradhan
50abeb7ed9
community[patch]: fix Chroma add_images (#17964)
###  Description

Fixed a small bug in chroma.py add_images(), previously whenever we are
not passing metadata the documents is containing the base64 of the uris
passed, but when we are passing the metadata the documents is containing
normal string uris which should not be the case.

### Issue

In add_images() method when we are calling upsert() we have to use
"b64_texts" instead of normal string "uris".

### Twitter handle

https://twitter.com/whitepegasus01
2024-03-01 21:55:58 +00:00
Kate Silverstein
b7c71e2e07
community[minor]: llamafile embeddings support (#17976)
* **Description:** adds `LlamafileEmbeddings` class implementation for
generating embeddings using
[llamafile](https://github.com/Mozilla-Ocho/llamafile)-based models.
Includes related unit tests and notebook showing example usage.
* **Issue:** N/A
* **Dependencies:** N/A
2024-03-01 13:49:18 -08:00
Tomaz Bratanic
f6bfb969ba
community[patch]: Add an option for indexed generic label when import neo4j graph documents (#18122)
Current implementation doesn't have an indexed property that would
optimize the import. I have added a `baseEntityLabel` parameter that
allows you to add a secondary node label, which has an indexed id
`property`. By default, the behaviour is identical to previous version.

Since multi-labeled nodes are terrible for text2cypher, I removed the
secondary label from schema representation object and string, which is
used in text2cypher.
2024-03-01 12:33:52 -08:00
Arun Sathiya
4adac20d7b
community[patch]: Make cohere_api_key a SecretStr (#12188)
This PR makes `cohere_api_key` in `llms/cohere` a SecretStr, so that the
API Key is not leaked when `Cohere.cohere_api_key` is represented as a
string.

---------

Signed-off-by: Arun <arun@arun.blog>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-01 20:27:53 +00:00
Petteri Johansson
6c1989d292
community[minor], langchain[minor], docs: Gremlin Graph Store and QA Chain (#17683)
- **Description:** 
New feature: Gremlin graph-store and QA chain (including docs).
Compatible with Azure CosmosDB.
  - **Dependencies:** 
  no changes
2024-03-01 12:21:14 -08:00
Ather Fawaz
a5ccf5d33c
community[minor]: Add support for Perplexity chat model(#17024)
- **Description:** This PR adds support for [Perplexity AI
APIs](https://blog.perplexity.ai/blog/introducing-pplx-api).
  - **Issues:** None
  - **Dependencies:** None
  - **Twitter handle:** [@atherfawaz](https://twitter.com/AtherFawaz)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-01 12:19:23 -08:00
Rodrigo Nogueira
3438d2cbcc
community[minor]: add maritalk chat (#17675)
**Description:** Adds the MariTalk chat that is based on a LLM specially
trained for Portuguese.

**Twitter handle:** @MaritacaAI
2024-03-01 12:18:23 -08:00
sarahberenji
08fa38d56d
community[patch]: the syntax error for Redis generated query (#17717)
To fix the reported error:
https://github.com/langchain-ai/langchain/discussions/17397
2024-03-01 12:18:10 -08:00
certified-dodo
43e3244573
community[patch]: Fix MongoDBAtlasVectorSearch max_marginal_relevance_search (#17971)
Description:
* `self._embedding_key` is accessed after deletion, breaking
`max_marginal_relevance_search` search
* Introduced in:
e135e5257c
* Updated but still persists in:
ce22e10c4b

Issue: https://github.com/langchain-ai/langchain/issues/17963

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-01 12:17:42 -08:00
Nikita Titov
9f2ab37162
community[patch]: don't try to parse json in case of errored response (#18317)
Related issue: #13896.

In case Ollama is behind a proxy, proxy error responses cannot be
viewed. You aren't even able to check response code.

For example, if your Ollama has basic access authentication and it's not
passed, `JSONDecodeError` will overwrite the truth response error.

<details>
<summary><b>Log now:</b></summary>

```
{
	"name": "JSONDecodeError",
	"message": "Expecting value: line 1 column 1 (char 0)",
	"stack": "---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
    970 try:
--> 971     return complexjson.loads(self.text, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File /opt/miniforge3/envs/.gpt/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File /opt/miniforge3/envs/.gpt/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 \"\"\"Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 \"\"\"
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File /opt/miniforge3/envs/.gpt/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError(\"Expecting value\", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError                           Traceback (most recent call last)
Cell In[3], line 1
----> 1 print(translate_func().invoke('text'))

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/runnables/base.py:2053, in RunnableSequence.invoke(self, input, config)
   2051 try:
   2052     for i, step in enumerate(self.steps):
-> 2053         input = step.invoke(
   2054             input,
   2055             # mark each step as a child run
   2056             patch_config(
   2057                 config, callbacks=run_manager.get_child(f\"seq:step:{i+1}\")
   2058             ),
   2059         )
   2060 # finish the root run
   2061 except BaseException as e:

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:165, in BaseChatModel.invoke(self, input, config, stop, **kwargs)
    154 def invoke(
    155     self,
    156     input: LanguageModelInput,
   (...)
    160     **kwargs: Any,
    161 ) -> BaseMessage:
    162     config = ensure_config(config)
    163     return cast(
    164         ChatGeneration,
--> 165         self.generate_prompt(
    166             [self._convert_input(input)],
    167             stop=stop,
    168             callbacks=config.get(\"callbacks\"),
    169             tags=config.get(\"tags\"),
    170             metadata=config.get(\"metadata\"),
    171             run_name=config.get(\"run_name\"),
    172             **kwargs,
    173         ).generations[0][0],
    174     ).message

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:543, in BaseChatModel.generate_prompt(self, prompts, stop, callbacks, **kwargs)
    535 def generate_prompt(
    536     self,
    537     prompts: List[PromptValue],
   (...)
    540     **kwargs: Any,
    541 ) -> LLMResult:
    542     prompt_messages = [p.to_messages() for p in prompts]
--> 543     return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:407, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, **kwargs)
    405         if run_managers:
    406             run_managers[i].on_llm_error(e, response=LLMResult(generations=[]))
--> 407         raise e
    408 flattened_outputs = [
    409     LLMResult(generations=[res.generations], llm_output=res.llm_output)
    410     for res in results
    411 ]
    412 llm_output = self._combine_llm_outputs([res.llm_output for res in results])

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:397, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, **kwargs)
    394 for i, m in enumerate(messages):
    395     try:
    396         results.append(
--> 397             self._generate_with_cache(
    398                 m,
    399                 stop=stop,
    400                 run_manager=run_managers[i] if run_managers else None,
    401                 **kwargs,
    402             )
    403         )
    404     except BaseException as e:
    405         if run_managers:

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:576, in BaseChatModel._generate_with_cache(self, messages, stop, run_manager, **kwargs)
    572     raise ValueError(
    573         \"Asked to cache, but no cache found at `langchain.cache`.\"
    574     )
    575 if new_arg_supported:
--> 576     return self._generate(
    577         messages, stop=stop, run_manager=run_manager, **kwargs
    578     )
    579 else:
    580     return self._generate(messages, stop=stop, **kwargs)

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/chat_models/ollama.py:250, in ChatOllama._generate(self, messages, stop, run_manager, **kwargs)
    226 def _generate(
    227     self,
    228     messages: List[BaseMessage],
   (...)
    231     **kwargs: Any,
    232 ) -> ChatResult:
    233     \"\"\"Call out to Ollama's generate endpoint.
    234 
    235     Args:
   (...)
    247             ])
    248     \"\"\"
--> 250     final_chunk = self._chat_stream_with_aggregation(
    251         messages,
    252         stop=stop,
    253         run_manager=run_manager,
    254         verbose=self.verbose,
    255         **kwargs,
    256     )
    257     chat_generation = ChatGeneration(
    258         message=AIMessage(content=final_chunk.text),
    259         generation_info=final_chunk.generation_info,
    260     )
    261     return ChatResult(generations=[chat_generation])

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/chat_models/ollama.py:183, in ChatOllama._chat_stream_with_aggregation(self, messages, stop, run_manager, verbose, **kwargs)
    174 def _chat_stream_with_aggregation(
    175     self,
    176     messages: List[BaseMessage],
   (...)
    180     **kwargs: Any,
    181 ) -> ChatGenerationChunk:
    182     final_chunk: Optional[ChatGenerationChunk] = None
--> 183     for stream_resp in self._create_chat_stream(messages, stop, **kwargs):
    184         if stream_resp:
    185             chunk = _chat_stream_response_to_chat_generation_chunk(stream_resp)

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/chat_models/ollama.py:156, in ChatOllama._create_chat_stream(self, messages, stop, **kwargs)
    147 def _create_chat_stream(
    148     self,
    149     messages: List[BaseMessage],
    150     stop: Optional[List[str]] = None,
    151     **kwargs: Any,
    152 ) -> Iterator[str]:
    153     payload = {
    154         \"messages\": self._convert_messages_to_ollama_messages(messages),
    155     }
--> 156     yield from self._create_stream(
    157         payload=payload, stop=stop, api_url=f\"{self.base_url}/api/chat/\", **kwargs
    158     )

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/llms/ollama.py:234, in _OllamaCommon._create_stream(self, api_url, payload, stop, **kwargs)
    228         raise OllamaEndpointNotFoundError(
    229             \"Ollama call failed with status code 404. \"
    230             \"Maybe your model is not found \"
    231             f\"and you should pull the model with `ollama pull {self.model}`.\"
    232         )
    233     else:
--> 234         optional_detail = response.json().get(\"error\")
    235         raise ValueError(
    236             f\"Ollama call failed with status code {response.status_code}.\"
    237             f\" Details: {optional_detail}\"
    238         )
    239 return response.iter_lines(decode_unicode=True)

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/requests/models.py:975, in Response.json(self, **kwargs)
    971     return complexjson.loads(self.text, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)"
}
```

</details>


<details>

<summary><b>Log after a fix:</b></summary>

```
{
	"name": "ValueError",
	"message": "Ollama call failed with status code 401. Details: <html>\r
<head><title>401 Authorization Required</title></head>\r
<body>\r
<center><h1>401 Authorization Required</h1></center>\r
<hr><center>nginx/1.18.0 (Ubuntu)</center>\r
</body>\r
</html>\r
",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 print(translate_func().invoke('text'))

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/runnables/base.py:2053, in RunnableSequence.invoke(self, input, config)
   2051 try:
   2052     for i, step in enumerate(self.steps):
-> 2053         input = step.invoke(
   2054             input,
   2055             # mark each step as a child run
   2056             patch_config(
   2057                 config, callbacks=run_manager.get_child(f\"seq:step:{i+1}\")
   2058             ),
   2059         )
   2060 # finish the root run
   2061 except BaseException as e:

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:165, in BaseChatModel.invoke(self, input, config, stop, **kwargs)
    154 def invoke(
    155     self,
    156     input: LanguageModelInput,
   (...)
    160     **kwargs: Any,
    161 ) -> BaseMessage:
    162     config = ensure_config(config)
    163     return cast(
    164         ChatGeneration,
--> 165         self.generate_prompt(
    166             [self._convert_input(input)],
    167             stop=stop,
    168             callbacks=config.get(\"callbacks\"),
    169             tags=config.get(\"tags\"),
    170             metadata=config.get(\"metadata\"),
    171             run_name=config.get(\"run_name\"),
    172             **kwargs,
    173         ).generations[0][0],
    174     ).message

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:543, in BaseChatModel.generate_prompt(self, prompts, stop, callbacks, **kwargs)
    535 def generate_prompt(
    536     self,
    537     prompts: List[PromptValue],
   (...)
    540     **kwargs: Any,
    541 ) -> LLMResult:
    542     prompt_messages = [p.to_messages() for p in prompts]
--> 543     return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:407, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, **kwargs)
    405         if run_managers:
    406             run_managers[i].on_llm_error(e, response=LLMResult(generations=[]))
--> 407         raise e
    408 flattened_outputs = [
    409     LLMResult(generations=[res.generations], llm_output=res.llm_output)
    410     for res in results
    411 ]
    412 llm_output = self._combine_llm_outputs([res.llm_output for res in results])

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:397, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, **kwargs)
    394 for i, m in enumerate(messages):
    395     try:
    396         results.append(
--> 397             self._generate_with_cache(
    398                 m,
    399                 stop=stop,
    400                 run_manager=run_managers[i] if run_managers else None,
    401                 **kwargs,
    402             )
    403         )
    404     except BaseException as e:
    405         if run_managers:

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:576, in BaseChatModel._generate_with_cache(self, messages, stop, run_manager, **kwargs)
    572     raise ValueError(
    573         \"Asked to cache, but no cache found at `langchain.cache`.\"
    574     )
    575 if new_arg_supported:
--> 576     return self._generate(
    577         messages, stop=stop, run_manager=run_manager, **kwargs
    578     )
    579 else:
    580     return self._generate(messages, stop=stop, **kwargs)

File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/chat_models/ollama.py:250, in ChatOllama._generate(self, messages, stop, run_manager, **kwargs)
    226 def _generate(
    227     self,
    228     messages: List[BaseMessage],
   (...)
    231     **kwargs: Any,
    232 ) -> ChatResult:
    233     \"\"\"Call out to Ollama's generate endpoint.
    234 
    235     Args:
   (...)
    247             ])
    248     \"\"\"
--> 250     final_chunk = self._chat_stream_with_aggregation(
    251         messages,
    252         stop=stop,
    253         run_manager=run_manager,
    254         verbose=self.verbose,
    255         **kwargs,
    256     )
    257     chat_generation = ChatGeneration(
    258         message=AIMessage(content=final_chunk.text),
    259         generation_info=final_chunk.generation_info,
    260     )
    261     return ChatResult(generations=[chat_generation])

File /storage/gpt-project/Repos/repo_nikita/gpt_lib/langchain/ollama.py:328, in ChatOllamaCustom._chat_stream_with_aggregation(self, messages, stop, run_manager, verbose, **kwargs)
    319 def _chat_stream_with_aggregation(
    320     self,
    321     messages: List[BaseMessage],
   (...)
    325     **kwargs: Any,
    326 ) -> ChatGenerationChunk:
    327     final_chunk: Optional[ChatGenerationChunk] = None
--> 328     for stream_resp in self._create_chat_stream(messages, stop, **kwargs):
    329         if stream_resp:
    330             chunk = _chat_stream_response_to_chat_generation_chunk(stream_resp)

File /storage/gpt-project/Repos/repo_nikita/gpt_lib/langchain/ollama.py:301, in ChatOllamaCustom._create_chat_stream(self, messages, stop, **kwargs)
    292 def _create_chat_stream(
    293     self,
    294     messages: List[BaseMessage],
    295     stop: Optional[List[str]] = None,
    296     **kwargs: Any,
    297 ) -> Iterator[str]:
    298     payload = {
    299         \"messages\": self._convert_messages_to_ollama_messages(messages),
    300     }
--> 301     yield from self._create_stream(
    302         payload=payload, stop=stop, api_url=f\"{self.base_url}/api/chat\", **kwargs
    303     )

File /storage/gpt-project/Repos/repo_nikita/gpt_lib/langchain/ollama.py:134, in _OllamaCommonCustom._create_stream(self, api_url, payload, stop, **kwargs)
    132     else:
    133         optional_detail = response.text
--> 134         raise ValueError(
    135             f\"Ollama call failed with status code {response.status_code}.\"
    136             f\" Details: {optional_detail}\"
    137         )
    138 return response.iter_lines(decode_unicode=True)

ValueError: Ollama call failed with status code 401. Details: <html>\r
<head><title>401 Authorization Required</title></head>\r
<body>\r
<center><h1>401 Authorization Required</h1></center>\r
<hr><center>nginx/1.18.0 (Ubuntu)</center>\r
</body>\r
</html>\r
"
}
```

</details>

The same is true for timeout errors or when you simply mistyped in
`base_url` arg and get response from some other service, for instance.

Real Ollama errors are still clearly readable:

```
ValueError: Ollama call failed with status code 400. Details: {"error":"invalid options: unknown_option"}
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-01 12:17:29 -08:00
Yudhajit Sinha
e2b901c35b
community[patch]: chat message histrory mypy fix (#18250)
Description: Fixed type: ignore's for mypy for
chat_message_histories(streamlit)
Adresses #17048 

Planning to add more based on reviews
2024-03-01 12:17:18 -08:00
Hemslo Wang
58a2abf089
community[patch]: fix RecursiveUrlLoader metadata_extractor return type (#18193)
**Description:** Fix `metadata_extractor` type for `RecursiveUrlLoader`,
the default `_metadata_extractor` returns `dict` instead of `str`.
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** N/A

Signed-off-by: Hemslo Wang <hemslo.wang@gmail.com>
2024-03-01 12:08:20 -08:00
Maxime Perrin
98380cff9b
community[patch]: removing "response_mode" parameter in llama_index retriever (#18180)
- **Description:** Removing this line 
```python
response = index.query(query, response_mode="no_text", **self.query_kwargs)
```
to 
```python
response = index.query(query, **self.query_kwargs)
```
Since llama index query does not support response_mode anymore : ``` |
TypeError: BaseQueryEngine.query() got an unexpected keyword argument
'response_mode'````
  - **Twitter handle:** @maximeperrin_

---------

Co-authored-by: Maxime Perrin <mperrin@doing.fr>
2024-03-01 12:05:09 -08:00
Christophe Bornet
177f51c7bd
community: Use default load() implementation in doc loaders (#18385)
Following https://github.com/langchain-ai/langchain/pull/18289
2024-03-01 14:46:52 -05:00
mwmajewsk
e192f6b6eb
community[patch]: fix, better error message in deeplake vectoriser (#18397)
If the document loader recieves Pathlib path instead of str, it reads
the file correctly, but the problem begins when the document is added to
Deeplake.
This problem arises from casting the path to str in the metadata.

```python
deeplake = True
fname = Path('./lorem_ipsum.txt')
loader = TextLoader(fname, encoding="utf-8")
docs = loader.load_and_split()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks= text_splitter.split_documents(docs)
if deeplake:
    db = DeepLake(dataset_path=ds_path, embedding=embeddings, token=activeloop_token)
    db.add_documents(chunks)
else:
    db = Chroma.from_documents(docs, embeddings)
```

So using this snippet of code the error message for deeplake looks like
this:

```
[part of error message omitted]

Traceback (most recent call last):
  File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 53, in <module>
    db.add_documents(chunks)
  File "/home/mwm/repositories/sources/langchain/libs/core/langchain_core/vectorstores.py", line 139, in add_documents
    return self.add_texts(texts, metadatas, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/deeplake.py", line 258, in add_texts
    return self.vectorstore.add(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/deeplake_vectorstore.py", line 226, in add
    return self.dataset_handler.add(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/dataset_handlers/client_side_dataset_handler.py", line 139, in add
    dataset_utils.extend_or_ingest_dataset(
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 544, in extend_or_ingest_dataset
    extend(
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 505, in extend
    dataset.extend(batched_processed_tensors, progressbar=False)
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/dataset/dataset.py", line 3247, in extend
    raise SampleExtendError(str(e)) from e.__cause__
deeplake.util.exceptions.SampleExtendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback. If you wish to skip the samples that cause errors, please specify `ignore_errors=True`.
```

Which is does not explain the error well enough.
The same error for chroma looks like this 

```
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 56, in <module>
    db = Chroma.from_documents(docs, embeddings)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 778, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 736, in from_texts
    chroma_collection.add_texts(
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 309, in add_texts
    raise ValueError(e.args[0] + "\n\n" + msg)
ValueError: Expected metadata value to be a str, int, float or bool, got lorem_ipsum.txt which is a <class 'pathlib.PosixPath'>

Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata.
```

Which is way more user friendly, so I just added information about
possible mismatch of the type in the error message, the same way it is
covered in chroma
https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/chroma.py#L224
2024-03-01 11:21:21 -08:00
Daniel Chico
7d962278f6
community[patch]: type ignore fixes (#18395)
Related to #17048
2024-03-01 11:21:02 -08:00
Christophe Bornet
69be82c86d
community[patch]: Implement lazy_load() for CSVLoader (#18391)
Covered by `test_csv_loader.py`
2024-03-01 11:17:08 -08:00
Guangdong Liu
760a16ff32
community[patch]: Fix ChatModel for sparkllm Bug. (#18375)
**PR message**: ***Delete this entire checklist*** and replace with
    - **Description:** fix sparkllm paramer error
    - **Issue:**   close #18370
- **Dependencies:** change `IFLYTEK_SPARK_APP_URL` to
`IFLYTEK_SPARK_API_URL`
    - **Twitter handle:** No
2024-03-01 10:49:30 -08:00
Yujie Qian
cbb65741a7
community[patch]: Voyage AI updates default model and batch size (#17655)
- **Description:** update the default model and batch size in
VoyageEmbeddings
    - **Issue:** N/A
    - **Dependencies:** N/A
    - **Twitter handle:** N/A

---------

Co-authored-by: fodizoltan <zoltan@conway.expert>
2024-03-01 10:22:24 -08:00
Shengsheng Huang
ae471a7dcb
community[minor]: add BigDL-LLM integrations (#17953)
- **Description**:
[`bigdl-llm`](https://github.com/intel-analytics/BigDL) is a library for
running LLM on Intel XPU (from Laptop to GPU to Cloud) using
INT4/FP4/INT8/FP8 with very low latency (for any PyTorch model). This PR
adds bigdl-llm integrations to langchain.
- **Issue**: NA
- **Dependencies**: `bigdl-llm` library
- **Contribution maintainer**: @shane-huang 
 
Examples added:
- docs/docs/integrations/llms/bigdl.ipynb
2024-03-01 10:04:53 -08:00
Ethan Yang
f61cb8d407
community[minor]: Add openvino backend support (#11591)
- **Description:** add openvino backend support by HuggingFace Optimum
Intel,
  - **Dependencies:** “optimum[openvino]”,

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-01 10:04:24 -08:00
RadhikaBansal97
8bafd2df5e
community[patch]: Change github endpoint in GithubLoader (#17622)
Description- 
- Changed the GitHub endpoint as existing was not working and giving 404
not found error
- Also the existing function was failing if file_filter is not passed as
the tree api return all paths including directory as well, and when
get_file_content was iterating over these path, the function was failing
for directory as the api was returning list of files inside the
directory, so added a condition to ignore the paths if it a directory
- Fixes this issue -
https://github.com/langchain-ai/langchain/issues/17453

Co-authored-by: Radhika Bansal <Radhika.Bansal@veritas.com>
2024-03-01 09:36:31 -08:00
Anush
9d663f31fa
community[patch]: FastEmbed to latest (#18040)
## Description

Updates the `langchain_community.embeddings.fastembed` provider as per
the recent updates to [`FastEmbed`](https://github.com/qdrant/fastembed)
library.
2024-02-29 21:15:51 -08:00
Eugene Yurtsev
51b661cfe8
community[patch]: BaseLoader load method should just delegate to lazy_load (#18289)
load() should just reference lazy_load()
2024-02-29 21:45:28 -05:00