Commit Graph

209 Commits (7824291252c8f99a31c416f363d6a46d7e8f63e4)

Author SHA1 Message Date
Benito Geordie 57b226532d
community[minor]: Added integrations for ThirdAI's NeuralDB as a Retriever (#17334)
**Description:** Adds ThirdAI NeuralDB retriever integration. NeuralDB
is a CPU-friendly and fine-tunable text retrieval engine. We previously
added a vector store integration but we think that it will be easier for
our customers if they can also find us under under
langchain-community/retrievers.

---------

Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
4 months ago
Dhruv Chawla d6d559d50d
community[minor]: add UpTrainCallbackHandler (#19956)
- **Description:** 
This PR adds a callback handler for UpTrain. It performs evaluations in
the RAG pipeline to check the quality of retrieved documents, generated
queries and responses.

- **Dependencies:** 
    - The UpTrainCallbackHandler requires the uptrain package

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
4 months ago
Ravindu Somawansa 5acc7ba622
community[minor]: Add glue catalog loader (#20220)
Add Glue Catalog loader
4 months ago
Juan Carlos José Camacho 450c458f8f
community[minor]: Add Datahareld tool (#19680)
**Description:** Integrate [dataherald](https://www.dataherald.com)
tool, It is a natural language-to-SQL tool.
**Dependencies:** Install dataherald sdk to use it,
```
pip install dataherald
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
4 months ago
Egor Krasheninnikov c8391d4ff1
community[patch]: Fix YandexGPT embeddings (#19720)
Fix of YandexGPT embeddings. 

The current version uses a single `model_name` for queries and
documents, essentially making the `embed_documents` and `embed_query`
methods the same. Yandex has a different endpoint (`model_uri`) for
encoding documents, see
[this](https://yandex.cloud/en/docs/yandexgpt/concepts/embeddings). The
bug may impact retrievers built with `YandexGPTEmbeddings` (for instance
FAISS database as retriever) since they use both `embed_documents` and
`embed_query`.

A simple snippet to test the behaviour:
```python
from langchain_community.embeddings.yandex import YandexGPTEmbeddings
embeddings = YandexGPTEmbeddings()
q_emb = embeddings.embed_query('hello world')
doc_emb = embeddings.embed_documents(['hello world', 'hello world'])
q_emb == doc_emb[0]
```
The response is `True` with the current version and `False` with the
changes I made.


Twitter: @egor_krash

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
4 months ago
ccurme 38faa74c23
community[patch]: update use of deprecated llm methods (#20393)
.predict and .predict_messages for BaseLanguageModel and BaseChatModel
4 months ago
Corey Zumar 3a068b26f3
community[patch]: Databricks - fix scope of dangerous deserialization error in Databricks LLM connector (#20368)
fix scope of dangerous deserialization error in Databricks LLM connector

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
4 months ago
Nicolas ad04585e30
community[minor]: Firecrawl.dev integration (#20364)
Added the [FireCrawl](https://firecrawl.dev) document loader. Firecrawl
crawls and convert any website into LLM-ready data. It crawls all
accessible subpages and give you clean markdown for each.

    - **Description:** Adds FireCrawl data loader
    - **Dependencies:** firecrawl-py
    - **Twitter handle:** @mendableai 

ccing contributors: (@ericciarla @nickscamara)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
4 months ago
Alex Sherstinsky fad0962643
community: for Predibase -- enable both Predibase-hosted and HuggingFace-hosted fine-tuned adapter repositories (#20370) 4 months ago
Leonid Ganeline 4cb5f4c353
community[patch]: import flattening fix (#20110)
This PR should make it easier for linters to do type checking and for IDEs to jump to definition of code.

See #20050 as a template for this PR.
- As a byproduct: Added 3 missed `test_imports`.
- Added missed `SolarChat` in to __init___.py Added it into test_import
ut.
- Added `# type: ignore` to fix linting. It is not clear, why linting
errors appear after ^ changes.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
4 months ago
jeff kit ac42e96e4c
community[patch], langchain[minor]: Enhance Tencent Cloud VectorDB, langchain: make Tencent Cloud VectorDB self query retrieve compatible (#19651)
- make Tencent Cloud VectorDB support metadata filtering.
- implement delete function for Tencent Cloud VectorDB.
- support both Langchain Embedding model and Tencent Cloud VDB embedding
model.
- Tencent Cloud VectorDB support filter search keyword, compatible with
langchain filtering syntax.
- add Tencent Cloud VectorDB TranslationVisitor, now work with self
query retriever.
- more documentations.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
4 months ago
Guangdong Liu 97d91ec17c
community[patch]: standardize baichuan init args (#20209)
Related to https://github.com/langchain-ai/langchain/issues/20085

@baskaryan
4 months ago
Piyush Jain cd7abc495a
community[minor]: add neptune analytics graph (#20047)
Replacement for PR
[#19772](https://github.com/langchain-ai/langchain/pull/19772).

---------

Co-authored-by: Dave Bechberger <dbechbe@amazon.com>
Co-authored-by: bechbd <bechbd@users.noreply.github.com>
4 months ago
Shuqian ad9750403b
community[minor]: add bedrock anthropic callback for token usage counting (#19864)
**Description:** add bedrock anthropic callback for token usage
counting, consulted openai callback.

---------

Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com>
4 months ago
Prince Canuma 1f9f4d8742
community[minor]: Add support for MLX models (chat & llm) (#18152)
**Description:** This PR adds support for MLX models both chat (i.e.,
instruct) and llm (i.e., pretrained) types/
**Dependencies:** mlx, mlx_lm, transformers
**Twitter handle:** @Prince_Canuma

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
4 months ago
Leonid Ganeline 2f8dd1a161
community[patch]: `cross_encoders` flatten namespaces (#20183)
Issue `langchain_community.cross_encoders` didn't have flattening
namespace code in the __init__.py file.
Changes:
- added code to flattening namespaces (used #20050 as a template)
- added ut for a change
- added missed `test_imports` for `chat_loaders` and
`chat_message_histories` modules
4 months ago
Alex Sherstinsky 5f563e040a
community: extend Predibase integration to support fine-tuned LLM adapters (#19979)
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Langchain-Predibase integration was failing, because
it was not current with the Predibase SDK; in addition, Predibase
integration tests were instantiating the Langchain Community `Predibase`
class with one required argument (`model`) missing. This change updates
the Predibase SDK usage and fixes the integration tests.
    - **Twitter handle:** `@alexsherstinsky`


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
4 months ago
david02871 e1a24d09c5
community: Add PHP language parser to document_loaders (#19850)
**Description:**
Added a PHP language parser to document_loaders
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** N/A

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
4 months ago
Marlene 2f03bc397e
Community: Updating Azure Retriever and Docs to be Azure AI Search instead of Azure Cognitive Search (#19925)
Last year Microsoft [changed the
name](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search)
of Azure Cognitive Search to Azure AI Search. This PR updates the
Langchain Azure Retriever API and it's associated docs to reflect this
change. It may be confusing for users to see the name Cognitive here and
AI in the Microsoft documentation which is why this is needed. I've also
added a more detailed example to the Azure retriever doc page.

There are more places that need a similar update but I'm breaking it up
so the PRs are not too big 😄 Fixing my errors from the previous PR.

Twitter: @marlene_zw

Two new tests added to test backward compatibility in
`libs/community/tests/integration_tests/retrievers/test_azure_cognitive_search.py`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
4 months ago
Rahul Triptahi 820b713086
community[minor]: Add support for Pebblo cloud_api_key in PebbloSafeLoader (#19855)
**Description**:
_PebbloSafeLoader_: Add support for pebblo's cloud api-key in
PebbloSafeLoader

- This Pull request enables PebbloSafeLoader to accept pebblo's cloud
api-key and send the semantic classification data to pebblo cloud.

**Documentation**: Updated 
**Unit test**: Added
**Issue**: NA
**Dependencies**: - None
**Twitter handle**: @rahul_tripathi2

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
4 months ago
Eugene Yurtsev 520ff50adc
community[patch]: Improve import callbacks to make it IDE friendly (#20050)
* declares __all__ as a list of strings (instead of dynamically
computing it)
* import type definitions when TYPE_CHECKING is true
5 months ago
Leonid Ganeline 3aacd11846
community[minor]: added missed class to __all__ (#19888)
Added missed `UnstructuredCHMLoader` class to the
document_loader.\_\_init\_\_.py \_\_all\_\_
5 months ago
happy-go-lucky c6432abdbe
community[patch]: Implement delete method and all async methods in opensearch_vector_search (#17321)
- **Description:** In order to use index and aindex in
libs/langchain/langchain/indexes/_api.py, I implemented delete method
and all async methods in opensearch_vector_search
- **Dependencies:** No changes
5 months ago
Cheng, Penghui cc407e8a1b
community[minor]: weight only quantization with intel-extension-for-transformers. (#14504)
Support weight only quantization with intel-extension-for-transformers.
[Intel® Extension for
Transformers](https://github.com/intel/intel-extension-for-transformers)
is an innovative toolkit to accelerate Transformer-based models on Intel
platforms, in particular effective on 4th Intel Xeon Scalable processor
[Sapphire
Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)
(codenamed Sapphire Rapids). The toolkit provides the below key
features:

* Seamless user experience of model compressions on Transformer-based
models by extending [Hugging Face
transformers](https://github.com/huggingface/transformers) APIs and
leveraging [Intel® Neural
Compressor](https://github.com/intel/neural-compressor)
* Advanced software optimizations and unique compression-aware runtime.
* Optimized Transformer-based model packages.
*
[NeuralChat](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat),
a customizable chatbot framework to create your own chatbot within
minutes by leveraging a rich set of plugins and SOTA optimizations.
*
[Inference](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/graph)
of Large Language Model (LLM) in pure C/C++ with weight-only
quantization kernels.
This PR is an integration of weight only quantization feature with
intel-extension-for-transformers.

Unit test is in
lib/langchain/tests/integration_tests/llm/test_weight_only_quantization.py
The notebook is in
docs/docs/integrations/llms/weight_only_quantization.ipynb.
The document is in
docs/docs/integrations/providers/weight_only_quantization.mdx.

---------

Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Peter Vandenabeele e830a4e731
community[patch]: Add remove_comments option (default True): do not extract html comments (#13259)
- **Description:** add `remove_comments` option (default: True): do not
extract html _comments_,
  - **Issue:** None,
  - **Dependencies:** None,
  - **Tag maintainer:** @nfcampos ,
  - **Twitter handle:** peter_v

I ran `make format`, `make lint` and `make test`.

Discussion: I my use case, I prefer to not have the comments in the
extracted text:
* e.g. from a Google tag that is added in the html as comment
* e.g. content that the authors have temporarily hidden to make it non
visible to the regular reader

Removing the comments makes the extracted text more alike the intended
text to be seen by the reader.


**Choice to make:** do we prefer to make the default for this
`remove_comments` option to be True or False?
I have changed it to True in a second commit, since that is how I would
prefer to use it by default. Have the
cleaned text (without technical Google tags etc.) and also closer to the
actually visible and intended content.
I am not sure what is best aligned with the conventions of langchain in
general ...


INITIAL VERSION (new version above):
~**Choice to make:** do we prefer to make the default for this
`ignore_comments` option to be True or False?
I have set it to False now to be backwards compatible. On the other
hand, I would use it mostly with True.
I am not sure what is best aligned with the conventions of langchain in
general ...~

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Anıl Berk Altuner 4384fa8e49
community[minor]: Add Dria retriever (#17098)
[Dria](https://dria.co/) is a hub of public RAG models for developers to
both contribute and utilize a shared embedding lake. This PR adds a
retriever that can retrieve documents from Dria.
5 months ago
Chenhui Zhang a1f3e9f537
community[minor]: Update ChatZhipuAI to support GLM-4 model (#16695)
Description: Update `ChatZhipuAI` to support the latest `glm-4` model.
Issue: N/A
Dependencies: httpx, httpx-sse, PyJWT

The previous `ChatZhipuAI` implementation requires the `zhipuai`
package, and cannot call the latest GLM model. This is because
- The old version `zhipuai==1.*` doesn't support the latest model.
- `zhipuai==2.*` requires `pydantic V2`, which is incompatible with
'langchain-community'.

This re-implementation invokes the GLM model by sending HTTP requests to
[open.bigmodel.cn](https://open.bigmodel.cn/dev/api) via the `httpx`
package, and uses the `httpx-sse` package to handle stream events.

---------

Co-authored-by: zR <2448370773@qq.com>
5 months ago
Kamal Zhang 368e35c3b1
community[patch]: introduce convert_to_secret() to bananadev llm (#14283)
- **Description:** Per #12165, this PR add to BananaLLM the function
convert_to_secret_str() during environment variable validation.
- **Issue:** #12165
- **Tag maintainer:** @eyurtsev
- **Twitter handle:** @treewatcha75751

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
M.Abdulrahman Alnaseer ba54f1577f
community[minor]: add support for llmsherpa (#19741)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: added support for llmsherpa library"

- [x] **Add tests and docs**: 
1. Integration test:
'docs/docs/integrations/document_loaders/test_llmsherpa.py'.
2. an example notebook:
`docs/docs/integrations/document_loaders/llmsherpa.ipynb`.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Hrvoje Milković b7344e3347
community[minor]: Infobip tool integration (#16805)
**Description:** Adding Tool that wraps Infobip API for sending sms or
emails and email validation.
**Dependencies:** None,
**Twitter handle:** @hmilkovic

Implementation:
```
libs/community/langchain_community/utilities/infobip.py
```

Integration tests:
```
libs/community/tests/integration_tests/utilities/test_infobip.py
```

Example notebook:
```
docs/docs/integrations/tools/infobip.ipynb
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
shahrin014 f51e6a35ba
community[patch]: OllamaEmbeddings - Pass headers to post request (#16880)
## Feature
- Set additional headers in constructor
- Headers will be sent in post request

This feature is useful if deploying Ollama on a cloud service such as
hugging face, which requires authentication tokens to be passed in the
request header.

## Tests
- Test if header is passed
- Test if header is not passed

Similar to https://github.com/langchain-ai/langchain/pull/15881

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Jan Chorowski b8b42ccbc5
community[minor]: Pathway vectorstore(#14859)
- **Description:** Integration with pathway.com data processing pipeline
acting as an always updated vectorstore
  - **Issue:** not applicable
- **Dependencies:** optional dependency on
[`pathway`](https://pypi.org/project/pathway/)
  - **Twitter handle:** pathway_com

The PR provides and integration with `pathway` to provide an easy to use
always updated vector store:

```python
import pathway as pw
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer

data_sources = []
data_sources.append(
    pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True))

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])
vector_server = PathwayVectorServer(
    *data_sources,
    embedder=embeddings_model,
    splitter=text_splitter,
)
vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False)
client = PathwayVectorClient(
    host="127.0.0.1",
    port="8765",
)
query = "What is Pathway?"
docs = client.similarity_search(query)
```

The `PathwayVectorServer` builds a data processing pipeline which
continusly scans documents in a given source connector (google drive,
s3, ...) and builds a vector store. The `PathwayVectorClient` implements
LangChain's `VectorStore` interface and connects to the server to
retrieve documents.

---------

Co-authored-by: Mateusz Lewandowski <lewymati@users.noreply.github.com>
Co-authored-by: mlewandowski <mlewandowski@MacBook-Pro-mlewandowski.local>
Co-authored-by: Berke <berkecanrizai1@gmail.com>
Co-authored-by: Adrian Kosowski <adrian@pathway.com>
Co-authored-by: mlewandowski <mlewandowski@macbook-pro-mlewandowski.home>
Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: mlewandowski <mlewandowski@MBPmlewandowski.ht.home>
Co-authored-by: Szymon Dudycz <szymond@pathway.com>
Co-authored-by: Szymon Dudycz <szymon.dudycz@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
高璟琦 ec7a59c96c
community[minor]: Add solar embedding (#19761)
Solar is a large language model developed by
[Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM.
You can visit the embedding service provided by Solar within this pr.

You may get **SOLAR_API_KEY** from
https://console.upstage.ai/services/embedding
You can refer to more details about accepted llm integration at
https://python.langchain.com/docs/integrations/llms/solar.
5 months ago
Tomaz Bratanic dec00d3050
community[patch]: Add the ability to pass maps to neo4j retrieval query (#19758)
Makes it easier to flatten complex values to text, so you don't have to
use a lot of Cypher to do it.
5 months ago
Robby f7e8a382cc
community[minor]: add hugging face text-to-speech inference API (#18880)
Description: I implemented a tool to use Hugging Face text-to-speech
inference API.

Issue: n/a

Dependencies: n/a

Twitter handle: No Twitter, but do have
[LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol.

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
DasDingoCodes 73eb3f8fd9
community[minor]: Implement DirectoryLoader lazy_load function (#19537)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: Implement DirectoryLoader lazy_load
function"

- [x] **Description**: The `lazy_load` function of the `DirectoryLoader`
yields each document separately. If the given `loader_cls` of the
`DirectoryLoader` also implemented `lazy_load`, it will be used to yield
subdocuments of the file.

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access:
`libs/community/tests/unit_tests/document_loaders/test_directory_loader.py`
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory:
`docs/docs/integrations/document_loaders/directory.ipynb`


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Jialei f7c903e24a
community[minor]: add support for Moonshot llm and chat model (#17100) 5 months ago
Ethan Yang 7164015135
community[minor]: Add Openvino embedding support (#19632)
This PR is used to support both HF and BGE embeddings with openvino

---------

Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
5 months ago
kYLe 124ab79c23
community[minor]: Add Anyscale embedding support (#17605)
**Description:** Add embedding model support for Anyscale Endpoint
**Dependencies:** openai

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Paulo Nascimento 44a3484503
community[patch]: add NotebookLoader unit test (#17721)
Thank you for contributing to LangChain!

- **Description:** added unit tests for NotebookLoader. Linked PR:
https://github.com/langchain-ai/langchain/pull/17614
- **Issue:**
[#17614](https://github.com/langchain-ai/langchain/pull/17614)
    - **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Paulo Nascimento 4c3a67122f
community[patch]: add Integration for OpenAI image gen with v1 sdk (#17771)
**Description:** Created a Langchain Tool for OpenAI DALLE Image
Generation.
**Issue:**
[#15901](https://github.com/langchain-ai/langchain/issues/15901)
**Dependencies:** n/a
**Twitter handle:** @paulodoestech

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Victor Adan afa2d85405
community[patch]: Added missing from_documents method to KNNRetriever. (#18411)
- Description: Added missing `from_documents` method to `KNNRetriever`,
providing the ability to supply metadata to LangChain `Document`s, and
to give it parity to the other retrievers, which do have
`from_documents`.
- Issue: None
- Dependencies: None
- Twitter handle: None

Co-authored-by: Victor Adan <vadan@netroadshow.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
Christian Galo 1adaa3c662
community[minor]: Update Azure Cognitive Services to Azure AI Services (#19488)
This is a follow up to #18371. These are the changes:
- New **Azure AI Services** toolkit and tools to replace those of
**Azure Cognitive Services**.
- Updated documentation for Microsoft platform.
- The image analysis tool has been rewritten to use the new package
`azure-ai-vision-imageanalysis`, doing a proper replacement of
`azure-ai-vision`.

These changes:
- Update outdated naming from "Azure Cognitive Services" to "Azure AI
Services".
- Update documentation to use non-deprecated methods to create and use
agents.
- Removes need to depend on yanked python package (`azure-ai-vision`)

There is one new dependency that is needed as a replacement to
`azure-ai-vision`:
- `azure-ai-vision-imageanalysis`. This is optional and declared within
a function.

There is a new `azure_ai_services.ipynb` notebook showing usage; Changes
have been linted and formatted.

I am leaving the actions of adding deprecation notices and future
removal of Azure Cognitive Services up to the LangChain team, as I am
not sure what the current practice around this is.

---

If this PR makes it, my handle is  @galo@mastodon.social

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
5 months ago
Shengsheng Huang ac1dd8ad94
community[minor]: migrate `bigdl-llm` to `ipex-llm` (#19518)
- **Description**: `bigdl-llm` library has been renamed to
[`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR
migrates the `bigdl-llm` integration to `ipex-llm` .
- **Issue**: N/A. The original PR of `bigdl-llm` is
https://github.com/langchain-ai/langchain/pull/17953
- **Dependencies**: `ipex-llm` library
- **Contribution maintainer**: @shane-huang

Updated doc:   docs/docs/integrations/llms/ipex_llm.ipynb
Updated test:
libs/community/tests/integration_tests/llms/test_ipex_llm.py
5 months ago
Chaunte W. Lacewell a31f692f4e
community[minor]: Add VDMS vectorstore (#19551)
- **Description:** Add support for Intel Lab's [Visual Data Management
System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store
- **Dependencies:** `vdms` library which requires protobuf = "4.24.2".
There is a conflict with dashvector in `langchain` package but conflict
is resolved in `community`.
- **Contribution maintainer:** [@cwlacewe](https://github.com/cwlacewe)
- **Added tests:**
libs/community/tests/integration_tests/vectorstores/test_vdms.py
- **Added docs:** docs/docs/integrations/vectorstores/vdms.ipynb
- **Added cookbook:** cookbook/multi_modal_RAG_vdms.ipynb

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
yongheng.liu 7e29b6061f
community[minor]: integrate China Mobile Ecloud vector search (#15298)
- **Description:** integrate China Mobile Ecloud vector search, 
  - **Dependencies:** elasticsearch==7.10.1

Co-authored-by: liuyongheng <liuyongheng@cmss.chinamobile.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
yuwenzho 3a7d2cf443
community[minor]: Add ITREX optimized Embeddings (#18474)
Introduction
[Intel® Extension for
Transformers](https://github.com/intel/intel-extension-for-transformers)
is an innovative toolkit designed to accelerate GenAI/LLM everywhere
with the optimal performance of Transformer-based models on various
Intel platforms

Description

adding ITREX runtime embeddings using intel-extension-for-transformers.
added mdx documentation and example notebooks
added embedding import testing.

---------

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Fabrizio Ruocco f12cb0bea4
community[patch]: Microsoft Azure Document Intelligence updates (#16932)
- **Description:** Update Azure Document Intelligence implementation by
Microsoft team and RAG cookbook with Azure AI Search

---------

Co-authored-by: Lu Zhang (AI) <luzhan@microsoft.com>
Co-authored-by: Yateng Hong <yatengh@microsoft.com>
Co-authored-by: teethache <hongyateng2006@126.com>
Co-authored-by: Lu Zhang <44625949+luzhang06@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
xsai9101 160a8eb178
community[minor]: add oracle autonomous database doc loader integration (#19536)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Adding oracle autonomous database document loader
integration. This will allow users to connect to oracle autonomous
database through connection string or TNS configuration.
    https://www.oracle.com/autonomous-database/
    - **Issue:** None
    - **Dependencies:** oracledb python package 
    https://pypi.org/project/oracledb/
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
  Unit test and doc are added.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Yuki Watanabe cfecbda48b
community[minor]: Allow passing `allow_dangerous_deserialization` when loading LLM chain (#18894)
### Issue
Recently, the new `allow_dangerous_deserialization` flag was introduced
for preventing unsafe model deserialization that relies on pickle
without user's notice (#18696). Since then some LLMs like Databricks
requires passing in this flag with true to instantiate the model.

However, this breaks existing functionality to loading such LLMs within
a chain using `load_chain` method, because the underlying loader
function
[load_llm_from_config](f96dd57501/libs/langchain/langchain/chains/loading.py (L40))
 (and load_llm) ignores keyword arguments passed in. 

### Solution
This PR fixes this issue by propagating the
`allow_dangerous_deserialization` argument to the class loader iff the
LLM class has that field.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago