Thank you for contributing to LangChain!
- [x] **PR title**
- [x] **PR message**:
- **Description:** Deprecate persist method in Chroma no longer exists
in Chroma 0.4.x
- **Issue:** #20851
- **Dependencies:** None
- **Twitter handle:** AndresAlgaba1
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:**
The RecursiveUrlLoader loader offers a link_regex parameter that can
filter out URLs. However, this filtering capability is limited, and if
the internal links of the website change, unexpected resources may be
loaded. These resources, such as font files, can cause problems in
subsequent embedding processing.
>
https://blog.langchain.dev/assets/fonts/source-sans-pro-v21-latin-ext_latin-regular.woff2?v=0312715cbf
We can add the Content-Type in the HTTP response headers to the document
metadata so developers can choose which resources to use. This allows
developers to make their own choices.
For example, the following may be a good choice for text knowledge.
- text/plain - simple text file
- text/html - HTML web page
- text/xml - XML format file
- text/json - JSON format data
- application/pdf - PDF file
- application/msword - Word document
and ignore the following
- text/css - CSS stylesheet
- text/javascript - JavaScript script
- application/octet-stream - binary data
- image/jpeg - JPEG image
- image/png - PNG image
- image/gif - GIF image
- image/svg+xml - SVG image
- audio/mpeg - MPEG audio files
- video/mp4 - MP4 video file
- application/font-woff - WOFF font file
- application/font-ttf - TTF font file
- application/zip - ZIP compressed file
- application/octet-stream - binary data
**Twitter handle:** @coolbeevip
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description :
- added functionalities - delete, index creation, using existing
connection object etc.
- updated usage
- Added LaceDB cloud OSS support
make lint_diff , make test checks done
Implemented the ability to enable full-text search within the
SingleStore vector store, offering users a versatile range of search
strategies. This enhancement allows users to seamlessly combine
full-text search with vector search, enabling the following search
strategies:
* Search solely by vector similarity.
* Conduct searches exclusively based on text similarity, utilizing
Lucene internally.
* Filter search results by text similarity score, with the option to
specify a threshold, followed by a search based on vector similarity.
* Filter results by vector similarity score before conducting a search
based on text similarity.
* Perform searches using a weighted sum of vector and text similarity
scores.
Additionally, integration tests have been added to comprehensively cover
all scenarios.
Updated notebook with examples.
CC: @baskaryan, @hwchase17
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:**
This PR adds support for advanced filtering to the integration of HANA
Vector Engine.
The newly supported filtering operators are: $eq, $ne, $gt, $gte, $lt,
$lte, $between, $in, $nin, $like, $and, $or
- **Issue:** N/A
- **Dependencies:** no new dependencies added
Added integration tests to:
`libs/community/tests/integration_tests/vectorstores/test_hanavector.py`
Description of the new capabilities in notebook:
`docs/docs/integrations/vectorstores/hanavector.ipynb`
**Description:** implemented GraphStore class for Apache Age graph db
**Dependencies:** depends on psycopg2
Unit and integration tests included. Formatting and linting have been
run.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Community: Unify Titan Takeoff Integrations and Adding Embedding
Support**
**Description:**
Titan Takeoff no longer reflects this either of the integrations in the
community folder. The two integrations (TitanTakeoffPro and
TitanTakeoff) where causing confusion with clients, so have moved code
into one place and created an alias for backwards compatibility. Added
Takeoff Client python package to do the bulk of the work with the
requests, this is because this package is actively updated with new
versions of Takeoff. So this integration will be far more robust and
will not degrade as badly over time.
**Issue:**
Fixes bugs in the old Titan integrations and unified the code with added
unit test converge to avoid future problems.
**Dependencies:**
Added optional dependency takeoff-client, all imports still work without
dependency including the Titan Takeoff classes but just will fail on
initialisation if not pip installed takeoff-client
**Twitter**
@MeryemArik9
Thanks all :)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This PR updates OctoAIEndpoint LLM to subclass BaseOpenAI as OctoAI is
an OpenAI-compatible service. The documentation and tests have also been
updated.
**Description:** Adds ThirdAI NeuralDB retriever integration. NeuralDB
is a CPU-friendly and fine-tunable text retrieval engine. We previously
added a vector store integration but we think that it will be easier for
our customers if they can also find us under under
langchain-community/retrievers.
---------
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
**Description**: Support filter by OR and AND for deprecated PGVector
version
**Issue**: #20445
**Dependencies**: N/A
**Twitter** handle: @martinferenaz
This PR should make it easier for linters to do type checking and for IDEs to jump to definition of code.
See #20050 as a template for this PR.
- As a byproduct: Added 3 missed `test_imports`.
- Added missed `SolarChat` in to __init___.py Added it into test_import
ut.
- Added `# type: ignore` to fix linting. It is not clear, why linting
errors appear after ^ changes.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Langchain-Predibase integration was failing, because
it was not current with the Predibase SDK; in addition, Predibase
integration tests were instantiating the Langchain Community `Predibase`
class with one required argument (`model`) missing. This change updates
the Predibase SDK usage and fixes the integration tests.
- **Twitter handle:** `@alexsherstinsky`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Last year Microsoft [changed the
name](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search)
of Azure Cognitive Search to Azure AI Search. This PR updates the
Langchain Azure Retriever API and it's associated docs to reflect this
change. It may be confusing for users to see the name Cognitive here and
AI in the Microsoft documentation which is why this is needed. I've also
added a more detailed example to the Azure retriever doc page.
There are more places that need a similar update but I'm breaking it up
so the PRs are not too big 😄 Fixing my errors from the previous PR.
Twitter: @marlene_zw
Two new tests added to test backward compatibility in
`libs/community/tests/integration_tests/retrievers/test_azure_cognitive_search.py`
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Removes required usage of `requests` from `langchain-core`, all of which
has been deprecated.
- removes Tracer V1 implementations
- removes old `try_load_from_hub` github-based hub implementations
Removal done in a way where imports will still succeed, and usage will
fail with a `RuntimeError`.
**Description:** adds integration with [Layerup
Security](https://uselayerup.com). Docs can be found
[here](https://docs.uselayerup.com). Integrates directly with our Python
SDK.
**Dependencies:**
[LayerupSecurity](https://pypi.org/project/LayerupSecurity/)
**Note**: all methods for our product require a paid API key, so I only
included 1 test which checks for an invalid API key response. I have
tested extensively locally.
**Twitter handle**: [@layerup_](https://twitter.com/layerup_)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
[Dria](https://dria.co/) is a hub of public RAG models for developers to
both contribute and utilize a shared embedding lake. This PR adds a
retriever that can retrieve documents from Dria.
Description: Update `ChatZhipuAI` to support the latest `glm-4` model.
Issue: N/A
Dependencies: httpx, httpx-sse, PyJWT
The previous `ChatZhipuAI` implementation requires the `zhipuai`
package, and cannot call the latest GLM model. This is because
- The old version `zhipuai==1.*` doesn't support the latest model.
- `zhipuai==2.*` requires `pydantic V2`, which is incompatible with
'langchain-community'.
This re-implementation invokes the GLM model by sending HTTP requests to
[open.bigmodel.cn](https://open.bigmodel.cn/dev/api) via the `httpx`
package, and uses the `httpx-sse` package to handle stream events.
---------
Co-authored-by: zR <2448370773@qq.com>
- **Description:** Support reranking based on cross encoder models
available from HuggingFace.
- Added `CrossEncoder` schema
- Implemented `HuggingFaceCrossEncoder` and
`SagemakerEndpointCrossEncoder`
- Implemented `CrossEncoderReranker` that performs similar functionality
to `CohereRerank`
- Added `cross-encoder-reranker.ipynb` to demonstrate how to use it.
Please let me know if anything else needs to be done to make it visible
on the table-of-contents navigation bar on the left, or on the card list
on [retrievers documentation
page](https://python.langchain.com/docs/integrations/retrievers).
- **Issue:** N/A
- **Dependencies:** None other than the existing ones.
---------
Co-authored-by: Kenny Choe <kchoe@amazon.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Description
This implementation adds functionality from the AlphaVantage API,
renowned for its comprehensive financial data. The class encapsulates
various methods, each dedicated to fetching specific types of financial
information from the API.
### Implemented Functions
- **`search_symbols`**:
- Searches the AlphaVantage API for financial symbols using the provided
keywords.
- **`_get_market_news_sentiment`**:
- Retrieves market news sentiment for a specified stock symbol from the
AlphaVantage API.
- **`_get_time_series_daily`**:
- Fetches daily time series data for a specific symbol from the
AlphaVantage API.
- **`_get_quote_endpoint`**:
- Obtains the latest price and volume information for a given symbol
from the AlphaVantage API.
- **`_get_time_series_weekly`**:
- Gathers weekly time series data for a particular symbol from the
AlphaVantage API.
- **`_get_top_gainers_losers`**:
- Provides details on top gainers, losers, and most actively traded
tickers in the US market from the AlphaVantage API.
### Issue:
- #11994
### Dependencies:
- 'requests' library for HTTP requests. (import requests)
- 'pytest' library for testing. (import pytest)
---------
Co-authored-by: Adam Badar <94140103+adam-badar@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Langchain-Predibase integration was failing, because
it was not current with the Predibase SDK; in addition, Predibase
integration tests were instantiating the Langchain Community `Predibase`
class with one required argument (`model`) missing. This change updates
the Predibase SDK usage and fixes the integration tests.
- **Twitter handle:** `@alexsherstinsky`
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "community: added support for llmsherpa library"
- [x] **Add tests and docs**:
1. Integration test:
'docs/docs/integrations/document_loaders/test_llmsherpa.py'.
2. an example notebook:
`docs/docs/integrations/document_loaders/llmsherpa.ipynb`.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:**
1. Fix the BiliBiliLoader that can receive cookie parameters, it
requires 3 other parameters to run. The change is backward compatible.
2. Add test;
3. Add example in docs
- **Issue:** [#14213]
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- [x] **PR title**: "community: Support streaming in Azure ML and few
naming changes"
- [x] **PR message**:
- **Description:** Added support for streaming for azureml_endpoint.
Also, renamed and AzureMLEndpointApiType.realtime to
AzureMLEndpointApiType.dedicated. Also, added new classes
CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and
updated the classes LlamaChatContentFormatter and LlamaContentFormatter
to now show a deprecated warning message when instantiated.
---------
Co-authored-by: Sachin Paryani <saparan@microsoft.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR allows to calculate token usage for prompts and completion
directly in the generation method of BedrockChat. The token usage
details are then returned together with the generations, so that other
downstream tasks can access them easily.
This allows to define a callback for tokens tracking and cost
calculation, similarly to what happens with OpenAI (see
[OpenAICallbackHandler](https://api.python.langchain.com/en/latest/_modules/langchain_community/callbacks/openai_info.html#OpenAICallbackHandler).
I plan on adding a BedrockCallbackHandler later.
Right now keeping track of tokens in the callback is already possible,
but it requires passing the llm, as done here:
https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html.
However, I find the approach of this PR cleaner.
Thanks for your reviews. FYI @baskaryan, @hwchase17
---------
Co-authored-by: taamedag <Davide.Menini@swisscom.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description**: `bigdl-llm` library has been renamed to
[`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR
migrates the `bigdl-llm` integration to `ipex-llm` .
- **Issue**: N/A. The original PR of `bigdl-llm` is
https://github.com/langchain-ai/langchain/pull/17953
- **Dependencies**: `ipex-llm` library
- **Contribution maintainer**: @shane-huang
Updated doc: docs/docs/integrations/llms/ipex_llm.ipynb
Updated test:
libs/community/tests/integration_tests/llms/test_ipex_llm.py
- **Description:** Add support for Intel Lab's [Visual Data Management
System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store
- **Dependencies:** `vdms` library which requires protobuf = "4.24.2".
There is a conflict with dashvector in `langchain` package but conflict
is resolved in `community`.
- **Contribution maintainer:** [@cwlacewe](https://github.com/cwlacewe)
- **Added tests:**
libs/community/tests/integration_tests/vectorstores/test_vdms.py
- **Added docs:** docs/docs/integrations/vectorstores/vdms.ipynb
- **Added cookbook:** cookbook/multi_modal_RAG_vdms.ipynb
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Updates Meilisearch vectorstore for compatibility
with v1.6 and above. Adds embedders settings and embedder_name which are
now required.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **Add len() implementation to Chroma**: "package: community"
- [x] **PR message**:
- **Description:** add an implementation of the __len__() method for the
Chroma vectostore, for convenience.
- **Issue:** no exposed method to know the size of a Chroma vectorstore
- **Dependencies:** None
- **Twitter handle:** lowrank_adrian
- [x] **Add tests and docs**
- [x] **Lint and test**
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Fixing some issues for AzureCosmosDBSemanticCache
- Added the entry for "AzureCosmosDBSemanticCache" which was missing in
langchain/cache.py
- Added application name when creating the MongoClient for the
AzureCosmosDBVectorSearch, for tracking purposes.
@baskaryan, can you please review this PR, we need this to go in asap.
These are just small fixes which we found today in our testing.
### Prem SDK integration in LangChain
This PR adds the integration with [PremAI's](https://www.premai.io/)
prem-sdk with langchain. User can now access to deployed models
(llms/embeddings) and use it with langchain's ecosystem. This PR adds
the following:
### This PR adds the following:
- [x] Add chat support
- [X] Adding embedding support
- [X] writing integration tests
- [X] writing tests for chat
- [X] writing tests for embedding
- [X] writing unit tests
- [X] writing tests for chat
- [X] writing tests for embedding
- [X] Adding documentation
- [X] writing documentation for chat
- [X] writing documentation for embedding
- [X] run `make test`
- [X] run `make lint`, `make lint_diff`
- [X] Final checks (spell check, lint, format and overall testing)
---------
Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This PR adds code to make sure that the correct base URL is being
created for the Azure Cognitive Search retriever. At the moment an
incorrect base URL is being generated. I think this is happening because
the original code was based on a depreciated API version. No
dependencies need to be added. I've also added more context to the test
doc strings.
I should also note that ACS is now Azure AI Search. I will open a
separate PR to make these changes as that would be a breaking change and
should potentially be discussed.
Twitter: @marlene_zw
- No new tests added, however the current ACS retriever tests are now
passing when I run them.
- Code was linted.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Added support for lower-case and mixed-case names
The names for tables and columns previouly had to be UPPER_CASE.
With this enhancement, also lower_case and MixedCase are supported,
- **Issue:** N/A
- **Dependencies:** no new dependecies added
- **Twitter handle:** @sapopensource
**Description:**
This PR adds [Dappier](https://dappier.com/) for the chat model. It
supports generate, async generate, and batch functionalities. We added
unit and integration tests as well as a notebook with more details about
our chat model.
**Dependencies:**
No extra dependencies are needed.
DuckDB has a cosine similarity function along list and array data types,
which can be used as a vector store.
- **Description:** The latest version of DuckDB features a cosine
similarity function, which can be used with its support for list or
array column types. This PR surfaces this functionality to langchain.
- **Dependencies:** duckdb 0.10.0
- **Twitter handle:** @igocrite
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description**:
this PR enable VectorStore autoconfiguration for Infinispan: if
metadatas are only of basic types, protobuf
config will be automatically generated for the user.
This PR makes the following updates in the pgvector database:
1. Use JSONB field for metadata instead of JSON
2. Update operator syntax to include required `$` prefix before the
operators (otherwise there will be name collisions with fields)
3. The change is non-breaking, old functionality is still the default,
but it will emit a deprecation warning
4. Previous functionality has bugs associated with comparisons due to
casting to text (so lexical ordering is used incorrectly for numeric
fields)
5. Adds an a GIN index on the JSONB field for more efficient querying
## Add Passio Nutrition AI Food Search Tool to Community Package
### Description
We propose adding a new tool to the `community` package, enabling
integration with Passio Nutrition AI for food search functionality. This
tool will provide a simple interface for retrieving nutrition facts
through the Passio Nutrition AI API, simplifying user access to
nutrition data based on food search queries.
### Implementation Details
- **Class Structure:** Implement `NutritionAI`, extending `BaseTool`. It
includes an `_run` method that accepts a query string and, optionally, a
`CallbackManagerForToolRun`.
- **API Integration:** Use `NutritionAIAPI` for the API wrapper,
encapsulating all interactions with the Passio Nutrition AI and
providing a clean API interface.
- **Error Handling:** Implement comprehensive error handling for API
request failures.
### Expected Outcome
- **User Benefits:** Enable easy querying of nutrition facts from Passio
Nutrition AI, enhancing the utility of the `langchain_community` package
for nutrition-related projects.
- **Functionality:** Provide a straightforward method for integrating
nutrition information retrieval into users' applications.
### Dependencies
- `langchain_core` for base tooling support
- `pydantic` for data validation and settings management
- Consider `requests` or another HTTP client library if not covered by
`NutritionAIAPI`.
### Tests and Documentation
- **Unit Tests:** Include tests that mock network interactions to ensure
tool reliability without external API dependency.
- **Documentation:** Create an example notebook in
`docs/docs/integrations/tools/passio_nutrition_ai.ipynb` showing usage,
setup, and example queries.
### Contribution Guidelines Compliance
- Adhere to the project's linting and formatting standards (`make
format`, `make lint`, `make test`).
- Ensure compliance with LangChain's contribution guidelines,
particularly around dependency management and package modifications.
### Additional Notes
- Aim for the tool to be a lightweight, focused addition, not
introducing significant new dependencies or complexity.
- Potential future enhancements could include caching for common queries
to improve performance.
### Twitter Handle
- Here is our Passio AI [twitter handle](https://twitter.com/@passio_ai)
where we announce our products.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
## Description
- Add [Friendli](https://friendli.ai/) integration for `Friendli` LLM
and `ChatFriendli` chat model.
- Unit tests and integration tests corresponding to this change are
added.
- Documentations corresponding to this change are added.
## Dependencies
- Optional dependency
[`friendli-client`](https://pypi.org/project/friendli-client/) package
is added only for those who use `Frienldi` or `ChatFriendli` model.
## Twitter handle
- https://twitter.com/friendliai
This pull request introduces initial support for the TiDB vector store.
The current version is basic, laying the foundation for the vector store
integration. While this implementation provides the essential features,
we plan to expand and improve the TiDB vector store support with
additional enhancements in future updates.
Upcoming Enhancements:
* Support for Vector Index Creation: To enhance the efficiency and
performance of the vector store.
* Support for max marginal relevance search.
* Customized Table Structure Support: Recognizing the need for
flexibility, we plan for more tailored and efficient data store
solutions.
Simple use case exmaple
```python
from typing import List, Tuple
from langchain.docstore.document import Document
from langchain_community.vectorstores import TiDBVectorStore
from langchain_openai import OpenAIEmbeddings
db = TiDBVectorStore.from_texts(
embedding=embeddings,
texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'],
table_name="tidb_vector_langchain",
connection_string=tidb_connection_url,
distance_strategy="cosine",
)
query = "Can you tell me about Alexandra?"
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)
```
**Description:**
This integrates Infinispan as a vectorstore.
Infinispan is an open-source key-value data grid, it can work as single
node as well as distributed.
Vector search is supported since release 15.x
For more: [Infinispan Home](https://infinispan.org)
Integration tests are provided as well as a demo notebook
This is a PR that adds a dangerous load parameter to force users to opt in to use pickle.
This is a PR that's meant to raise user awareness that the pickling module is involved.
Neo4j tools use particular node labels and relationship types to store
metadata, but are irrelevant for text2cypher or graph generation, so we
want to ignore them in the schema representation.
Description:
This pull request introduces several enhancements for Azure Cosmos
Vector DB, primarily focused on improving caching and search
capabilities using Azure Cosmos MongoDB vCore Vector DB. Here's a
summary of the changes:
- **AzureCosmosDBSemanticCache**: Added a new cache implementation
called AzureCosmosDBSemanticCache, which utilizes Azure Cosmos MongoDB
vCore Vector DB for efficient caching of semantic data. Added
comprehensive test cases for AzureCosmosDBSemanticCache to ensure its
correctness and robustness. These tests cover various scenarios and edge
cases to validate the cache's behavior.
- **HNSW Vector Search**: Added HNSW vector search functionality in the
CosmosDB Vector Search module. This enhancement enables more efficient
and accurate vector searches by utilizing the HNSW (Hierarchical
Navigable Small World) algorithm. Added corresponding test cases to
validate the HNSW vector search functionality in both
AzureCosmosDBSemanticCache and AzureCosmosDBVectorSearch. These tests
ensure the correctness and performance of the HNSW search algorithm.
- **LLM Caching Notebook** - The notebook now includes a comprehensive
example showcasing the usage of the AzureCosmosDBSemanticCache. This
example highlights how the cache can be employed to efficiently store
and retrieve semantic data. Additionally, the example provides default
values for all parameters used within the AzureCosmosDBSemanticCache,
ensuring clarity and ease of understanding for users who are new to the
cache implementation.
@hwchase17,@baskaryan, @eyurtsev,
Current implementation doesn't have an indexed property that would
optimize the import. I have added a `baseEntityLabel` parameter that
allows you to add a secondary node label, which has an indexed id
`property`. By default, the behaviour is identical to previous version.
Since multi-labeled nodes are terrible for text2cypher, I removed the
secondary label from schema representation object and string, which is
used in text2cypher.
This PR makes `cohere_api_key` in `llms/cohere` a SecretStr, so that the
API Key is not leaked when `Cohere.cohere_api_key` is represented as a
string.
---------
Signed-off-by: Arun <arun@arun.blog>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description**:
[`bigdl-llm`](https://github.com/intel-analytics/BigDL) is a library for
running LLM on Intel XPU (from Laptop to GPU to Cloud) using
INT4/FP4/INT8/FP8 with very low latency (for any PyTorch model). This PR
adds bigdl-llm integrations to langchain.
- **Issue**: NA
- **Dependencies**: `bigdl-llm` library
- **Contribution maintainer**: @shane-huang
Examples added:
- docs/docs/integrations/llms/bigdl.ipynb
- **Description:** The current embedchain implementation seems to handle
document metadata differently than done in the current implementation of
langchain and a KeyError is thrown. I would love for someone else to
test this...
---------
Co-authored-by: KKUGLER <kai.kugler@mercedes-benz.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Deshraj Yadav <deshraj@gatech.edu>
Sometimes, you want to use various parameters in the retrieval query of
Neo4j Vector to personalize/customize results. Before, when there were
only predefined chains, it didn't really make sense. Now that it's all
about custom chains and LCEL, it is worth adding since users can inject
any params they wish at query time. Isn't prone to SQL injection-type
attacks since we use parameters and not concatenating strings.
- **Description:** A generic document loader adapter for SQLAlchemy on
top of LangChain's `SQLDatabaseLoader`.
- **Needed by:** https://github.com/crate-workbench/langchain/pull/1
- **Depends on:** GH-16655
- **Addressed to:** @baskaryan, @cbornet, @eyurtsev
Hi from CrateDB again,
in the same spirit like GH-16243 and GH-16244, this patch breaks out
another commit from https://github.com/crate-workbench/langchain/pull/1,
in order to reduce the size of this patch before submitting it, and to
separate concerns.
To accompany the SQLAlchemy adapter implementation, the patch includes
integration tests for both SQLite and PostgreSQL. Let me know if
corresponding utility resources should be added at different spots.
With kind regards,
Andreas.
### Software Tests
```console
docker compose --file libs/community/tests/integration_tests/document_loaders/docker-compose/postgresql.yml up
```
```console
cd libs/community
pip install psycopg2-binary
pytest -vvv tests/integration_tests -k sqldatabase
```
```
14 passed
```
![image](https://github.com/langchain-ai/langchain/assets/453543/42be233c-eb37-4c76-a830-474276e01436)
---------
Co-authored-by: Andreas Motl <andreas.motl@crate.io>
- **Description:**
- Add DocumentManager class, which is a nosql record manager.
- In order to use index and aindex in
libs/langchain/langchain/indexes/_api.py, DocumentManager inherits
RecordManager.
- Also I added the MongoDB implementation of Document Manager too.
- **Dependencies:** pymongo, motor
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** Add DocumentManager class, which is a no sql record
manager. To use index method and aindex method in indexes._api.py,
Document Manager inherits RecordManager.Add the MongoDB implementation
of Document Manager.
- **Dependencies:** pymongo, motor
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:** Initial pull request for Kinetica LLM wrapper
**Issue:** N/A
**Dependencies:** No new dependencies for unit tests. Integration tests
require gpudb, typeguard, and faker
**Twitter handle:** @chad_juliano
Note: There is another pull request for Kinetica vectorstore. Ultimately
we would like to make a partner package but we are starting with a
community contribution.
In this pull request, we introduce the add_images method to the
SingleStoreDB vector store class, expanding its capabilities to handle
multi-modal embeddings seamlessly. This method facilitates the
incorporation of image data into the vector store by associating each
image's URI with corresponding document content, metadata, and either
pre-generated embeddings or embeddings computed using the embed_image
method of the provided embedding object.
the change includes integration tests, validating the behavior of the
add_images. Additionally, we provide a notebook showcasing the usage of
this new method.
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Hi, I'm from the LanceDB team.
Improves LanceDB integration by making it easier to use - now you aren't
required to create tables manually and pass them in the constructor,
although that is still backward compatible.
Bug fix - pandas was being used even though it's not a dependency for
LanceDB or langchain
PS - this issue was raised a few months ago but lost traction. It is a
feature improvement for our users kindly review this , Thanks !
- **Description:** This fixes an issue with working with RecordManager.
RecordManager was generating new hashes on documents because `add_texts`
was modifying the metadata directly. Additionally moved some tests to
unit tests since that was a more appropriate home.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** `@_morgan_adams_`
- **Description:** Updates to the Kuzu API had broken this
functionality. These updates resolve those issues and add a new test to
demonstrate the updates.
- **Issue:** #11874
- **Dependencies:** No new dependencies
- **Twitter handle:** @amirk08
Test results:
```
tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_no_params PASSED [ 33%]
tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_params PASSED [ 66%]
tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_refresh_schema PASSED [100%]
=================================================== slowest 5 durations ===================================================
0.53s call tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_refresh_schema
0.34s call tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_no_params
0.28s call tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_params
0.03s teardown tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_refresh_schema
0.02s teardown tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_params
==================================================== 3 passed in 1.27s ====================================================
```
1. integrate with
[`Yuan2.0`](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md)
2. update `langchain.llms`
3. add a new doc for [Yuan2.0
integration](docs/docs/integrations/llms/yuan2.ipynb)
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This pull request introduces support for various Approximate Nearest
Neighbor (ANN) vector index algorithms in the VectorStore class,
starting from version 8.5 of SingleStore DB. Leveraging this enhancement
enables users to harness the power of vector indexing, significantly
boosting search speed, particularly when handling large sets of vectors.
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>