Commit Graph

818 Commits (d4befd0cfb67a73766fe40be3906ead1a3b97ba1)

Author SHA1 Message Date
Dristy Srivastava 5f1d1666e3
community[patch]: Add support for pebblo server and client version (#20269)
**Description**:
_PebbloSafeLoader_: Add support for pebblo server and client version


**Documentation:** NA
**Unit test:** NA
**Issue:** NA
**Dependencies:**  None

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
am-kinetica b54b19ba1c
community[minor]: Implemented Kinetica Document Loader and added notebooks (#20002)
- [ ] **Kinetica Document Loader**: "community: a class to load
Documents from Kinetica"



- [ ] **Kinetica Document Loader**: 
- **Description:** implemented KineticaLoader in `kinetica_loader.py`
- **Dependencies:** install the Kinetica API using `pip install
gpudb==7.2.0.1 `
5 months ago
Shengsheng Huang fd1061e7bf
community[patch]: add more data types support to ipex-llm llm integration (#20833)
- **Description**:  
- **add support for more data types**: by default `IpexLLM` will load
the model in int4 format. This PR adds more data types support such as
`sym_in5`, `sym_int8`, etc. Data formats like NF3, NF4, FP4 and FP8 are
only supported on GPU and will be added in future PR.
    - Fix a small issue in saving/loading, update api docs
- **Dependencies**: `ipex-llm` library
- **Document**: In `docs/docs/integrations/llms/ipex_llm.ipynb`, added
instructions for saving/loading low-bit model.
- **Tests**: added new test cases to
`libs/community/tests/integration_tests/llms/test_ipex_llm.py`, added
config params.
- **Contribution maintainer**: @shane-huang
5 months ago
Rahul Triptahi dc921f0823
community[patch]: Add semantic info to metadata, classified by pebblo-server. (#20468)
Description: Add support for Semantic topics and entities.
Classification done by pebblo-server is not used to enhance metadata of
Documents loaded by document loaders.
Dependencies: None
Documentation: Updated.

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
5 months ago
Jingpan Xiong 1202017c56
community[minor]: Add relyt vector database (#20316)
Co-authored-by: kaka <kaka@zbyte-inc.cloud>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: jingsi <jingsi@leadincloud.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
davidefantiniIntel f386f71bb3
community: fix tqdm import (#20263)
Description: Fix tqdm import in QuantizedBiEncoderEmbeddings
5 months ago
Andres Algaba 05ae8ca7d4
community[patch]: deprecate persist method in Chroma (#20855)
Thank you for contributing to LangChain!

- [x] **PR title**

- [x] **PR message**:
- **Description:** Deprecate persist method in Chroma no longer exists
in Chroma 0.4.x
    - **Issue:** #20851 
    - **Dependencies:** None
    - **Twitter handle:** AndresAlgaba1

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
Tomaz Bratanic 520972fd0f
community[patch]: Support passing graph object to Neo4j integrations (#20876)
For driver connection reusage, we introduce passing the graph object to
neo4j integrations
5 months ago
Lei Zhang 748a6ae609
community[patch]: add HTTP response headers Content-Type to metadata of RecursiveUrlLoader document (#20875)
**Description:** 
The RecursiveUrlLoader loader offers a link_regex parameter that can
filter out URLs. However, this filtering capability is limited, and if
the internal links of the website change, unexpected resources may be
loaded. These resources, such as font files, can cause problems in
subsequent embedding processing.

>
https://blog.langchain.dev/assets/fonts/source-sans-pro-v21-latin-ext_latin-regular.woff2?v=0312715cbf

We can add the Content-Type in the HTTP response headers to the document
metadata so developers can choose which resources to use. This allows
developers to make their own choices.

For example, the following may be a good choice for text knowledge.

- text/plain - simple text file
- text/html - HTML web page
- text/xml - XML format file
- text/json - JSON format data
- application/pdf - PDF file
- application/msword - Word document

and ignore the following

- text/css - CSS stylesheet
- text/javascript - JavaScript script
- application/octet-stream - binary data
- image/jpeg - JPEG image
- image/png - PNG image
- image/gif - GIF image
- image/svg+xml - SVG image
- audio/mpeg - MPEG audio files
- video/mp4 - MP4 video file
- application/font-woff - WOFF font file
- application/font-ttf - TTF font file
- application/zip - ZIP compressed file
- application/octet-stream - binary data

**Twitter handle:** @coolbeevip

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Joan Fontanals baefbfb14e
community[mionr]: add Jina Reranker in retrievers module (#19406)
- **Description:** Adapt JinaEmbeddings to run with the new Jina AI
Rerank API
- **Twitter handle:** https://twitter.com/JinaAI_


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Jason_Chen 53bb7dbd29
community[patch]: add BeautifulSoupTransformer remove_unwanted_classnames method (#20467)
Add the remove_unwanted_classnames method to the
BeautifulSoupTransformer class, which can filter more effectively.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Bagatur 5b83130855
core[minor], langchain[patch], community[patch]: mv StructuredQuery (#20849)
mv StructuredQuery to core
5 months ago
Mish Ushakov 6ccecf2363
community[minor]: added Browserbase loader (#20478) 5 months ago
ccurme 481d3855dc
patch: remove usage of llm, chat model __call__ (#20788)
- `llm(prompt)` -> `llm.invoke(prompt)`
- `llm(prompt=prompt` -> `llm.invoke(prompt)` (same with `messages=`)
- `llm(prompt, callbacks=callbacks)` -> `llm.invoke(prompt,
config={"callbacks": callbacks})`
- `llm(prompt, **kwargs)` -> `llm.invoke(prompt, **kwargs)`
5 months ago
Raghav Dixit 9b7fb381a4
community[patch]: LanceDB integration patch update (#20686)
Description : 

- added functionalities - delete, index creation, using existing
connection object etc.
- updated usage 
- Added LaceDB cloud OSS support

make lint_diff , make test checks done
5 months ago
volodymyr-memsql 493afe4d8d
community[patch]: add hybrid search to singlestoredb vectorstore (#20793)
Implemented the ability to enable full-text search within the
SingleStore vector store, offering users a versatile range of search
strategies. This enhancement allows users to seamlessly combine
full-text search with vector search, enabling the following search
strategies:

* Search solely by vector similarity.
* Conduct searches exclusively based on text similarity, utilizing
Lucene internally.
* Filter search results by text similarity score, with the option to
specify a threshold, followed by a search based on vector similarity.
* Filter results by vector similarity score before conducting a search
based on text similarity.
* Perform searches using a weighted sum of vector and text similarity
scores.

Additionally, integration tests have been added to comprehensively cover
all scenarios.
Updated notebook with examples.

CC: @baskaryan, @hwchase17

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Tomaz Bratanic 9efab3ed66
community[patch]: Add driver config param for neo4j graph (#20772)
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Leonid Ganeline 13751c3297
community: `tigergraph` fixes (#20034)
- added guard on the `pyTigerGraph` import
- added a missed example page in the `docs/integrations/graphs/`
- formatted the `docs/integrations/providers/` page to the consistent
format. Added links.
5 months ago
Martin Kolb 0186e4e633
community[patch]: Advanced filtering for HANA Cloud Vector Engine (#20821)
- **Description:**
This PR adds support for advanced filtering to the integration of HANA
Vector Engine.
The newly supported filtering operators are: $eq, $ne, $gt, $gte, $lt,
$lte, $between, $in, $nin, $like, $and, $or

  - **Issue:** N/A
  - **Dependencies:** no new dependencies added

Added integration tests to:
`libs/community/tests/integration_tests/vectorstores/test_hanavector.py`

Description of the new capabilities in notebook:
`docs/docs/integrations/vectorstores/hanavector.ipynb`
5 months ago
Alex Sherstinsky 12e5ec6de3
community: Support both Predibase SDK-v1 and SDK-v2 in Predibase-LangChain integration (#20859) 5 months ago
JeffKatzy 5ab3f9a995
community[patch]: standardize chat init args (#20844)
Thank you for contributing to LangChain!

community:perplexity[patch]: standardize init args

updated pplx_api_key and request_timeout so that aliased to api_key, and
timeout respectively. Added test that both continue to set the same
underlying attributes.

Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085)

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
Massimiliano Pronesti 8d1167b32f
community[patch]: add support for similarity_score_threshold search in… (#20852)
See
https://github.com/langchain-ai/langchain/issues/20600#issuecomment-2075569338
for details.

@chrislrobert
5 months ago
Eugene Yurtsev 30e48c9878
core[patch],community[patch]: Move file chat history back to community (#20834)
Marking as patch since we haven't had releases in between. This just reverting part of a PR from yesterday.
5 months ago
Nestor Qin 9111d3a636
community[patch]: Fix message formatting for Anthropic models on Amazon Bedrock (#20801)
**Description:**
This PR fixes an issue in message formatting function for Anthropic
models on Amazon Bedrock.

Currently, LangChain BedrockChat model will crash if it uses Anthropic
models and the model return a message in the following type:
- `AIMessageChunk`

Moreover, when use BedrockChat with for building Agent, the following
message types will trigger the same issue too:
- `HumanMessageChunk`
- `FunctionMessage`

**Issue:**
https://github.com/langchain-ai/langchain/issues/18831

**Dependencies:**
No.

**Testing:**
Manually tested. The following code was failing before the patch and
works after.

```
@tool
def square_root(x: str):
    "Useful when you need to calculate the square root of a number"
    return math.sqrt(int(x))

llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    model_kwargs={ "temperature": 0.0 },
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", FUNCTION_CALL_PROMPT),
        ("human", "Question: {user_input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

tools = [square_root]
tools_string = format_tool_to_anthropic_function(square_root)

agent = (
        RunnablePassthrough.assign(
            user_input=lambda x: x['user_input'],
            agent_scratchpad=lambda x: format_to_openai_function_messages(
                x["intermediate_steps"]
            )
        )
        | prompt
        | llm
        | AnthropicFunctionsAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True)
output = agent_executor.invoke({
    "user_input": "What is the square root of 2?",
    "tools_string": tools_string,
})
```
List of messages returned from Bedrock:
```
<SystemMessage> content='You are a helpful assistant.'
<HumanMessage> content='Question: What is the square root of 2?'
<AIMessageChunk> content="Okay, let's calculate the square root of 2.<scratchpad>\nTo calculate the square root of a number, I can use the square_root tool:\n\n<function_calls>\n  <invoke>\n    <tool_name>square_root</tool_name>\n    <parameters>\n      <__arg1>2</__arg1>\n    </parameters>\n  </invoke>\n</function_calls>\n</scratchpad>\n\n<function_results>\n<search_result>\nThe square root of 2 is approximately 1.414213562373095\n</search_result>\n</function_results>\n\n<answer>\nThe square root of 2 is approximately 1.414213562373095\n</answer>" id='run-92363df7-eff6-4849-bbba-fa16a1b2988c'"
<FunctionMessage> content='1.4142135623730951' name='square_root'
```
5 months ago
Aliaksandr Kuzmik 5560cc448c
community[patch]: fix CometTracer bug (#20796)
Hi! My name is Alex, I'm an SDK engineer from
[Comet](https://www.comet.com/site/)

This PR updates the `CometTracer` class.

Fixed an issue when `CometTracer` failed while logging the data to Comet
because this data is not JSON-encodable.

The problem was in some of the `Run` attributes that could contain
non-default types inside, now these attributes are taken not from the
run instance, but from the `run.dict()` return value.
5 months ago
Eugene Yurtsev 645b1e142e
core[minor],langchain[patch],community[patch]: Move InMemory and File implementations of Chat History to core (#20752)
This PR moves the implementations for chat history to core. So it's
easier to determine which dependencies need to be broken / add
deprecation warnings
5 months ago
ccurme 7a922f3e48
core, openai: support custom token encoders (#20762) 5 months ago
Christophe Bornet 0ae5027d98
community[patch]: Remove usage of deprecated StoredBlobHistory in CassandraChatMessageHistory (#20666) 5 months ago
Eugene Yurtsev 38adbfdf34
community[patch],core[minor]: Move BaseToolKit to core.tools (#20669) 5 months ago
Mark Needham ce23f8293a
Community patch clickhouse make it possible to not specify index (#20460)
Vector indexes in ClickHouse are experimental at the moment and can
sometimes break/change behaviour. So this PR makes it possible to say
that you don't want to specify an index type.

Any queries against the embedding column will be brute force/linear
scan, but that gives reasonable performance for small-medium dataset
sizes.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
ccurme c010ec8b71
patch: deprecate (a)get_relevant_documents (#20477)
- `.get_relevant_documents(query)` -> `.invoke(query)`
- `.get_relevant_documents(query=query)` -> `.invoke(query)`
- `.get_relevant_documents(query, callbacks=callbacks)` ->
`.invoke(query, config={"callbacks": callbacks})`
- `.get_relevant_documents(query, **kwargs)` -> `.invoke(query,
**kwargs)`

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Matheus Henrique Raymundo bb69819267
community: Fix the stop sequence key name for Mistral in Bedrock (#20709)
Fixing the wrong stop sequence key name that causes an error on AWS
Bedrock.
You can check the MistralAI bedrock parameters
[here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral.html)
This change fixes this
[issue](https://github.com/langchain-ai/langchain/issues/20095)
5 months ago
Bagatur 1c7b3c75a7
community[patch], experimental[patch]: support tool-calling sql and p… (#20639)
d agents
5 months ago
shumway743 cb6e5e56c2
community[minor]: add graph store implementation for apache age (#20582)
**Description:** implemented GraphStore class for Apache Age graph db

**Dependencies:** depends on psycopg2

Unit and integration tests included. Formatting and linting have been
run.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Christophe Bornet c909ae0152
community[minor]: Add async methods to CassandraVectorStore (#20602)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Dmitry Tyumentsev f111efeb6e
community[patch]: YandexGPT API add ability to disable request logging (#20670)
Closes (#20622)

Added the ability to [disable logging of requests to
YandexGPT](https://yandex.cloud/en/docs/foundation-models/operations/yandexgpt/disable-logging).
5 months ago
Tomaz Bratanic 8c08cf4619
community: Add support for relationship indexes in neo4j vector (#20657)
Neo4j has added relationship vector indexes.
We can't populate them, but we can use existing indexes for retrieval
5 months ago
Charlie Holtz 1cbab0ebda
community: update Replicate to work with official models (#20633)
Description: you don't need to pass a version for Replicate official
models. That was broken on LangChain until now!

You can now run: 

```
llm = Replicate(
    model="meta/meta-llama-3-8b-instruct",
    model_kwargs={"temperature": 0.75, "max_length": 500, "top_p": 1},
)
prompt = """
User: Answer the following yes/no question by reasoning step by step. Can a dog drive a car?
Assistant:
"""
llm(prompt)
```

I've updated the replicate.ipynb to reflect that.

twitter: @charliebholtz

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Congyu dd5139e304
community[patch]: truncate zhipuai `temperature` and `top_p` parameters to [0.01, 0.99] (#20261)
ZhipuAI API only accepts `temperature` parameter between `(0, 1)` open
interval, and if `0` is passed, it responds with status code `400`.

However, 0 and 1 is often accepted by other APIs, for example, OpenAI
allows `[0, 2]` for temperature closed range.

This PR truncates temperature parameter passed to `[0.01, 0.99]` to
improve the compatibility between langchain's ecosystem's and ZhipuAI
(e.g., ragas `evaluate` often generates temperature 0, which results in
a lot of 400 invalid responses). The PR also truncates `top_p` parameter
since it has the same restriction.

Reference: [glm-4 doc](https://open.bigmodel.cn/dev/api#glm-4) (which
unfortunately is in Chinese though).

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
Lance Martin d5c22b80a5
community[patch]: Fix Ollama for LLaMA3 (#20624)
We see verbose generations w/ LLaMA3 and Ollama - 

https://smith.langchain.com/public/88c4cd21-3d57-4229-96fe-53443398ca99/r

--- 

Fix here implies that when stop was being set to an empty list, the
stream had no conditions under which to stop, which could lead to
excessive or unintended output.

Test LLaMA2 - 

https://smith.langchain.com/public/57dfc64a-591b-46fa-a1cd-8783acaefea2/r

Test LLaMA3 - 

https://smith.langchain.com/public/76ff5f47-ac89-4772-a7d2-5caa907d3fd6/r

https://smith.langchain.com/public/a31d2fad-9094-4c93-949a-964b27630ccb/r

Test Mistral -

https://smith.langchain.com/public/a4fe7114-c308-4317-b9fd-6c86d31f1c5b/r

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
hulitaitai 7d0a008744
community[minor]: Add audio-parser "faster-whisper" in audio.py (#20012)
faster-whisper is a reimplementation of OpenAI's Whisper model using
CTranslate2, which is up to 4 times faster than enai/whisper for the
same accuracy while using less memory. The efficiency can be further
improved with 8-bit quantization on both CPU and GPU.

It can automatically detect the following 14 languages and transcribe
the text into their respective languages: en, zh, fr, de, ja, ko, ru,
es, th, it, pt, vi, ar, tr.

The gitbub repository for faster-whisper is :
    https://github.com/SYSTRAN/faster-whisper

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Guangdong Liu e3c2431c5b
comminuty[patch]:Fix Error in apache doris insert (#19989)
- **Issue:** #19886
5 months ago
Tomaz Bratanic 27370b679e
community[patch]: Ignore null and invalid embedding values for neo4j metadata filtering (#20558) 5 months ago
Massimiliano Pronesti 2542a09abc
community[patch]: AzureSearch incorrectly converted to retriever (#20601)
Closes #20600.

Please see the issue for more details.
5 months ago
Christophe Bornet 8f0b5687a3
community[minor]: Add hybrid search to Cassandra VectorStore (#20286)
Only supported by Astra DB at the moment.
**Twitter handle:** cbornet_
5 months ago
Christophe Bornet d2d01370bc
community[minor]: Add async methods to CassandraLoader (#20609)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
balloonio e786da7774
community[patch]: Invoke callback prior to yielding token fix [HuggingFaceTextGenInference] (#20426)
…gFaceTextGenInference)

- [x] **PR title**: community[patch]: Invoke callback prior to yielding
token fix for [HuggingFaceTextGenInference]


- [x] **PR message**: 
- **Description:** Invoke callback prior to yielding token in stream
method in [HuggingFaceTextGenInference]
    - **Issue:** https://github.com/langchain-ai/langchain/issues/16913
    - **Dependencies:** None
    - **Twitter handle:** @bolun_zhang

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
5 months ago
Ethan Yang 2d6d796040
community: Add save_model function for openvino reranker and embedding (#19896) 5 months ago
zR 9c1d7f2405
update zhipuai notebook (#20595)
fix timeout issue
fix zhipuai usecase notebookbook

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
5 months ago
ccurme c897264b9b
community: (milvus) check for num_shards (#20603)
@rgupta2508 I believe this change is necessary following
https://github.com/langchain-ai/langchain/pull/20318 because of how
Milvus handles defaults:


59bf5e811a/pymilvus/client/prepare.py (L82-L85)
```python
num_shards = kwargs[next(iter(same_key))]
if not isinstance(num_shards, int):
    msg = f"invalid num_shards type, got {type(num_shards)}, expected int"
    raise ParamError(message=msg)
req.shards_num = num_shards
```
this way lets Milvus control the default value (instead of maintaining a
separate default in Langchain).

Let me know if I've got this wrong or you feel it's unnecessary. Thanks.
5 months ago