Commit Graph

8312 Commits (a6cbb755a769fe5ba8184c26559300715c05e10f)
 

Author SHA1 Message Date
Clément Tamines a6cbb755a7
community[patch]: fix semantic answer bug in AzureSearch vector store (#18938)
- **Description:** The `semantic_hybrid_search_with_score_and_rerank`
method of `AzureSearch` contains a hardcoded field name "metadata" for
the document metadata in the Azure AI Search Index. Adding such a field
is optional when creating an Azure AI Search Index, as other snippets
from `AzureSearch` test for the existence of this field before trying to
access it. Furthermore, the metadata field name shouldn't be hardcoded
as "metadata" and use the `FIELDS_METADATA` variable that defines this
field name instead. In the current implementation, any index without a
metadata field named "metadata" will yield an error if a semantic answer
is returned by the search in
`semantic_hybrid_search_with_score_and_rerank`.

- **Issue:** https://github.com/langchain-ai/langchain/issues/18731

- **Prior fix to this bug:** This bug was fixed in this PR
https://github.com/langchain-ai/langchain/pull/15642 by adding a check
for the existence of the metadata field named `FIELDS_METADATA` and
retrieving a value for the key called "key" in that metadata if it
exists. If the field named `FIELDS_METADATA` was not present, an empty
string was returned. This fix was removed in this PR
https://github.com/langchain-ai/langchain/pull/15659 (see
ed1ffca911#).
@lz-chen: could you confirm this wasn't intentional? 

- **New fix to this bug:** I believe there was an oversight in the logic
of the fix from
[#1564](https://github.com/langchain-ai/langchain/pull/15642) which I
explain below.
The `semantic_hybrid_search_with_score_and_rerank` method creates a
dictionary `semantic_answers_dict` with semantic answers returned by the
search as follows.

5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)
The keys in this dictionary are the unique document ids in the index, if
I understand the [documentation of semantic
answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers)
in Azure AI Search correctly. When the method transforms a search result
into a `Document` object, an "answer" key is added to the document's
metadata. The value for this "answer" key should be the semantic answer
returned by the search from this document, if such an answer is
returned. The match between a `Document` object and the semantic answers
returned by the search should be done through the unique document id,
which is used as a key for the `semantic_answers_dict` dictionary. This
id is defined in the search result's field named `FIELDS_ID`. I added a
check to avoid any error in case no field named `FIELDS_ID` exists in a
search result (which shouldn't happen in theory).
A benefit of this approach is that this fix should work whether or not
the Azure AI Search Index contains a metadata field.

@levalencia could you confirm my analysis and test the fix?
@raunakshrivastava7 do you agree with the fix?

Thanks for the help!
6 months ago
miri-bar 55db737302
ai21[minor]: AI21 Labs Semantic Text Splitter support (#19510)
Description: Added support for AI21 Labs model - Segmentation, as a Text
Splitter
Dependencies: ai21, langchain-text-splitter
Twitter handle: https://github.com/AI21Labs

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
Anindyadeep b2a11ce686
community[minor]: Prem AI langchain integration (#19113)
### Prem SDK integration in LangChain

This PR adds the integration with [PremAI's](https://www.premai.io/)
prem-sdk with langchain. User can now access to deployed models
(llms/embeddings) and use it with langchain's ecosystem. This PR adds
the following:

### This PR adds the following:

- [x]  Add chat support
- [X]  Adding embedding support
- [X]  writing integration tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  writing unit tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  Adding documentation
    - [X]  writing documentation for chat
    - [X]  writing documentation for embedding
- [X] run `make test`
- [X] run `make lint`, `make lint_diff` 
- [X]  Final checks (spell check, lint, format and overall testing)

---------

Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Alessandro D'Armiento 37eb3a4a9e
docs: Some import nits (#19130)
- **Description:** fixes some minor issues in the documentation

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
Souhail Hanfi cbec43afa9
community[patch]: avoid creating extension PGvector while using readOnly Databases (#19268)
- **Description:** PgVector class always runs "create extension" on init
and this statement crashes on ReadOnly databases (read only replicas).
but wierdly the next create collection etc work even in readOnly
databases
- **Dependencies:** no new dependencies
- **Twitter handle:** @VenOmaX666

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Dixing (Dex) Xu 903541f439
docs: update dependecy for autogpt/marathon.ipynb (#19491)
fixes the import error from notebook based on the
[documentation](https://api.python.langchain.com/en/latest/agents/langchain_experimental.agents.agent_toolkits.pandas.base.create_pandas_dataframe_agent.html)

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Mauricio Cruz fb9ce95184
cli[patch]: Fix Tuple typing problem when create new langchain app (#19141)
Thank you for contributing to LangChain!

When run command langchain app new my-app, i get this error:

File
"/home/mauricio/.local/lib/python3.8/site-packages/langchain_cli/utils/pyproject.py",
line 15, in <module>
pyproject_toml: Path, local_editable_dependencies: Iterable[tuple[str,
Path]]
TypeError: 'type' object is not subscriptable

This PR fix the error.
6 months ago
Anthony Shaw 6c9b0f96f3
docs: Add guidance for splitting Chinese, Japanese, and Thai (#19295)
The existing default list of separators for the `RecursiveTextSplitter`
assumes spaces are word boundaries. Some languages [don't use spaces
between
words](https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries)
(Chinese, Japanese, Thai, Burmese).

This PR extends the documentation to explain how to cater for those
languages by adding additional punctuation to the separators and
zero-width spaces which are used by some typesetters and will assist the
splitter to not split in words.

Ideally, **these separators could be a constant in the module** but for
now, defining them in the documentation is a start.
6 months ago
Erick Friis 441a8012b3
mistralai[patch]: release 0.1.0 (#19540) 6 months ago
Barun Amalkumar Halder 9246ec6b36
community[patch] : [Fiddler] ensure dataset is not added if model is present (#19293)
**Description:**
- minor PR to speed up onboarding by not trying to add a dataset, if a
model is already present.
- replace batch publish API with streaming when single events are
published.

**Dependencies:** any dependencies required for this change
**Twitter handle:** behalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
6 months ago
JSDu 6e090280fd
community[patch]: milvus will autoflush, manual flush is slowly (#19300)
reference:


https://milvus.io/docs/configure_quota_limits.md#quotaAndLimitsflushRateenabled

https://github.com/milvus-io/milvus/issues/31407

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
mackong e65dc4b95b
community[patch]: clean warning when delete by ids (#19301)
* Description: rearrange to avoid variable overwrite, which cause
warning always.
* Issue: N/A
* Dependencies: N/A
6 months ago
Ian d5415dbd68
docs: improve tidb integrations documents (#19321)
This PR aims to enhance the documentation for TiDB integration, driven
by feedback from our users. It provides detailed introductions to key
features, ensuring developers can fully leverage TiDB for AI application
development.
6 months ago
Stefano Mosconi 01fc69c191
community[patch]: expanding version in confluence loader (#19324)
**Description:**
Expanding version in all the Confluence API calls so to get when the
page was last modified/created in all cases.

**Issue:** #12812 
**Twitter handle:** zzste
6 months ago
Dmitry Tyumentsev 08b769d539
community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341)
Fixed inability to work with [yandexcloud
SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0
6 months ago
Marlene f1313339ac
community[patch]: Fixing incorrect base URLs for Azure Cognitive Search Retriever (#19352)
This PR adds code to make sure that the correct base URL is being
created for the Azure Cognitive Search retriever. At the moment an
incorrect base URL is being generated. I think this is happening because
the original code was based on a depreciated API version. No
dependencies need to be added. I've also added more context to the test
doc strings.

I should also note that ACS is now Azure AI Search. I will open a
separate PR to make these changes as that would be a breaking change and
should potentially be discussed.

Twitter: @marlene_zw



- No new tests added, however the current ACS retriever tests are now
passing when I run them.
- Code was linted.

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Tridib Roy Arjo d667b1ea8f
docs: Update async_chromium.ipynb (#19514)
In Jupyter, asyncio would throw an error before `.load()` unless
`nest_asyncio` is applied (Issue #8494 mentioned this)

+Minor typo fixes..
6 months ago
Bob Lin 5b6b1f9e1d
docs: Fix several sample code errors (#19382) 6 months ago
FinTech秋田 03ba1d4731
community[patch]: Add Support for GPU Index Types in Milvus 2.4 (#19468)
- **Description:** This commit introduces support for the newly
available GPU index types introduced in Milvus 2.4 within the LangChain
project's `milvus.py`. With the release of Milvus 2.4, a range of
GPU-accelerated index types have been added, offering enhanced search
capabilities and performance optimizations for vector search operations.
This update ensures LangChain users can fully utilize the new
performance benefits for vector search operations.
    - Reference: https://milvus.io/docs/gpu_index.md

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Hamid Ali c281ec8887
docs: Fix broken link in semantic-chunker.ipynb (#19464)
Corrected a broken link within the semantic-chunker.ipynb notebook,
ensuring that users can access the referenced resource.

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Ash Vardanian d01bad5169
core[patch]: Convert SimSIMD back to NumPy (#19473)
This patch fixes the #18022 issue, converting the SimSIMD internal
zero-copy outputs to NumPy.

I've also noticed, that oftentimes `dtype=np.float32` conversion is used
before passing to SimSIMD. Which numeric types do LangChain users
generally care about? We support `float64`, `float32`, `float16`, and
`int8` for cosine distances and `float16` seems reasonable for
practically any kind of embeddings and any modern piece of hardware, so
we can change that part as well 🤗
6 months ago
Ikko Eltociear Ashimine 980658cb47
docs: Update streaming.ipynb (#19500)
Fixed typo.

occuring -> occurring
6 months ago
Leonid Kuligin 91f4c80143
docs: fixed links (#19503)
- [ ] **PR title**: "docs: fixed broken links"


- [ ] **PR message**:
    - **Description:** fixed links in the documentation
6 months ago
Mikelarg dac2e0165a
community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516)
- **Description:** Added integration with
[GigaChat](https://developers.sber.ru/portal/products/gigachat)
embeddings. Also added support for extra fields in GigaChat LLM and
fixed docs.
6 months ago
Martin Kolb e5bdb26f76
community[patch]: More flexible handling for entity names in vector store "HANA Cloud" (#19523)
- **Description:** Added support for lower-case and mixed-case names
The names for tables and columns previouly had to be UPPER_CASE.
With this enhancement, also lower_case and MixedCase are supported,


  - **Issue:** N/A
  - **Dependencies:** no new dependecies added
  - **Twitter handle:** @sapopensource
6 months ago
Erica Clark a1ff21f90f
docs: Update local llms article to use invoke instead of deprecated __call__ (#19528)
- **Description:** Since the implicit `__call__` has been deprecated in
favor of `invoke`, the local_llms article also needed to be updated.
This article was my introduction to Lanchain, and as it was helpful in
getting me setup with running LLMs locally, it is nice to not have any
warnings when running the example code. With this change, the warnings
go away when running the example code.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** clarkerican
6 months ago
Orest Xherija 0b1e09029f
openai[patch]: increase max batch size for Azure OpenAI Embeddings API (#19532)
**Description:** Azure OpenAI has increased its maximum batch size from
16 to 2048 for the Embeddings API per this How-To
[page](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=console#best-practices)

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Eugene Yurtsev 56f4c5459b
core[patch]: fix xml output parser transform (#19530)
Previous PR passed _parser attribute which apparently is not meant to be
used by user code and causes non deterministic failures on CI when
testing the transform and a transform methods. Reverting this change
temporarily.
6 months ago
Erick Friis e6952b04d5
cohere[patch]: fix release (#19529) 6 months ago
aditya thomas aa68fd7e91
core[runnables]: docstring for class runnable, method with_listeners() (#19515)
**Description:** Docstring for method with_listerners() of class
Runnable
**Issue:** [Add in code documentation to core Runnable methods
#18804](https://github.com/langchain-ai/langchain/issues/18804)
**Dependencies:** None
6 months ago
billytrend-cohere 63343b4987
cohere[patch]: add cohere as a partner package (#19049)
Description: adds support for langchain_cohere

---------

Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
6 months ago
Eugene Yurtsev 727d5023ce
core[patch]: Use defusedxml in XMLOutputParser (#19526)
This mitigates a security concern for users still using older versions of libexpat that causes an attacker to compromise the availability of the system if an attacker manages to surface malicious payload to this XMLParser.
6 months ago
Zachary Wilkins e1a6341940
langchain: Passthrough batch_size on index()/aindex() calls (#19443)
**Description:** This change passes through `batch_size` to
`add_documents()`/`aadd_documents()` on calls to `index()` and
`aindex()` such that the documents are processed in the expected batch
size.
**Issue:** #19415
**Dependencies:** N/A
**Twitter handle:** N/A
6 months ago
ccurme 82de8fd6c9
add kwargs (#19519)
`HanaDB.add_texts` is missing **kwargs.
6 months ago
Nikhil Kumar 3d3b46a782
docs: Update docs for `HuggingFacePipeline` (#19306)
Updated `HuggingFacePipeline` docs to be in sync with list of supported
tasks, including translation.

- [x] **PR title**: "community: Update docs for `HuggingFacePipeline`"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**:
- **Description:** Update docs for `HuggingFacePipeline`, was earlier
missing `translation` as a valid task
    - **Issue:** N/A
    - **Dependencies:** N/A
    - **Twitter handle:** None


- [x] **Add tests and docs**:


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
6 months ago
Igor Muniz Soares 743f888580
community[minor]: Dappier chat model integration (#19370)
**Description:** 

This PR adds [Dappier](https://dappier.com/) for the chat model. It
supports generate, async generate, and batch functionalities. We added
unit and integration tests as well as a notebook with more details about
our chat model.


**Dependencies:** 
    No extra dependencies are needed.
6 months ago
Jacob Lezberg 64e1df3d3a
infra: Update package version to apply CVE-related patch (#19490)
- **Description:** [CVE
2024-21503](https://www.cve.org/CVERecord?id=CVE-2024-21503) was
recently identified. The python linter "black" suffers from a potential
Regex-related denial of service attack. Updated version from the
vulnerable 24.2.0 to the patched 24.3.0.
- **Issue:** N/A
- **Dependencies:** The 'black' package in both `langchain` (top-level)
and `templates/python-lint`.

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Hugoberry 96dc180883
community[minor]: Add `DuckDB` as a vectorstore (#18916)
DuckDB has a cosine similarity function along list and array data types,
which can be used as a vector store.
- **Description:** The latest version of DuckDB features a cosine
similarity function, which can be used with its support for list or
array column types. This PR surfaces this functionality to langchain.
    - **Dependencies:** duckdb 0.10.0
    - **Twitter handle:** @igocrite

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Ethan Yang fa6397d76a
docs: Add OpenVINO llms docs (#19489)
Add OpenVINOpipeline instructions in docs. OpenVINO users can find more
details in this page.
6 months ago
preak95 6ea3e57a63
community[minor]: S3FileLoader to use expose mode and post_processors arguments of unstructured loader (#19270)
**Description:** Update s3_file.py to use arguments **mode** and
**post_processors** from the base class **UnstructuredBaseLoader** to
include more metadata about the files from the S3 bucket such as
*'page_number', 'languages'* etc.

**Issue:** NA
**Dependencies:** None
**Twitter handle:** preak95

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Guangdong Liu 560e2182d8
docs: docstring Runnable `pipe` and `pick` methods (docs only) (#19395)
- **Issue:**  #18804
-  @eyurtsev @ccurme PTAL

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Christophe Bornet 63898dbda0
langchain[patch]: Use async memory in Chain when needed (#19429) 6 months ago
Lance Martin db7403d667
docs: Remove non-rendering images & output spamming from doc ntbks (#19475)
Looking at tokens / page of our docs, we see a few outliers:
<img width="761" alt="image"
src="https://github.com/langchain-ai/langchain/assets/122662504/677aa2d6-0a29-45e4-882a-db2bbf46d02b">

It is due to non-rendering images in one case, and output spamming. 

Clean these, along with other cases of excessing output spamming in
docs.

All get sucked into chat-langchain for retrieval.
6 months ago
Erick Friis b617085af0
mistralai[patch]: streaming tool calls (#19469) 6 months ago
aditya thomas b43a9d5808
docs: adding voyageai to the list of partner packages (#19376)
**Description:** Adding VoyageAI to the list of partners
**Issue:** A standalone langchain-voyageai package has been added
**Dependencies:** None
6 months ago
Zeeland 2549df00cd
docs: fix error bilibili url (#19375)
Thank you for contributing to LangChain!

bilibili-api-python use https://github.com/Nemo2011/bilibili-api repo.
Change to the correct address.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
6 months ago
aditya thomas 375ab7bf59
docs: update module imports for fireworks documentation (#19377)
**Description:** Update module imports for Fireworks documentation
**Issue:** Module imports not present or in incorrect location
**Dependencies:** None
6 months ago
aditya thomas 0cc0467267
docs: update import paths and move to lcel for llama.cpp examples (#19391)
**Description:** Update import paths and move to lcel for llama.cpp
examples
**Issue:** Update import paths to reflect package refactoring and move
chains to LCEL in examples
**Dependencies:** None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
fengjial 3b52ee05d1
community[patch]: fix bugs in baiduvectordb as vectorstore (#19380)
fix small bugs in vectorstore/baiduvectordb
6 months ago
Cailin Wang 5402aef32e
docs: Add `partition` parameter to DashVector (#19385)
**Description**: Add `partition` parameter to DashVector
dashvector.ipynb
**Related PR**: https://github.com/langchain-ai/langchain/pull/19023
**Twitter handle**: @CailinWang_

---------

Co-authored-by: root <root@Bluedot-AI>
6 months ago