Commit Graph

86 Commits (e5f5d9ff563b6f71bb3ebbccac729c10fb851b3a)

Author SHA1 Message Date
Ethan Yang 2d6d796040
community: Add save_model function for openvino reranker and embedding (#19896) 5 months ago
Erick Friis f09bd0b75b
upstage: init package (#20574)
Co-authored-by: Sean Cho <sean@upstage.ai>
Co-authored-by: JuHyung-Son <sonju0427@gmail.com>
5 months ago
pjb157 479be3cc91
community[minor]: Unify Titan Takeoff Integrations and Adding Embedding Support (#18775)
**Community: Unify Titan Takeoff Integrations and Adding Embedding
Support**

 **Description:** 
Titan Takeoff no longer reflects this either of the integrations in the
community folder. The two integrations (TitanTakeoffPro and
TitanTakeoff) where causing confusion with clients, so have moved code
into one place and created an alias for backwards compatibility. Added
Takeoff Client python package to do the bulk of the work with the
requests, this is because this package is actively updated with new
versions of Takeoff. So this integration will be far more robust and
will not degrade as badly over time.

**Issue:**
Fixes bugs in the old Titan integrations and unified the code with added
unit test converge to avoid future problems.

**Dependencies:**
Added optional dependency takeoff-client, all imports still work without
dependency including the Titan Takeoff classes but just will fail on
initialisation if not pip installed takeoff-client

**Twitter**
@MeryemArik9

Thanks all :)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
Egor Krasheninnikov c8391d4ff1
community[patch]: Fix YandexGPT embeddings (#19720)
Fix of YandexGPT embeddings. 

The current version uses a single `model_name` for queries and
documents, essentially making the `embed_documents` and `embed_query`
methods the same. Yandex has a different endpoint (`model_uri`) for
encoding documents, see
[this](https://yandex.cloud/en/docs/yandexgpt/concepts/embeddings). The
bug may impact retrievers built with `YandexGPTEmbeddings` (for instance
FAISS database as retriever) since they use both `embed_documents` and
`embed_query`.

A simple snippet to test the behaviour:
```python
from langchain_community.embeddings.yandex import YandexGPTEmbeddings
embeddings = YandexGPTEmbeddings()
q_emb = embeddings.embed_query('hello world')
doc_emb = embeddings.embed_documents(['hello world', 'hello world'])
q_emb == doc_emb[0]
```
The response is `True` with the current version and `False` with the
changes I made.


Twitter: @egor_krash

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Leonid Ganeline 7cf2d2759d
community[patch]: docstrings update (#20301)
Added missed docstrings. Format docstings to the consistent form.
5 months ago
Leonid Ganeline 4cb5f4c353
community[patch]: import flattening fix (#20110)
This PR should make it easier for linters to do type checking and for IDEs to jump to definition of code.

See #20050 as a template for this PR.
- As a byproduct: Added 3 missed `test_imports`.
- Added missed `SolarChat` in to __init___.py Added it into test_import
ut.
- Added `# type: ignore` to fix linting. It is not clear, why linting
errors appear after ^ changes.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Ethan Yang 48f84e253e
community[minor]: Add OpenVINO rerank model support (#19791)
@eaidova @AlexKoff88 Could you help to review, thanks

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
shahrin014 f51e6a35ba
community[patch]: OllamaEmbeddings - Pass headers to post request (#16880)
## Feature
- Set additional headers in constructor
- Headers will be sent in post request

This feature is useful if deploying Ollama on a cloud service such as
hugging face, which requires authentication tokens to be passed in the
request header.

## Tests
- Test if header is passed
- Test if header is not passed

Similar to https://github.com/langchain-ai/langchain/pull/15881

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
高璟琦 ec7a59c96c
community[minor]: Add solar embedding (#19761)
Solar is a large language model developed by
[Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM.
You can visit the embedding service provided by Solar within this pr.

You may get **SOLAR_API_KEY** from
https://console.upstage.ai/services/embedding
You can refer to more details about accepted llm integration at
https://python.langchain.com/docs/integrations/llms/solar.
6 months ago
Ethan Yang 7164015135
community[minor]: Add Openvino embedding support (#19632)
This PR is used to support both HF and BGE embeddings with openvino

---------

Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
6 months ago
T Cramer 540ebf35a9
community[patch]: Add explicit error message to Bedrock error output. (#17328)
- **Description:** Propagate Bedrock errors into Langchain explicitly.
Use-case: unset region error is hidden behind 'Could not load
credentials...' message
- **Issue:**
[17654](https://github.com/langchain-ai/langchain/issues/17654)
  - **Dependencies:** None

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
kYLe 124ab79c23
community[minor]: Add Anyscale embedding support (#17605)
**Description:** Add embedding model support for Anyscale Endpoint
**Dependencies:** openai

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
Lance Martin 12843f292f
community[patch]: llama cpp embeddings reset default n_batch (#17594)
When testing Nomic embeddings --
```
from langchain_community.embeddings import LlamaCppEmbeddings
embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf"
embd_lc = LlamaCppEmbeddings(model_path=embd_model_path)
embedding_lc = embd_lc.embed_query(query)
```

We were seeing this error for strings > a certain size -- 
```
File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count)
    824     s_sizes = []
    826 # add to batch
--> 827 self._batch.add_sequence(tokens, len(s_sizes), False)
    828 t_batch += n_tokens
    829 s_sizes.append(n_tokens)

File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all)
    540 self.batch.token[j] = batch[i]
    541 self.batch.pos[j] = i
--> 542 self.batch.seq_id[j][0] = seq_id
    543 self.batch.n_seq_id[j] = 1
    544 self.batch.logits[j] = logits_all

ValueError: NULL pointer access
```

The default `n_batch` of llama-cpp-python's Llama is `512` but we were
explicitly setting it to `8`.
 
These need to be set to equal for embedding models. 
* The embedding.cpp example has an assertion to make sure these are
always equal.
* Apparently this is not being done properly in llama-cpp-python.

With `n_batch` set to 8, if more than 8 tokens are passed the batch runs
out of space and it crashes.

This also explains why the CPU compute buffer size was small:

raw client with default `n_batch=512`
```
llama_new_context_with_model:        CPU input buffer size   =     3.51 MiB
llama_new_context_with_model:        CPU compute buffer size =    21.00 MiB
```
langchain with `n_batch=8`
```
llama_new_context_with_model:        CPU input buffer size   =     0.04 MiB
llama_new_context_with_model:        CPU compute buffer size =     0.33 MiB
```

We can work around this by passing `n_batch=512`, but this will not be
obvious to some users:
```
    embedding = LlamaCppEmbeddings(model_path=embd_model_path,
                                   n_batch=512)
```

From discussion w/ @cebtenzzre. Related:

https://github.com/abetlen/llama-cpp-python/issues/1189

Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
hulitaitai dc2c9dd4d7
Update text2vec.py (#19657)
Add that URL of the embedding tool "text2vec".
Fix minor mistakes in the doc-string.
6 months ago
yuwenzho 3a7d2cf443
community[minor]: Add ITREX optimized Embeddings (#18474)
Introduction
[Intel® Extension for
Transformers](https://github.com/intel/intel-extension-for-transformers)
is an innovative toolkit designed to accelerate GenAI/LLM everywhere
with the optimal performance of Transformer-based models on various
Intel platforms

Description

adding ITREX runtime embeddings using intel-extension-for-transformers.
added mdx documentation and example notebooks
added embedding import testing.

---------

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
Tom Aarsen e0a1278d2b
docs: HFEmbeddings: Add more information to model_kwargs/encode_kwargs (#19594)
- **Description:** Be more explicit with the `model_kwargs` and
`encode_kwargs` for `HuggingFaceEmbeddings`.
    - **Issue:** -
    - **Dependencies:** -

I received some reports by my users that they didn't realise that you
could change the default `batch_size` with `HuggingFaceEmbeddings`,
which may be attributed to how the `model_kwargs` and `encode_kwargs`
don't give much information about what you can specify.

I've added some parameter names & links to the Sentence Transformers
documentation to help clear it up. Let me know if you'd rather have
Markdown/Sphinx-style hyperlinks rather than a "bare URL".

- Tom Aarsen
6 months ago
hulitaitai d7c14cb6f9
community[minor]: Add embeddings integration for text2vec (#19267)
Create a Class which allows to use the "text2vec" open source embedding
model.

It should install the model by running 'pip install -U text2vec'.
Example to call the model through LangChain:

from langchain_community.embeddings.text2vec import Text2vecEmbeddings

            embedding = Text2vecEmbeddings()
            bookend.embed_documents([
                "This is a CoSENT(Cosine Sentence) model.",
"It maps sentences to a 768 dimensional dense vector space.",
            ])
            bookend.embed_query(
                "It can be used for text matching or semantic search."
            )

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
6 months ago
Anindyadeep b2a11ce686
community[minor]: Prem AI langchain integration (#19113)
### Prem SDK integration in LangChain

This PR adds the integration with [PremAI's](https://www.premai.io/)
prem-sdk with langchain. User can now access to deployed models
(llms/embeddings) and use it with langchain's ecosystem. This PR adds
the following:

### This PR adds the following:

- [x]  Add chat support
- [X]  Adding embedding support
- [X]  writing integration tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  writing unit tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  Adding documentation
    - [X]  writing documentation for chat
    - [X]  writing documentation for embedding
- [X] run `make test`
- [X] run `make lint`, `make lint_diff` 
- [X]  Final checks (spell check, lint, format and overall testing)

---------

Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Dmitry Tyumentsev 08b769d539
community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341)
Fixed inability to work with [yandexcloud
SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0
6 months ago
Mikelarg dac2e0165a
community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516)
- **Description:** Added integration with
[GigaChat](https://developers.sber.ru/portal/products/gigachat)
embeddings. Also added support for extra fields in GigaChat LLM and
fixed docs.
6 months ago
billytrend-cohere 63343b4987
cohere[patch]: add cohere as a partner package (#19049)
Description: adds support for langchain_cohere

---------

Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
6 months ago
Sergey Kozlov 1a55e950aa
community[patch]: support fastembed v1 and v2 (#19125)
**Description:**
#18040 forces `fastembed>2.0`, and this causes dependency conflicts with
the new `unstructured` package (different `onnxruntime`). There may be
other dependency conflicts.. The only way to use
`langchain-community>=0.0.28` is rollback to `unstructured 0.10.X`. But
new `unstructured` contains many fixes.

This PR allows to use both `fastembed` `v1` and `v2`.

How to reproduce:

`pyproject.toml`:
```toml
[tool.poetry]
name = "depstest"
version = "0.0.0"
description = "test"
authors = ["<dev@example.org>"]

[tool.poetry.dependencies]
python = ">=3.10,<3.12"
langchain-community = "^0.0.28"
fastembed = "^0.2.0"
unstructured = {extras = ["pdf"], version = "^0.12"}
```

```bash
$ poetry lock
```

Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
6 months ago
case-k ebc4a64f9e
docs: fix databricks document url (#19096)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Guangdong Liu cced3eb9bc
community[patch]: Fix sparkllm embeddings api bug. (#19122)
- **Description:** Fix sparkllm embeddings api bug.
@baskaryan PTAL
6 months ago
Erick Friis 7ce81eb6f4
voyageai[patch]: init package (#19098)
Co-authored-by: fodizoltan <zoltan@conway.expert>
Co-authored-by: Yujie Qian <thomasq0809@gmail.com>
Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com>
6 months ago
billytrend-cohere 7253b816cc
community: Add support for cohere SDK v5 (keeps v4 backwards compatibility) (#19084)
- **Description:** Add support for cohere SDK v5 (keeps v4 backwards
compatibility)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
6 months ago
Leonid Ganeline 9c8523b529
community[patch]: flattening imports 3 (#18939)
@eyurtsev
6 months ago
wt3639 5b5b37a999
community[patch]: Add embedding instruction to HuggingFaceBgeEmbeddings (#18017)
- **Description:** Add embedding instruction to
HuggingFaceBgeEmbeddings, so that it can be compatible with nomic and
other models that need embedding instruction.

---------

Co-authored-by: Tao Wu <tao.wu@rwth-aachen.de>
Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
Phat Vo 3ecb903d49
community[patch] : Tidy up and update Clarifai SDK functions (#18314)
Description :
* Tidy up, add missing docstring and fix unused params
* Enable using session token
6 months ago
Max Jakob cca0167917
elasticsearch[patch], community[patch]: update references, deprecate community classes (#18506)
Follow up on https://github.com/langchain-ai/langchain/pull/17467.

- Update all references to the Elasticsearch classes to use the partners
package.
- Deprecate community classes.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
Kate Silverstein b7c71e2e07
community[minor]: llamafile embeddings support (#17976)
* **Description:** adds `LlamafileEmbeddings` class implementation for
generating embeddings using
[llamafile](https://github.com/Mozilla-Ocho/llamafile)-based models.
Includes related unit tests and notebook showing example usage.
* **Issue:** N/A
* **Dependencies:** N/A
6 months ago
Yujie Qian cbb65741a7
community[patch]: Voyage AI updates default model and batch size (#17655)
- **Description:** update the default model and batch size in
VoyageEmbeddings
    - **Issue:** N/A
    - **Dependencies:** N/A
    - **Twitter handle:** N/A

---------

Co-authored-by: fodizoltan <zoltan@conway.expert>
7 months ago
Anush 9d663f31fa
community[patch]: FastEmbed to latest (#18040)
## Description

Updates the `langchain_community.embeddings.fastembed` provider as per
the recent updates to [`FastEmbed`](https://github.com/qdrant/fastembed)
library.
7 months ago
Erick Friis eefb49680f
multiple[patch]: fix deprecation versions (#18349) 7 months ago
Dan Stambler 69344a0661
community: Add Laser Embedding Integration (#18111)
- **Description:** Added Integration with Meta AI's LASER
Language-Agnostic SEntence Representations embedding library, which
supports multilingual embedding for any of the languages listed here:
https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200,
including several low resource languages
- **Dependencies:** laser_encoders
7 months ago
Michael Feil 242981b8f0
community[minor]: infinity embedding local option (#17671)
**drop-in-replacement for sentence-transformers
inference.**

https://github.com/langchain-ai/langchain/discussions/17670

tldr from the discussion above -> around a 4x-22x speedup over using
SentenceTransformers / huggingface embeddings. For more info:
https://github.com/michaelfeil/infinity (pure-python dependency)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Guangdong Liu 3ba1cb8650
community[minor]: Add SparkLLM Text Embedding Model and SparkLLM introduction (#17573) 7 months ago
Bagatur a058c8812d
community[patch]: add VoyageEmbeddings truncation (#17638) 7 months ago
Moshe Berchansky 20a56fe0a2
community[minor]: Add QuantizedEmbedders (#17391)
**Description:** 
* adding Quantized embedders using optimum-intel and
intel-extension-for-pytorch.
* added mdx documentation and example notebooks 
* added embedding import testing.

**Dependencies:** 
optimum = {extras = ["neural-compressor"], version = "^1.14.0", optional
= true}
intel_extension_for_pytorch = {version = "^2.2.0", optional = true}

Dependencies have been added to pyproject.toml for the community lib.  

**Twitter handle:** @peter_izsak

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
nvpranak 91bcc9c5c9
community[minor]: Nemo embeddings(#16206)
This PR is adding support for NVIDIA NeMo embeddings issue #16095.

---------

Co-authored-by: Praveen Nakshatrala <pnakshatrala@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Abhishek Jain 37e1275f9e
community[patch]: Fixed the 'aembed' method of 'CohereEmbeddings'. (#16497)
**Description:**
- The existing code was trying to find a `.embeddings` property on the
`Coroutine` returned by calling `cohere.async_client.embed`.
- Instead, the `.embeddings` property is present on the value returned
by the `Coroutine`.
- Also, it seems that the original cohere client expects a value of
`max_retries` to not be `None`. Hence, setting the default value of
`max_retries` to `3`.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Michael Feil e1cfd0f3e7
community[patch]: infinity embeddings update incorrect default url (#16759)
The default url has always been incorrect (7797 instead 7997). Here is a
update to the correct url.
7 months ago
Erick Friis 3a2eb6e12b
infra: add print rule to ruff (#16221)
Added noqa for existing prints. Can slowly remove / will prevent more
being intro'd
7 months ago
Leonid Ganeline 932c52c333
community[patch]: docstrings (#16810)
- added missed docstrings
- formated docstrings to the consistent form
7 months ago
Bassem Yacoube 4e3ed7f043
community[patch]: octoai embeddings bug fix (#17216)
fixes a bug in octoa_embeddings provider
7 months ago
Tyler Titsworth 304f3f5fc1
community[patch]: Add Progress bar to HuggingFaceEmbeddings (#16758)
- **Description:** Adds a function parameter to HuggingFaceEmbeddings
called `show_progress` that enables a `tqdm` progress bar if enabled.
Does not function if `multi_process = True`.
  - **Issue:** n/a
  - **Dependencies:** n/a
7 months ago
Harrison Chase 4eda647fdd
infra: add -p to mkdir in lint steps (#17013)
Previously, if this did not find a mypy cache then it wouldnt run

this makes it always run

adding mypy ignore comments with existing uncaught issues to unblock other prs

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
7 months ago
Marina Pliusnina a1ce7ab672
adding parameter for changing the language in SpacyEmbeddings (#15743)
Description: Added the parameter for a possibility to change a language
model in SpacyEmbeddings. The default value is still the same:
"en_core_web_sm", so it shouldn't affect a code which previously did not
specify this parameter, but it is not hard-coded anymore and easy to
change in case you want to use it with other languages or models.

Issue: At Barcelona Supercomputing Center in Aina project
(https://github.com/projecte-aina), a project for Catalan Language
Models and Resources, we would like to use Langchain for one of our
current projects and we would like to comment that Langchain, while
being a very powerful and useful open-source tool, is pretty much
focused on English language. We would like to contribute to make it a
bit more adaptable for using with other languages.

Dependencies: This change requires the Spacy library and a language
model, specified in the model parameter.

Tag maintainer: @dev2049

Twitter handle: @projecte_aina

---------

Co-authored-by: Marina Pliusnina <marina.pliusnina@bsc.es>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
8 months ago
Bassem Yacoube 85e93e05ed
community[minor]: Update OctoAI LLM, Embedding and documentation (#16710)
This PR includes updates for OctoAI integrations:
- The LLM class was updated to fix a bug that occurs with multiple
sequential calls
- The Embedding class was updated to support the new GTE-Large endpoint
released on OctoAI lately
- The documentation jupyter notebook was updated to reflect using the
new LLM sdk
Thank you!
8 months ago
yin1991 a936472512
docs: Update documentation to use 'model_id' rather than 'model_name' to match actual API (#16615)
- **Description:** Replace 'model_name' with 'model_id' for accuracy 
- **Issue:**
[link-to-issue](https://github.com/langchain-ai/langchain/issues/16577)
  - **Dependencies:** 
  - **Twitter handle:**
8 months ago