Commit Graph

553 Commits (a96a6e0f2cc93b85272107c840375f362cb269cd)

Author SHA1 Message Date
Anush 9d663f31fa
community[patch]: FastEmbed to latest (#18040)
## Description

Updates the `langchain_community.embeddings.fastembed` provider as per
the recent updates to [`FastEmbed`](https://github.com/qdrant/fastembed)
library.
4 months ago
Eugene Yurtsev 51b661cfe8
community[patch]: BaseLoader load method should just delegate to lazy_load (#18289)
load() should just reference lazy_load()
4 months ago
Bagatur 5efb5c099f
text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346) 4 months ago
Erick Friis eefb49680f
multiple[patch]: fix deprecation versions (#18349) 4 months ago
Jib 72bfc1d3db
mongodb[minor]: MongoDB Partner Package -- Porting MongoDBAtlasVectorSearch (#17652)
This PR migrates the existing MongoDBAtlasVectorSearch abstraction from
the `langchain_community` section to the partners package section of the
codebase.
- [x] Run the partner package script as advised in the partner-packages
documentation.
- [x] Add Unit Tests
- [x] Migrate Integration Tests
- [x] Refactor `MongoDBAtlasVectorStore` (autogenerated) to
`MongoDBAtlasVectorSearch`
- [x] ~Remove~ deprecate the old `langchain_community` VectorStore
references.

## Additional Callouts
- Implemented the `delete` method
- Included any missing async function implementations
  - `amax_marginal_relevance_search_by_vector`
  - `adelete` 
- Added new Unit Tests that test for functionality of
`MongoDBVectorSearch` methods
- Removed [`del
res[self._embedding_key]`](e0c81e1cb0/libs/community/langchain_community/vectorstores/mongodb_atlas.py (L218))
in `_similarity_search_with_score` function as it would make the
`maximal_marginal_relevance` function fail otherwise. The `Document`
needs to store the embedding key in metadata to work.

Checklist:

- [x] PR title: Please title your PR "package: description", where
"package" is whichever of langchain, community, core, experimental, etc.
is being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"
- [x] PR message
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. Existing tests supplied in docs/docs do not change. Updated
docstrings for new functions like `delete`
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory. (This already exists)

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Steven Silvester <steven.silvester@ieee.org>
Co-authored-by: Erick Friis <erick@langchain.dev>
4 months ago
Kai Kugler df234fb171
community[patch]: Fixing embedchain document mapping (#18255)
- **Description:** The current embedchain implementation seems to handle
document metadata differently than done in the current implementation of
langchain and a KeyError is thrown. I would love for someone else to
test this...

---------

Co-authored-by: KKUGLER <kai.kugler@mercedes-benz.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Deshraj Yadav <deshraj@gatech.edu>
4 months ago
Tomaz Bratanic 5999c4a240
Add support for parameters in neo4j retrieval query (#18310)
Sometimes, you want to use various parameters in the retrieval query of
Neo4j Vector to personalize/customize results. Before, when there were
only predefined chains, it didn't really make sense. Now that it's all
about custom chains and LCEL, it is worth adding since users can inject
any params they wish at query time. Isn't prone to SQL injection-type
attacks since we use parameters and not concatenating strings.
4 months ago
Virat Singh cd926ac3dd
community: Add PolygonFinancials Tool (#18324)
**Description:**
In this PR, I am adding a `PolygonFinancials` tool, which can be used to
get financials data for a given ticker. The financials data is the
fundamental data that is found in income statements, balance sheets, and
cash flow statements of public US companies.

**Twitter**: 
[@virattt](https://twitter.com/virattt)
4 months ago
Christophe Bornet 8a81fcd5d3
community: Fix deprecation version of AstraDB VectorStore (#17991) 4 months ago
mackong 2c42f3a955
ollama[patch]: delete suffix slash to avoid redirect (#18260)
- **Description:** see
[ollama](https://github.com/ollama/ollama/blob/main/server/routes.go#L949)'s
route definitions
- **Issue:** N/A
- **Dependencies:** N/A
4 months ago
William De Vena 6b58943917
community[patch]: Invoke callback prior to yielding token (#18288)
## PR title
community[patch]: Invoke callback prior to yielding

PR message
Description: Invoke on_llm_new_token callback prior to yielding token in
_stream and _astream methods.
Issue: https://github.com/langchain-ai/langchain/issues/16913
Dependencies: None
Twitter handle: None
4 months ago
Eugene Yurtsev cd52433ba0
community[minor]: Add `SQLDatabaseLoader` document loader (#18281)
- **Description:** A generic document loader adapter for SQLAlchemy on
top of LangChain's `SQLDatabaseLoader`.
  - **Needed by:** https://github.com/crate-workbench/langchain/pull/1
  - **Depends on:** GH-16655
  - **Addressed to:** @baskaryan, @cbornet, @eyurtsev

Hi from CrateDB again,

in the same spirit like GH-16243 and GH-16244, this patch breaks out
another commit from https://github.com/crate-workbench/langchain/pull/1,
in order to reduce the size of this patch before submitting it, and to
separate concerns.

To accompany the SQLAlchemy adapter implementation, the patch includes
integration tests for both SQLite and PostgreSQL. Let me know if
corresponding utility resources should be added at different spots.

With kind regards,
Andreas.


### Software Tests

```console
docker compose --file libs/community/tests/integration_tests/document_loaders/docker-compose/postgresql.yml up
```

```console
cd libs/community
pip install psycopg2-binary
pytest -vvv tests/integration_tests -k sqldatabase
```

```
14 passed
```



![image](https://github.com/langchain-ai/langchain/assets/453543/42be233c-eb37-4c76-a830-474276e01436)

---------

Co-authored-by: Andreas Motl <andreas.motl@crate.io>
4 months ago
David Ruan af35e2525a
community[minor]: add hugging_face_model document loader (#17323)
- **Description:** add hugging_face_model document loader,
  - **Issue:** NA,
  - **Dependencies:** NA,

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
4 months ago
Sanjaypranav V M b9a495e56e
community[patch]: added latin-1 decoder to gmail search tool (#18116)
some mails from flipkart , amazon are encoded with other plain text
format so to handle UnicodeDecode error , added exception and latin
decoder

Thank you for contributing to LangChain!

@hwchase17
4 months ago
Ashley Xu e3211c2b3d
community[patch]: BigQueryVectorSearch JSON type unsupported for metadatas (#18234) 4 months ago
Ayo Ayibiowu ac1d7d9de8
community[feat]: Adds LLMLingua as a document compressor (#17711)
**Description**: This PR adds support for using the [LLMLingua project
](https://github.com/microsoft/LLMLingua) especially the LongLLMLingua
(Enhancing Large Language Model Inference via Prompt Compression) as a
document compressor / transformer.

The LLMLingua project is an interesting project that can greatly improve
RAG system by compressing prompts and contexts while keeping their
semantic relevance.

**Issue**: https://github.com/microsoft/LLMLingua/issues/31
**Dependencies**: [llmlingua](https://pypi.org/project/llmlingua/)

@baskaryan

---------

Co-authored-by: Ayodeji Ayibiowu <ayodeji.ayibiowu@getinge.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
4 months ago
Jaskirat Singh ce682f5a09
community: vectorstores.kdbai - Added support for when no docs are present (#18103)
- **Description:** By default it expects a list but that's not the case
in corner scenarios when there is no document ingested(use case:
Bootstrap application).
\
Hence added as check, if the instance is panda Dataframe instead of list
then it will procced with return immediately.

- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:**  jaskiratsingh1

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
4 months ago
am-kinetica 9b8f6455b1
Langchain vectorstore integration with Kinetica (#18102)
- **Description:** New vectorstore integration with the Kinetica
database
  - **Issue:** 
- **Dependencies:** the Kinetica Python API `pip install
gpudb==7.2.0.1`,
  - **Tag maintainer:** @baskaryan, @hwchase17 
  - **Twitter handle:**

---------

Co-authored-by: Chad Juliano <cjuliano@kinetica.com>
4 months ago
GoodBai 3589a135ef
community: make `SET allow_experimental_[engine]_index` configurabe in vectorstores.clickhouse (#18107)
## Description & Issue
While following the official doc to use clickhouse as a vectorstore, I
found only the default `annoy` index is properly supported. But I want
to try another engine `usearch` for `annoy` is not properly supported on
ARM platforms.
Here is the settings I prefer:

``` python
settings = ClickhouseSettings(
    table="wiki_Ethereum",
    index_type="usearch",  # annoy by default
    index_param=[],
)
```
The above settings do not work for the command `set
allow_experimental_annoy_index=1` is hard-coded.
This PR will make sure the experimental feature follow the `index_type`
which is also consistent with Clickhouse's naming conventions.
4 months ago
Dan Stambler 69344a0661
community: Add Laser Embedding Integration (#18111)
- **Description:** Added Integration with Meta AI's LASER
Language-Agnostic SEntence Representations embedding library, which
supports multilingual embedding for any of the languages listed here:
https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200,
including several low resource languages
- **Dependencies:** laser_encoders
4 months ago
Christophe Bornet a2d5fa7649
community[patch]: Fix GenericRequestsWrapper _aget_resp_content must be async (#18065)
There are existing tests in
`libs/community/tests/unit_tests/tools/requests/test_tool.py`
4 months ago
Neli Hateva a01e8473f8
community[patch]: Fix GraphSparqlQAChain so that it works with Ontotext GraphDB (#15009)
- **Description:** Introduce a new parameter `graph_kwargs` to
`RdfGraph` - parameters used to initialize the `rdflib.Graph` if
`query_endpoint` is set. Also, do not set
`rdflib.graph.DATASET_DEFAULT_GRAPH_ID` as default value for the
`rdflib.Graph` `identifier` if `query_endpoint` is set.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
4 months ago
kYLe 17ecf6e119
community[patch]: Remove model limitation on Anyscale LLM (#17662)
**Description:** Llama Guard is deprecated from Anyscale public
endpoint.
**Issue:** Change the default model. and remove the limitation of only
use Llama Guard with Anyscale LLMs
Anyscale LLM can also works with all other Chat model hosted on
Anyscale.
Also added `async_client` for Anyscale LLM
4 months ago
Barun Amalkumar Halder cc69976860
community[minor] : adds callback handler for Fiddler AI (#17708)
**Description:**  Callback handler to integrate fiddler with langchain. 
This PR adds the following -

1. `FiddlerCallbackHandler` implementation into langchain/community
2. Example notebook `fiddler.ipynb` for usage documentation

[Internal Tracker : FDL-14305]

**Issue:** 
NA

**Dependencies:** 
- Installation of langchain-community is unaffected.
- Usage of FiddlerCallbackHandler requires installation of latest
fiddler-client (2.5+)

**Twitter handle:** @fiddlerlabs @behalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
4 months ago
Christophe Bornet b8b5ce0c8c
astradb: Add AstraDBChatMessageHistory to langchain-astradb package (#17732)
Co-authored-by: Bagatur <baskaryan@gmail.com>
4 months ago
Maxime Perrin c06a8732aa
community[patch]: fix llama index imports and fields access (#17870)
- **Description:** Fixing outdated imports after v0.10 llama index
update and updating metadata and source text access
  - **Issue:** #17860
  - **Twitter handle:** @maximeperrin_

---------

Co-authored-by: Maxime Perrin <mperrin@doing.fr>
4 months ago
2jimoo 7fc903464a
community: Add document manager and mongo document manager (#17320)
- **Description:** 
    - Add DocumentManager class, which is a nosql record manager. 
- In order to use index and aindex in
libs/langchain/langchain/indexes/_api.py, DocumentManager inherits
RecordManager.
    - Also I added the MongoDB implementation of Document Manager too.
  - **Dependencies:** pymongo, motor
  
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
- **Description:** Add DocumentManager class, which is a no sql record
manager. To use index method and aindex method in indexes._api.py,
Document Manager inherits RecordManager.Add the MongoDB implementation
of Document Manager.
  - **Dependencies:** pymongo, motor

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
4 months ago
kYLe 56b955fc31
community[minor]: Add async_client for Anyscale Chat model (#18050)
Add `async_client` for Anyscale Chat_model
4 months ago
Erick Friis 29e0445490
community[patch]: BaseLLM typing in init (#18029) 4 months ago
Nicolò Boschi 4c132b4cc6
community: fix openai streaming throws 'AIMessageChunk' object has no attribute 'text' (#18006)
After upgrading langchain-community to 0.0.22, it's not possible to use
openai from the community package with streaming=True
```
  File "/home/runner/work/ragstack-ai/ragstack-ai/ragstack-e2e-tests/.tox/langchain/lib/python3.11/site-packages/langchain_community/chat_models/openai.py", line 434, in _generate
    return generate_from_stream(stream_iter)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/ragstack-ai/ragstack-ai/ragstack-e2e-tests/.tox/langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 65, in generate_from_stream
    for chunk in stream:
  File "/home/runner/work/ragstack-ai/ragstack-ai/ragstack-e2e-tests/.tox/langchain/lib/python3.11/site-packages/langchain_community/chat_models/openai.py", line 418, in _stream
    run_manager.on_llm_new_token(chunk.text, chunk=cg_chunk)
                                 ^^^^^^^^^^
AttributeError: 'AIMessageChunk' object has no attribute 'text'
```

Fix regression of https://github.com/langchain-ai/langchain/pull/17907 
**Twitter handle:** @nicoloboschi
4 months ago
Guangdong Liu 4197efd67a
community: Fix SparkLLM error (#18015)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"

- **Description:** fix SparkLLM  error
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
4 months ago
Leo Diegues b15fccbb99
community[patch]: Skip `OpenAIWhisperParser` extremely small audio chunks to avoid api error (#11450)
**Description**
This PR addresses a rare issue in `OpenAIWhisperParser` that causes it
to crash when processing an audio file with a duration very close to the
class's chunk size threshold of 20 minutes.

**Issue**
#11449

**Dependencies**
None

**Tag maintainer**
@agola11 @eyurtsev 

**Twitter handle**
leonardodiegues

---------

Co-authored-by: Leonardo Diegues <leonardo.diegues@grupofolha.com.br>
Co-authored-by: Bagatur <baskaryan@gmail.com>
4 months ago
mackong 9678797625
community[patch]: callback before yield for _stream/_astream (#17907)
- Description: callback on_llm_new_token before yield chunk for
_stream/_astream for some chat models, make all chat models in a
consistent behaviour.
- Issue: N/A
- Dependencies: N/A
4 months ago
Chad Juliano 50ba3c68bb
community[minor]: add Kinetica LLM wrapper (#17879)
**Description:** Initial pull request for Kinetica LLM wrapper
**Issue:** N/A
**Dependencies:** No new dependencies for unit tests. Integration tests
require gpudb, typeguard, and faker
**Twitter handle:** @chad_juliano

Note: There is another pull request for Kinetica vectorstore. Ultimately
we would like to make a partner package but we are starting with a
community contribution.
4 months ago
Bagatur 9b0b0032c2
community[patch]: fix lint (#17984) 4 months ago
David Loving d068e8ea54
community[patch]: compatibility with SQLAlchemy 1.4.x (#17954)
**Description:**
Change type hint on `QuerySQLDataBaseTool` to be compatible with
SQLAlchemy v1.4.x.

**Issue:**
Users locked to `SQLAlchemy < 2.x` are unable to import
`QuerySQLDataBaseTool`.

closes https://github.com/langchain-ai/langchain/issues/17819

**Dependencies:**
None
4 months ago
kartikTAI 9cf6661dc5
community: use NeuralDB object to initialize NeuralDBVectorStore (#17272)
**Description:** This PR adds an `__init__` method to the
NeuralDBVectorStore class, which takes in a NeuralDB object to
instantiate the state of NeuralDBVectorStore.
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** N/A
4 months ago
Brad Erickson ecd72d26cf
community: Bugfix - correct Ollama API path to avoid HTTP 307 (#17895)
Sets the correct /api/generate path, without ending /, to reduce HTTP
requests.

Reference:

https://github.com/ollama/ollama/blob/efe040f8/docs/api.md#generate-request-streaming

Before:

    DEBUG: Starting new HTTP connection (1): localhost:11434
    DEBUG: http://localhost:11434 "POST /api/generate/ HTTP/1.1" 307 0
    DEBUG: http://localhost:11434 "POST /api/generate HTTP/1.1" 200 None

After:

    DEBUG: Starting new HTTP connection (1): localhost:11434
    DEBUG: http://localhost:11434 "POST /api/generate HTTP/1.1" 200 None
4 months ago
Erick Friis a53370a060
pinecone[patch], docs: PineconeVectorStore, release 0.0.3 (#17896) 4 months ago
Hasan 7248e98b9e
community[patch]: Return PK in similarity search Document (#17561)
Issue: #17390

Co-authored-by: hasan <hasan@m2sys.com>
4 months ago
Raunak 1ec8199c8e
community[patch]: Added more functions in NetworkxEntityGraph class (#17624)
- **Description:** 
1. Added add_node(), remove_node(), has_node(), remove_edge(),
has_edge() and get_neighbors() functions in
       NetworkxEntityGraph class.

2. Added the above functions in graph_networkx_qa.ipynb documentation.
4 months ago
Christophe Bornet 3d91be94b1
community[patch]: Add missing async_astra_db_client param to AstraDBChatMessageHistory (#17742) 4 months ago
Xudong Sun c524bf31f5
docs: add helpful comments to sparkllm.py (#17774)
Adding helpful comments to sparkllm.py, help users to use ChatSparkLLM
more effectively
4 months ago
Ian 3019a594b7
community[minor]: Add tidb loader support (#17788)
This pull request support loading data from TiDB database with
Langchain.

A simple usage:
```
from  langchain_community.document_loaders import TiDBLoader

CONNECTION_STRING = "mysql+pymysql://root@127.0.0.1:4000/test"

QUERY = "select id, name, description from items;"
loader = TiDBLoader(
    connection_string=CONNECTION_STRING,
    query=QUERY,
    page_content_columns=["name", "description"],
    metadata_columns=["id"],
)
documents = loader.load()
print(documents)
```
4 months ago
Christophe Bornet 815ec74298
docs: Add docstring to AstraDBStore (#17793) 4 months ago
ehude 9e54c227f1
community[patch]: Bug Neo4j VectorStore when having multiple indexes the sort is not working and the store that returned is random (#17396)
Bug fix: when having multiple indexes the sort is not working and the
store that returned is random.
The following small fix resolves the issue.
4 months ago
Michael Feil 242981b8f0
community[minor]: infinity embedding local option (#17671)
**drop-in-replacement for sentence-transformers
inference.**

https://github.com/langchain-ai/langchain/discussions/17670

tldr from the discussion above -> around a 4x-22x speedup over using
SentenceTransformers / huggingface embeddings. For more info:
https://github.com/michaelfeil/infinity (pure-python dependency)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
4 months ago
Leonid Ganeline ed0b7c3b72
docs: added `community` modules descriptions (#17827)
API Reference: Several `community` modules (like
[adapter](https://api.python.langchain.com/en/latest/community_api_reference.html#module-langchain_community.adapters)
module) are missing descriptions. It happens when langchain was split to
the core, langchain and community packages.
- Copied module descriptions from other packages
- Fixed several descriptions to the consistent format.
4 months ago
Christophe Bornet 5019951a5d
docs: AstraDB VectorStore docstring (#17834) 4 months ago
Rohit Gupta 3acd0c74fc
community[patch]: added SCANN index in default search params (#17889)
This will enable users to add data in same collection for index type
SCANN for milvus
4 months ago
Karim Assi afc1ba0329
community[patch]: add possibility to search by vector in OpenSearchVectorSearch (#17878)
- **Description:** implements the missing `similarity_search_by_vector`
function for `OpenSearchVectorSearch`
- **Issue:** N/A
- **Dependencies:** N/A
4 months ago
Nathan Voxland (Activeloop) 9ece134d45
docs: Improved deeplake.py init documentation (#17549)
**Description:** 
Updated documentation for DeepLake init method.

Especially the exec_option docs needed improvement, but did a general
cleanup while I was looking at it.

**Issue:** n/a
**Dependencies:** None

---------

Co-authored-by: Nathan Voxland <nathan@voxland.net>
4 months ago
Zachary Toliver 29ee0496b6
community[patch]: Allow override of 'fetch_schema_from_transport' in the GraphQL tool (#17649)
- **Description:** In order to override the bool value of
"fetch_schema_from_transport" in the GraphQLAPIWrapper, a
"fetch_schema_from_transport" value needed to be added to the
"_EXTRA_OPTIONAL_TOOLS" dictionary in load_tools in the "graphql" key.
The parameter "fetch_schema_from_transport" must also be passed in to
the GraphQLAPIWrapper to allow reading of the value when creating the
client. Passing as an optional parameter is probably best to avoid
breaking changes. This change is necessary to support GraphQL instances
that do not support fetching schema, such as TigerGraph. More info here:
[TigerGraph GraphQL Schema
Docs](https://docs.tigergraph.com/graphql/current/schema)
  - **Threads handle:** @zacharytoliver

---------

Co-authored-by: Zachary Toliver <zt10191991@hotmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
4 months ago
mackong 31891092d8
community[patch]: add missing chunk parameter for _stream/_astream (#17807)
- Description: Add missing chunk parameter for _stream/_astream for some
chat models, make all chat models in a consistent behaviour.
- Issue: N/A
- Dependencies: N/A
4 months ago
volodymyr-memsql 0a9a519a39
community[patch]: Added add_images method to SingleStoreDB vector store (#17871)
In this pull request, we introduce the add_images method to the
SingleStoreDB vector store class, expanding its capabilities to handle
multi-modal embeddings seamlessly. This method facilitates the
incorporation of image data into the vector store by associating each
image's URI with corresponding document content, metadata, and either
pre-generated embeddings or embeddings computed using the embed_image
method of the provided embedding object.

the change includes integration tests, validating the behavior of the
add_images. Additionally, we provide a notebook showcasing the usage of
this new method.

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
4 months ago
Shashank 8381f859b4
community[patch]: Graceful handling of redis errors in RedisCache and AsyncRedisCache (#17171)
- **Description:**
The existing `RedisCache` implementation lacks proper handling for redis
client failures, such as `ConnectionRefusedError`, leading to subsequent
failures in pipeline components like LLM calls. This pull request aims
to improve error handling for redis client issues, ensuring a more
robust and graceful handling of such errors.

  - **Issue:**  Fixes #16866
  - **Dependencies:** No new dependency
  - **Twitter handle:** N/A

Co-authored-by: snsten <>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
4 months ago
Christophe Bornet e6311d953d
community[patch]: Add AstraDBLoader docstring (#17873) 4 months ago
nbyrneKX c1bb5fd498
community[patch]: typo in doc-string for kdbai vectorstore (#17811)
community[patch]: typo in doc-string for kdbai vectorstore (#17811)
4 months ago
Christophe Bornet bebe401b1a
astradb[patch]: Add AstraDBStore to langchain-astradb package (#17789)
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Guangdong Liu 47b1b7092d
community[minor]: Add SparkLLM to community (#17702) 5 months ago
Guangdong Liu 3ba1cb8650
community[minor]: Add SparkLLM Text Embedding Model and SparkLLM introduction (#17573) 5 months ago
Virat Singh 92e52e89ca
community: Add PolygonTickerNews Tool (#17808)
Description:
In this PR, I am adding a PolygonTickerNews Tool, which can be used to
get the latest news for a given ticker / stock.

Twitter handle: [@virattt](https://twitter.com/virattt)
5 months ago
Christophe Bornet b13e52b6ac
community[patch]: Fix AstraDBCache docstrings (#17802) 5 months ago
Karim Lalani ea61302f71
community[patch]: bug fix - add empty metadata when metadata not provided (#17669)
Code fix to include empty medata dictionary to aadd_texts if metadata is
not provided.
5 months ago
CogniJT 919ebcc596
community[minor]: CogniSwitch Agent Toolkit for LangChain (#17312)
**Description**: CogniSwitch focusses on making GenAI usage more
reliable. It abstracts out the complexity & decision making required for
tuning processing, storage & retrieval. Using simple APIs documents /
URLs can be processed into a Knowledge Graph that can then be used to
answer questions.

**Dependencies**: No dependencies. Just network calls & API key required
**Tag maintainer**: @hwchase17
**Twitter handle**: https://github.com/CogniSwitch
**Documentation**: Please check
`docs/docs/integrations/toolkits/cogniswitch.ipynb`
**Tests**: The usual tool & toolkits tests using `test_imports.py`

PR has passed linting and testing before this submission.

---------

Co-authored-by: Saicharan Sridhara <145636106+saiCogniswitch@users.noreply.github.com>
5 months ago
Christophe Bornet 6275d8b1bf
docs: Fix AstraDBChatMessageHistory docstrings (#17740) 5 months ago
Aymeric Roucher 0d294760e7
Community: Fuse HuggingFace Endpoint-related classes into one (#17254)
## Description
Fuse HuggingFace Endpoint-related classes into one:
-
[HuggingFaceHub](5ceaf784f3/libs/community/langchain_community/llms/huggingface_hub.py)
-
[HuggingFaceTextGenInference](5ceaf784f3/libs/community/langchain_community/llms/huggingface_text_gen_inference.py)
- and
[HuggingFaceEndpoint](5ceaf784f3/libs/community/langchain_community/llms/huggingface_endpoint.py)

Are fused into
- HuggingFaceEndpoint

## Issue
The deduplication of classes was creating a lack of clarity, and
additional effort to develop classes leads to issues like [this
hack](5ceaf784f3/libs/community/langchain_community/llms/huggingface_endpoint.py (L159)).

## Dependancies

None, this removes dependancies.

## Twitter handle

If you want to post about this: @AymericRoucher

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Raghav Dixit 6c18f73ca5
community[patch]: LanceDB integration improvements/fixes (#16173)
Hi, I'm from the LanceDB team.

Improves LanceDB integration by making it easier to use - now you aren't
required to create tables manually and pass them in the constructor,
although that is still backward compatible.

Bug fix - pandas was being used even though it's not a dependency for
LanceDB or langchain

PS - this issue was raised a few months ago but lost traction. It is a
feature improvement for our users kindly review this , Thanks !
5 months ago
Christophe Bornet e92e96193f
community[minor]: Add async methods to the AstraDB BaseStore (#16872)
---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
Mohammad Mohtashim 43dc5d3416
community[patch]: OpenLLM Client Fixes + Added Timeout Parameter (#17478)
- OpenLLM was using outdated method to get the final text output from
openllm client invocation which was raising the error. Therefore
corrected that.
- OpenLLM `_identifying_params` was getting the openllm's client
configuration using outdated attributes which was raising error.
- Updated the docstring for OpenLLM.
- Added timeout parameter to be passed to underlying openllm client.
5 months ago
Guangdong Liu 73edf17b4e
community[minor]: Add Apache Doris as vector store (#17527)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Bagatur a058c8812d
community[patch]: add VoyageEmbeddings truncation (#17638) 5 months ago
Mohammad Mohtashim 8d4547ae97
[Langchain_community]: Corrected the imports to make them compatible with Sqlachemy <2.0 (#17653)
- Small Change in Imports in sql_database module to make it work with
Sqlachemy <2.0
 - This was identified in the following issue: #17616
5 months ago
Christophe Bornet 19ebc7418e
community: Use _AstraDBCollectionEnvironment in AstraDB VectorStore (community) (#17635)
Another PR will be done for the langchain-astradb package.

Note: for future PRs, devs will be done in the partner package only. This one is just to align with the rest of the components in the community package and it fixes a bunch of issues.
5 months ago
Nejc Habjan b4fa847a90
community[minor]: add exclude parameter to DirectoryLoader (#17316)
- **Description:** adds an `exclude` parameter to the DirectoryLoader
class, based on similar behavior in GenericLoader
- **Issue:** discussed in
https://github.com/langchain-ai/langchain/discussions/9059 and I think
in some other issues that I cannot find at the moment 🙇
  - **Dependencies:** None
  - **Twitter handle:** don't have one sorry! Just https://github/nejch

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Krista Pratico bf8e3c6dd1
community[patch]: add fixes for AzureSearch after update to stable azure-search-documents library (#17599)
- **Description:** Addresses the bugs described in linked issue where an
import was erroneously removed and the rename of a keyword argument was
missed when migrating from beta --> stable of the azure-search-documents
package
- **Issue:** https://github.com/langchain-ai/langchain/issues/17598
- **Dependencies:** N/A
- **Twitter handle:** N/A
5 months ago
morgana 9d7ca7df6e
community[patch]: update copy of metadata in rockset vectorstore integration (#17612)
- **Description:** This fixes an issue with working with RecordManager.
RecordManager was generating new hashes on documents because `add_texts`
was modifying the metadata directly. Additionally moved some tests to
unit tests since that was a more appropriate home.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** `@_morgan_adams_`
5 months ago
Stefano Lottini 5240ecab99
astradb: bootstrapping Astra DB as Partner Package (#16875)
**Description:** This PR introduces a new "Astra DB" Partner Package.

So far only the vector store class is _duplicated_ there, all others
following once this is validated and established.

Along with the move to separate package, incidentally, the class name
will change `AstraDB` => `AstraDBVectorStore`.

The strategy has been to duplicate the module (with prospected removal
from community at LangChain 0.2). Until then, the code will be kept in
sync with minimal, known differences (there is a makefile target to
automate drift control. Out of convenience with this check, the
community package has a class `AstraDBVectorStore` aliased to `AstraDB`
at the end of the module).

With this PR several bugfixes and improvement come to the vector store,
as well as a reshuffling of the doc pages/notebooks (Astra and
Cassandra) to align with the move to a separate package.

**Dependencies:** A brand new pyproject.toml in the new package, no
changes otherwise.

**Twitter handle:** `@rsprrs`

---------

Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Moshe Berchansky 20a56fe0a2
community[minor]: Add QuantizedEmbedders (#17391)
**Description:** 
* adding Quantized embedders using optimum-intel and
intel-extension-for-pytorch.
* added mdx documentation and example notebooks 
* added embedding import testing.

**Dependencies:** 
optimum = {extras = ["neural-compressor"], version = "^1.14.0", optional
= true}
intel_extension_for_pytorch = {version = "^2.2.0", optional = true}

Dependencies have been added to pyproject.toml for the community lib.  

**Twitter handle:** @peter_izsak

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Amir Karbasi bccc9241ea
community[patch]: Resolve KuzuQAChain API Changes (#16885)
- **Description:** Updates to the Kuzu API had broken this
functionality. These updates resolve those issues and add a new test to
demonstrate the updates.
- **Issue:** #11874
- **Dependencies:** No new dependencies
- **Twitter handle:** @amirk08


Test results:
```
tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_no_params PASSED                                   [ 33%]
tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_params PASSED                                      [ 66%]
tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_refresh_schema PASSED                                    [100%]

=================================================== slowest 5 durations =================================================== 
0.53s call     tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_refresh_schema
0.34s call     tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_no_params
0.28s call     tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_params
0.03s teardown tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_refresh_schema
0.02s teardown tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_params
==================================================== 3 passed in 1.27s ==================================================== 
```
5 months ago
Rafail Giavrimis a84a3add25
Community[patch]: Adjusted import to be compatible with SQLAlchemy<2 (#17520)
- **Description:** Adjusts an import to directly import `Result` from
`sqlalchemy.engine`.
- **Issue:** #17519 
- **Dependencies:** N/A
- **Twitter handle:** @grafail
5 months ago
Zachary Toliver 6746adf363
community[patch]: pass bool value for fetch_schema_from_transport in GraphQLAPIWrapper (#17552)
- **Description:** Allow a bool value to be passed to
fetch_schema_from_transport since not all GraphQL instances support this
feature, such as TigerGraph.
- **Threads:** @zacharytoliver

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Christophe Bornet 789cd5198d
community[patch]: Use astrapy built-in pagination prefetch in AstraDBLoader (#17569) 5 months ago
Christophe Bornet 387cacb881
community[minor]: Add async methods to AstraDBChatMessageHistory (#17572) 5 months ago
Christophe Bornet ff1f985a2a
community: Fix some mypy types in cassandra doc loader (#17570)
Thank you!
5 months ago
Christophe Bornet ca2d4078f3
community: Add async methods to AstraDBCache (#17415)
Adds async methods to AstraDBCache
5 months ago
Jan Cap 7ae3ce60d2
community[patch]: Fix pwd import that is not available on windows (#17532)
- **Description:** Resolving problem in
`langchain_community\document_loaders\pebblo.py` with `import pwd`.
`pwd` is not available on windows. import moved to try catch block
  - **Issue:** #17514
5 months ago
nvpranak 91bcc9c5c9
community[minor]: Nemo embeddings(#16206)
This PR is adding support for NVIDIA NeMo embeddings issue #16095.

---------

Co-authored-by: Praveen Nakshatrala <pnakshatrala@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Mateusz Szewczyk 916332ef5b
ibm: added partners package `langchain_ibm`, added llm (#16512)
- **Description:** Added `langchain_ibm` as an langchain partners
package of IBM [watsonx.ai](https://www.ibm.com/products/watsonx-ai) LLM
provider (`WatsonxLLM`)
- **Dependencies:**
[ibm-watsonx-ai](https://pypi.org/project/ibm-watsonx-ai/),
  - **Tag maintainer:** : 
---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Shawn f6d3a3546f
community[patch]: document_loaders: modified athena key logic to handle s3 uris without a prefix (#17526)
https://github.com/langchain-ai/langchain/issues/17525

### Example Code

```python
from langchain_community.document_loaders.athena import AthenaLoader

database_name = "database"
s3_output_path = "s3://bucket-no-prefix"
query="""SELECT 
  CAST(extract(hour FROM current_timestamp) AS INTEGER) AS current_hour,
  CAST(extract(minute FROM current_timestamp) AS INTEGER) AS current_minute,
  CAST(extract(second FROM current_timestamp) AS INTEGER) AS current_second;
"""
profile_name = "AdministratorAccess"

loader = AthenaLoader(
    query=query,
    database=database_name,
    s3_output_uri=s3_output_path,
    profile_name=profile_name,
)

documents = loader.load()
print(documents)
```



### Error Message and Stack Trace (if applicable)

NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject
operation: The specified key does not exist

### Description

Athena Loader errors when result s3 bucket uri has no prefix. The Loader
instance call results in a "NoSuchKey: An error occurred (NoSuchKey)
when calling the GetObject operation: The specified key does not exist."
error.

If s3_output_path contains a prefix like:

```python
s3_output_path = "s3://bucket-with-prefix/prefix"
```

Execution works without an error.

## Suggested solution

Modify:

```python
key = "/".join(tokens[1:]) + "/" + query_execution_id + ".csv"
```

to

```python
key = "/".join(tokens[1:]) + ("/" if tokens[1:] else "") + query_execution_id + ".csv"
```


9e8a3fc4ff/libs/community/langchain_community/document_loaders/athena.py (L128)


### System Info


System Information
------------------
> OS:  Darwin
> OS Version: Darwin Kernel Version 22.6.0: Fri Sep 15 13:41:30 PDT
2023; root:xnu-8796.141.3.700.8~1/RELEASE_ARM64_T8103
> Python Version:  3.9.9 (main, Jan  9 2023, 11:42:03) 
[Clang 14.0.0 (clang-1400.0.29.102)]

Package Information
-------------------
> langchain_core: 0.1.23
> langchain: 0.1.7
> langchain_community: 0.0.20
> langsmith: 0.0.87
> langchain_openai: 0.0.6
> langchainhub: 0.1.14

Packages not installed (Not Necessarily a Problem)
--------------------------------------------------
The following packages were not found:

> langgraph
> langserve

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
wulixuan c776cfc599
community[minor]: integrate with model Yuan2.0 (#15411)
1. integrate with
[`Yuan2.0`](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md)
2. update `langchain.llms`
3. add a new doc for [Yuan2.0
integration](docs/docs/integrations/llms/yuan2.ipynb)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Philippe PRADOS d07db457fc
community[patch]: Fix SQLAlchemyMd5Cache race condition (#16279)
If the SQLAlchemyMd5Cache is shared among multiple processes, it is
possible to encounter a race condition during the cache update.

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Alex Peplowski 70c296ae96
community[patch]: Expose Anthropic Retry Logic (#17069)
**Description:**

Expose Anthropic's retry logic, so that `max_retries` can be configured
via langchain. Anthropic's retry logic is implemented in their Python
SDK here:
https://github.com/anthropics/anthropic-sdk-python?tab=readme-ov-file#retries

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Lyndsey 8562a1e7d4
community[patch]: support query filters for NotionDBLoader (#17217)
- **Description:** Support filtering databases in the use case where
devs do not want to query ALL entries within a DB,
- **Issue:** N/A,
- **Dependencies:** N/A,
- **Twitter handle:** I don't have Twitter but feel free to tag my
Github!

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
volodymyr-memsql e36bc379f2
community[patch]: Add vector index support to SingleStoreDB VectorStore (#17308)
This pull request introduces support for various Approximate Nearest
Neighbor (ANN) vector index algorithms in the VectorStore class,
starting from version 8.5 of SingleStore DB. Leveraging this enhancement
enables users to harness the power of vector indexing, significantly
boosting search speed, particularly when handling large sets of vectors.

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Kate Silverstein 0bc4a9b3fc
community[minor]: Adds Llamafile as an LLM (#17431)
* **Description:** Adds a simple LLM implementation for interacting with
[llamafile](https://github.com/Mozilla-Ocho/llamafile)-based models.
* **Dependencies:** N/A
* **Issue:** N/A

**Detail**
[llamafile](https://github.com/Mozilla-Ocho/llamafile) lets you run LLMs
locally from a single file on most computers without installing any
dependencies.

To use the llamafile LLM implementation, the user needs to:

1. Download a llamafile e.g.
https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile?download=true
2. Make the file executable.
3. Run the llamafile in 'server mode'. (All llamafiles come packaged
with a lightweight server; by default, the server listens at
`http://localhost:8080`.)


```bash
wget https://url/of/model.llamafile
chmod +x model.llamafile
./model.llamafile --server --nobrowser
```

Now, the user can invoke the LLM via the LangChain client:

```python
from langchain_community.llms.llamafile import Llamafile

llm = Llamafile()

llm.invoke("Tell me a joke.")
```
5 months ago
Rakib Hosen 5ce1827d31
community[patch]: fix import in language parser (#17538)
- **Description:** Resolving import error in language_parser.py during
"from langchain.langchain.text_splitter import Language - **Issue:** the
issue #17536
- **Dependencies:** NO
- **Twitter handle:** @iRakibHosen

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Raunak 685d62b032
community[patch]: Added functions in NetworkxEntityGraph class (#17535)
- **Description:** 
1. Added _clear_edges()_ and _get_number_of_nodes()_ functions in
NetworkxEntityGraph class.
2. Added the above two function in graph_networkx_qa.ipynb
documentation.
5 months ago
Qihui Xie 5738143d4b
add mongodb_store (#13801)
# Add MongoDB storage
  - **Description:** 
  Add MongoDB Storage as an option for large doc store. 

Example usage: 
```Python
# Instantiate the MongodbStore with a MongoDB connection
from langchain.storage import MongodbStore

mongo_conn_str = "mongodb://localhost:27017/"
mongodb_store = MongodbStore(mongo_conn_str, db_name="test-db",
                                collection_name="test-collection")

# Set values for keys
doc1 = Document(page_content='test1')
doc2 = Document(page_content='test2')
mongodb_store.mset([("key1", doc1), ("key2", doc2)])

# Get values for keys
values = mongodb_store.mget(["key1", "key2"])
# [doc1, doc2]

# Iterate over keys
for key in mongodb_store.yield_keys():
    print(key)

# Delete keys
mongodb_store.mdelete(["key1", "key2"])
 ```

  - **Dependencies:**
  Use `mongomock` for integration test.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Nat Noordanus 8a3b74fe1f
community[patch]: Fix pydantic ForwardRef error in BedrockBase (#17416)
- **Description:** Fixes a type annotation issue in the definition of
BedrockBase. This issue was that the annotation for the `config`
attribute includes a ForwardRef to `botocore.client.Config` which is
only imported when `TYPE_CHECKING`. This can cause pydantic to raise an
error like `pydantic.errors.ConfigError: field "config" not yet prepared
so type is still a ForwardRef, ...`.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@__nat_n__`
5 months ago
Ashley Xu f746a73e26
Add the BQ job usage tracking from LangChain (#17123)
- **Description:**
Add the BQ job usage tracking from LangChain

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Max Jakob ab3d944667
community[patch]: ElasticsearchStore: preserve user headers (#16830)
Users can provide an Elasticsearch connection with custom headers. This
PR makes sure these headers are preserved when adding the langchain user
agent header.
5 months ago
wulixuan 5d06797905
community[minor]: integrate chat models with Yuan2.0 (#16575)
1. integrate chat models with
[`Yuan2.0`](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md)
2. add a new doc for [Yuan2.0
integration](docs/docs/integrations/llms/yuan2.ipynb)
 
Yuan2.0 is a new generation Fundamental Large Language Model developed
by IEIT System. We have published all three models, Yuan 2.0-102B, Yuan
2.0-51B, and Yuan 2.0-2B.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Ian Gregory e5472b5eb8
Framework for supporting more languages in LanguageParser (#13318)
## Description

I am submitting this for a school project as part of a team of 5. Other
team members are @LeilaChr, @maazh10, @Megabear137, @jelalalamy. This PR
also has contributions from community members @Harrolee and @Mario928.

Initial context is in the issue we opened (#11229).

This pull request adds:

- Generic framework for expanding the languages that `LanguageParser`
can handle, using the
[tree-sitter](https://github.com/tree-sitter/py-tree-sitter#py-tree-sitter)
parsing library and existing language-specific parsers written for it
- Support for the following additional languages in `LanguageParser`:
  - C
  - C++
  - C#
  - Go
- Java (contributed by @Mario928
https://github.com/ThatsJustCheesy/langchain/pull/2)
  - Kotlin
  - Lua
  - Perl
  - Ruby
  - Rust
  - Scala
- TypeScript (contributed by @Harrolee
https://github.com/ThatsJustCheesy/langchain/pull/1)

Here is the [design
document](https://docs.google.com/document/d/17dB14cKCWAaiTeSeBtxHpoVPGKrsPye8W0o_WClz2kk)
if curious, but no need to read it.

## Issues

- Closes #11229
- Closes #10996
- Closes #8405

## Dependencies

`tree_sitter` and `tree_sitter_languages` on PyPI. We have tried to add
these as optional dependencies.

## Documentation

We have updated the list of supported languages, and also added a
section to `source_code.ipynb` detailing how to add support for
additional languages using our framework.

## Maintainer

- @hwchase17 (previously reviewed
https://github.com/langchain-ai/langchain/pull/6486)

Thanks!!

## Git commits

We will gladly squash any/all of our commits (esp merge commits) if
necessary. Let us know if this is desirable, or if you will be
squash-merging anyway.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Maaz Hashmi <mhashmi373@gmail.com>
Co-authored-by: LeilaChr <87657694+LeilaChr@users.noreply.github.com>
Co-authored-by: Jeremy La <jeremylai511@gmail.com>
Co-authored-by: Megabear137 <zubair.alnoor27@gmail.com>
Co-authored-by: Lee Harrold <lhharrold@sep.com>
Co-authored-by: Mario928 <88029051+Mario928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
5 months ago
Abhishek Jain 37e1275f9e
community[patch]: Fixed the 'aembed' method of 'CohereEmbeddings'. (#16497)
**Description:**
- The existing code was trying to find a `.embeddings` property on the
`Coroutine` returned by calling `cohere.async_client.embed`.
- Instead, the `.embeddings` property is present on the value returned
by the `Coroutine`.
- Also, it seems that the original cohere client expects a value of
`max_retries` to not be `None`. Hence, setting the default value of
`max_retries` to `3`.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Sridhar Ramaswamy 9f1cbbc6ed
community[minor]: Add pebblo safe document loader (#16862)
- **Description:** Pebblo opensource project enables developers to
safely load data to their Gen AI apps. It identifies semantic topics and
entities found in the loaded data and summarizes them in a
developer-friendly report.
  - **Dependencies:** none
  - **Twitter handle:** srics

@hwchase17
5 months ago
mhavey 1bbb64d956
community[minor], langchian[minor]: Add Neptune Rdf graph and chain (#16650)
**Description**: This PR adds a chain for Amazon Neptune graph database
RDF format. It complements the existing Neptune Cypher chain. The PR
also includes a Neptune RDF graph class to connect to, introspect, and
query a Neptune RDF graph database from the chain. A sample notebook is
provided under docs that demonstrates the overall effect: invoking the
chain to make natural language queries against Neptune using an LLM.

**Issue**: This is a new feature
 
**Dependencies**: The RDF graph class depends on the AWS boto3 library
if using IAM authentication to connect to the Neptune database.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Michael Feil e1cfd0f3e7
community[patch]: infinity embeddings update incorrect default url (#16759)
The default url has always been incorrect (7797 instead 7997). Here is a
update to the correct url.
5 months ago
Theo / Taeyoon Kang 1987f905ed
core[patch]: Support .yml extension for YAML (#16783)
- **Description:**

[AS-IS] When dealing with a yaml file, the extension must be .yaml.  

[TO-BE] In the absence of extension length constraints in the OS, the
extension of the YAML file is yaml, but control over the yml extension
must still be made.

It's as if it's an error because it's a .jpg extension in jpeg support.

  - **Issue:** - 

  - **Dependencies:**
no dependencies required for this change,
5 months ago
Kapil Sachdeva cd00a87db7
community[patch] - in FAISS vector store, support passing custom DocStore implementation when using from_xxx methods (#16801)
- **Description:** The from__xx methods of FAISS class have hardcoded
InMemoryStore implementation and thereby not let users pass a custom
DocStore implementation,
  - **Issue:** no referenced issue,
  - **Dependencies:** none,
  - **Twitter handle:** ksachdeva
5 months ago
Chris f9f5626ca4
community[patch]: Fix github search issues and PRs PaginatedList has no len() error (#16806)
**Description:** 
Bugfix: Langchain_community's GitHub Api wrapper throws a TypeError when
searching for issues and/or PRs (the `search_issues_and_prs` method).
This is because PyGithub's PageinatedList type does not support the
len() method. See https://github.com/PyGithub/PyGithub/issues/1476

![image](https://github.com/langchain-ai/langchain/assets/8849021/57390b11-ed41-4f48-ba50-f3028610789c)
  **Dependencies:** None 
  **Twitter handle**: @ChrisKeoghNZ
  
I haven't registered an issue as it would take me longer to fill the
template out than to make the fix, but I'm happy to if that's deemed
essential.

I've added a simple integration test to cover this as there were no
existing unit tests and it was going to be tricky to set them up.

Co-authored-by: Chris Keogh <chris.keogh@xero.com>
5 months ago
morgana 722aae4fd1
community: add delete method to rocksetdb vectorstore to support recordmanager (#17030)
- **Description:** This adds a delete method so that rocksetdb can be
used with `RecordManager`.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@_morgan_adams_`

---------

Co-authored-by: Rockset API Bot <admin@rockset.io>
5 months ago
yin1991 c454dc36fc
community[proxy]: Enhancement/add proxy support playwrighturlloader 16751 (#16822)
- **Description:** Enhancement/add proxy support playwrighturlloader
16751
- **Issue:** [Enhancement: Add Proxy Support to PlaywrightURLLoader
Class](https://github.com/langchain-ai/langchain/issues/16751)
  - **Dependencies:** 
  - **Twitter handle:** @ootR77013489

---------

Co-authored-by: root <root@ip-172-31-46-160.ap-southeast-1.compute.internal>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Lingzhen Chen 30af711c34
community[patch]: update AzureSearch class to work with azure-search-documents=11.4.0 (#15659)
- **Description:** Updates
`libs/community/langchain_community/vectorstores/azuresearch.py` to
support the stable version `azure-search-documents=11.4.0`
- **Issue:** https://github.com/langchain-ai/langchain/issues/14534,
https://github.com/langchain-ai/langchain/issues/15039,
https://github.com/langchain-ai/langchain/issues/15355
  - **Dependencies:** azure-search-documents>=11.4.0

---------

Co-authored-by: Clément Tamines <Skar0@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Robby e135dc70c3
community[patch]: Invoke callback prior to yielding token (#17348)
**Description:** Invoke callback prior to yielding token in stream
method for Ollama.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)

Co-authored-by: Robby <h0rv@users.noreply.github.com>
5 months ago
Christophe Bornet ab025507bc
community[patch]: Add async methods to VectorStoreQATool (#16949) 5 months ago
Christophe Bornet fb7552bfcf
Add async methods to InMemoryCache (#17425)
Add async methods to InMemoryCache
5 months ago
yin1991 37ef6ac113
community[patch]: Add Pagination to GitHubIssuesLoader for Efficient GitHub Issues Retrieval (#16934)
- **Description:** Add Pagination to GitHubIssuesLoader for Efficient
GitHub Issues Retrieval
- **Issue:** [the issue # it fixes if
applicable,](https://github.com/langchain-ai/langchain/issues/16864)

---------

Co-authored-by: root <root@ip-172-31-46-160.ap-southeast-1.compute.internal>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Robby ece4b43a81
community[patch]: doc loaders mypy fixes (#17368)
**Description:** Fixed `type: ignore`'s for mypy for some
document_loaders.
**Issue:** [Remove "type: ignore" comments #17048
](https://github.com/langchain-ai/langchain/issues/17048)

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Robby 0653aa469a
community[patch]: Invoke callback prior to yielding token (#17346)
**Description:** Invoke callback prior to yielding token in stream
method for watsonx.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)

Co-authored-by: Robby <h0rv@users.noreply.github.com>
5 months ago
Bagatur f7e453971d
community[patch]: remove print (#17435) 5 months ago
Spencer Kelly 54fa78c887
community[patch]: fixed vector similarity filtering (#16967)
**Description:** changed filtering so that failed filter doesn't add
document to results. Currently filtering is entirely broken and all
documents are returned whether or not they pass the filter.

fixes issue introduced in
https://github.com/langchain-ai/langchain/pull/16190
5 months ago
Abhijeeth Padarthi 584b647b96
community[minor]: AWS Athena Document Loader (#15625)
- **Description:** Adds the document loader for [AWS
Athena](https://aws.amazon.com/athena/), a serverless and interactive
analytics service.
  - **Dependencies:** Added boto3 as a dependency
5 months ago
david-tempelmann 93da18b667
community[minor]: Add mmr and similarity_score_threshold retrieval to DatabricksVectorSearch (#16829)
- **Description:** This PR adds support for `search_types="mmr"` and
`search_type="similarity_score_threshold"` to retrievers using
`DatabricksVectorSearch`,
  - **Issue:** 
  - **Dependencies:**
  - **Twitter handle:**

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Massimiliano Pronesti 3894b4d9a5
community: add gpt-4-turbo and gpt-4-0125 costs (#17349)
Ref: https://openai.com/pricing
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
5 months ago
Erick Friis 3a2eb6e12b
infra: add print rule to ruff (#16221)
Added noqa for existing prints. Can slowly remove / will prevent more
being intro'd
5 months ago
Jael Gu c07c0da01a
community[patch]: Fix Milvus add texts when ids=None (#17021)
- **Description:** Fix Milvus add texts when ids=None (auto_id=True)

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Quang Hoa 54c1fb3f25
community[patch]: Make some functions work with Milvus (#10695)
**Description**
Make some functions work with Milvus:
1. get_ids: Get primary keys by field in the metadata
2. delete: Delete one or more entities by ids
3. upsert: Update/Insert one or more entities

**Issue**
None
**Dependencies**
None
**Tag maintainer:**
@hwchase17 
**Twitter handle:**
None

---------

Co-authored-by: HoaNQ9 <hoanq.1811@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
kYLe c9999557bf
community[patch]: Modify LLMs/Anyscale work with OpenAI API v1 (#14206)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
- **Description:** 
1. Modify LLMs/Anyscale to work with OAI v1
2. Get rid of openai_ prefixed variables in Chat_model/ChatAnyscale
3. Modify `anyscale_api_base` to `anyscale_base_url` to follow OAI name
convention (reverted)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Leonid Ganeline 932c52c333
community[patch]: docstrings (#16810)
- added missed docstrings
- formated docstrings to the consistent form
5 months ago
Kononov Pavel 15bc201967
langchain_community: Fix typo bug (#17324)
Problem from #17095

This error wasn't in the v1.4.0
5 months ago
Bagatur 02ef9164b5
langchain[patch]: expose cohere rerank score, add parent doc param (#16887) 5 months ago
Alex de5e96b5f9
community[patch]: updated openai prices in mapping (#17009)
- **Description:** there are january prices update for chatgpt
[blog](https://openai.com/blog/new-embedding-models-and-api-updates),
also there are updates on their website on page
[pricing](https://openai.com/pricing)
- **Issue:** N/A

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Armin Stepanyan 641efcf41c
community: add runtime kwargs to HuggingFacePipeline (#17005)
This PR enables changing the behaviour of huggingface pipeline between
different calls. For example, before this PR there's no way of changing
maximum generation length between different invocations of the chain.
This is desirable in cases, such as when we want to scale the maximum
output size depending on a dynamic prompt size.

Usage example:

```python
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
hf = HuggingFacePipeline(pipeline=pipe)

hf("Say foo:", pipeline_kwargs={"max_new_tokens": 42})
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Scott Nath a32798abd7
community: Add you.com utility, update you retriever integration docs (#17014)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

- **Description: changes to you.com files** 
    - general cleanup
- adds community/utilities/you.py, moving bulk of code from retriever ->
utility
    - removes `snippet` as endpoint
    - adds `news` as endpoint
    - adds more tests

<s>**Description: update community MAKE file** 
    - adds `integration_tests`
    - adds `coverage`</s>

- **Issue:** the issue # it fixes if applicable,
- [For New Contributors: Update Integration
Documentation](https://github.com/langchain-ai/langchain/issues/15664#issuecomment-1920099868)
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Liang Zhang 7306600e2f
community[patch]: Support SerDe transform functions in Databricks LLM (#16752)
**Description:** Databricks LLM does not support SerDe the
transform_input_fn and transform_output_fn. After saving and loading,
the LLM will be broken. This PR serialize these functions into a hex
string using pickle, and saving the hex string in the yaml file. Using
pickle to serialize a function can be flaky, but this is a simple
workaround that unblocks many use cases. If more sophisticated SerDe is
needed, we can improve it later.

Test:
Added a simple unit test.
I did manual test on Databricks and it works well.
The saved yaml looks like:
```
llm:
      _type: databricks
      cluster_driver_port: null
      cluster_id: null
      databricks_uri: databricks
      endpoint_name: databricks-mixtral-8x7b-instruct
      extra_params: {}
      host: e2-dogfood.staging.cloud.databricks.com
      max_tokens: null
      model_kwargs: null
      n: 1
      stop: null
      task: null
      temperature: 0.0
      transform_input_fn: 80049520000000000000008c085f5f6d61696e5f5f948c0f7472616e73666f726d5f696e7075749493942e
      transform_output_fn: null
```

@baskaryan

```python
from langchain_community.embeddings import DatabricksEmbeddings
from langchain_community.llms import Databricks
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
import mlflow

embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")

def transform_input(**request):
  request["messages"] = [
    {
      "role": "user",
      "content": request["prompt"]
    }
  ]
  del request["prompt"]
  return request

llm = Databricks(endpoint_name="databricks-mixtral-8x7b-instruct", transform_input_fn=transform_input)

persist_dir = "faiss_databricks_embedding"

# Create the vector db, persist the db to a local fs folder
loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
db = FAISS.from_documents(docs, embeddings)
db.save_local(persist_dir)

def load_retriever(persist_directory):
    embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
    vectorstore = FAISS.load_local(persist_directory, embeddings)
    return vectorstore.as_retriever()

retriever = load_retriever(persist_dir)
retrievalQA = RetrievalQA.from_llm(llm=llm, retriever=retriever)
with mlflow.start_run() as run:
    logged_model = mlflow.langchain.log_model(
        retrievalQA,
        artifact_path="retrieval_qa",
        loader_fn=load_retriever,
        persist_dir=persist_dir,
    )

# Load the retrievalQA chain
loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri)
print(loaded_model.predict([{"query": "What did the president say about Ketanji Brown Jackson"}]))

```
5 months ago
cjpark-data ce22e10c4b
community[patch]: Fix KeyError 'embedding' (MongoDBAtlasVectorSearch) (#17178)
- **Description:**
Embedding field name was hard-coded named "embedding".
So I suggest that change `res["embedding"]` into
`res[self._embedding_key]`.
  - **Issue:** #17177,
- **Twitter handle:**
[@bagcheoljun17](https://twitter.com/bagcheoljun17)
5 months ago
Neli Hateva 9bb5157a3d
langchain[patch], community[patch]: Fixes in the Ontotext GraphDB Graph and QA Chain (#17239)
- **Description:** Fixes in the Ontotext GraphDB Graph and QA Chain
related to the error handling in case of invalid SPARQL queries, for
which `prepareQuery` doesn't throw an exception, but the server returns
400 and the query is indeed invalid
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** @OntotextGraphDB
5 months ago
ByeongUk Choi b88329e9a5
community[patch]: Implement Unique ID Enforcement in FAISS (#17244)
**Description:**
Implemented unique ID validation in the FAISS component to ensure all
document IDs are distinct. This update resolves issues related to
non-unique IDs, such as inconsistent behavior during deletion processes.
5 months ago
Bassem Yacoube 4e3ed7f043
community[patch]: octoai embeddings bug fix (#17216)
fixes a bug in octoa_embeddings provider
5 months ago
Eugene Yurtsev 780e84ae79
community[minor]: SQLDatabase Add fetch mode `cursor`, query parameters, query by selectable, expose execution options, and documentation (#17191)
- **Description:** Improve `SQLDatabase` adapter component to promote
code re-use, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16246#pullrequestreview-1846590962).
  - **Needed by:** GH-16246
  - **Addressed to:** @baskaryan, @cbornet 

## Details
- Add `cursor` fetch mode
- Accept SQL query parameters
- Accept both `str` and SQLAlchemy selectables as query expression
- Expose `execution_options`
- Documentation page (notebook) about `SQLDatabase` [^1]
See [About
SQLDatabase](https://github.com/langchain-ai/langchain/blob/c1c7b763/docs/docs/integrations/tools/sql_database.ipynb).

[^1]: Apparently there hasn't been any yet?

---------

Co-authored-by: Andreas Motl <andreas.motl@crate.io>
5 months ago
Tomaz Bratanic 7e4b676d53
community[patch]: Better error propagation for neo4jgraph (#17190)
There are other errors that could happen when refreshing the schema, so
we want to propagate specific errors for more clarity
5 months ago
Dmitry Kankalovich f92738a6f6
langchain[minor], community[minor], core[minor]: Async Cache support and AsyncRedisCache (#15817)
* This PR adds async methods to the LLM cache. 
* Adds an implementation using Redis called AsyncRedisCache.
* Adds a docker compose file at the /docker to help spin up docker
* Updates redis tests to use a context manager so flushing always happens by default
5 months ago
Bagatur af74301ab9
core[patch], community[patch]: link extraction continue on failure (#17200) 5 months ago
Frank ef082c77b1
community[minor]: add github file loader to load any github file content b… (#15305)
### Description
support load any github file content based on file extension.  

Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.

### Twitter handle
my twitter: @shufanhaotop

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Ryan Kraus f027696b5f
community: Added new Utility runnables for NVIDIA Riva. (#15966)
**Please tag this issue with `nvidia_genai`**

- **Description:** Added new Runnables for integration NVIDIA Riva into
LCEL chains for Automatic Speech Recognition (ASR) and Text To Speech
(TTS).
- **Issue:** N/A
- **Dependencies:** To use these runnables, the NVIDIA Riva client
libraries are required. It they are not installed, an error will be
raised instructing how to install them. The Runnables can be safely
imported without the riva client libraries.
- **Twitter handle:** N/A

All of the Riva Runnables are inside a single folder in the Utilities
module. In this folder are four files:
- common.py - Contains all code that is common to both TTS and ASR
- stream.py - Contains a class representing an audio stream that allows
the end user to put data into the stream like a queue.
- asr.py - Contains the RivaASR runnable
- tts.py - Contains the RivaTTS runnable

The following Python function is an example of creating a chain that
makes use of both of these Runnables:

```python
def create(
    config: Configuration,
    audio_encoding: RivaAudioEncoding,
    sample_rate: int,
    audio_channels: int = 1,
) -> Runnable[ASRInputType, TTSOutputType]:
    """Create a new instance of the chain."""
    _LOGGER.info("Instantiating the chain.")

    # create the riva asr client
    riva_asr = RivaASR(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        encoding=audio_encoding,
        audio_channel_count=audio_channels,
        sample_rate_hertz=sample_rate,
        profanity_filter=config.riva_asr.profanity_filter,
        enable_automatic_punctuation=config.riva_asr.enable_automatic_punctuation,
        language_code=config.riva_asr.language_code,
    )

    # create the prompt template
    prompt = PromptTemplate.from_template("{user_input}")

    # model = ChatOpenAI()
    model = ChatNVIDIA(model="mixtral_8x7b")  # type: ignore

    # create the riva tts client
    riva_tts = RivaTTS(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        output_directory=config.riva_tts.output_directory,
        language_code=config.riva_tts.language_code,
        voice_name=config.riva_tts.voice_name,
    )

    # construct and return the chain
    return {"user_input": riva_asr} | prompt | model | riva_tts  # type: ignore
```

The following code is an example of creating a new audio stream for
Riva:

```python
input_stream = AudioStream(maxsize=1000)
# Send bytes into the stream
for chunk in audio_chunks:
    await input_stream.aput(chunk)
input_stream.close()
```

The following code is an example of how to execute the chain with
RivaASR and RivaTTS

```python
output_stream = asyncio.Queue()
while not input_stream.complete:
    async for chunk in chain.astream(input_stream):
        output_stream.put(chunk)    
```

Everything should be async safe and thread safe. Audio data can be put
into the input stream while the chain is running without interruptions.

---------

Co-authored-by: Hayden Wolff <hwolff@nvidia.com>
Co-authored-by: Hayden Wolff <hwolff@Haydens-Laptop.local>
Co-authored-by: Hayden Wolff <haydenwolff99@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
François Paupier 929f071513
community[patch]: Fix error in `LlamaCpp` community LLM with Configurable Fields, 'grammar' custom type not available (#16995)
- **Description:** Ensure the `LlamaGrammar` custom type is always
available when instantiating a `LlamaCpp` LLM
  - **Issue:** #16994 
  - **Dependencies:** None
  - **Twitter handle:** @fpaupier

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Alex Boury 334b6ebdf3
community[minor]: Breebs docs retriever (#16578)
- **Description:** Implementation of breeb retriever with integration
tests ->
libs/community/tests/integration_tests/retrievers/test_breebs.py and
documentation (notebook) ->
docs/docs/integrations/retrievers/breebs.ipynb.
  - **Dependencies:** None
5 months ago
Serena Ruan 9b279ac127
community[patch]: MLflow callback update (#16687)
Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Mohammad Mohtashim 3c4b24b69a
community[patch]: Fix the _call of HuggingFaceHub (#16891)
Fixed the following identified issue: #16849

@baskaryan
5 months ago