Commit Graph

209 Commits (7824291252c8f99a31c416f363d6a46d7e8f63e4)

Author SHA1 Message Date
Christophe Bornet 8595c3ab59
community[minor]: Add InMemoryVectorStore to module level imports (#19576) 6 months ago
Anindyadeep b2a11ce686
community[minor]: Prem AI langchain integration (#19113)
### Prem SDK integration in LangChain

This PR adds the integration with [PremAI's](https://www.premai.io/)
prem-sdk with langchain. User can now access to deployed models
(llms/embeddings) and use it with langchain's ecosystem. This PR adds
the following:

### This PR adds the following:

- [x]  Add chat support
- [X]  Adding embedding support
- [X]  writing integration tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  writing unit tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  Adding documentation
    - [X]  writing documentation for chat
    - [X]  writing documentation for embedding
- [X] run `make test`
- [X] run `make lint`, `make lint_diff` 
- [X]  Final checks (spell check, lint, format and overall testing)

---------

Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Dmitry Tyumentsev 08b769d539
community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341)
Fixed inability to work with [yandexcloud
SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0
6 months ago
Mikelarg dac2e0165a
community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516)
- **Description:** Added integration with
[GigaChat](https://developers.sber.ru/portal/products/gigachat)
embeddings. Also added support for extra fields in GigaChat LLM and
fixed docs.
6 months ago
Igor Muniz Soares 743f888580
community[minor]: Dappier chat model integration (#19370)
**Description:** 

This PR adds [Dappier](https://dappier.com/) for the chat model. It
supports generate, async generate, and batch functionalities. We added
unit and integration tests as well as a notebook with more details about
our chat model.


**Dependencies:** 
    No extra dependencies are needed.
6 months ago
Hugoberry 96dc180883
community[minor]: Add `DuckDB` as a vectorstore (#18916)
DuckDB has a cosine similarity function along list and array data types,
which can be used as a vector store.
- **Description:** The latest version of DuckDB features a cosine
similarity function, which can be used with its support for list or
array column types. This PR surfaces this functionality to langchain.
    - **Dependencies:** duckdb 0.10.0
    - **Twitter handle:** @igocrite

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
6 months ago
Christophe Bornet 00614f332a
community[minor]: Add InMemoryVectorStore (#19326)
This is a basic VectorStore implementation using an in-memory dict to
store the documents.
It doesn't need any extra/optional dependency as it uses numpy which is
already a dependency of langchain.
This is useful for quick testing, demos, examples.
Also it allows to write vendor-neutral tutorials, guides, etc...
6 months ago
Nithish Raghunandanan 7ad0a3f2a7
community: add Couchbase Vector Store (#18994)
- **Description:** Added support for Couchbase Vector Search to
LangChain.
- **Dependencies:** couchbase>=4.1.12
- **Twitter handle:** @nithishr

---------

Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
6 months ago
gonvee b82644078e
community: Add `keep_alive` parameter to control how long the model w… (#19005)
Add `keep_alive` parameter to control how long the model will stay
loaded into memory with Ollama。

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
Leonid Ganeline 7de1d9acfd
community: `llms` imports fixes (#18943)
Classes are missed in  __all__  and in different places of __init__.py
- BaichuanLLM 
- ChatDatabricks
- ChatMlflow
- Llamafile
- Mlflow
- Together
Added classes to __all__. I also sorted __all__ list.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
6 months ago
fengjial c922ea36cb
community[minor]: Add Baidu VectorDB as vector store (#17997)
Co-authored-by: fengjialin <fengjialin@MacBook-Pro.local>
6 months ago
Leonid Ganeline 9c8523b529
community[patch]: flattening imports 3 (#18939)
@eyurtsev
6 months ago
Virat Singh cafffe8a21
community: Add PolygonAggregates tool (#18882)
**Description:**
In this PR, I am adding a `PolygonAggregates` tool, which can be used to
get historical stock price data (called aggregates by Polygon) for a
given ticker.

Polygon
[docs](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to)
for this endpoint.

**Twitter**: 
[@virattt](https://twitter.com/virattt)
6 months ago
Ishani Vyas 2b0cbd65ba
community[patch]: Add Passio Nutrition AI Food Search Tool to Community Package (#18278)
## Add Passio Nutrition AI Food Search Tool to Community Package

### Description
We propose adding a new tool to the `community` package, enabling
integration with Passio Nutrition AI for food search functionality. This
tool will provide a simple interface for retrieving nutrition facts
through the Passio Nutrition AI API, simplifying user access to
nutrition data based on food search queries.

### Implementation Details
- **Class Structure:** Implement `NutritionAI`, extending `BaseTool`. It
includes an `_run` method that accepts a query string and, optionally, a
`CallbackManagerForToolRun`.
- **API Integration:** Use `NutritionAIAPI` for the API wrapper,
encapsulating all interactions with the Passio Nutrition AI and
providing a clean API interface.
- **Error Handling:** Implement comprehensive error handling for API
request failures.

### Expected Outcome
- **User Benefits:** Enable easy querying of nutrition facts from Passio
Nutrition AI, enhancing the utility of the `langchain_community` package
for nutrition-related projects.
- **Functionality:** Provide a straightforward method for integrating
nutrition information retrieval into users' applications.

### Dependencies
- `langchain_core` for base tooling support
- `pydantic` for data validation and settings management
- Consider `requests` or another HTTP client library if not covered by
`NutritionAIAPI`.

### Tests and Documentation
- **Unit Tests:** Include tests that mock network interactions to ensure
tool reliability without external API dependency.
- **Documentation:** Create an example notebook in
`docs/docs/integrations/tools/passio_nutrition_ai.ipynb` showing usage,
setup, and example queries.

### Contribution Guidelines Compliance
- Adhere to the project's linting and formatting standards (`make
format`, `make lint`, `make test`).
- Ensure compliance with LangChain's contribution guidelines,
particularly around dependency management and package modifications.

### Additional Notes
- Aim for the tool to be a lightweight, focused addition, not
introducing significant new dependencies or complexity.
- Potential future enhancements could include caching for common queries
to improve performance.

### Twitter Handle
- Here is our Passio AI [twitter handle](https://twitter.com/@passio_ai)
where we announce our products.


If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
6 months ago
Christophe Bornet e54a49b697
community[minor]: Add lazy_table_reflection param to SqlDatabase (#18742)
For some DBs with lots of tables, reflection of all the tables can take
very long. So this change will make the tables be reflected lazily when
get_table_info() is called and `lazy_table_reflection` is True.
6 months ago
Tomaz Bratanic 4bfe888717
comunity[patch]: Fix neo4j sanitizing values (#18750)
Fixing sanitization for when deeply nested lists appear
6 months ago
Yunmo Koo fee6f983ef
community[minor]: Integration for `Friendli` LLM and `ChatFriendli` ChatModel. (#17913)
## Description
- Add [Friendli](https://friendli.ai/) integration for `Friendli` LLM
and `ChatFriendli` chat model.
- Unit tests and integration tests corresponding to this change are
added.
- Documentations corresponding to this change are added.

## Dependencies
- Optional dependency
[`friendli-client`](https://pypi.org/project/friendli-client/) package
is added only for those who use `Frienldi` or `ChatFriendli` model.

## Twitter handle
- https://twitter.com/friendliai
6 months ago
Ian 390ef6abe3
community[minor]: Add Initial Support for TiDB Vector Store (#15796)
This pull request introduces initial support for the TiDB vector store.
The current version is basic, laying the foundation for the vector store
integration. While this implementation provides the essential features,
we plan to expand and improve the TiDB vector store support with
additional enhancements in future updates.

Upcoming Enhancements:
* Support for Vector Index Creation: To enhance the efficiency and
performance of the vector store.
* Support for max marginal relevance search. 
* Customized Table Structure Support: Recognizing the need for
flexibility, we plan for more tailored and efficient data store
solutions.

Simple use case exmaple

```python
from typing import List, Tuple
from langchain.docstore.document import Document
from langchain_community.vectorstores import TiDBVectorStore
from langchain_openai import OpenAIEmbeddings

db = TiDBVectorStore.from_texts(
    embedding=embeddings,
    texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'],
    table_name="tidb_vector_langchain",
    connection_string=tidb_connection_url,
    distance_strategy="cosine",
)

query = "Can you tell me about Alexandra?"
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)
for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)
```
6 months ago
Eugene Yurtsev e188d4ecb0
Add dangerous parameter to requests tool (#18697)
The tools are already documented as dangerous. Not clear whether adding
an opt-in parameter is necessary or not
6 months ago
Erick Friis 1beb84b061
community[patch]: move pdf text tests to integration (#18746) 6 months ago
Christophe Bornet 4a7d73b39d
community: If load() has been overridden, use it in default lazy_load() (#18690) 6 months ago
Sam Khano 1b4dcf22f3
community[minor]: Add DocumentDBVectorSearch VectorStore (#17757)
**Description:**
- Added Amazon DocumentDB Vector Search integration (HNSW index)
- Added integration tests
- Updated AWS documentation with DocumentDB Vector Search instructions
- Added notebook for DocumentDB integration with example usage

---------

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-95-226.ec2.internal>
6 months ago
Vittorio Rigamonti 51f3902bc4
community[minor]: Adding support for Infinispan as VectorStore (#17861)
**Description:**
This integrates Infinispan as a vectorstore.
Infinispan is an open-source key-value data grid, it can work as single
node as well as distributed.

Vector search is supported since release 15.x 

For more: [Infinispan Home](https://infinispan.org)

Integration tests are provided as well as a demo notebook
6 months ago
Djordje 12b4a4d860
community[patch]: Opensearch delete method added - indexing supported (#18522)
- **Description:** Added delete method for OpenSearchVectorSearch,
therefore indexing supported
    - **Issue:** No
    - **Dependencies:** No
    - **Twitter handle:** stkbmf
6 months ago
Eugene Yurtsev 4c25b49229
community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696)
This is a PR that adds a dangerous load parameter to force users to opt in to use pickle.

This is a PR that's meant to raise user awareness that the pickling module is involved.
6 months ago
Eugene Yurtsev 0e52961562
community[patch]: Patch tdidf retriever (CVE-2024-2057) (#18695)
This is a patch for `CVE-2024-2057`:
https://www.cve.org/CVERecord?id=CVE-2024-2057

This affects users that: 

* Use the  `TFIDFRetriever`
* Attempt to de-serialize it from an untrusted source that contains a
malicious payload
6 months ago
Christophe Bornet 15b1770326
Merge pull request #18421
* Implement lazy_load() for AssemblyAIAudioTranscriptLoader
6 months ago
Christophe Bornet bb284eebe4
Merge pull request #18436
* Implement lazy_load() for ConfluenceLoader
6 months ago
Liang Zhang 81985b31e6
community[patch]: Databricks SerDe uses cloudpickle instead of pickle (#18607)
- **Description:** Databricks SerDe uses cloudpickle instead of pickle
when serializing a user-defined function transform_input_fn since pickle
does not support functions defined in `__main__`, and cloudpickle
supports this.
- **Dependencies:** cloudpickle>=2.0.0

Added a unit test.
6 months ago
Dounx ad48f55357
community[minor]: add Yuque document loader (#17924)
This pull request support loading documents from Yuque with Langchain.

Yuque is a professional cloud-based knowledge base for team
collaboration in documentation.

Website: https://www.yuque.com
OpenAPI: https://www.yuque.com/yuque/developer/openapi
6 months ago
Kazuki Maeda 60c5d964a8
community[minor]: use jq schema for content_key in json_loader (#18003)
### Description
Changed the value specified for `content_key` in JSONLoader from a
single key to a value based on jq schema.
I created [similar
PR](https://github.com/langchain-ai/langchain/pull/11255) before, but it
has several conflicts because of the architectural change associated
stable version release, so I re-create this PR to fit new architecture.

### Why
For json data like the following, specify `.data[].attributes.message`
for page_content and `.data[].attributes.id` or
`.data[].attributes.attributes. tags`, etc., the `content_key` must also
parse the json structure.

<details>
<summary>sample json data</summary>

```json
{
  "data": [
    {
      "attributes": {
        "message": "message1",
        "tags": [
          "tag1"
        ]
      },
      "id": "1"
    },
    {
      "attributes": {
        "message": "message2",
        "tags": [
          "tag2"
        ]
      },
      "id": "2"
    }
  ]
}
```

</details>

<details>
<summary>sample code</summary>

```python
def metadata_func(record: dict, metadata: dict) -> dict:

    metadata["source"] = None
    metadata["id"] = record.get("id")
    metadata["tags"] = record["attributes"].get("tags")

    return metadata

sample_file = "sample1.json"
loader = JSONLoader(
    file_path=sample_file,
    jq_schema=".data[]",
    content_key=".attributes.message", ## content_key is parsable into jq schema
    is_content_key_jq_parsable=True, ## this is added parameter
    metadata_func=metadata_func
)

data = loader.load()
data
```

</details>

### Dependencies
none

### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)
6 months ago
Erick Friis e1924b3e93
core[patch]: deprecate hwchase17/langchain-hub, address path traversal (#18600)
Deprecates the old langchain-hub repository. Does *not* deprecate the
new https://smith.langchain.com/hub

@PinkDraconian has correctly raised that in the event someone is loading
unsanitized user input into the `try_load_from_hub` function, they have
the ability to load files from other locations in github than the
hwchase17/langchain-hub repository.

This PR adds some more path checking to that function and deprecates the
functionality in favor of the hub built into LangSmith.
6 months ago
Scott Nath b051bba1a9
community: Add you.com tool, add async to retriever, add async testing, add You tool doc (#18032)
- **Description:** finishes adding the you.com functionality including:
    - add async functions to utility and retriever
    - add the You.com Tool
    - add async testing for utility, retriever, and tool
    - add a tool integration notebook page
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** @scottnath
6 months ago
Kate Silverstein b7c71e2e07
community[minor]: llamafile embeddings support (#17976)
* **Description:** adds `LlamafileEmbeddings` class implementation for
generating embeddings using
[llamafile](https://github.com/Mozilla-Ocho/llamafile)-based models.
Includes related unit tests and notebook showing example usage.
* **Issue:** N/A
* **Dependencies:** N/A
6 months ago
Petteri Johansson 6c1989d292
community[minor], langchain[minor], docs: Gremlin Graph Store and QA Chain (#17683)
- **Description:** 
New feature: Gremlin graph-store and QA chain (including docs).
Compatible with Azure CosmosDB.
  - **Dependencies:** 
  no changes
6 months ago
Ather Fawaz a5ccf5d33c
community[minor]: Add support for Perplexity chat model(#17024)
- **Description:** This PR adds support for [Perplexity AI
APIs](https://blog.perplexity.ai/blog/introducing-pplx-api).
  - **Issues:** None
  - **Dependencies:** None
  - **Twitter handle:** [@atherfawaz](https://twitter.com/AtherFawaz)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
6 months ago
Rodrigo Nogueira 3438d2cbcc
community[minor]: add maritalk chat (#17675)
**Description:** Adds the MariTalk chat that is based on a LLM specially
trained for Portuguese.

**Twitter handle:** @MaritacaAI
6 months ago
RadhikaBansal97 8bafd2df5e
community[patch]: Change github endpoint in GithubLoader (#17622)
Description- 
- Changed the GitHub endpoint as existing was not working and giving 404
not found error
- Also the existing function was failing if file_filter is not passed as
the tree api return all paths including directory as well, and when
get_file_content was iterating over these path, the function was failing
for directory as the api was returning list of files inside the
directory, so added a condition to ignore the paths if it a directory
- Fixes this issue -
https://github.com/langchain-ai/langchain/issues/17453

Co-authored-by: Radhika Bansal <Radhika.Bansal@veritas.com>
6 months ago
Eugene Yurtsev 51b661cfe8
community[patch]: BaseLoader load method should just delegate to lazy_load (#18289)
load() should just reference lazy_load()
7 months ago
Erick Friis 040271f33a
community[patch]: remove llmlingua extended tests (#18344) 7 months ago
Virat Singh cd926ac3dd
community: Add PolygonFinancials Tool (#18324)
**Description:**
In this PR, I am adding a `PolygonFinancials` tool, which can be used to
get financials data for a given ticker. The financials data is the
fundamental data that is found in income statements, balance sheets, and
cash flow statements of public US companies.

**Twitter**: 
[@virattt](https://twitter.com/virattt)
7 months ago
Eugene Yurtsev cd52433ba0
community[minor]: Add `SQLDatabaseLoader` document loader (#18281)
- **Description:** A generic document loader adapter for SQLAlchemy on
top of LangChain's `SQLDatabaseLoader`.
  - **Needed by:** https://github.com/crate-workbench/langchain/pull/1
  - **Depends on:** GH-16655
  - **Addressed to:** @baskaryan, @cbornet, @eyurtsev

Hi from CrateDB again,

in the same spirit like GH-16243 and GH-16244, this patch breaks out
another commit from https://github.com/crate-workbench/langchain/pull/1,
in order to reduce the size of this patch before submitting it, and to
separate concerns.

To accompany the SQLAlchemy adapter implementation, the patch includes
integration tests for both SQLite and PostgreSQL. Let me know if
corresponding utility resources should be added at different spots.

With kind regards,
Andreas.


### Software Tests

```console
docker compose --file libs/community/tests/integration_tests/document_loaders/docker-compose/postgresql.yml up
```

```console
cd libs/community
pip install psycopg2-binary
pytest -vvv tests/integration_tests -k sqldatabase
```

```
14 passed
```



![image](https://github.com/langchain-ai/langchain/assets/453543/42be233c-eb37-4c76-a830-474276e01436)

---------

Co-authored-by: Andreas Motl <andreas.motl@crate.io>
7 months ago
David Ruan af35e2525a
community[minor]: add hugging_face_model document loader (#17323)
- **Description:** add hugging_face_model document loader,
  - **Issue:** NA,
  - **Dependencies:** NA,

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
7 months ago
Ayo Ayibiowu ac1d7d9de8
community[feat]: Adds LLMLingua as a document compressor (#17711)
**Description**: This PR adds support for using the [LLMLingua project
](https://github.com/microsoft/LLMLingua) especially the LongLLMLingua
(Enhancing Large Language Model Inference via Prompt Compression) as a
document compressor / transformer.

The LLMLingua project is an interesting project that can greatly improve
RAG system by compressing prompts and contexts while keeping their
semantic relevance.

**Issue**: https://github.com/microsoft/LLMLingua/issues/31
**Dependencies**: [llmlingua](https://pypi.org/project/llmlingua/)

@baskaryan

---------

Co-authored-by: Ayodeji Ayibiowu <ayodeji.ayibiowu@getinge.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
7 months ago
am-kinetica 9b8f6455b1
Langchain vectorstore integration with Kinetica (#18102)
- **Description:** New vectorstore integration with the Kinetica
database
  - **Issue:** 
- **Dependencies:** the Kinetica Python API `pip install
gpudb==7.2.0.1`,
  - **Tag maintainer:** @baskaryan, @hwchase17 
  - **Twitter handle:**

---------

Co-authored-by: Chad Juliano <cjuliano@kinetica.com>
7 months ago
Dan Stambler 69344a0661
community: Add Laser Embedding Integration (#18111)
- **Description:** Added Integration with Meta AI's LASER
Language-Agnostic SEntence Representations embedding library, which
supports multilingual embedding for any of the languages listed here:
https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200,
including several low resource languages
- **Dependencies:** laser_encoders
7 months ago
Barun Amalkumar Halder cc69976860
community[minor] : adds callback handler for Fiddler AI (#17708)
**Description:**  Callback handler to integrate fiddler with langchain. 
This PR adds the following -

1. `FiddlerCallbackHandler` implementation into langchain/community
2. Example notebook `fiddler.ipynb` for usage documentation

[Internal Tracker : FDL-14305]

**Issue:** 
NA

**Dependencies:** 
- Installation of langchain-community is unaffected.
- Usage of FiddlerCallbackHandler requires installation of latest
fiddler-client (2.5+)

**Twitter handle:** @fiddlerlabs @behalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
7 months ago
Chad Juliano 50ba3c68bb
community[minor]: add Kinetica LLM wrapper (#17879)
**Description:** Initial pull request for Kinetica LLM wrapper
**Issue:** N/A
**Dependencies:** No new dependencies for unit tests. Integration tests
require gpudb, typeguard, and faker
**Twitter handle:** @chad_juliano

Note: There is another pull request for Kinetica vectorstore. Ultimately
we would like to make a partner package but we are starting with a
community contribution.
7 months ago
Brad Erickson ecd72d26cf
community: Bugfix - correct Ollama API path to avoid HTTP 307 (#17895)
Sets the correct /api/generate path, without ending /, to reduce HTTP
requests.

Reference:

https://github.com/ollama/ollama/blob/efe040f8/docs/api.md#generate-request-streaming

Before:

    DEBUG: Starting new HTTP connection (1): localhost:11434
    DEBUG: http://localhost:11434 "POST /api/generate/ HTTP/1.1" 307 0
    DEBUG: http://localhost:11434 "POST /api/generate HTTP/1.1" 200 None

After:

    DEBUG: Starting new HTTP connection (1): localhost:11434
    DEBUG: http://localhost:11434 "POST /api/generate HTTP/1.1" 200 None
7 months ago
Ian 3019a594b7
community[minor]: Add tidb loader support (#17788)
This pull request support loading data from TiDB database with
Langchain.

A simple usage:
```
from  langchain_community.document_loaders import TiDBLoader

CONNECTION_STRING = "mysql+pymysql://root@127.0.0.1:4000/test"

QUERY = "select id, name, description from items;"
loader = TiDBLoader(
    connection_string=CONNECTION_STRING,
    query=QUERY,
    page_content_columns=["name", "description"],
    metadata_columns=["id"],
)
documents = loader.load()
print(documents)
```
7 months ago
Michael Feil 242981b8f0
community[minor]: infinity embedding local option (#17671)
**drop-in-replacement for sentence-transformers
inference.**

https://github.com/langchain-ai/langchain/discussions/17670

tldr from the discussion above -> around a 4x-22x speedup over using
SentenceTransformers / huggingface embeddings. For more info:
https://github.com/michaelfeil/infinity (pure-python dependency)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Zachary Toliver 29ee0496b6
community[patch]: Allow override of 'fetch_schema_from_transport' in the GraphQL tool (#17649)
- **Description:** In order to override the bool value of
"fetch_schema_from_transport" in the GraphQLAPIWrapper, a
"fetch_schema_from_transport" value needed to be added to the
"_EXTRA_OPTIONAL_TOOLS" dictionary in load_tools in the "graphql" key.
The parameter "fetch_schema_from_transport" must also be passed in to
the GraphQLAPIWrapper to allow reading of the value when creating the
client. Passing as an optional parameter is probably best to avoid
breaking changes. This change is necessary to support GraphQL instances
that do not support fetching schema, such as TigerGraph. More info here:
[TigerGraph GraphQL Schema
Docs](https://docs.tigergraph.com/graphql/current/schema)
  - **Threads handle:** @zacharytoliver

---------

Co-authored-by: Zachary Toliver <zt10191991@hotmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Guangdong Liu 47b1b7092d
community[minor]: Add SparkLLM to community (#17702) 7 months ago
Guangdong Liu 3ba1cb8650
community[minor]: Add SparkLLM Text Embedding Model and SparkLLM introduction (#17573) 7 months ago
Virat Singh 92e52e89ca
community: Add PolygonTickerNews Tool (#17808)
Description:
In this PR, I am adding a PolygonTickerNews Tool, which can be used to
get the latest news for a given ticker / stock.

Twitter handle: [@virattt](https://twitter.com/virattt)
7 months ago
CogniJT 919ebcc596
community[minor]: CogniSwitch Agent Toolkit for LangChain (#17312)
**Description**: CogniSwitch focusses on making GenAI usage more
reliable. It abstracts out the complexity & decision making required for
tuning processing, storage & retrieval. Using simple APIs documents /
URLs can be processed into a Knowledge Graph that can then be used to
answer questions.

**Dependencies**: No dependencies. Just network calls & API key required
**Tag maintainer**: @hwchase17
**Twitter handle**: https://github.com/CogniSwitch
**Documentation**: Please check
`docs/docs/integrations/toolkits/cogniswitch.ipynb`
**Tests**: The usual tool & toolkits tests using `test_imports.py`

PR has passed linting and testing before this submission.

---------

Co-authored-by: Saicharan Sridhara <145636106+saiCogniswitch@users.noreply.github.com>
7 months ago
Guangdong Liu 73edf17b4e
community[minor]: Add Apache Doris as vector store (#17527)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Nejc Habjan b4fa847a90
community[minor]: add exclude parameter to DirectoryLoader (#17316)
- **Description:** adds an `exclude` parameter to the DirectoryLoader
class, based on similar behavior in GenericLoader
- **Issue:** discussed in
https://github.com/langchain-ai/langchain/discussions/9059 and I think
in some other issues that I cannot find at the moment 🙇
  - **Dependencies:** None
  - **Twitter handle:** don't have one sorry! Just https://github/nejch

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
7 months ago
Bagatur 8f14234afb
infra: ignore flakey lua test (#17618) 7 months ago
Moshe Berchansky 20a56fe0a2
community[minor]: Add QuantizedEmbedders (#17391)
**Description:** 
* adding Quantized embedders using optimum-intel and
intel-extension-for-pytorch.
* added mdx documentation and example notebooks 
* added embedding import testing.

**Dependencies:** 
optimum = {extras = ["neural-compressor"], version = "^1.14.0", optional
= true}
intel_extension_for_pytorch = {version = "^2.2.0", optional = true}

Dependencies have been added to pyproject.toml for the community lib.  

**Twitter handle:** @peter_izsak

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
nvpranak 91bcc9c5c9
community[minor]: Nemo embeddings(#16206)
This PR is adding support for NVIDIA NeMo embeddings issue #16095.

---------

Co-authored-by: Praveen Nakshatrala <pnakshatrala@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
wulixuan c776cfc599
community[minor]: integrate with model Yuan2.0 (#15411)
1. integrate with
[`Yuan2.0`](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md)
2. update `langchain.llms`
3. add a new doc for [Yuan2.0
integration](docs/docs/integrations/llms/yuan2.ipynb)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Kate Silverstein 0bc4a9b3fc
community[minor]: Adds Llamafile as an LLM (#17431)
* **Description:** Adds a simple LLM implementation for interacting with
[llamafile](https://github.com/Mozilla-Ocho/llamafile)-based models.
* **Dependencies:** N/A
* **Issue:** N/A

**Detail**
[llamafile](https://github.com/Mozilla-Ocho/llamafile) lets you run LLMs
locally from a single file on most computers without installing any
dependencies.

To use the llamafile LLM implementation, the user needs to:

1. Download a llamafile e.g.
https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile?download=true
2. Make the file executable.
3. Run the llamafile in 'server mode'. (All llamafiles come packaged
with a lightweight server; by default, the server listens at
`http://localhost:8080`.)


```bash
wget https://url/of/model.llamafile
chmod +x model.llamafile
./model.llamafile --server --nobrowser
```

Now, the user can invoke the LLM via the LangChain client:

```python
from langchain_community.llms.llamafile import Llamafile

llm = Llamafile()

llm.invoke("Tell me a joke.")
```
7 months ago
Qihui Xie 5738143d4b
add mongodb_store (#13801)
# Add MongoDB storage
  - **Description:** 
  Add MongoDB Storage as an option for large doc store. 

Example usage: 
```Python
# Instantiate the MongodbStore with a MongoDB connection
from langchain.storage import MongodbStore

mongo_conn_str = "mongodb://localhost:27017/"
mongodb_store = MongodbStore(mongo_conn_str, db_name="test-db",
                                collection_name="test-collection")

# Set values for keys
doc1 = Document(page_content='test1')
doc2 = Document(page_content='test2')
mongodb_store.mset([("key1", doc1), ("key2", doc2)])

# Get values for keys
values = mongodb_store.mget(["key1", "key2"])
# [doc1, doc2]

# Iterate over keys
for key in mongodb_store.yield_keys():
    print(key)

# Delete keys
mongodb_store.mdelete(["key1", "key2"])
 ```

  - **Dependencies:**
  Use `mongomock` for integration test.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
7 months ago
wulixuan 5d06797905
community[minor]: integrate chat models with Yuan2.0 (#16575)
1. integrate chat models with
[`Yuan2.0`](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md)
2. add a new doc for [Yuan2.0
integration](docs/docs/integrations/llms/yuan2.ipynb)
 
Yuan2.0 is a new generation Fundamental Large Language Model developed
by IEIT System. We have published all three models, Yuan 2.0-102B, Yuan
2.0-51B, and Yuan 2.0-2B.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Ian Gregory e5472b5eb8
Framework for supporting more languages in LanguageParser (#13318)
## Description

I am submitting this for a school project as part of a team of 5. Other
team members are @LeilaChr, @maazh10, @Megabear137, @jelalalamy. This PR
also has contributions from community members @Harrolee and @Mario928.

Initial context is in the issue we opened (#11229).

This pull request adds:

- Generic framework for expanding the languages that `LanguageParser`
can handle, using the
[tree-sitter](https://github.com/tree-sitter/py-tree-sitter#py-tree-sitter)
parsing library and existing language-specific parsers written for it
- Support for the following additional languages in `LanguageParser`:
  - C
  - C++
  - C#
  - Go
- Java (contributed by @Mario928
https://github.com/ThatsJustCheesy/langchain/pull/2)
  - Kotlin
  - Lua
  - Perl
  - Ruby
  - Rust
  - Scala
- TypeScript (contributed by @Harrolee
https://github.com/ThatsJustCheesy/langchain/pull/1)

Here is the [design
document](https://docs.google.com/document/d/17dB14cKCWAaiTeSeBtxHpoVPGKrsPye8W0o_WClz2kk)
if curious, but no need to read it.

## Issues

- Closes #11229
- Closes #10996
- Closes #8405

## Dependencies

`tree_sitter` and `tree_sitter_languages` on PyPI. We have tried to add
these as optional dependencies.

## Documentation

We have updated the list of supported languages, and also added a
section to `source_code.ipynb` detailing how to add support for
additional languages using our framework.

## Maintainer

- @hwchase17 (previously reviewed
https://github.com/langchain-ai/langchain/pull/6486)

Thanks!!

## Git commits

We will gladly squash any/all of our commits (esp merge commits) if
necessary. Let us know if this is desirable, or if you will be
squash-merging anyway.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Maaz Hashmi <mhashmi373@gmail.com>
Co-authored-by: LeilaChr <87657694+LeilaChr@users.noreply.github.com>
Co-authored-by: Jeremy La <jeremylai511@gmail.com>
Co-authored-by: Megabear137 <zubair.alnoor27@gmail.com>
Co-authored-by: Lee Harrold <lhharrold@sep.com>
Co-authored-by: Mario928 <88029051+Mario928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
7 months ago
Sridhar Ramaswamy 9f1cbbc6ed
community[minor]: Add pebblo safe document loader (#16862)
- **Description:** Pebblo opensource project enables developers to
safely load data to their Gen AI apps. It identifies semantic topics and
entities found in the loaded data and summarizes them in a
developer-friendly report.
  - **Dependencies:** none
  - **Twitter handle:** srics

@hwchase17
7 months ago
mhavey 1bbb64d956
community[minor], langchian[minor]: Add Neptune Rdf graph and chain (#16650)
**Description**: This PR adds a chain for Amazon Neptune graph database
RDF format. It complements the existing Neptune Cypher chain. The PR
also includes a Neptune RDF graph class to connect to, introspect, and
query a Neptune RDF graph database from the chain. A sample notebook is
provided under docs that demonstrates the overall effect: invoking the
chain to make natural language queries against Neptune using an LLM.

**Issue**: This is a new feature
 
**Dependencies**: The RDF graph class depends on the AWS boto3 library
if using IAM authentication to connect to the Neptune database.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Andreas Motl 1fdd9bd980
community/SQLDatabase: Generalize and trim software tests (#16659)
- **Description:** Improve test cases for `SQLDatabase` adapter
component, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16655#pullrequestreview-1846749474).
  - **Depends on:** GH-16655
  - **Addressed to:** @baskaryan, @cbornet, @eyurtsev

_Remark: This PR is stacked upon GH-16655, so that one will need to go
in first._

Edit: Thank you for bringing in GH-17191, @eyurtsev. This is a little
aftermath, improving/streamlining the corresponding test cases.
7 months ago
morgana 722aae4fd1
community: add delete method to rocksetdb vectorstore to support recordmanager (#17030)
- **Description:** This adds a delete method so that rocksetdb can be
used with `RecordManager`.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@_morgan_adams_`

---------

Co-authored-by: Rockset API Bot <admin@rockset.io>
7 months ago
Spencer Kelly 54fa78c887
community[patch]: fixed vector similarity filtering (#16967)
**Description:** changed filtering so that failed filter doesn't add
document to results. Currently filtering is entirely broken and all
documents are returned whether or not they pass the filter.

fixes issue introduced in
https://github.com/langchain-ai/langchain/pull/16190
7 months ago
Abhijeeth Padarthi 584b647b96
community[minor]: AWS Athena Document Loader (#15625)
- **Description:** Adds the document loader for [AWS
Athena](https://aws.amazon.com/athena/), a serverless and interactive
analytics service.
  - **Dependencies:** Added boto3 as a dependency
7 months ago
david-tempelmann 93da18b667
community[minor]: Add mmr and similarity_score_threshold retrieval to DatabricksVectorSearch (#16829)
- **Description:** This PR adds support for `search_types="mmr"` and
`search_type="similarity_score_threshold"` to retrievers using
`DatabricksVectorSearch`,
  - **Issue:** 
  - **Dependencies:**
  - **Twitter handle:**

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Erick Friis 3a2eb6e12b
infra: add print rule to ruff (#16221)
Added noqa for existing prints. Can slowly remove / will prevent more
being intro'd
7 months ago
kYLe c9999557bf
community[patch]: Modify LLMs/Anyscale work with OpenAI API v1 (#14206)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
- **Description:** 
1. Modify LLMs/Anyscale to work with OAI v1
2. Get rid of openai_ prefixed variables in Chat_model/ChatAnyscale
3. Modify `anyscale_api_base` to `anyscale_base_url` to follow OAI name
convention (reverted)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Bagatur 65e97c9b53
infra: mv SQLDatabase tests to community (#17276) 7 months ago
Scott Nath a32798abd7
community: Add you.com utility, update you retriever integration docs (#17014)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

- **Description: changes to you.com files** 
    - general cleanup
- adds community/utilities/you.py, moving bulk of code from retriever ->
utility
    - removes `snippet` as endpoint
    - adds `news` as endpoint
    - adds more tests

<s>**Description: update community MAKE file** 
    - adds `integration_tests`
    - adds `coverage`</s>

- **Issue:** the issue # it fixes if applicable,
- [For New Contributors: Update Integration
Documentation](https://github.com/langchain-ai/langchain/issues/15664#issuecomment-1920099868)
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Liang Zhang 7306600e2f
community[patch]: Support SerDe transform functions in Databricks LLM (#16752)
**Description:** Databricks LLM does not support SerDe the
transform_input_fn and transform_output_fn. After saving and loading,
the LLM will be broken. This PR serialize these functions into a hex
string using pickle, and saving the hex string in the yaml file. Using
pickle to serialize a function can be flaky, but this is a simple
workaround that unblocks many use cases. If more sophisticated SerDe is
needed, we can improve it later.

Test:
Added a simple unit test.
I did manual test on Databricks and it works well.
The saved yaml looks like:
```
llm:
      _type: databricks
      cluster_driver_port: null
      cluster_id: null
      databricks_uri: databricks
      endpoint_name: databricks-mixtral-8x7b-instruct
      extra_params: {}
      host: e2-dogfood.staging.cloud.databricks.com
      max_tokens: null
      model_kwargs: null
      n: 1
      stop: null
      task: null
      temperature: 0.0
      transform_input_fn: 80049520000000000000008c085f5f6d61696e5f5f948c0f7472616e73666f726d5f696e7075749493942e
      transform_output_fn: null
```

@baskaryan

```python
from langchain_community.embeddings import DatabricksEmbeddings
from langchain_community.llms import Databricks
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
import mlflow

embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")

def transform_input(**request):
  request["messages"] = [
    {
      "role": "user",
      "content": request["prompt"]
    }
  ]
  del request["prompt"]
  return request

llm = Databricks(endpoint_name="databricks-mixtral-8x7b-instruct", transform_input_fn=transform_input)

persist_dir = "faiss_databricks_embedding"

# Create the vector db, persist the db to a local fs folder
loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
db = FAISS.from_documents(docs, embeddings)
db.save_local(persist_dir)

def load_retriever(persist_directory):
    embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
    vectorstore = FAISS.load_local(persist_directory, embeddings)
    return vectorstore.as_retriever()

retriever = load_retriever(persist_dir)
retrievalQA = RetrievalQA.from_llm(llm=llm, retriever=retriever)
with mlflow.start_run() as run:
    logged_model = mlflow.langchain.log_model(
        retrievalQA,
        artifact_path="retrieval_qa",
        loader_fn=load_retriever,
        persist_dir=persist_dir,
    )

# Load the retrievalQA chain
loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri)
print(loaded_model.predict([{"query": "What did the president say about Ketanji Brown Jackson"}]))

```
7 months ago
ByeongUk Choi b88329e9a5
community[patch]: Implement Unique ID Enforcement in FAISS (#17244)
**Description:**
Implemented unique ID validation in the FAISS component to ensure all
document IDs are distinct. This update resolves issues related to
non-unique IDs, such as inconsistent behavior during deletion processes.
7 months ago
Luiz Ferreira 34d2daffb3
community[patch]: Fix chat openai unit test (#17124)
- **Description:** 
Actually the test named `test_openai_apredict` isn't testing the
apredict method from ChatOpenAI.
  - **Twitter handle:**
  https://twitter.com/OAlmofadas
7 months ago
Frank ef082c77b1
community[minor]: add github file loader to load any github file content b… (#15305)
### Description
support load any github file content based on file extension.  

Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.

### Twitter handle
my twitter: @shufanhaotop

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Ryan Kraus f027696b5f
community: Added new Utility runnables for NVIDIA Riva. (#15966)
**Please tag this issue with `nvidia_genai`**

- **Description:** Added new Runnables for integration NVIDIA Riva into
LCEL chains for Automatic Speech Recognition (ASR) and Text To Speech
(TTS).
- **Issue:** N/A
- **Dependencies:** To use these runnables, the NVIDIA Riva client
libraries are required. It they are not installed, an error will be
raised instructing how to install them. The Runnables can be safely
imported without the riva client libraries.
- **Twitter handle:** N/A

All of the Riva Runnables are inside a single folder in the Utilities
module. In this folder are four files:
- common.py - Contains all code that is common to both TTS and ASR
- stream.py - Contains a class representing an audio stream that allows
the end user to put data into the stream like a queue.
- asr.py - Contains the RivaASR runnable
- tts.py - Contains the RivaTTS runnable

The following Python function is an example of creating a chain that
makes use of both of these Runnables:

```python
def create(
    config: Configuration,
    audio_encoding: RivaAudioEncoding,
    sample_rate: int,
    audio_channels: int = 1,
) -> Runnable[ASRInputType, TTSOutputType]:
    """Create a new instance of the chain."""
    _LOGGER.info("Instantiating the chain.")

    # create the riva asr client
    riva_asr = RivaASR(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        encoding=audio_encoding,
        audio_channel_count=audio_channels,
        sample_rate_hertz=sample_rate,
        profanity_filter=config.riva_asr.profanity_filter,
        enable_automatic_punctuation=config.riva_asr.enable_automatic_punctuation,
        language_code=config.riva_asr.language_code,
    )

    # create the prompt template
    prompt = PromptTemplate.from_template("{user_input}")

    # model = ChatOpenAI()
    model = ChatNVIDIA(model="mixtral_8x7b")  # type: ignore

    # create the riva tts client
    riva_tts = RivaTTS(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        output_directory=config.riva_tts.output_directory,
        language_code=config.riva_tts.language_code,
        voice_name=config.riva_tts.voice_name,
    )

    # construct and return the chain
    return {"user_input": riva_asr} | prompt | model | riva_tts  # type: ignore
```

The following code is an example of creating a new audio stream for
Riva:

```python
input_stream = AudioStream(maxsize=1000)
# Send bytes into the stream
for chunk in audio_chunks:
    await input_stream.aput(chunk)
input_stream.close()
```

The following code is an example of how to execute the chain with
RivaASR and RivaTTS

```python
output_stream = asyncio.Queue()
while not input_stream.complete:
    async for chunk in chain.astream(input_stream):
        output_stream.put(chunk)    
```

Everything should be async safe and thread safe. Audio data can be put
into the input stream while the chain is running without interruptions.

---------

Co-authored-by: Hayden Wolff <hwolff@nvidia.com>
Co-authored-by: Hayden Wolff <hwolff@Haydens-Laptop.local>
Co-authored-by: Hayden Wolff <haydenwolff99@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Alex Boury 334b6ebdf3
community[minor]: Breebs docs retriever (#16578)
- **Description:** Implementation of breeb retriever with integration
tests ->
libs/community/tests/integration_tests/retrievers/test_breebs.py and
documentation (notebook) ->
docs/docs/integrations/retrievers/breebs.ipynb.
  - **Dependencies:** None
7 months ago
Harrison Chase 4eda647fdd
infra: add -p to mkdir in lint steps (#17013)
Previously, if this did not find a mypy cache then it wouldnt run

this makes it always run

adding mypy ignore comments with existing uncaught issues to unblock other prs

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
7 months ago
Erick Friis b1a847366c
community: revert SQL Stores (#16912)
This reverts commit cfc225ecb3.


https://github.com/langchain-ai/langchain/pull/15909#issuecomment-1922418097

These will have existed in langchain-community 0.0.16 and 0.0.17.
7 months ago
Christophe Bornet af8c5c185b
langchain[minor],community[minor]: Add async methods in BaseLoader (#16634)
Adds:
* methods `aload()` and `alazy_load()` to interface `BaseLoader`
* implementation for class `MergedDataLoader `
* support for class `BaseLoader` in async function `aindex()` with unit
tests

Note: this is compatible with existing `aload()` methods that some
loaders already had.

**Twitter handle:** @cbornet_

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
7 months ago
thiswillbeyourgithub 1d082359ee
community: add support for callable filters in FAISS (#16190)
- **Description:**
Filtering in a FAISS vectorstores is very inflexible and doesn't allow
that many use case. I think supporting callable like this enables a lot:
regular expressions, condition on multiple keys etc. **Note** I had to
manually alter a test. I don't understand if it was falty to begin with
or if there is something funky going on.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
8 months ago
Volodymyr Machula 32c5be8b73
community[minor]: Connery Tool and Toolkit (#14506)
## Summary

This PR implements the "Connery Action Tool" and "Connery Toolkit".
Using them, you can integrate Connery actions into your LangChain agents
and chains.

Connery is an open-source plugin infrastructure for AI.

With Connery, you can easily create a custom plugin with a set of
actions and seamlessly integrate them into your LangChain agents and
chains. Connery will handle the rest: runtime, authorization, secret
management, access management, audit logs, and other vital features.
Additionally, Connery and our community offer a wide range of
ready-to-use open-source plugins for your convenience.

Learn more about Connery:

- GitHub: https://github.com/connery-io/connery-platform
- Documentation: https://docs.connery.io
- Twitter: https://twitter.com/connery_io

## TODOs

- [x] API wrapper
   - [x] Integration tests
- [x] Connery Action Tool
   - [x] Docs
   - [x] Example
   - [x] Integration tests
- [x] Connery Toolkit
  - [x] Docs
  - [x] Example
- [x] Formatting (`make format`)
- [x] Linting (`make lint`)
- [x] Testing (`make test`)
8 months ago
Neli Hateva c95facc293
langchain[minor], community[minor]: Implement Ontotext GraphDB QA Chain (#16019)
- **Description:** Implement Ontotext GraphDB QA Chain
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** @OntotextGraphDB
8 months ago
Jael Gu a1aa3a657c
community[patch]: Milvus supports add & delete texts by ids (#16256)
# Description

To support [langchain
indexing](https://python.langchain.com/docs/modules/data_connection/indexing)
as requested by users, vectorstore Milvus needs to support:
- document addition by id (`add_documents` method with `ids` argument)
- delete by id (`delete` method with `ids` argument)

Example usage:

```python
from langchain.indexes import SQLRecordManager, index
from langchain.schema import Document
from langchain_community.vectorstores import Milvus
from langchain_openai import OpenAIEmbeddings

collection_name = "test_index"
embedding = OpenAIEmbeddings()
vectorstore = Milvus(embedding_function=embedding, collection_name=collection_name)

namespace = f"milvus/{collection_name}"
record_manager = SQLRecordManager(
    namespace, db_url="sqlite:///record_manager_cache.sql"
)
record_manager.create_schema()

doc1 = Document(page_content="kitty", metadata={"source": "kitty.txt"})
doc2 = Document(page_content="doggy", metadata={"source": "doggy.txt"})

index(
    [doc1, doc1, doc2],
    record_manager,
    vectorstore,
    cleanup="incremental",  # None, "incremental", or "full"
    source_id_key="source",
)
```

# Fix issues

Fix https://github.com/milvus-io/milvus/issues/30112

---------

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Benito Geordie f3fdc5c5da
community: Added integrations for ThirdAI's NeuralDB with Retriever and VectorStore frameworks (#15280)
**Description:** Adds ThirdAI NeuralDB retriever and vectorstore
integration. NeuralDB is a CPU-friendly and fine-tunable text retrieval
engine.
8 months ago
Christophe Bornet 4915c3cd86
[Fix] Fix Cassandra Document loader default page content mapper (#16273)
We can't use `json.dumps` by default as many types returned by the
cassandra driver are not serializable. It's safer to use `str` and let
users define their own custom `page_content_mapper` if needed.
8 months ago
Micah Parker 6543e585a5
community[patch]: Added support for Ollama's num_predict option in ChatOllama (#16633)
Just a simple default addition to the options payload for a ollama
generate call to support a max_new_tokens parameter.

Should fix issue: https://github.com/langchain-ai/langchain/issues/14715
8 months ago
baichuan-assistant 70ff54eace
community[minor]: Add Baichuan Text Embedding Model and Baichuan Inc introduction (#16568)
- **Description:** Adding Baichuan Text Embedding Model and Baichuan Inc
introduction.

Baichuan Text Embedding ranks #1 in C-MTEB leaderboard:
https://huggingface.co/spaces/mteb/leaderboard

Co-authored-by: BaiChuanHelper <wintergyc@WinterGYCs-MacBook-Pro.local>
8 months ago
Ghani e30c6662df
Langchain-community : EdenAI chat integration. (#16377)
- **Description:** This PR adds [EdenAI](https://edenai.co/) for the
chat model (already available in LLM & Embeddings). It supports all
[ChatModel] functionality: generate, async generate, stream, astream and
batch. A detailed notebook was added.

  - **Dependencies**: No dependencies are added as we call a rest API.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
8 months ago
Bagatur 5df8ab574e
infra: move indexing documentation test (#16595) 8 months ago
Brian Burgin 148347e858
community[minor]: Add LiteLLM Router Integration (#15588)
community:

  - **Description:**
- Add new ChatLiteLLMRouter class that allows a client to use a LiteLLM
Router as a LangChain chat model.
- Note: The existing ChatLiteLLM integration did not cover the LiteLLM
Router class.
    - Add tests and Jupyter notebook.
  - **Issue:** None
  - **Dependencies:** Relies on existing ChatLiteLLM integration
  - **Twitter handle:** @bburgin_0

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Rave Harpaz c4e9c9ca29
community[minor]: Add OCI Generative AI integration (#16548)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
- **Description:** Adding Oracle Cloud Infrastructure Generative AI
integration. Oracle Cloud Infrastructure (OCI) Generative AI is a fully
managed service that provides a set of state-of-the-art, customizable
large language models (LLMs) that cover a wide range of use cases, and
which is available through a single API. Using the OCI Generative AI
service you can access ready-to-use pretrained models, or create and
host your own fine-tuned custom models based on your own data on
dedicated AI clusters.
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
  - **Issue:** None,
  - **Dependencies:** OCI Python SDK,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
Passed

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

we provide unit tests. However, we cannot provide integration tests due
to Oracle policies that prohibit public sharing of api keys.
 
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Martin Kolb 04651f0248
community[minor]: VectorStore integration for SAP HANA Cloud Vector Engine (#16514)
- **Description:**
This PR adds a VectorStore integration for SAP HANA Cloud Vector Engine,
which is an upcoming feature in the SAP HANA Cloud database
(https://blogs.sap.com/2023/11/02/sap-hana-clouds-vector-engine-announcement/).

  - **Issue:** N/A
- **Dependencies:** [SAP HANA Python
Client](https://pypi.org/project/hdbcli/)
  - **Twitter handle:** @sapopensource

Implementation of the integration:
`libs/community/langchain_community/vectorstores/hanavector.py`

Unit tests:
`libs/community/tests/unit_tests/vectorstores/test_hanavector.py`

Integration tests:
`libs/community/tests/integration_tests/vectorstores/test_hanavector.py`

Example notebook:
`docs/docs/integrations/vectorstores/hanavector.ipynb`

Access credentials for execution of the integration tests can be
provided to the maintainers.

---------

Co-authored-by: sascha <sascha.stoll@sap.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Xudong Sun 019b6ebe8d
community[minor]: Add iFlyTek Spark LLM chat model support (#13389)
- **Description:** This PR enables LangChain to access the iFlyTek's
Spark LLM via the chat_models wrapper.
  - **Dependencies:** websocket-client ^1.6.1
  - **Tag maintainer:** @baskaryan 

### SparkLLM chat model usage

Get SparkLLM's app_id, api_key and api_secret from [iFlyTek SparkLLM API
Console](https://console.xfyun.cn/services/bm3) (for more info, see
[iFlyTek SparkLLM Intro](https://xinghuo.xfyun.cn/sparkapi) ), then set
environment variables `IFLYTEK_SPARK_APP_ID`, `IFLYTEK_SPARK_API_KEY`
and `IFLYTEK_SPARK_API_SECRET` or pass parameters when using it like the
demo below:

```python3
from langchain.chat_models.sparkllm import ChatSparkLLM

client = ChatSparkLLM(
    spark_app_id="<app_id>",
    spark_api_key="<api_key>",
    spark_api_secret="<api_secret>"
)
```
8 months ago