Commit Graph

277 Commits

Author SHA1 Message Date
Unai Garay Maestre
fdbfa6b2c8
Adds progress bar to VertexAIEmbeddings (#14542)
- **Description:** Adds progress bar to VertexAIEmbeddings 
- **Issue:** related issue
https://github.com/langchain-ai/langchain/issues/13637

Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>

---------

Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
2024-01-24 11:16:16 -07:00
Jeremi Joslin
9e95699277
community[patch]: Fix error message when litellm is not installed (#16316)
The error message was mentioning the wrong package. I updated it to the
correct one.
2024-01-23 21:42:29 -08:00
bachr
b3ed98dec0
community[patch]: avoid KeyError when language not in LANGUAGE_SEGMENTERS (#15212)
**Description:**

Handle unsupported languages in same way as when none is provided 
 
**Issue:**

The following line will throw a KeyError if the language is not
supported.
```python
self.Segmenter = LANGUAGE_SEGMENTERS[language]
```
E.g. when using `Language.CPP` we would get `KeyError: <Language.CPP:
'cpp'>`

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-23 21:09:43 -08:00
chyroc
61da2ff24c
community[patch]: use SecretStr for yandex model secrets (#15463) 2024-01-23 20:08:53 -08:00
Alessio Serra
d628a80a5d
community[patch]: added 'conversational' as a valid task for hugginface endopoint models (#15761)
- **Description:** added the conversational task to hugginFace endpoint
in order to use models designed for chatbot programming.
  - **Dependencies:** None

---------

Co-authored-by: Alessio Serra (ext.) <alessio.serra@partner.bmw.de>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-23 20:04:15 -08:00
Karim Lalani
4c7755778d
community[patch]: SurrealDB fix for asyncio (#16092)
Code fix for asyncio
2024-01-23 19:46:19 -08:00
Raunak
476bf8b763
community[patch]: Load list of files using UnstructuredFileLoader (#16216)
- **Description:** Updated `_get_elements()` function of
`UnstructuredFileLoader `class to check if the argument self.file_path
is a file or list of files. If it is a list of files then it iterates
over the list of file paths, calls the partition function for each one,
and appends the results to the elements list. If self.file_path is not a
list, it calls the partition function as before.
  
  - **Issue:** Fixed #15607,
  - **Dependencies:** NA
  - **Twitter handle:** NA

Co-authored-by: H161961 <Raunak.Raunak@Honeywell.com>
2024-01-23 19:37:37 -08:00
Xudong Sun
019b6ebe8d
community[minor]: Add iFlyTek Spark LLM chat model support (#13389)
- **Description:** This PR enables LangChain to access the iFlyTek's
Spark LLM via the chat_models wrapper.
  - **Dependencies:** websocket-client ^1.6.1
  - **Tag maintainer:** @baskaryan 

### SparkLLM chat model usage

Get SparkLLM's app_id, api_key and api_secret from [iFlyTek SparkLLM API
Console](https://console.xfyun.cn/services/bm3) (for more info, see
[iFlyTek SparkLLM Intro](https://xinghuo.xfyun.cn/sparkapi) ), then set
environment variables `IFLYTEK_SPARK_APP_ID`, `IFLYTEK_SPARK_API_KEY`
and `IFLYTEK_SPARK_API_SECRET` or pass parameters when using it like the
demo below:

```python3
from langchain.chat_models.sparkllm import ChatSparkLLM

client = ChatSparkLLM(
    spark_app_id="<app_id>",
    spark_api_key="<api_key>",
    spark_api_secret="<api_secret>"
)
```
2024-01-23 19:23:46 -08:00
Serena Ruan
5c6e123757
community[patch]: Fix MlflowCallback with none artifacts_dir (#16487) 2024-01-23 19:09:02 -08:00
bu2kx
ff3163297b
community[minor]: Add KDBAI vector store (#12797)
Addition of KDBAI vector store (https://kdb.ai).

Dependencies: `kdbai_client` v0.1.2 Python package.

Sample notebook: `docs/docs/integrations/vectorstores/kdbai.ipynb`

Tag maintainer: @bu2kx
Twitter handle: @kxsystems
2024-01-23 18:37:01 -08:00
Shivani Modi
4e160540ff
community[minor]: Adding Konko Completion endpoint (#15570)
This PR introduces update to Konko Integration with LangChain.

1. **New Endpoint Addition**: Integration of a new endpoint to utilize
completion models hosted on Konko.

2. **Chat Model Updates for Backward Compatibility**: We have updated
the chat models to ensure backward compatibility with previous OpenAI
versions.

4. **Updated Documentation**: Comprehensive documentation has been
updated to reflect these new changes, providing clear guidance on
utilizing the new features and ensuring seamless integration.

Thank you to the LangChain team for their exceptional work and for
considering this PR. Please let me know if any additional information is
needed.

---------

Co-authored-by: Shivani Modi <shivanimodi@Shivanis-MacBook-Pro.local>
Co-authored-by: Shivani Modi <shivanimodi@Shivanis-MBP.lan>
2024-01-23 18:22:32 -08:00
Noah Stapp
e135e5257c
community[patch]: Include scores in MongoDB Atlas QA chain results (#14666)
Adds the ability to return similarity scores when using
`RetrievalQA.from_chain_type` with `MongoDBAtlasVectorSearch`. Requires
that `return_source_documents=True` is set.

Example use:

```
vector_search = MongoDBAtlasVectorSearch.from_documents(...)

qa = RetrievalQA.from_chain_type(
	llm=OpenAI(), 
	chain_type="stuff", 
	retriever=vector_search.as_retriever(search_kwargs={"additional": ["similarity_score"]}),
	return_source_documents=True
)

...

docs = qa({"query": "..."})

docs["source_documents"][0].metadata["score"] # score will be here
```

I've tested this feature locally, using a MongoDB Atlas Cluster with a
vector search index.
2024-01-23 18:18:28 -08:00
Serena Ruan
90f5a1c40e
community[minor]: Improve mlflow callback (#15691)
- **Description:** Allow passing run_id to MLflowCallbackHandler to
resume a run instead of creating a new run. Support recording retriever
relevant metrics. Refactor the code to fix some bugs.
---------

Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
2024-01-23 18:16:51 -08:00
Facundo Santiago
92e6a641fd
feat: adding paygo api support for Azure ML / Azure AI Studio (#14560)
- **Description:** Introducing support for LLMs and Chat models running
in Azure AI studio and Azure ML using the new deployment mode
pay-as-you-go (model as a service).
- **Issue:** NA
- **Dependencies:** None.
- **Tag maintainer:** @prakharg-msft @gdyre 
- **Twitter handle:** @santiagofacundo

Examples added:
*
[docs/docs/integrations/llms/azure_ml.ipynb](https://github.com/santiagxf/langchain/blob/santiagxf/azureml-endpoints-paygo-community/docs/docs/integrations/chat/azureml_endpoint.ipynb)
*
[docs/docs/integrations/chat/azureml_chat_endpoint.ipynb](https://github.com/santiagxf/langchain/blob/santiagxf/azureml-endpoints-paygo-community/docs/docs/integrations/chat/azureml_chat_endpoint.ipynb)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-23 17:08:51 -08:00
Davide Menini
9ce177580a
community: normalize bedrock embeddings (#15103)
In this PR I added a post-processing function to normalize the
embeddings. This happens only if the new `normalize` flag is `True`.

---------

Co-authored-by: taamedag <Davide.Menini@swisscom.com>
2024-01-23 17:05:24 -08:00
baichuan-assistant
20fcd49348
community: Fix Baichuan Chat. (#15207)
- **Description:** Baichuan Chat (with both Baichuan-Turbo and
Baichuan-Turbo-192K models) has updated their APIs. There are breaking
changes. For example, BAICHUAN_SECRET_KEY is removed in the latest API
but is still required in Langchain. Baichuan's Langchain integration
needs to be updated to the latest version.
  - **Issue:** #15206
  - **Dependencies:** None,
  - **Twitter handle:** None

@hwchase17.

Co-authored-by: BaiChuanHelper <wintergyc@WinterGYCs-MacBook-Pro.local>
2024-01-23 17:01:57 -08:00
gcheron
cfc225ecb3
community: SQLStrStore/SQLDocStore provide an easy SQL alternative to InMemoryStore to persist data remotely in a SQL storage (#15909)
**Description:**

- Implement `SQLStrStore` and `SQLDocStore` classes that inherits from
`BaseStore` to allow to persist data remotely on a SQL server.
- SQL is widely used and sometimes we do not want to install a caching
solution like Redis.
- Multiple issues/comments complain that there is no easy remote and
persistent solution that are not in memory (users want to replace
InMemoryStore), e.g.,
https://github.com/langchain-ai/langchain/issues/14267,
https://github.com/langchain-ai/langchain/issues/15633,
https://github.com/langchain-ai/langchain/issues/14643,
https://stackoverflow.com/questions/77385587/persist-parentdocumentretriever-of-langchain
- This is particularly painful when wanting to use
`ParentDocumentRetriever `
- This implementation is particularly useful when:
     * it's expensive to construct an InMemoryDocstore/dict
     * you want to retrieve documents from remote sources
     * you just want to reuse existing objects
- This implementation integrates well with PGVector, indeed, when using
PGVector, you already have a SQL instance running. `SQLDocStore` is a
convenient way of using this instance to store documents associated to
vectors. An integration example with ParentDocumentRetriever and
PGVector is provided in docs/docs/integrations/stores/sql.ipynb or
[here](https://github.com/gcheron/langchain/blob/sql-store/docs/docs/integrations/stores/sql.ipynb).
- It persists `str` and `Document` objects but can be easily extended.

 **Issue:**

Provide an easy SQL alternative to `InMemoryStore`.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-23 16:50:48 -08:00
Massimiliano Pronesti
e529939c54
feat(llms): support more tasks in HuggingFaceHub LLM and remove deprecated dep (#14406)
- **Description:** this PR upgrades the `HuggingFaceHub` LLM:
   * support more tasks (`translation` and `conversational`)
   * replaced the deprecated `InferenceApi` with `InferenceClient`
* adjusted the overall logic to use the "recommended" model for each
task when no model is provided, and vice-versa.
- **Tag mainter(s)**: @baskaryan @hwchase17
2024-01-23 16:48:56 -08:00
Bagatur
54149292f8
community[patch]: Release 0.0.15 (#16474) 2024-01-23 11:50:10 -08:00
Tomaz Bratanic
d0a8082188
Fix neo4j sanitize (#16439)
Fix the sanitization bug and add an integration test
2024-01-23 10:56:28 -05:00
Florian MOREL
4b7969efc5
community[minor]: New documents loader for visio files (with extension .vsdx) (#16171)
**Description** : New documents loader for visio files (with extension
.vsdx)

A [visio file](https://fr.wikipedia.org/wiki/Microsoft_Visio) (with
extension .vsdx) is associated with Microsoft Visio, a diagram creation
software. It stores information about the structure, layout, and
graphical elements of a diagram. This format facilitates the creation
and sharing of visualizations in areas such as business, engineering,
and computer science.

A Visio file can contain multiple pages. Some of them may serve as the
background for others, and this can occur across multiple layers. This
loader extracts the textual content from each page and its associated
pages, enabling the extraction of all visible text from each page,
similar to what an OCR algorithm would do.

**Dependencies** : xmltodict package
2024-01-22 22:07:03 -08:00
Boris Feld
404abf139a
community: Add CometLLM tracing context var (#15765)
I also added LANGCHAIN_COMET_TRACING to enable the CometLLM tracing
integration similar to other tracing integrations. This is easier for
end-users to enable it rather than importing the callback and pass it
manually.

(This is the same content as
https://github.com/langchain-ai/langchain/pull/14650 but rebased and
squashed as something seems to confuse Github Action).
2024-01-22 15:17:16 -08:00
DL
b9e7f6f38a
community[minor]: Bedrock async methods (#12477)
Description: Added support for asynchronous streaming in the Bedrock
class and corresponding tests.

Primarily:
  async def aprepare_output_stream
    async def _aprepare_input_and_invoke_stream
    async def _astream
    async def _acall

I've ensured that the code adheres to the project's linting and
formatting standards by running make format, make lint, and make test.

Issue: #12054, #11589

Dependencies: None

Tag maintainer: @baskaryan 

Twitter handle: @dominic_lovric

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
2024-01-22 14:44:49 -08:00
Frank995
5694728816
community[patch]: Implement vector length definition at init time in PGVector for indexing (#16133)
Replace this entire comment with:
- **Description:** allow user to define tVector length in PGVector when
creating the embedding store, this allows for later indexing
  - **Issue:** #16132
  - **Dependencies:** None
2024-01-22 14:32:44 -08:00
parkererickson-tg
b26a22f307
community[minor]: add TigerGraph support (#16280)
**Description:** Add support for querying TigerGraph databases through
the InquiryAI service.
**Issue**: N/A
**Dependencies:** N/A
**Twitter handle:** @TigerGraphDB
2024-01-22 14:07:44 -08:00
Alireza Kashani
d1b4ead87c
community[patch]: Update grobid.py (#16298)
there is a case where "coords" does not exist in the "sentence"
therefore, the "split(";")" will lead to error.

we can fix that by adding "if sentence.get("coords") is not None:" 

the resulting empty "sbboxes" from this scenario will raise error at
"sbboxes[0]["page"]" because sbboxes are empty.

the PDF from https://pubmed.ncbi.nlm.nih.gov/23970373/ can replicate
those errors.
2024-01-22 14:03:58 -08:00
s-g-1
fbe592a5ce
community[patch]: fix typo in pgvecto_rs debug msg (#16318)
fixes typo in pip install message for the pgvecto_rs community vector
store
no issues found mentioning this
no dependents changed
2024-01-22 14:01:33 -08:00
Ian
b9f5104e6c
communty[minor]: Store Message History to TiDB Database (#16304)
This pull request integrates the TiDB database into LangChain for
storing message history, marking one of several steps towards a
comprehensive integration of TiDB with LangChain.


A simple usage
```python
from datetime import datetime
from langchain_community.chat_message_histories import TiDBChatMessageHistory

history = TiDBChatMessageHistory(
    connection_string="mysql+pymysql://<host>:<PASSWORD>@<host>:4000/<db>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true",
    session_id="code_gen",
    earliest_time=datetime.utcnow(),  # Optional to set earliest_time to load messages after this time point.
)

history.add_user_message("hi! How's feature going?")
history.add_ai_message("It's almot done")
```
2024-01-22 13:56:56 -08:00
Erick Friis
cfe95ab085
multiple: update langsmith dep (#16407) 2024-01-22 14:23:11 -07:00
Eli Lucherini
6b2a57161a
community[patch]: allow additional kwargs in MlflowEmbeddings for compatibility with Cohere API (#15242)
- **Description:** add support for kwargs in`MlflowEmbeddings`
`embed_document()` and `embed_query()` so that all the arguments
required by Cohere API (and others?) can be passed down to the server.
  - **Issue:** #15234 
- **Dependencies:** MLflow with MLflow Deployments (`pip install
mlflow[genai]`)

**Tests**
Now this code [adapted from the
docs](https://python.langchain.com/docs/integrations/providers/mlflow#embeddings-example)
for the Cohere API works locally.

```python
"""
Setup
-----
export COHERE_API_KEY=...
mlflow deployments start-server --config-path examples/deployments/cohere/config.yaml

Run
---
python /path/to/this/file.py
"""
embeddings = MlflowCohereEmbeddings(target_uri="http://127.0.0.1:5000", endpoint="embeddings")
print(embeddings.embed_query("hello")[:3])
print(embeddings.embed_documents(["hello", "world"])[0][:3])
```

Output
```
[0.060455322, 0.028793335, -0.025848389]
[0.031707764, 0.021057129, -0.009361267]
```
2024-01-22 11:38:11 -08:00
Guillem Orellana Trullols
aad2aa7188
community[patch]: BedrockChat -> Support Titan express as chat model (#15408)
Titan Express model was not supported as a chat model because LangChain
messages were not "translated" to a text prompt.

Co-authored-by: Guillem Orellana Trullols <guillem.orellana_trullols@siemens.com>
2024-01-22 11:37:23 -08:00
Katarina Supe
01c2f27ffa
community[patch]: Update Memgraph support (#16360)
- **Description:** I removed two queries to the database and left just
one whose results were formatted afterward into other type of schema
(avoided two calls to DB)
  - **Issue:** /
  - **Dependencies:** /
  - **Twitter handle:** @supe_katarina
2024-01-22 11:33:28 -08:00
Max Jakob
8569b8f680
community[patch]: ElasticsearchStore enable max inner product (#16393)
Enable max inner product for approximate retrieval strategy. For exact
strategy we lack the necessary `maxInnerProduct` function in the
Painless scripting language, this is why we do not add it there.

Similarity docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Joe McElroy <joseph.mcelroy@elastic.co>
2024-01-22 11:26:18 -08:00
Iskren Ivov Chernev
fc196cab12
community[minor]: DeepInfra support for chat models (#16380)
Add deepinfra chat models support.

This is https://github.com/langchain-ai/langchain/pull/14234 re-opened
from my branch (so maintainers can edit).
2024-01-22 11:22:17 -08:00
Bagatur
85e8423312
community[patch]: Update bing results tool name (#16395)
Make BingSearchResults tool name OpenAI functions compatible (can't have
spaces).

Fixes #16368
2024-01-22 11:11:03 -08:00
Max Jakob
de209af533
community[patch]: ElasticsearchStore: add relevance function selector (#16378)
Implement similarity function selector for ElasticsearchStore. The
scores coming back from Elasticsearch are already similarities (not
distances) and they are already normalized (see
[docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params)).
Hence we leave the scores untouched and just forward them.

This fixes #11539.

However, in hybrid mode (when keyword search and vector search are
involved) Elasticsearch currently returns no scores. This PR adds an
error message around this fact. We need to think a bit more to come up
with a solution for this case.

This PR also corrects a small error in the Elasticsearch integration
test.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-01-22 11:52:20 -07:00
Tom Jorquera
1445ac95e8
community[patch]: Enable streaming for GPT4all (#16392)
`streaming` param was never passed to model
2024-01-22 09:54:18 -08:00
Bagatur
8779013847
community[patch]: Release 0.0.14 (#16384) 2024-01-22 08:50:19 -08:00
Bagatur
1dc6c1ce06
core[patch], community[patch], langchain[patch], docs: Update SQL chains/agents/docs (#16168)
Revamp SQL use cases docs. In the process update SQL chains and agents.
2024-01-22 08:19:08 -08:00
Luke
5396604ef4
community: Handling missing key in Google Trends API response. (#15864)
- **Description:** Handing response where _interest_over_time_ is
missing.
  - **Issue:** #15859
  - **Dependencies:** None
2024-01-21 18:11:45 -08:00
Virat Singh
c2a614eddc
community: Add PolygonLastQuote Tool and Toolkit (#15990)
**Description:** 
In this PR, I am adding a `PolygonLastQuote` Tool, which can be used to
get the latest price quote for a given ticker / stock.

Additionally, I've added a Polygon Toolkit, which we can use to
encapsulate future tools that we build for Polygon.

**Twitter handle:** [@virattt](https://twitter.com/virattt)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-21 15:08:55 -08:00
Ofer Mendelevitch
ffae98d371
template: Update Vectara templates (#15363)
fixed multi-query template for Vectara
added self-query template for Vectara

Also added prompt_name parameter to summarization

CC @efriis 
 **Twitter handle:** @ofermend
2024-01-19 17:32:33 -08:00
Carey
021b0484a8
community[patch]: add skipped test for inner product normalization (#14989)
---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-01-18 23:03:15 -08:00
Christophe Bornet
3ccbe11363
community[minor]: Add Cassandra document loader (#16215)
- **Description:** document loader for Apache Cassandra
  - **Twitter handle:** cbornet_
2024-01-18 18:49:02 -08:00
mikeFore4
9d32af72ce
community[patch]: huggingface hub character removal bug fix (#16233)
- **Description:** Some text-generation models on huggingface repeat the
prompt in their generated response, but not all do! The tests use "gpt2"
which DOES repeat the prompt and as such, the HuggingFaceHub class is
hardcoded to remove the first few characters of the response (to match
the len(prompt)). However, if you are using a model (such as the very
popular "meta-llama/Llama-2-7b-chat-hf") that DOES NOT repeat the prompt
in it's generated text, then the beginning of the generated text will be
cut off. This code change fixes that bug by first checking whether the
prompt is repeated in the generated response and removing it
conditionally.
  - **Issue:** #16232 
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
2024-01-18 18:44:10 -08:00
Andreas Motl
3613d8a2ad
community[patch]: Use SQLAlchemy's bulk_save_objects method to improve insert performance (#16244)
- **Description:** Improve [pgvector vector store
adapter](https://github.com/langchain-ai/langchain/blob/v0.1.1/libs/community/langchain_community/vectorstores/pgvector.py)
to save embeddings in batches, to improve its performance.
  - **Issue:** NA
  - **Dependencies:** NA
  - **References:** https://github.com/crate-workbench/langchain/pull/1


Hi again from the CrateDB team,

following up on GH-16243, this is another minor patch to the pgvector
vector store adapter. Inserting embeddings in batches, using
[SQLAlchemy's
`bulk_save_objects`](https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.bulk_save_objects)
method, can deliver substantial performance gains.

With kind regards,
Andreas.

NB: As I am seeing just now that this method is a legacy feature of SA
2.0, it will need to be reworked on a future iteration. However, it is
not deprecated yet, and I haven't been able to come up with a different
implementation, yet.
2024-01-18 18:35:39 -08:00
Christophe Bornet
3502a407d9
infra: Use dotenv in langchain-community's integration tests (#16137)
* Removed some env vars not used in langchain package IT
* Added Astra DB env vars in langchain package, used for cache tests
* Added conftest.py to load env vars in langchain_community IT
* Added .env.example in  langchain_community IT
2024-01-17 18:18:26 -08:00
Tomaz Bratanic
1e80113ac9
community[patch]: Add neo4j timeout and value sanitization option (#16138)
The timeout function comes in handy when you want to kill longrunning
queries.
The value sanitization removes all lists that are larger than 128
elements. The idea here is to remove embedding properties from results.
2024-01-17 13:22:19 -08:00
Krishna Shedbalkar
f238217cea
community[patch]: Basic Logging and Human input to ShellTool (#15932)
- **Description:** As Shell tool is very versatile, while integrating it
into applications as openai functions, developers have no clue about
what command is being executed using the ShellTool. All one can see is:

![image](https://github.com/langchain-ai/langchain/assets/60742358/540e274a-debc-4564-9027-046b91424df3)

Summarising my feature request:
1. There's no visibility about what command was executed.
2. There's no mechanism to prevent a command to be executed using
ShellTool, like a y/n human input which can be accepted from user to
proceed with executing the command.,
  - **Issue:** the issue #15931 it fixes if applicable,
  - **Dependencies:** There isn't any dependancy,
  - **Twitter handle:** @krishnashed
2024-01-17 12:57:51 -08:00
Christophe Bornet
fb940d11df
community[patch]: Use newer MetadataVectorCassandraTable in Cassandra vector store (#15987)
as VectorTable is deprecated

Tested manually with `test_cassandra.py` vector store integration test.
2024-01-17 10:37:07 -08:00