Commit Graph

305 Commits

Author SHA1 Message Date
DL
b9e7f6f38a
community[minor]: Bedrock async methods (#12477)
Description: Added support for asynchronous streaming in the Bedrock
class and corresponding tests.

Primarily:
  async def aprepare_output_stream
    async def _aprepare_input_and_invoke_stream
    async def _astream
    async def _acall

I've ensured that the code adheres to the project's linting and
formatting standards by running make format, make lint, and make test.

Issue: #12054, #11589

Dependencies: None

Tag maintainer: @baskaryan 

Twitter handle: @dominic_lovric

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
2024-01-22 14:44:49 -08:00
Frank995
5694728816
community[patch]: Implement vector length definition at init time in PGVector for indexing (#16133)
Replace this entire comment with:
- **Description:** allow user to define tVector length in PGVector when
creating the embedding store, this allows for later indexing
  - **Issue:** #16132
  - **Dependencies:** None
2024-01-22 14:32:44 -08:00
parkererickson-tg
b26a22f307
community[minor]: add TigerGraph support (#16280)
**Description:** Add support for querying TigerGraph databases through
the InquiryAI service.
**Issue**: N/A
**Dependencies:** N/A
**Twitter handle:** @TigerGraphDB
2024-01-22 14:07:44 -08:00
Alireza Kashani
d1b4ead87c
community[patch]: Update grobid.py (#16298)
there is a case where "coords" does not exist in the "sentence"
therefore, the "split(";")" will lead to error.

we can fix that by adding "if sentence.get("coords") is not None:" 

the resulting empty "sbboxes" from this scenario will raise error at
"sbboxes[0]["page"]" because sbboxes are empty.

the PDF from https://pubmed.ncbi.nlm.nih.gov/23970373/ can replicate
those errors.
2024-01-22 14:03:58 -08:00
s-g-1
fbe592a5ce
community[patch]: fix typo in pgvecto_rs debug msg (#16318)
fixes typo in pip install message for the pgvecto_rs community vector
store
no issues found mentioning this
no dependents changed
2024-01-22 14:01:33 -08:00
Ian
b9f5104e6c
communty[minor]: Store Message History to TiDB Database (#16304)
This pull request integrates the TiDB database into LangChain for
storing message history, marking one of several steps towards a
comprehensive integration of TiDB with LangChain.


A simple usage
```python
from datetime import datetime
from langchain_community.chat_message_histories import TiDBChatMessageHistory

history = TiDBChatMessageHistory(
    connection_string="mysql+pymysql://<host>:<PASSWORD>@<host>:4000/<db>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true",
    session_id="code_gen",
    earliest_time=datetime.utcnow(),  # Optional to set earliest_time to load messages after this time point.
)

history.add_user_message("hi! How's feature going?")
history.add_ai_message("It's almot done")
```
2024-01-22 13:56:56 -08:00
Erick Friis
cfe95ab085
multiple: update langsmith dep (#16407) 2024-01-22 14:23:11 -07:00
Eli Lucherini
6b2a57161a
community[patch]: allow additional kwargs in MlflowEmbeddings for compatibility with Cohere API (#15242)
- **Description:** add support for kwargs in`MlflowEmbeddings`
`embed_document()` and `embed_query()` so that all the arguments
required by Cohere API (and others?) can be passed down to the server.
  - **Issue:** #15234 
- **Dependencies:** MLflow with MLflow Deployments (`pip install
mlflow[genai]`)

**Tests**
Now this code [adapted from the
docs](https://python.langchain.com/docs/integrations/providers/mlflow#embeddings-example)
for the Cohere API works locally.

```python
"""
Setup
-----
export COHERE_API_KEY=...
mlflow deployments start-server --config-path examples/deployments/cohere/config.yaml

Run
---
python /path/to/this/file.py
"""
embeddings = MlflowCohereEmbeddings(target_uri="http://127.0.0.1:5000", endpoint="embeddings")
print(embeddings.embed_query("hello")[:3])
print(embeddings.embed_documents(["hello", "world"])[0][:3])
```

Output
```
[0.060455322, 0.028793335, -0.025848389]
[0.031707764, 0.021057129, -0.009361267]
```
2024-01-22 11:38:11 -08:00
Guillem Orellana Trullols
aad2aa7188
community[patch]: BedrockChat -> Support Titan express as chat model (#15408)
Titan Express model was not supported as a chat model because LangChain
messages were not "translated" to a text prompt.

Co-authored-by: Guillem Orellana Trullols <guillem.orellana_trullols@siemens.com>
2024-01-22 11:37:23 -08:00
Katarina Supe
01c2f27ffa
community[patch]: Update Memgraph support (#16360)
- **Description:** I removed two queries to the database and left just
one whose results were formatted afterward into other type of schema
(avoided two calls to DB)
  - **Issue:** /
  - **Dependencies:** /
  - **Twitter handle:** @supe_katarina
2024-01-22 11:33:28 -08:00
Max Jakob
8569b8f680
community[patch]: ElasticsearchStore enable max inner product (#16393)
Enable max inner product for approximate retrieval strategy. For exact
strategy we lack the necessary `maxInnerProduct` function in the
Painless scripting language, this is why we do not add it there.

Similarity docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Joe McElroy <joseph.mcelroy@elastic.co>
2024-01-22 11:26:18 -08:00
Iskren Ivov Chernev
fc196cab12
community[minor]: DeepInfra support for chat models (#16380)
Add deepinfra chat models support.

This is https://github.com/langchain-ai/langchain/pull/14234 re-opened
from my branch (so maintainers can edit).
2024-01-22 11:22:17 -08:00
Bagatur
85e8423312
community[patch]: Update bing results tool name (#16395)
Make BingSearchResults tool name OpenAI functions compatible (can't have
spaces).

Fixes #16368
2024-01-22 11:11:03 -08:00
Max Jakob
de209af533
community[patch]: ElasticsearchStore: add relevance function selector (#16378)
Implement similarity function selector for ElasticsearchStore. The
scores coming back from Elasticsearch are already similarities (not
distances) and they are already normalized (see
[docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params)).
Hence we leave the scores untouched and just forward them.

This fixes #11539.

However, in hybrid mode (when keyword search and vector search are
involved) Elasticsearch currently returns no scores. This PR adds an
error message around this fact. We need to think a bit more to come up
with a solution for this case.

This PR also corrects a small error in the Elasticsearch integration
test.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-01-22 11:52:20 -07:00
Tom Jorquera
1445ac95e8
community[patch]: Enable streaming for GPT4all (#16392)
`streaming` param was never passed to model
2024-01-22 09:54:18 -08:00
Bagatur
8779013847
community[patch]: Release 0.0.14 (#16384) 2024-01-22 08:50:19 -08:00
Bagatur
1dc6c1ce06
core[patch], community[patch], langchain[patch], docs: Update SQL chains/agents/docs (#16168)
Revamp SQL use cases docs. In the process update SQL chains and agents.
2024-01-22 08:19:08 -08:00
Luke
5396604ef4
community: Handling missing key in Google Trends API response. (#15864)
- **Description:** Handing response where _interest_over_time_ is
missing.
  - **Issue:** #15859
  - **Dependencies:** None
2024-01-21 18:11:45 -08:00
Virat Singh
c2a614eddc
community: Add PolygonLastQuote Tool and Toolkit (#15990)
**Description:** 
In this PR, I am adding a `PolygonLastQuote` Tool, which can be used to
get the latest price quote for a given ticker / stock.

Additionally, I've added a Polygon Toolkit, which we can use to
encapsulate future tools that we build for Polygon.

**Twitter handle:** [@virattt](https://twitter.com/virattt)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-21 15:08:55 -08:00
Ofer Mendelevitch
ffae98d371
template: Update Vectara templates (#15363)
fixed multi-query template for Vectara
added self-query template for Vectara

Also added prompt_name parameter to summarization

CC @efriis 
 **Twitter handle:** @ofermend
2024-01-19 17:32:33 -08:00
Carey
021b0484a8
community[patch]: add skipped test for inner product normalization (#14989)
---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-01-18 23:03:15 -08:00
Christophe Bornet
3ccbe11363
community[minor]: Add Cassandra document loader (#16215)
- **Description:** document loader for Apache Cassandra
  - **Twitter handle:** cbornet_
2024-01-18 18:49:02 -08:00
mikeFore4
9d32af72ce
community[patch]: huggingface hub character removal bug fix (#16233)
- **Description:** Some text-generation models on huggingface repeat the
prompt in their generated response, but not all do! The tests use "gpt2"
which DOES repeat the prompt and as such, the HuggingFaceHub class is
hardcoded to remove the first few characters of the response (to match
the len(prompt)). However, if you are using a model (such as the very
popular "meta-llama/Llama-2-7b-chat-hf") that DOES NOT repeat the prompt
in it's generated text, then the beginning of the generated text will be
cut off. This code change fixes that bug by first checking whether the
prompt is repeated in the generated response and removing it
conditionally.
  - **Issue:** #16232 
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
2024-01-18 18:44:10 -08:00
Andreas Motl
3613d8a2ad
community[patch]: Use SQLAlchemy's bulk_save_objects method to improve insert performance (#16244)
- **Description:** Improve [pgvector vector store
adapter](https://github.com/langchain-ai/langchain/blob/v0.1.1/libs/community/langchain_community/vectorstores/pgvector.py)
to save embeddings in batches, to improve its performance.
  - **Issue:** NA
  - **Dependencies:** NA
  - **References:** https://github.com/crate-workbench/langchain/pull/1


Hi again from the CrateDB team,

following up on GH-16243, this is another minor patch to the pgvector
vector store adapter. Inserting embeddings in batches, using
[SQLAlchemy's
`bulk_save_objects`](https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.bulk_save_objects)
method, can deliver substantial performance gains.

With kind regards,
Andreas.

NB: As I am seeing just now that this method is a legacy feature of SA
2.0, it will need to be reworked on a future iteration. However, it is
not deprecated yet, and I haven't been able to come up with a different
implementation, yet.
2024-01-18 18:35:39 -08:00
Christophe Bornet
3502a407d9
infra: Use dotenv in langchain-community's integration tests (#16137)
* Removed some env vars not used in langchain package IT
* Added Astra DB env vars in langchain package, used for cache tests
* Added conftest.py to load env vars in langchain_community IT
* Added .env.example in  langchain_community IT
2024-01-17 18:18:26 -08:00
Tomaz Bratanic
1e80113ac9
community[patch]: Add neo4j timeout and value sanitization option (#16138)
The timeout function comes in handy when you want to kill longrunning
queries.
The value sanitization removes all lists that are larger than 128
elements. The idea here is to remove embedding properties from results.
2024-01-17 13:22:19 -08:00
Krishna Shedbalkar
f238217cea
community[patch]: Basic Logging and Human input to ShellTool (#15932)
- **Description:** As Shell tool is very versatile, while integrating it
into applications as openai functions, developers have no clue about
what command is being executed using the ShellTool. All one can see is:

![image](https://github.com/langchain-ai/langchain/assets/60742358/540e274a-debc-4564-9027-046b91424df3)

Summarising my feature request:
1. There's no visibility about what command was executed.
2. There's no mechanism to prevent a command to be executed using
ShellTool, like a y/n human input which can be accepted from user to
proceed with executing the command.,
  - **Issue:** the issue #15931 it fixes if applicable,
  - **Dependencies:** There isn't any dependancy,
  - **Twitter handle:** @krishnashed
2024-01-17 12:57:51 -08:00
Christophe Bornet
fb940d11df
community[patch]: Use newer MetadataVectorCassandraTable in Cassandra vector store (#15987)
as VectorTable is deprecated

Tested manually with `test_cassandra.py` vector store integration test.
2024-01-17 10:37:07 -08:00
Mohammad Mohtashim
1fa056c324
community[patch]: Don't set search path for unknown SQL dialects (#16047)
- **Description:** Made a small fix for the `SQLDatabase` highlighted in
an issue. The issue pertains to switching schema for different SQL
engines. 
  - **Issue:** #16023
@baskaryan
2024-01-17 10:31:11 -08:00
Leonid Ganeline
c5f6b828ad
langchain[patch], community[minor]: move output_parsers.ernie_functions (#16057)
`output_parsers.ernie_functions` moved into `community`
2024-01-17 10:06:18 -08:00
Fei Wang
d0e101e4e0
community[patch]: fix ollama astream (#16070)
Update ollama.py
2024-01-17 09:42:41 -08:00
BeatrixCohere
b0c3e3db2b
community[patch]: Handle when documents are not provided in the Cohere response (#16144)
- **Description:** This handles the cohere response when documents
aren't included in the response
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
2024-01-17 09:11:00 -08:00
Felix Krones
d91126fc64
community[patch]: missing unpack operator for or_clause in pgvector document filter (#16148)
- Fix for #16146 
- Adding unpack operation to "or" and "and" filter for pgvector
retriever. #
2024-01-17 09:10:43 -08:00
William FH
e5cf1e2414
Community[patch]use secret str in Tavily and HuggingFaceInferenceEmbeddings (#16109)
So the api keys don't show up in repr's 

Still need to do tests
2024-01-17 00:30:07 -08:00
William FH
f3601b0aaf
Community[Patch] Remove docs form bm25 repr (#16110)
Resolves: https://github.com/langchain-ai/langsmith-sdk/issues/356
2024-01-17 00:00:55 -08:00
Erick Friis
52114bdfac
community[patch]: release 0.0.13 (#16087) 2024-01-16 06:25:28 -08:00
James Briggs
ca288d8f2c
community[patch]: add vector param to index query for pinecone vec store (#16054) 2024-01-16 06:12:19 -08:00
Antonio Morales
476fb328ee
community[patch]: implement adelete from VectorStore in Qdrant (#16005)
**Description:**
Implement `adelete` function from `VectorStore` in `Qdrant` to support
other asynchronous flows such as async indexing (`aindex`) which
requires `adelete` to be implemented. Since `Qdrant` can be passed an
async qdrant client, this can be supported easily.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-15 19:57:09 -08:00
高远
061e63eef2
community[minor]: add vikingdb vecstore (#15155)
---------

Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>
2024-01-15 12:34:01 -08:00
andrijdavid
d196646811
community[patch]: Refactor OpenAIWhisperParserLocal (#15150)
This PR addresses an issue in OpenAIWhisperParserLocal where requesting
CUDA without availability leads to an AttributeError #15143

Changes:

- Refactored Logic for CUDA Availability: The initialization now
includes a check for CUDA availability. If CUDA is not available, the
code falls back to using the CPU. This ensures seamless operation
without manual intervention.
- Parameterizing Batch Size and Chunk Size: The batch_size and
chunk_size are now configurable parameters, offering greater flexibility
and optimization options based on the specific requirements of the use
case.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-15 12:29:14 -08:00
Zhichao HAN
5cf06db3b3
community[minor]: add JsonRequestsWrapper tool (#15374)
**Description:** This new feature enhances the flexibility of pipeline
integration, particularly when working with RESTful APIs.
``JsonRequestsWrapper`` allows for the decoding of JSON output, instead
of the only option for text output.

---------

Co-authored-by: Zhichao HAN <hanzhichao2000@hotmail.com>
2024-01-15 12:27:19 -08:00
chyroc
d334efc848
community[patch]: fix top_p type hint (#15452)
fix: https://github.com/langchain-ai/langchain/issues/15341

@efriis
2024-01-15 11:59:39 -08:00
Mateusz Szewczyk
251afda549
community[patch]: fix stop (stop_sequences) param on WatsonxLLM (#15541)
- **Description:** Fix to IBM
[watsonx.ai](https://www.ibm.com/products/watsonx-ai) LLM provider (stop
(`stop_sequences`) param on watsonxLLM)
- **Dependencies:**
[ibm-watsonx-ai](https://pypi.org/project/ibm-watsonx-ai/),
2024-01-15 11:44:57 -08:00
Funkeke
7220124368
community[patch]: fix tongyi completion and params error (#15544)
fix tongyi completion json parse error and prompt's params error

---------

Co-authored-by: fangkeke <3339698829@qq.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-15 11:43:13 -08:00
盐粒 Yanli
ddf4e7c633
community[minor]: Update pgvecto_rs to use its high level sdk (#15574)
- **Description:** Update pgvecto_rs to use its high level sdk, 
  - **Issue:** fix #15173
2024-01-15 11:41:59 -08:00
YHW
ce21392a21
community: add a flag that determines whether to load the milvus collection (#15693)
fix https://github.com/langchain-ai/langchain/issues/15694

---------

Co-authored-by: hyungwookyang <hyungwookyang@worksmobile.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-15 11:25:23 -08:00
Mohammad Mohtashim
9e779ca846
community[patch]: Fixing the SlackGetChannel Tool Input Error (#15725)
Fixed the issue mentioned in #15698 for SlackGetChannel Tool.

@baskaryan.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-15 11:23:55 -08:00
axiangcoding
daa9ccae52
community[patch]: deprecate ErnieBotChat and ErnieEmbeddings classes (#15862)
- **Description:** add deprecated warning for ErnieBotChat and
ErnieEmbeddings.
- These two classes **lack maintenance** and do not use the sdk provided
by qianfan, which means hard to implement some key feature like
streaming.
- The alternative `langchain_community.chat_models.QianfanChatEndpoint`
and `langchain_community.embeddings.QianfanEmbeddingsEndpoint` can
completely replace these two classes, only need to change configuration
items.
  - **Issue:** None,
  - **Dependencies:** None,
  - **Twitter handle:** None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-15 11:14:44 -08:00
JaguarDB
b11fd3bedc
community[patch]: jaguar vector store fix integer-element error when joining metadata values (#15939)
- **Description:** some document loaders add integer-type metadata
values which cause error
  - **Issue:** 15937
  - **Dependencies:** none

---------

Co-authored-by: JY <jyjy@jaguardb>
2024-01-15 11:13:45 -08:00
Neo Zhao
21e0df937f
community[patch]: fix a bug that mistakenly handle zip iterator in FAISS.from_embeddings (#16020)
**Description**: `zip` is iterator that will only produce result once,
so the previous code will cause the `embeddings` to be an empty list.

**Issue**: I could not find a related issue.

**Dependencies**: this PR does not introduce or affect dependencies.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-15 11:13:14 -08:00
Leonid Ganeline
fb676d8a9b
community[minor], langchain[minor]: refactor output_parsers Rail (#15852)
Moved Rail parser to `community` package.
2024-01-15 10:54:49 -08:00
Massimiliano Pronesti
e80aab2275
docs(community): update Amadeus toolkit to langchain v0.1 (#15976)
- **Description:** docs update following the changes introduced in
#15879

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-15 10:50:47 -08:00
Ashley Xu
ce7723c1e5
community[minor]: add additional support for BigQueryVectorSearch (#15904)
BigQuery vector search lets you use GoogleSQL to do semantic search,
using vector indexes for fast but approximate results, or using brute
force for exact results.

This PR:
1. Add `metadata[_job_ib]` in Document returned by any similarity search
2. Add `explore_job_stats` to enable users to explore job statistics and
better the debuggability
3. Set the minimum row limit for running create vector index.
2024-01-15 10:45:15 -08:00
Mohammed Naqi
8799b028a6
community[minor]: Adding asynchronous function implementation for Doctran (#15941)
## Description 
In this update, I addressed the missing implementation for
atransform_document, which is the asynchronous counterpart of
transform_document in Doctran.

### Usage Example:
```py
# Instantiate DoctranPropertyExtractor with specified properties
property_extractor = DoctranPropertyExtractor(properties=properties)

# Asynchronously extract properties from a list of documents
extracted_document = await property_extractor.atransform_documents(
    documents, properties=properties
)

# Display metadata of the first extracted document
print(json.dumps(extracted_document[0].metadata, indent=2))

```

## Issue
- Pull request #14525 has caused a break in the aforementioned code.
Instead of removing an asynchronous implementation of a function,
consider implementing a synchronous version alongside it.
2024-01-15 10:39:25 -08:00
Raunak
c0773ab329
community[patch]: Fixed 'coroutine' object is not subscriptable error (#15986)
- **Description:** Added parenthesis in return statement of
aembed_query() funtion to fix 'coroutine' object is not subscriptable
error.
  - **Dependencies:** NA

Co-authored-by: H161961 <Raunak.Raunak@Honeywell.com>
2024-01-15 10:34:10 -08:00
Karim Lalani
14244bd7e5
community[minor]: Added document loader for SurrealDB (#15995)
Added a simple document loader to work with SurrealDB.
2024-01-15 10:32:42 -08:00
Karim Lalani
768e5e33bc
community[minor]: Fix to match SurrealDB 0.3.2 SDK (#15996)
New version of SurrealDB python sdk was causing the integration to
break.
This fix addresses that change.
2024-01-15 10:31:59 -08:00
shahrin014
86321a949f
community: Ollama - Parameter structure to follow official documentation (#16035)
## Feature
- Follow parameter structure as per official documentation 
- top level parameters (e.g. model, system, template) will be passed as
top level parameters
  - other parameters will be sent in options unless options is provided

![image](https://github.com/langchain-ai/langchain/assets/17451563/d14715d9-9701-4ee3-b44b-89fffea62389)

## Tests
- Test if top level parameters handled properly
- Test if parameters that are not top level parameters are handled as
options
- Test if options is provided, it will be passed as is
2024-01-15 10:17:58 -08:00
Nir Kopler
0fa06732b7
community: add new gpt-3.5-turbo-1106 finetuned for cost calculation (#16039)
**Description:** Added the new gpt-3.5-turbo-1106 for **finetuned** cost
calculation,
**Issue:** no issue found open

By the information in OpenAI the pricing is the same as the older model
(0613)
2024-01-15 08:36:54 -08:00
Virat Singh
eb6e385dc5
community: Add PolygonAPIWrapper and get_last_quote endpoint (#15971)
- **Description:** Added a `PolygonAPIWrapper` and an initial
`get_last_quote` endpoint, which allows us to get the last price quote
for a given `ticker`. Once merged, I can add a Polygon tool in `tools/`
for agents to use.
- **Twitter handle:** [@virattt](https://twitter.com/virattt)

The Polygon.io Stocks API provides REST endpoints that let you query the
latest market data from all US stock exchanges.
2024-01-12 17:52:09 -08:00
Erick Friis
74bac7bda1
community[patch]: core min 0.1.9 (#15974) 2024-01-12 15:32:06 -08:00
Erick Friis
845e407e08
community[patch]: release 0.0.12 (#15973) 2024-01-12 15:27:05 -08:00
Jonathan Algar
a74f3a4979
Batch update of alt text and title attributes for images in md/mdx files across repo (#15357)
**Description:** Batch update of alt text and title attributes for
images in `md` & `mdx` files across the repo using
[alttexter](https://github.com/jonathanalgar/alttexter)/[alttexter-ghclient](https://github.com/jonathanalgar/alttexter-ghclient)
(built using LangChain/LangSmith).

**Limitation:** cannot update `ipynb` files because of [this
issue](https://github.com/langchain-ai/langchain/pull/15357#issuecomment-1885037250).
Can revisit when Docusaurus is bumped to v3.

I checked all the generated alt texts and titles and didn't find any
technical inaccuracies. That's not to say they're _perfect_, but a lot
better than what's there currently.


[Deployed](https://langchain-819yf1tbk-langchain.vercel.app/docs/modules/model_io/)
image example:


![chrome_yZQ7BF2GTj](https://github.com/langchain-ai/langchain/assets/93204286/43a9a4d4-70fd-41c4-8978-b6240ff63ffa)

You can see LangSmith traces for all the calls out to the LLM in the PRs
merged into this one:

* https://github.com/jonathanalgar/langchain/pull/6
* https://github.com/jonathanalgar/langchain/pull/4
* https://github.com/jonathanalgar/langchain/pull/3

I didn't add the following files to the PR as the images already have OK
alt texts:

*
27dca2d92f/docs/docs/integrations/providers/argilla.mdx (L3)
*
27dca2d92f/docs/docs/integrations/providers/apify.mdx (L11)

---------

Co-authored-by: github-actions <github-actions@github.com>
2024-01-12 14:37:48 -08:00
Varik Matevosyan
efe6cfafe2
community: Added Lantern as VectorStore (#12951)
Support [Lantern](https://github.com/lanterndata/lantern) as a new
VectorStore type.

- Added Lantern as VectorStore.
It will support 3 distance functions `l2 squared`, `cosine` and
`hamming` and will use `HNSW` index.
- Added tests
- Added example notebook
2024-01-12 12:00:16 -08:00
Edwin Wenink
9fb09c1c30
community: fix the "page" mode in the AzureAIDocumentIntelligenceParser (bug) (#15958)
**Description**: the "page" mode in the
AzureAIDocumentIntelligenceParser is not accessible due to a wrong
membership test. The mode argument can only be a string (also see the
assertion in the `__init__`: `assert self.mode in ["single", "page",
"object", "markdown"]`, so the check `elif self.mode == ["page"]:`
always fails.
As a result, effectively the "object" mode is used when selecting the
"page" mode, which may lead to errors.

The docstring of the `AzureAIDocumentIntelligenceLoader` also ommitted
the `mode` parameter alltogether, so I added it.

**Issue**: I could not find a related issue (this class is only 3 weeks
old anyways)

**Dependencies**: this PR does not introduce or affect dependencies.

The current demo notebook and examples are not affected because they all
use the default markdown mode.
2024-01-12 11:01:28 -08:00
Mahdi Setayesh
eb76f9c9fe
community: Fixing a performance issue with AzureSearch to perform batch embedding (#15594)
- **Description:** Azure Cognitive Search vector DB store performs slow
embedding as it does not utilize the batch embedding functionality. This
PR provide a fix to improve the performance of Azure Search class when
adding documents to the vector search,
  - **Issue:** #11313 ,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-12 10:58:55 -08:00
ChengZi
d5808f786c
community: Support milvus partition key. (#15740)
- **Description:** Milvus's partition key is an important feature. It
can support multi-tenancy. We hope to introduce this feature.
https://milvus.io/docs/partition_key.md
  - **Issue:** No
  - **Dependencies:** No
  - **Twitter handle:** No

---------

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-12 09:15:03 -08:00
ohbeep
9b3962fc25
community: Add support of "http" URI for Milvus (#12710) (#15683)
- **Description:** Add support of HTTP URI for Milvus
  - **Issue:** #12710 
  - **Dependencies:** N/A,
2024-01-11 21:55:35 -08:00
Raunak
e26e1f8b37
community: Added functions to make async calls to HuggingFaceHub's embedding endpoint in HuggingFaceHubEmbeddings class (#15737)
**Description:**
Added aembed_documents() and aembed_query() async functions in
HuggingFaceHubEmbeddings class in
langchain_community\embeddings\huggingface_hub.py file. It will support
to make async calls to HuggingFaceHub's
embedding endpoint and generate embeddings asynchronously.

Test Cases: Added test_huggingfacehub_embedding_async_documents() and
test_huggingfacehub_embedding_async_query()
functions in test_huggingface_hub.py file to test the two async
functions created in HuggingFaceHubEmbeddings class.

Documentation: Updated huggingfacehub.ipynb with steps to install
huggingface_hub package and use
HuggingFaceHubEmbeddings.

**Dependencies:** None,
**Twitter handle:** I do not have a Twitter account

---------

Co-authored-by: H161961 <Raunak.Raunak@Honeywell.com>
2024-01-11 21:52:55 -08:00
Christophe Bornet
81d1ba05dc
Add a BaseStore backed by AstraDB (#15812)
- **Description:** this change adds a `BaseStore` backed by AstraDB
  - **Twitter handle:** cbornet_
2024-01-11 21:41:24 -08:00
manishsahni2000
74d9fc2f9e
PR community:Removing knn beta content in mongodb atlas vectorstore (#15865)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-11 21:40:54 -08:00
shahrin014
bdd90ae2ee
community: Ollama - Pass headers to post request (#15881)
## Feature
- Set additional headers in constructor
- Headers will be sent in post request

This feature is useful if deploying Ollama on a cloud service such as
hugging face, which requires authentication tokens to be passed in the
request header.

## Tests
- Test if header is passed
- Test if header is not passed
2024-01-11 21:40:35 -08:00
Xin Liu
5efec068c9
feat: Implement stream interface (#15875)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Major changes:

- Rename `wasm_chat.py` to `llama_edge.py`
- Rename the `WasmChatService` class to `ChatService`
- Implement the `stream` interface for `ChatService`
- Add `test_chat_wasm_service_streaming` in the integration test
- Update `llama_edge.ipynb`

---------

Signed-off-by: Xin Liu <sam@secondstate.io>
2024-01-11 21:32:48 -08:00
Massimiliano Pronesti
ec4dab0449
feat(community): make Amadeus toolkit LLM-agnostic (#15879)
- **Description:** `AmadeusToolkit` and `AmadeusClosestAirport`
contained a hardcoded call to `ChatOpenAI`. This PR makes it
LLM-independent, while guaranteeing backward compatibility.
  - **Issue:** #15847 
  - **Dependencies:** None
   
@baskaryan 

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-11 21:32:03 -08:00
Yacine
782dd44be9
<langchain_community.vectorstores>:<Fix pinecone.py __init__ docsrting instruction> (#15922)
- **Description:** The pinecone docstring instructs to pass the
embedding query text causing the warning below. It should be the
embeddings object.
warning message: UserWarning: Passing in `embedding` as a Callable is
deprecated. Please pass in an Embeddings object instead.
  - **Issue:** NA
  - **Dependencies:** None


@baskaryan
2024-01-11 21:26:33 -08:00
Erick Friis
623f87c888
community[patch]: pinecone bug (#15905) 2024-01-11 11:44:07 -08:00
axiangcoding
d5aa277b94
community: add collection_properties parameter to Milvus (#15788)
- **Description:** add collection_properties parameter to Milvus. See
[pymilvus set_properties()
description](https://milvus.io/api-reference/pymilvus/v2.3.x/Collection/set_properties().md)
  - **Issue:** None
  - **Dependencies:** None
  - **Twitter handle:** None
2024-01-10 20:29:01 -08:00
mogith-pn
9e1ed17bfb
Community : Modified doc strings and example notebook for Clarifai (#15816)
Community : Modified doc strings and example notebook for Clarifai

Description:
1. Modified doc strings inside clarifai vectorstore class and
embeddings.
2. Modified notebook examples.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-01-10 19:33:10 -08:00
Erick Friis
38523d7c57
together[minor]: add llm (#15853) 2024-01-10 17:55:34 -08:00
Erick Friis
ee708739c3
community[patch]: pinecone v3 support (#15849)
Info in slack

---------

Co-authored-by: Roie Schwaber-Cohen <roie.cohen@gmail.com>
2024-01-10 14:54:50 -08:00
Erick Friis
85a4594ed7
community[patch]: more deprecations (#15782) 2024-01-09 20:36:16 -08:00
NuODaniel
70b6315b23
community[patch]: fix qianfan chat stream calling caused exception (#13800)
- **Description:** 
`QianfanChatEndpoint` extends `BaseChatModel` as a super class, which
has a default stream implement might concat the MessageChunk with
`__add__`. When call stream(), a ValueError for duplicated key will be
raise.
  - **Issues:** 
     * #13546  
     * #13548
     * merge two single test file related to qianfan.
  - **Dependencies:** no
  - **Tag maintainer:**

---------

Co-authored-by: root <liujun45@baidu.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-09 15:29:25 -08:00
Bagatur
ee5bd986de
community[patch]: update oai deprecation message (#15681)
addresses #15674
2024-01-09 14:36:58 -05:00
Erick Friis
4ed3d17c47
community[patch]: release 0.0.11 (#15760) 2024-01-09 09:44:26 -08:00
Ian
32ec56194b
community: fix myscale delete function bug (#15675)
Now the SQL used to delete vector doc from myscale is as follow:
```sql
DELETE FROM collection WHERE id = '1' AND id = '2' AND id = '3'
```

But the expected one should be 

```sql
DELETE FROM collection WHERE id IN ('1', '2', '3')
```
2024-01-08 12:26:29 -08:00
Christophe Bornet
a466f79ac9
Fix AstraDB logical operator filtering (#15699)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
This change fixes the AstraDB logical operator filtering (`$and,`
`$or`).
The `metadata` prefix must not be added if the key is `$and` or `$or`.
2024-01-08 12:23:46 -08:00
Christophe Bornet
1f5f6381ec
Add doc for AstraDB document loader (#15703)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
See preview :
https://langchain-git-fork-cbornet-astra-loader-doc-langchain.vercel.app/docs/integrations/document_loaders/astradb
2024-01-08 12:21:46 -08:00
Erick Friis
94911ae503
community[patch]: Support different Pinecone initializations depending on the version (#15717)
Co-authored-by: DosticJelena <jelenadostic2@gmail.com>
2024-01-08 11:33:36 -08:00
Bagatur
4c47f39fcb
community[patch]: Release 0.0.10 (#15678) 2024-01-08 00:24:45 -05:00
Nuno Campos
7ce4cd0709
Do not issue beta or deprecation warnings on internal calls (#15641) 2024-01-07 20:54:45 -08:00
Earlee
98c6c9603e
community: fix: should flush after inserting data on milvus (#15568)
The inserted data cannot take effect immediately. We should flush after
inserting data on milvus.
2024-01-07 09:33:47 -08:00
chyroc
a17a3638b5
Docs: fix excel document loader typo (#15470) 2024-01-07 09:33:35 -08:00
chyroc
9ae901c5e6
Feat: add CHM file loader (#15519)
fix https://github.com/langchain-ai/langchain/issues/15469
2024-01-07 09:28:52 -08:00
Nan LI
0b393315ce
community: Correct Input API Key Name in Notebook and Enhance Readability of Comments for ZhipuAI Chat Model (#15529)
- **Description:** This update rectifies an error in the notebook by
changing the input variable from `zhipu_api_key` to `api_key`. It also
includes revisions to comments to improve program readability.
- **Issue:** The input variable in the notebook example should be
`api_key` instead of `zhipu_api_key`.
- **Dependencies:** No additional dependencies are required for this
change.

To ensure quality and standards, we have performed extensive linting and
testing. Commands such as make format, make lint, and make test have
been run from the root of the modified package to ensure compliance with
LangChain's coding standards.
2024-01-07 09:27:47 -08:00
kursathalat
9ea28ee464
fix: Fix DEFAULT_API_KEY for ArgillaCallbackHandler (#15534)
- ArgillaCallbackHandler does not properly set the default values while
initializing. This PR corrects the line.
- Issue: #15531 
- Dependencies: Argilla

- Also corrected some dead links.
2024-01-07 09:26:51 -08:00
Chad Norvell
d1bfb70bc4
community: Allow deleting by ID and collection in pgvector (#15627)
- **Description:** The `delete_collection` method deletes an entire
collection regardless of custom ID. The `delete` method deletes
everything with the provided custom IDs regardless of collection. It can
be useful to restrict deletion to both the collection and a set of
custom IDs. This change adds support for that by allowing you to
optionally specify that `delete` should be restricted to the collection
defined on the `PGVector` instance.
2024-01-07 08:33:21 -08:00
Chad Norvell
f6226d464e
community: Include PDF ID in MathPix metadata (#15629)
- **Description:** Includes the PDF ID in the MathPix document metadata.
This is useful in case you need to re-request a processed PDF from the
MathPix API later.
2024-01-07 08:31:53 -08:00
Chad Norvell
d2a686b165
community: Provide more actionable errors in the MathPix PDF loader (#15630)
- **Description:** The `error_info['id']` can be cross-referenced with
the MathPix API documentation to get very specific information about why
an error occurred.
2024-01-07 08:31:09 -08:00
Kai
5d05df4bce
community: Fixed bug of "system message check" in chat_models/tongyi. (#15631)
- **Description:** This PR is to fix a bug of "system message check" in
langchain_community/ chat_models/tongyi.py
- **Issue:** In term of current logic, if there's no system message in
the chat messages, an error of "System message can only be the first
message." will be wrongly raised.
  - **Dependencies:** No.
  - **Twitter handle:** I don't have a Twitter account.
2024-01-07 08:30:18 -08:00
Raunak
64f5968a81
community: Replaced hardcoded "metadata" with FIELDS_METADATA variable in semantic_hybrid_search_with_score_and_rerank (#15642)
- **Description:** This PR is to fix a bug in
semantic_hybrid_search_with_score_and_rerank() function in
langchain_community/vectorstores/azuresearch.py. The hardcoded
"metadata" name is replaced with FIELDS_METADATA variable with an if
block to check if the metadata column exists or not.
- **Issue:** Fixed #15581
- **Dependencies:** No
- **Twitter handle:** None

Co-authored-by: H161961 <Raunak.Raunak@Honeywell.com>
2024-01-06 17:04:59 -08:00