Commit Graph

4467 Commits (5ba1899cd72dd57ee760d63137e2c92731c7f7a4)

Author SHA1 Message Date
Enzo Poggio 8f019e91d7
community[patch]: Use Custom Logger Instead of Root Logger in get_user_agent Function (#22691)
## Description
This PR addresses a logging inconsistency in the `get_user_agent`
function. Previously, the function was using the root logger to log a
warning message when the "USER_AGENT" environment variable was not set.
This bypassed the custom logger `log` that was created at the start of
the module, leading to potential inconsistencies in logging behavior.

Changes:
- Replaced `logging.warning` with `log.warning` in the `get_user_agent`
function to ensure that the custom logger is used.

This change ensures that all logging in the `get_user_agent` function
respects the configurations of the custom logger, leading to more
consistent and predictable logging behavior.

## Dependencies

None

## Issue 

None

## Tests and docs

☝🏻 see description


## `make format`, `make lint` & `cd libs/community; make test`

```shell
> make format 
poetry run ruff format docs templates cookbook
1417 files left unchanged
poetry run ruff check --select I --fix docs templates cookbook
All checks passed!
```

```shell
> make lint
poetry run ruff check docs templates cookbook
All checks passed!
poetry run ruff format docs templates cookbook --diff
1417 files already formatted
poetry run ruff check --select I docs templates cookbook
All checks passed!
git grep 'from langchain import' docs/docs templates cookbook | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
```

~cd libs/community; make test~ too much dependencies for integration ...

```shell
>  poetry run pytest tests/unit_tests   
....
==== 884 passed, 466 skipped, 4447 warnings in 15.93s ====
```

I choose you randomly : @ccurme
3 months ago
Philippe PRADOS 9aabb446c5
community[minor]: Add SQL storage implementation (#22207)
Hello @eyurtsev

- package: langchain-comminity
- **Description**: Add SQL implementation for docstore. A new
implementation, in line with my other PR ([async
PGVector](https://github.com/langchain-ai/langchain-postgres/pull/32),
[SQLChatMessageMemory](https://github.com/langchain-ai/langchain/pull/22065))
- Twitter handler: pprados

---------

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Piotr Mardziel <piotrm@gmail.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
3 months ago
Nithish Raghunandanan f2f0e0e13d
couchbase: Add the initial version of Couchbase partner package (#22087)
Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
3 months ago
Cahid Arda Öz 6c07eb0c12
community[minor]: Add UpstashRatelimitHandler (#21885)
Adding `UpstashRatelimitHandler` callback for rate limiting based on
number of chain invocations or LLM token usage.

For more details, see [upstash/ratelimit-py
repository](https://github.com/upstash/ratelimit-py) or the notebook
guide included in this PR.

Twitter handle: @cahidarda

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
3 months ago
Erick Friis 9e03864d64
core: add error message for non-structured llm to StructuredPrompt (#22684)
previously was the blank `NotImplementedError` from
`BaseLanguageModel.with_structured_output`
3 months ago
ccurme f32d57f6f0
anthropic: refactor streaming to use events api; add streaming usage metadata (#22628)
- Refactor streaming to use raw events;
- Add `stream_usage` class attribute and kwarg to stream methods that,
if True, will include separate chunks in the stream containing usage
metadata.

There are two ways to implement streaming with anthropic's python sdk.
They have slight differences in how they surface usage metadata.
1. [Use helper
functions](https://github.com/anthropics/anthropic-sdk-python?tab=readme-ov-file#streaming-helpers).
This is what we are doing now.
```python
count = 1
with client.messages.stream(**params) as stream:
    for text in stream.text_stream:
        snapshot = stream.current_message_snapshot
        print(f"{count}: {snapshot.usage} -- {text}")
        count = count + 1

final_snapshot = stream.get_final_message()
print(f"{count}: {final_snapshot.usage}")
```
```
1: Usage(input_tokens=8, output_tokens=1) -- Hello
2: Usage(input_tokens=8, output_tokens=1) -- !
3: Usage(input_tokens=8, output_tokens=1) --  How
4: Usage(input_tokens=8, output_tokens=1) --  can
5: Usage(input_tokens=8, output_tokens=1) --  I
6: Usage(input_tokens=8, output_tokens=1) --  assist
7: Usage(input_tokens=8, output_tokens=1) --  you
8: Usage(input_tokens=8, output_tokens=1) --  today
9: Usage(input_tokens=8, output_tokens=1) -- ?
10: Usage(input_tokens=8, output_tokens=12)
```
To do this correctly, we need to emit a new chunk at the end of the
stream containing the usage metadata.

2. [Handle raw
events](https://github.com/anthropics/anthropic-sdk-python?tab=readme-ov-file#streaming-responses)
```python
stream = client.messages.create(**params, stream=True)
count = 1
for event in stream:
    print(f"{count}: {event}")
    count = count + 1
```
```
1: RawMessageStartEvent(message=Message(id='msg_01Vdyov2kADZTXqSKkfNJXcS', content=[], model='claude-3-haiku-20240307', role='assistant', stop_reason=None, stop_sequence=None, type='message', usage=Usage(input_tokens=8, output_tokens=1)), type='message_start')
2: RawContentBlockStartEvent(content_block=TextBlock(text='', type='text'), index=0, type='content_block_start')
3: RawContentBlockDeltaEvent(delta=TextDelta(text='Hello', type='text_delta'), index=0, type='content_block_delta')
4: RawContentBlockDeltaEvent(delta=TextDelta(text='!', type='text_delta'), index=0, type='content_block_delta')
5: RawContentBlockDeltaEvent(delta=TextDelta(text=' How', type='text_delta'), index=0, type='content_block_delta')
6: RawContentBlockDeltaEvent(delta=TextDelta(text=' can', type='text_delta'), index=0, type='content_block_delta')
7: RawContentBlockDeltaEvent(delta=TextDelta(text=' I', type='text_delta'), index=0, type='content_block_delta')
8: RawContentBlockDeltaEvent(delta=TextDelta(text=' assist', type='text_delta'), index=0, type='content_block_delta')
9: RawContentBlockDeltaEvent(delta=TextDelta(text=' you', type='text_delta'), index=0, type='content_block_delta')
10: RawContentBlockDeltaEvent(delta=TextDelta(text=' today', type='text_delta'), index=0, type='content_block_delta')
11: RawContentBlockDeltaEvent(delta=TextDelta(text='?', type='text_delta'), index=0, type='content_block_delta')
12: RawContentBlockStopEvent(index=0, type='content_block_stop')
13: RawMessageDeltaEvent(delta=Delta(stop_reason='end_turn', stop_sequence=None), type='message_delta', usage=MessageDeltaUsage(output_tokens=12))
14: RawMessageStopEvent(type='message_stop')
```

Here we implement the second option, in part because it should make
things easier when implementing streaming tool calls in the near future.

This would add two new chunks to the stream-- one at the beginning and
one at the end-- with blank content and containing usage metadata. We
add kwargs to the stream methods and a class attribute allowing for this
behavior to be toggled. I enabled it by default. If we merge this we can
add the same kwargs / attribute to OpenAI.

Usage:
```python
from langchain_anthropic import ChatAnthropic

model = ChatAnthropic(
    model="claude-3-haiku-20240307",
    temperature=0
)

full = None
for chunk in model.stream("hi"):
    full = chunk if full is None else full + chunk
    print(chunk)

print(f"\nFull: {full}")
```
```
content='' id='run-8a20843f-25c7-4025-ad72-9add395899e3' usage_metadata={'input_tokens': 8, 'output_tokens': 0, 'total_tokens': 8}
content='Hello' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content='!' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' How' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' can' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' I' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' assist' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' you' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' today' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content='?' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content='' id='run-8a20843f-25c7-4025-ad72-9add395899e3' usage_metadata={'input_tokens': 0, 'output_tokens': 12, 'total_tokens': 12}

Full: content='Hello! How can I assist you today?' id='run-8a20843f-25c7-4025-ad72-9add395899e3' usage_metadata={'input_tokens': 8, 'output_tokens': 12, 'total_tokens': 20}
```
3 months ago
Bagatur 235d91940d
community[patch]: Release 0.2.4 (#22643) 3 months ago
William FH be79ce9336
[Core] Unified Enable/Disable Tracing (#22576) 3 months ago
Bagatur fe2e5a3b74
langchain[patch]: Release 0.2.3 (#22644) 3 months ago
Erick Friis a24a9c6427
multiple: get rid of pyproject extras (#22581)
They cause `poetry lock` to take a ton of time, and `uv pip install` can
resolve the constraints from these toml files in trivial time
(addressing problem with #19153)

This allows us to properly upgrade lockfile dependencies moving forward,
which revealed some issues that were either fixed or type-ignored (see
file comments)
3 months ago
Bagatur 4367e89c9a
core[patch]: Release 0.2.5 (#22642) 3 months ago
Eugene Yurtsev 28f744c1f5
core[patch]: Correctly order parent ids in astream events (from root to immediate parent), add defensive check for cycles (#22637)
This PR makes two changes:

1. Fixes the order of parent IDs to be from root to immediate parent
2. Adds a simple defensive check for cycles
3 months ago
Eugene Yurtsev 035a9c9609
core[minor]: Add parent_ids to astream_events API (#22563)
Include a list of parent ids for each event in astream events.
3 months ago
Nicolas Nkiere 51005e2776
core[minor]: Add an async root listener and with_alisteners method (#22151)
- [x] **Adding AsyncRootListener**: "langchain_core: Adding
AsyncRootListener"

- **Description:** Adding an AsyncBaseTracer, AsyncRootListener and
`with_alistener` function. This is to enable binding async root listener
to runnables. This currently only supported for sync listeners.
- **Issue:** None
- **Dependencies:** None

- [x] **Add tests and docs**: Added units tests and example snippet code
within the function description of `with_alistener`


- [x] **Lint and test**: Run make format_diff, make lint_diff and make
test
3 months ago
seyf97 2904c50cd5
openai[patch]: correct grammar in exception message in embeddings/base.py (#22629)
Correct the grammar error for missing transformers package ValueError
3 months ago
Anush 80560419b0
qdrant[patch]: Make path optional in from_existing_collection() (#21875)
## Description

The `path` param is used to specify the local persistence directory,
which isn't required if using Qdrant server.

This is a breaking but necessary change.
3 months ago
ccurme b57aa89f34
multiple: implement ls_params (#22621)
implement ls_params for ai21, fireworks, groq.
3 months ago
Xiangrui Meng f26ab93df8
community: support Databricks Unity Catalog functions as LangChain tools (#22555)
This PR adds support for using Databricks Unity Catalog functions as
LangChain tools, which runs inside a Databricks SQL warehouse.

* An example notebook is provided.
3 months ago
ccurme c1ef731503
anthropic: update attribute name and alias (#22625)
update name to `stop_sequences` and alias to `stop` (instead of the
other way around), since `stop_sequences` is the name used by anthropic.
3 months ago
lucasiscovici 05bf98b2f9
community[patch]: pgvector replace nin_ by not_in (#22619)
- [ ] **community**: "pgvector: replace nin_ by not_in"

- [ ] **PR message**: nin_ do not exist in sqlalchemy orm, it's not_in
3 months ago
ccurme 3999761201
multiple: add `stop` attribute (#22573) 3 months ago
ccurme e08879147b
Revert "anthropic: stream token usage" (#22624)
Reverts langchain-ai/langchain#20180
3 months ago
Bagatur 0d495f3f63
anthropic: stream token usage (#20180)
open to other ideas
<img width="1181" alt="Screenshot 2024-04-08 at 5 34 08 PM"
src="https://github.com/langchain-ai/langchain/assets/22008038/03eb11c4-5eb5-43e3-9109-a13f76098fa4">

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
3 months ago
Satyam Kumar 17b486a37b
openai, azure: update model_name in ChatResult to use name from API response (#22569)
The response.get("model", self.model_name) checks if the model key
exists in the response dictionary. If it does, it uses that value;
otherwise, it uses self.model_name.

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Christophe Bornet 12ddb4fc6f
core[patch]: Use explicit classes for InMemoryByteStore and InMemoryStore (#22608)
The current implementation doesn't work well with type checking.
Instead replace with class definition that correctly works with type
checking.
3 months ago
andyjessen cfed68e06f
docs: Fix description (#22611)
This commit fixes the description of the hair_color field.
3 months ago
ccurme 1925bde32e
together: bump langchain-core (#22616)
langchain-together depends on langchain-openai ^0.1.8
langchain-openai 0.1.8 has langchain-core >= 0.2.2

Here we bump langchain-core to 0.2.2, just to pass minimum dependency
version tests.
3 months ago
ccurme 35f4aa927b
together[patch]: Release 0.1.3 (#22615) 3 months ago
andyjessen 8b40428f58
docs: Fix typo (#22603)
This commit changes minor typo in the field description.
3 months ago
Isaac Francisco ba3e219d83
community[patch]: recursive url loader fix and unit tests (#22521)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Jeffrey Mak 5fc5ed463c
community[patch]:Support filter for AzureAISearchRetriever (#22303)
**Description**: 
The AzureAISearchRetriever does not support the "$filter" argument
offered in the AISearch API:
https://learn.microsoft.com/en-us/rest/api/searchservice/documents/search-get?view=rest-searchservice-2023-11-01&tabs=HTTP
The $filter allows filtering of indexes based on values in metadata.

**Issue**: 
https://github.com/langchain-ai/langchain/issues/19885

**Dependencies**: 
No

**Twitter handle**: 
@Jeffreym9M
 

- [ ] **Add tests and docs**: Not relevant


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
3 months ago
Isaac Francisco 148088a588
docs: duckduckgosearch options listed (#22568)
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
X-HAN 62f13f95e4
community[minor]: add DashScope Rerank (#22403)
**Description:** this PR adds DashScope Rerank capability to Langchain,
you can find DashScope Rerank API from
[here](https://help.aliyun.com/document_detail/2780058.html?spm=a2c4g.2780059.0.0.6d995024FlrJ12)
&
[here](https://help.aliyun.com/document_detail/2780059.html?spm=a2c4g.2780058.0.0.63f75024cr11N9).
[DashScope](https://dashscope.aliyun.com/) is the generative AI service
from Alibaba Cloud (Aliyun). You can create DashScope API key from
[here](https://bailian.console.aliyun.com/?apiKey=1#/api-key).

**Dependencies:** DashScopeRerank depends on `dashscope` python package.

**Twitter handle:** my twitter/x account is https://x.com/LastMonopoly
and I'd like a mention, thanks you!


**Tests and docs**
  1. integration test: `test_dashscope_rerank.py`
  2. example notebook: `dashscope_rerank.ipynb`

**Lint and test**: I have run `make format`, `make lint` and `make test`
from the root of the package I've modified.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
Ethan Yang 29064848f9
[Community]add option to delete the prompt from HF output (#22225)
This will help to solve pattern mismatching issue when parsing the
output in Agent.

https://github.com/langchain-ai/langchain/issues/21912
3 months ago
Bagatur 584a1e30ac
community[patch]: AzureSearch async functions (#22075) 3 months ago
Bagatur 1a911018bc
langchain[minor]: add universal init_model (#22039)
decisions to discuss
- only chat models
- model_provider isn't based on any existing values like llm-type,
package names, class names
- implemented as function not as a wrapper ChatModel
- function name (init_model)
- in langchain as opposed to community or core
- marked beta
3 months ago
ccurme af129974a3
community: update how OpenAIAssistantV2Runnable creates threads with tool_resources (#22549)
https://github.com/langchain-ai/langchain/issues/22503
3 months ago
Bagatur 51a0d4574e
community[patch]: Release 0.2.3 (#22562) 3 months ago
Bagatur b2daba37c7
nomic[patch]: Release 0.1.2 (#22561) 3 months ago
Zach Nussbaum 14f3014cce
embeddings: nomic embed vision (#22482)
Thank you for contributing to LangChain!

**Description:** Adds Langchain support for Nomic Embed Vision
**Twitter handle:** nomic_ai,zach_nussbaum


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
leila-messallem 3280a5b49b
community[patch]: improve test setup to accurately test filtering of labels in neo4j (#22531)
**Description:** This PR addresses an issue with an existing test that
was not effectively testing the intended functionality. The previous
test setup did not adequately validate the filtering of the labels in
neo4j, because the nodes and relationship in the test data did not have
any properties set. Without properties these labels would not have been
returned, regardless of the filtering.

---------

Co-authored-by: Oskar Hane <oh@oskarhane.com>
3 months ago
Mohammad Mohtashim 7fcef2556c
[Experimental]: Async agenerate method ollama functions (#21682)
- **Description:** :
Added Async method for Generate for OllamaFunctions which was missing
and was raising errors for the users.
   
- **Issue:** 
#21422
3 months ago
Stefano Lottini 328d0c99f2
community[minor]: Add support for metadata indexing policy in Cassandra vector store (#22548)
This PR adds a constructor `metadata_indexing` parameter to the
Cassandra vector store to allow optional fine-tuning of which fields of
the metadata are to be indexed.

This is a feature supported by the underlying CassIO library. Indexing
mode of "all", "none" or deny- and allow-list based choices are
available.

The rationale is, in some cases it's advisable to programmatically
exclude some portions of the metadata from the index if one knows in
advance they won't ever be used at search-time. this keeps the index
more lightweight and performant and avoids limitations on the length of
_indexed_ strings.

I added a integration test of the feature. I also added the possibility
of running the integration test with Cassandra on an arbitrary IP
address (e.g. Dockerized), via
`CASSANDRA_CONTACT_POINTS=10.1.1.5,10.1.1.6 poetry run pytest [...]` or
similar.

While I was at it, I added a line to the `.gitignore` since the mypy
_test_ cache was not ignored yet.

My X (Twitter) handle: @rsprrs.
3 months ago
Emilien Chauvet c3d4126eb1
community[minor]: add user agent for web scraping loaders (#22480)
**Description:** This PR adds a `USER_AGENT` env variable that is to be
used for web scraping. It creates a util to get that user agent and uses
it in the classes used for scraping in [this piece of
doc](https://python.langchain.com/v0.1/docs/use_cases/web_scraping/).
Identifying your scraper is considered a good politeness practice, this
PR aims at easing it.
**Issue:** `None`
**Dependencies:** `None`
**Twitter handle:** `None`
3 months ago
Philippe PRADOS 8250c177de
community[minor]: Add native async support to SQLChatMessageHistory (#22065)
# package community: Fix SQLChatMessageHistory

## Description
Here is a rewrite of `SQLChatMessageHistory` to properly implement the
asynchronous approach. The code circumvents [issue
22021](https://github.com/langchain-ai/langchain/issues/22021) by
accepting a synchronous call to `def add_messages()` in an asynchronous
scenario. This bypasses the bug.

For the same reasons as in [PR
22](https://github.com/langchain-ai/langchain-postgres/pull/32) of
`langchain-postgres`, we use a lazy strategy for table creation. Indeed,
the promise of the constructor cannot be fulfilled without this. It is
not possible to invoke a synchronous call in a constructor. We
compensate for this by waiting for the next asynchronous method call to
create the table.

The goal of the `PostgresChatMessageHistory` class (in
`langchain-postgres`) is, among other things, to be able to recycle
database connections. The implementation of the class is problematic, as
we have demonstrated in [issue
22021](https://github.com/langchain-ai/langchain/issues/22021).

Our new implementation of `SQLChatMessageHistory` achieves this by using
a singleton of type (`Async`)`Engine` for the database connection. The
connection pool is managed by this singleton, and the code is then
reentrant.

We also accept the type `str` (optionally complemented by `async_mode`.
I know you don't like this much, but it's the only way to allow an
asynchronous connection string).

In order to unify the different classes handling database connections,
we have renamed `connection_string` to `connection`, and `Session` to
`session_maker`.

Now, a single transaction is used to add a list of messages. Thus, a
crash during this write operation will not leave the database in an
unstable state with a partially added message list. This makes the code
resilient.

We believe that the `PostgresChatMessageHistory` class is no longer
necessary and can be replaced by:
```
PostgresChatMessageHistory = SQLChatMessageHistory
```
This also fixes the bug.


## Issue
- [issue 22021](https://github.com/langchain-ai/langchain/issues/22021)
  - Bug in _exit_history()
  - Bugs in PostgresChatMessageHistory and sync usage
  - Bugs in PostgresChatMessageHistory and async usage
- [issue
36](https://github.com/langchain-ai/langchain-postgres/issues/36)
 ## Twitter handle:
pprados

## Tests
- libs/community/tests/unit_tests/chat_message_histories/test_sql.py
(add async test)

@baskaryan, @eyurtsev or @hwchase17 can you check this PR ?
And, I've been waiting a long time for validation from other PRs. Can
you take a look?
- [PR 32](https://github.com/langchain-ai/langchain-postgres/pull/32)
- [PR 15575](https://github.com/langchain-ai/langchain/pull/15575)
- [PR 13200](https://github.com/langchain-ai/langchain/pull/13200)

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
3 months ago
Vincent Min 59bef31997
community[minor]: Improve InMemoryVectorStore with ability to persist to disk and filter on metadata. (#22186)
- **Description:** The InMemoryVectorStore is a nice and simple vector
store implementation for quick development and debugging. The current
implementation is quite limited in its functionalities. This PR extends
the functionalities by adding utility function to persist the vector
store to a json file and to load it from a json file. We choose the json
file format because it allows inspection of the database contents in a
text editor, which is great for debugging. Furthermore, it adds a
`filter` keyword that can be used to filter out documents on their
`page_content` or `metadata`.
- **Issue:** -
- **Dependencies:** -
- **Twitter handle:** @Vincent_Min
3 months ago
Christophe Bornet c34ad8c163
core[patch]: Improve VectorStore API doc (#22547) 3 months ago
maang-h 89128b7a49
community[patch]: add detailed paragraph and example for BaichuanTextEmbeddings (#22031)
- **Description:** add detailed paragraph and example for
BaichuanTextEmbeddings
   - **Issue:** the issue #21983
3 months ago
Anthony Bernabeu 4e676a63b8
community[minor]: Added filter search for LanceDB (#22461)
- [ ] **community**: "vectorstore: added filtering support for LanceDB
vector store"

- [ ] **This PR adds filtering capabilities to LanceDB**:
- **Description:** In LanceDB filtering can be applied when searching
for data into the vectorstore. It is using the SQL language as mentioned
in the LanceDB documentation.
    - **Issue:** #18235 
    - **Dependencies:** No

- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
3 months ago
Erick Friis 4050d6ea2b
huggingface: remove text-generation dep (#22543) 3 months ago