Commit Graph

4289 Commits

Author SHA1 Message Date
SaschaStoll
709664a079
community[patch]: Performant filter columns option for Hanavector (#21971)
**Description:** Backwards compatible extension of the initialisation
interface of HanaDB to allow the user to specify
specific_metadata_columns that are used for metadata storage of selected
keys which yields increased filter performance. Any not-mentioned
metadata remains in the general metadata column as part of a JSON
string. Furthermore switched to executemany for batch inserts into
HanaDB.

**Issue:** N/A

**Dependencies:** no new dependencies added

**Twitter handle:** @sapopensource

---------

Co-authored-by: Martin Kolb <martin.kolb@sap.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-22 13:21:21 -07:00
Bagatur
16b55b0704
langchain[patch]: remove dataclasses-json dep (#22042)
vestigial dep afaict
2024-05-22 13:20:57 -07:00
Christos Boulmpasakos
c3bcfad66d
text-splitters[patch]: Extend TextSplitter:keep_separator functionality (#21130)
**Description:** Added extra functionality to `CharacterTextSplitter`,
`TextSplitter` classes.
The user can select whether to append the separator to the previous
chunk with `keep_separator='end' ` or else prepend to the next chunk.
Previous functionality prepended by default to next chunk.
  
**Issue:** Fixes #20908

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-05-22 13:17:45 -07:00
Eric Zhang
e7e41eaabe
langchain: add RankLLM Reranker (#21171)
Integrate RankLLM reranker (https://github.com/castorini/rank_llm) into
LangChain

An example notebook is given in
`docs/docs/integrations/retrievers/rankllm-reranker.ipynb`

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-05-22 20:12:55 +00:00
maang-h
fc93bed8c4
community: Fix CSVLoader columns is None (#20701)
- **Bug code**: In
langchain_community/document_loaders/csv_loader.py:100

- **Description**: currently, when 'CSVLoader' reads the column as None
in the 'csv' file, it will report an error because the 'CSVLoader' does
not verify whether the column is of str type and does not consider how
to handle the corresponding 'row_data' when the column is' None 'in the
csv. This pr provides a solution.

- **Issue:**  Fix #20699 

- **thinking:**

1. Refer to the processing method for
'langchain_community/document_loaders/csv_loader.py:100' when **'v'**
equals'None', and apply the same method to '**k**'.
(Reference`csv.DictReader` ,**'k'** will only be None when `
len(columns) < len(number_row_data)` is established)
2. **‘k’** equals None only holds when it is the last column, and its
corresponding **'v'** type is a list. Therefore, I referred to the data
format in 'Document' and used ',' to concatenated the elements in the
list.(But I'm not sure if you accept this form, if you have any other
ideas, communicate)

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-05-22 12:57:46 -07:00
Nithin James Padayatti
403142eaba
langchain: added revision_example prompt template (#20916)
**Description:** Added revision_example prompt template to include the
revision request and revision examples in the revision chain.
    **Issue:** Not Applicable
    **Dependencies:** Not Applicable
    **Twitter handle:**  @nithinjp09
2024-05-22 19:57:32 +00:00
Sihan Chen
1f81277b9b
community[minor]: allow enabling proxy in aiohttp session in AsyncHTML (#19499)
Allow enabling proxy in aiohttp session async html
2024-05-22 18:25:06 +00:00
Eugene Yurtsev
36813d2f00
community[patch]: Fix remaining __inits__ in community (#22037)
Fixes the __init__ files in community to use __all__ which is statically
defined.
2024-05-22 17:42:17 +00:00
Eugene Yurtsev
58360a1e53
community[patch]: Add unit test to verify that init is correctly defined (#22030)
Fix some __init__ files and add a unit test
2024-05-22 17:19:00 +00:00
Erick Friis
ef53ccf54b
robocorp: release 0.0.8 (#22034) 2024-05-22 16:41:41 +00:00
Matthew Hoffman
4f2e3bd7fd
community[patch]: fix public interface for embeddings module (#21650)
## Description

The existing public interface for `langchain_community.emeddings` is
broken. In this file, `__all__` is statically defined, but is
subsequently overwritten with a dynamic expression, which type checkers
like pyright do not support. pyright actually gives the following
diagnostic on the line I am requesting we remove:


[reportUnsupportedDunderAll](https://github.com/microsoft/pyright/blob/main/docs/configuration.md#reportUnsupportedDunderAll):

```
Operation on "__all__" is not supported, so exported symbol list may be incorrect
```

Currently, I get the following errors when attempting to use publicablly
exported classes in `langchain_community.emeddings`:

```python
import langchain_community.embeddings

langchain_community.embeddings.HuggingFaceEmbeddings(...)  #  error: "HuggingFaceEmbeddings" is not exported from module "langchain_community.embeddings" (reportPrivateImportUsage)
```

This is solved easily by removing the dynamic expression.
2024-05-22 11:42:15 -04:00
Eugene Yurtsev
8d82160a8a
community[patch]: Clean up logic in import checking unit test (#22026)
Clean up unit test
2024-05-22 15:30:10 +00:00
Tomaz Bratanic
d8a1f1114d
community[patch]: Handle exceptions where node props aren't consistent in neo4j schema (#22027) 2024-05-22 11:21:56 -04:00
WeichenXu
b0ef5e778a
community[patch]: Fix ChatDatabricsk in case that streaming response doesn't have role field in delta chunk (#21897)
Thank you for contributing to LangChain!

- [X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


**Description:**
Fix ChatDatabricsk in case that streaming response doesn't have role
field in delta chunk


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2024-05-22 08:12:53 -07:00
Eugene Yurtsev
aed64daabb
community[patch]: Add unit test to catch bad __all__ definitions (#21996)
This will catch all dynamic __all__ definitions.
2024-05-22 09:32:13 -04:00
Bagatur
3b0437c05b
core[patch]: Release 0.2.1 (#22003) 2024-05-22 00:05:04 +00:00
Kefan You
24b5c27bb1
community[patch]: raise_for_status logic missing in async _fetch of WebBaseLoader (#21948)
## 'raise_for_status' parameter of WebBaseLoader works in sync load but
not in async load.
In webBaseLoader:  

Sync load is calling `_scrape` and has `raise_for_status` properly
handled.
```
    def _scrape(
        self,
        url: str,
        parser: Union[str, None] = None,
        bs_kwargs: Optional[dict] = None,
    ) -> Any:
        from bs4 import BeautifulSoup

        if parser is None:
            if url.endswith(".xml"):
                parser = "xml"
            else:
                parser = self.default_parser

        self._check_parser(parser)

        html_doc = self.session.get(url, **self.requests_kwargs)
        if self.raise_for_status:
            html_doc.raise_for_status()

        if self.encoding is not None:
            html_doc.encoding = self.encoding
        elif self.autoset_encoding:
            html_doc.encoding = html_doc.apparent_encoding
        return BeautifulSoup(html_doc.text, parser, **(bs_kwargs or {}))
```
Async load is calling `_fetch` but missing `raise_for_status` logic.
```
    async def _fetch(
        self, url: str, retries: int = 3, cooldown: int = 2, backoff: float = 1.5
    ) -> str:
        async with aiohttp.ClientSession() as session:
            for i in range(retries):
                try:
                    async with session.get(
                        url,
                        headers=self.session.headers,
                        ssl=None if self.session.verify else False,
                        cookies=self.session.cookies.get_dict(),
                    ) as response:
                        return await response.text()
```

Co-authored-by: kefan.you <darkfss@sina.com>
2024-05-21 23:51:03 +00:00
Surya Rath
eb096675a8
OpenAI Assistants v2 api support for OpenAIAssistantRunnable (#21484)
**Title**: "langchain: OpenAI Assistants v2 api support"

***Descriptions*** 
- [x] "attachments" support added along with backward compatibility of
"file_ids"
- [x]  "tool_resources" support added while creating new assistant

- [ ] "tool_choice" parameter support
- [ ]  Streaming support


- **Dependencies:** OpenAI v2 API (openai>=1.23.0)
- **Twitter handle:** @skanta_rath

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-05-21 15:32:29 -07:00
Eugene Yurtsev
7a5d042bd2
langchain[patch]: Add unit test to detect changes to community imports (#21998)
Add unit tests for community imports
2024-05-21 17:45:26 -04:00
Eugene Yurtsev
90f4d8842f
langchain[patch]: Turn on all deprecations for 0.2 (#21999)
- Turn on all 0.2 import deprecations.
- Update error messag with URL to upgrade instructions.
2024-05-21 17:33:43 -04:00
Asaf Joseph Gardin
a042e804b4
ai21: AI21 Jamba docs (#21978)
- Updated docs to have an example to use Jamba instead of J2

---------

Co-authored-by: Asaf Gardin <asafg@ai21.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-05-21 19:27:46 +00:00
Pengcheng Liu
4cf523949a
community[patch]: Update model client to support vision model in Tong… (#21474)
- **Description:** Tongyi uses different client for chat model and
vision model. This PR chooses proper client based on model name to
support both chat model and vision model. Reference [tongyi
document](https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-qianwen-vl-plus-api?spm=a2c4g.11186623.0.0.27404c9a7upm11)
for details.

```
from langchain_core.messages import HumanMessage
from langchain_community.chat_models import ChatTongyi

llm = ChatTongyi(model_name='qwen-vl-max')
image_message = {
    "image": "https://lilianweng.github.io/posts/2023-06-23-agent/agent-overview.png"
}
text_message = {
    "text": "summarize this picture",
}
message = HumanMessage(content=[text_message, image_message])
llm.invoke([message])
```

- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
2024-05-21 11:58:27 -07:00
Sevin F. Varoglu
1bc0ea5496
community[patch]: update OctoAIEmbeddings to subclass OpenAIEmbeddings (#21805) 2024-05-21 11:29:41 -07:00
Eugene Yurtsev
ded53297e0
core[patch]: Add unit test for RunnableGenerator for eventstream v2 (#21990)
No unit tests with runnable generator
2024-05-21 14:29:15 -04:00
Nuno Campos
fb6108c8f5
core[patch]: In astream_events(version=v2) tap output of root run (#21977)
- if tap_output_iter/aiter is called multiple times for the same run
issue events only once
- if chat model run is tapped don't issue duplicate on_llm_new_token
events
- if first chunk arrives after run has ended do not emit it as a stream
event

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-05-21 14:03:57 -04:00
Bagatur
72d4a8eeed
community[patch]: AzureSearch dont overwrite default async (#21989) 2024-05-21 11:01:28 -07:00
ccurme
a983465694
docs: set default anthropic model (#21988)
`ChatAnthropic()` raises ValidationError.
2024-05-21 11:01:18 -07:00
ccurme
4be5537837
Revert "anthropic: set default model" (#21987)
Reverts langchain-ai/langchain#21986
2024-05-21 17:28:32 +00:00
ccurme
35439cf3bd
anthropic: set default model (#21986)
Various docs reference `ChatAnthropic()`, but this currently raises
ValidationError.
2024-05-21 17:24:31 +00:00
ccurme
0923136851
langchain: default to Runnable in MultiQueryRetriever (#21770)
- `llm_chain` becomes `Union[LLMChain, Runnable]`
- `.from_llm` creates a runnable

tested by verifying that docs/how_to/MultiQueryRetriever.ipynb runs
unchanged with sync/async invoke (and that it runs if we specifically
instantiate with LLMChain).
2024-05-21 17:01:05 +00:00
Yulong Wang
8e1aeb8ad5
community[patch]: Fix typo in arxiv tool's doc (#21970)
Fix typo in arxiv tool's doc
2024-05-21 13:44:59 +00:00
Robert Caulk
54adcd9e82
community[minor]: add AskNews retriever and AskNews tool (#21581)
We add a tool and retriever for the [AskNews](https://asknews.app)
platform with example notebooks.

The retriever can be invoked with:

```py
from langchain_community.retrievers import AskNewsRetriever

retriever = AskNewsRetriever(k=3)

retriever.invoke("impact of fed policy on the tech sector")
```

To retrieve 3 documents in then news related to fed policy impacts on
the tech sector. The included notebook also includes deeper details
about controlling filters such as category and time, as well as
including the retriever in a chain.

The tool is quite interesting, as it allows the agent to decide how to
obtain the news by forming a query and deciding how far back in time to
look for the news:

```py
from langchain_community.tools.asknews import AskNewsSearch
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI

tool = AskNewsSearch()

instructions = """You are an assistant."""
base_prompt = hub.pull("langchain-ai/openai-functions-template")
prompt = base_prompt.partial(instructions=instructions)
llm = ChatOpenAI(temperature=0)
asknews_tool = AskNewsSearch()
tools = [asknews_tool]
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
)

agent_executor.invoke({"input": "How is the tech sector being affected by fed policy?"})
```

---------

Co-authored-by: Emre <e@emre.pm>
2024-05-20 18:23:06 -07:00
Jesse S
fc79b372cb
community[minor]: add aerospike vectorstore integration (#21735)
Please let me know if you see any possible areas of improvement. I would
very much appreciate your constructive criticism if time allows.

**Description:**
- Added a aerospike vector store integration that utilizes
[Aerospike-Vector-Search](https://aerospike.com/products/vector-database-search-llm/)
add-on.
- Added both unit tests and integration tests
- Added a docker compose file for spinning up a test environment
- Added a notebook

 **Dependencies:** any dependencies required for this change
- aerospike-vector-search

 **Twitter handle:** 
- No twitter, you can use my GitHub handle or LinkedIn if you'd like

Thanks!

---------

Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-21 01:01:47 +00:00
Prince Canuma
3587c60396
community[patch]: Fix MLX LLM Stream (#20575)
Closes #20561

This PR fixes MLX LLM stream `AttributeError`. 

Recently, `mlx-lm` changed the token decoding logic, which affected the
LC+MLX integration.

Additionally, I made minor fixes such as: docs example broken link and
enforcing pipeline arguments (max_tokens, temp and etc) for invoke.
   
- **Issue:** #20561
    
- **Twitter handle:** @Prince_Canuma
2024-05-20 17:17:08 -07:00
Rahul Triptahi
96bd0b0844
community[patch]: Remove redundant pebblo cloud api call (#21589)
Description: removed redundant pebblo cloud api call. Changed classified
`doc` key to `ai_apps_data`.
Documentation: N/A
Unit tests: N/A
2024-05-20 17:15:16 -07:00
Param Singh
d07885f8b7
community[patch]: standardized sparkllm init args (#21633)
Related to #20085 
@baskaryan 

Thank you for contributing to LangChain!

community:sparkllm[patch]: standardized init args

updated `spark_api_key` so that aliased to `api_key`. Added integration
test for `sparkllm` to test that it continues to set the same underlying
attribute.

updated temperature with Pydantic Field, added to the integration test.

Ran `make format`,`make test`, `make lint`, `make spell_check`
2024-05-20 17:11:36 -07:00
Dhruv Chawla
d4359d3de6
community[patch]: Update UpTrain Callback Handler to support the new UpTrain evaluation schema (#21656)
UpTrain has a new dashboard now that makes it easier to view projects
and evaluations. Using this requires specifying both project_name and
evaluation_name when performing evaluations. I have updated the code to
support it.
2024-05-20 17:06:00 -07:00
Alex Riina
c0e3c3a350
openai[patch], community[patch]: add pricing and max context window for GPT-4o (#21673)
# Add pricing and max context window for GPT-4o
- community: add cost per 1k tokens and max context window
- partners: add max context window

**Description:** adds static information about GPT-4o based on
https://openai.com/api/pricing/ and
https://platform.openai.com/docs/models/gpt-4o so that GPT-4o reporting
is accurate.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-20 23:47:43 +00:00
缨缨
bd39b2ccdf
community: enable SupabaseVectorStore to support extended table fields (#21762)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: enable SupabaseVectorStore to support
extended table fields"

- [x] **PR message**: 
- Added extension fields to the function _add_vectors so that users can
add other custom fields when insert a record into the database. eg:
    

![image](https://github.com/langchain-ai/langchain/assets/10885578/e1d5ca20-936e-4cab-ba69-8fdd23b8ce8f)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-20 16:32:26 -07:00
Leonid Ganeline
e98a4fd19a
ai21[patch]: configuration fix (#21790)
added "repository" and "Source Code" parameters (these parameters are
missed only in this partner package configuration).
2024-05-20 15:49:38 -07:00
Trayan Azarov
f54cbf8ff5
chroma[patch]: Chroma - remove reference to collection upon delete_collection (#21817)
**Description**:

- Reference to `Collection` object is set to `None` when deleting a
collection `delete_collection()`
- Added utility method `reset_collection()` to allow recreating the
collection
- Moved collection creation out of `__init__` into
`__ensure_collection()` to be reused by object init and
`reset_collection()`
- `_collection` is now a property to avoid breaking changes

**Issues**: 

- chroma-core/chroma#2213

**Twitter**: @t_azarov
2024-05-20 15:42:36 -07:00
Jens
b0b302ec6b
community[patch]: fixed aleph alpha default emedding request (#21826)
- **Description:** In the aleph alpha client the paramater `normalize`
is *not* optional. Setting this to `None` gives an error.
- **Dependencies:** None

Co-authored-by: Jens Lücke <jens.luecke@tngtech.com>
Co-authored-by: Jens <jens.luecke@hu-berlin.de>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-05-20 22:39:43 +00:00
Jorge Piedrahita Ortiz
e6207ad4f3
community[patch]: Sambanova integration api update (#21848)
- **Description:**:
        SambaStudio generic endpoint compatibility added
        Improved error description, and handling
        streaming examples added
2024-05-20 15:29:59 -07:00
Michael Reed
7a5e1bcf99
core[patch]: Fix NPE in function_calling._get_python_function_required_args (#21863)
Example error message:
line 206, in _get_python_function_required_args
    if is_function_type and required[0] == "self":
                            ~~~~~~~~^^^
IndexError: list index out of range

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-20 22:06:27 +00:00
Liuww
332ffed393
community[patch]: Adopting the lighter-weight xinference_client (#21900)
While integrating the xinference_embedding, we observed that the
downloaded dependency package is quite substantial in size. With a focus
on resource optimization and efficiency, if the project requirements are
limited to its vector processing capabilities, we recommend migrating to
the xinference_client package. This package is more streamlined,
significantly reducing the storage space requirements of the project and
maintaining a feature focus, making it particularly suitable for
scenarios that demand lightweight integration. Such an approach not only
boosts deployment efficiency but also enhances the application's
maintainability, rendering it an optimal choice for our current context.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-20 22:05:09 +00:00
Tomaz Bratanic
a43515ca65
experimental[patch]: Pass enum only to openai in llm graph transformer (#21860)
Some models like Groq return bad request if you pass in `enum` parameter
in tool definition
2024-05-20 15:02:48 -07:00
Jiří Spilka
6499897c87
community[patch]: update apify integration to attribute API activity to langchain (#21909)
**Description:** Add `Origin/langchain` to Apify's client's user-agent
to attribute API activity to LangChain (at Apify, we aim to monitor our
integrations to evaluate whether we should invest more in the LangChain
integration regarding functionality and content)

**Issue:** None
**Dependencies:** None
**Twitter handle:** None
2024-05-20 14:49:23 -07:00
Jared Van Bortel
25d1c1c9bb
nomic: implement local embeddings with the inference_mode parameter (#21934)
## Description

This PR implements local and dynamic mode in the Nomic Embed integration
using the inference_mode and device parameters. They work as documented
[here](https://docs.nomic.ai/reference/python-api/embeddings#local-inference).

<!-- If no one reviews your PR within a few days, please @-mention one
of baskaryan, efriis, eyurtsev, hwchase17. -->

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-05-20 14:17:07 -07:00
ccurme
0e72ed39a0
infra: fix CI on text-splitters (#21935) 2024-05-20 14:03:42 -07:00
ccurme
4470d3b4a0
partners: bump core in packages implementing ls_params (#21868)
These packages all import `LangSmithParams` which was released in
langchain-core==0.2.0.

N.B. we will need to release `openai` and then bump `langchain-openai`
in `together` and `upstage`.
2024-05-20 11:51:43 -07:00