Commit Graph

10144 Commits (d2c7379f1cbcd40ce22d8685d4173889ed38ce5f)
 

Author SHA1 Message Date
Anders Swanson aacc6198b9
community: OCI GenAI embedding batch size (#22986)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: OCI GenAI embedding batch size"



- [x] **PR message**:
    - **Issue:** #22985 


- [ ] **Add tests and docs**: N/A


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Signed-off-by: Anders Swanson <anders.swanson@oracle.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
3 months ago
Bagatur 8235bae48e
core[patch]: Release 0.2.8 (#23012) 3 months ago
Bagatur 5ee6e22983
infra: test all dependents on any change (#22994) 3 months ago
Nuno Campos bd4b68cd54
core: run_in_executor: Wrap StopIteration in RuntimeError (#22997)
- StopIteration can't be set on an asyncio.Future it raises a TypeError
and leaves the Future pending forever so we need to convert it to a
RuntimeError
3 months ago
Bagatur d96f67b06f
standard-tests[patch]: Update chat model standard tests (#22378)
- Refactor standard test classes to make them easier to configure
- Update openai to support stop_sequences init param
- Update groq to support stop_sequences init param
- Update fireworks to support max_retries init param
- Update ChatModel.bind_tools to type tool_choice
- Update groq to handle tool_choice="any". **this may be controversial**

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
3 months ago
Bob Lin 14f0cdad58
docs: Add some 3rd party tutorials (#22931)
Langchain is very popular among developers in China, but there are still
no good Chinese books or documents, so I want to add my own Chinese
resources on langchain topics, hoping to give Chinese readers a better
experience using langchain. This is not a translation of the official
langchain documentation, but my understanding.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
3 months ago
Jacob Lee 893299c3c9
docs[patch]: Reorder streaming guide, add tags (#22993)
CC @hinthornw
3 months ago
Oguz Vuruskaner dd25d08c06
community[minor]: add tool calling for DeepInfraChat (#22745)
DeepInfra now supports tool calling for supported models.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Bagatur 158701ab3c
docs: update universal init title (#22990) 3 months ago
Lance Martin a54deba6bc
Add RAG to conceptual guide (#22790)
Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
3 months ago
maang-h c6b7db6587
community: Add Baichuan Embeddings batch size (#22942)
- **Support batch size** 
Baichuan updates the document, indicating that up to 16 documents can be
imported at a time

- **Standardized model init arg names**
    - baichuan_api_key -> api_key
    - model_name  -> model
3 months ago
ccurme 722c8f50ea
openai[patch]: add stream_usage parameter (#22854)
Here we add `stream_usage` to ChatOpenAI as:

1. a boolean attribute
2. a kwarg to _stream and _astream.

Question: should the `stream_usage` attribute be `bool`, or `bool |
None`?

Currently I've kept it `bool` and defaulted to False. It was implemented
on
[ChatAnthropic](e832bbb486/libs/partners/anthropic/langchain_anthropic/chat_models.py (L535))
as a bool. However, to maintain support for users who access the
behavior via OpenAI's `stream_options` param, this ends up being
possible:
```python
llm = ChatOpenAI(model_kwargs={"stream_options": {"include_usage": True}})
assert not llm.stream_usage
```
(and this model will stream token usage).

Some options for this:
- it's ok
- make the `stream_usage` attribute bool or None
- make an \_\_init\_\_ for ChatOpenAI, set a `._stream_usage` attribute
and read `.stream_usage` from a property

Open to other ideas as well.
3 months ago
Shubham Pandey 56ac94e014
community[minor]: add `ChatSnowflakeCortex` chat model (#21490)
**Description:** This PR adds a chat model integration for [Snowflake
Cortex](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions),
which gives an instant access to industry-leading large language models
(LLMs) trained by researchers at companies like Mistral, Reka, Meta, and
Google, including [Snowflake
Arctic](https://www.snowflake.com/en/data-cloud/arctic/), an open
enterprise-grade model developed by Snowflake.

**Dependencies:** Snowflake's
[snowpark](https://pypi.org/project/snowflake-snowpark-python/) library
is required for using this integration.

**Twitter handle:** [@gethouseware](https://twitter.com/gethouseware)

- [x] **Add tests and docs**:
1. integration tests:
`libs/community/tests/integration_tests/chat_models/test_snowflake.py`
2. unit tests:
`libs/community/tests/unit_tests/chat_models/test_snowflake.py`
  3. example notebook: `docs/docs/integrations/chat/snowflake.ipynb`


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
3 months ago
Lance Martin ea96133890
docs: Update llamacpp ntbk (#22907)
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Bagatur e2304ebcdb
standard-tests[patch]: Release 0.1.1 (#22984) 3 months ago
Hakan Özdemir c437b1aab7
[Partner]: Add metadata to stream response (#22716)
Adds `response_metadata` to stream responses from OpenAI. This is
returned with `invoke` normally, but wasn't implemented for `stream`.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
3 months ago
Baskar Gopinath 42a379c75c
docs: Standardise formatting (#22948)
Standardised formatting 


![image](https://github.com/langchain-ai/langchain/assets/73015364/ea3b5c5c-e7a6-4bb7-8c6b-e7d8cbbbf761)
3 months ago
Ikko Eltociear Ashimine 3e7bb7690c
docs: update databricks.ipynb (#22949)
arbitary -> arbitrary
3 months ago
Baskar Gopinath 19356b6445
Update sql_qa.ipynb (#22966)
fixes #22798 
fixes #22963
3 months ago
Bagatur 9ff249a38d
standard-tests[patch]: don't require str chunk contents (#22965) 3 months ago
Daniel Glogowski 892bd4c29b
docs: nim model name update (#22943)
NIM Model name change in a notebook and mdx file.

Thanks!
3 months ago
Christopher Tee ada03dd273
community(you): Better support for You.com News API (#22622)
## Description
While `YouRetriever` supports both You.com's Search and News APIs, news
is supported as an afterthought.
More specifically, not all of the News API parameters are exposed for
the user, only those that happen to overlap with the Search API.

This PR:
- improves support for both APIs, exposing the remaining News API
parameters while retaining backward compatibility
- refactor some REST parameter generation logic
- updates the docstring of `YouSearchAPIWrapper`
- add input validation and warnings to ensure parameters are properly
set by user
- 🚨 Breaking: Limit the news results to `k` items

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
3 months ago
ccurme e09c6bb58b
infra: update integration test workflow (#22945) 3 months ago
Tomaz Bratanic 1c661fd849
Improve llm graph transformer docstring (#22939) 3 months ago
maang-h 7a0af56177
docs: update ZhipuAI ChatModel docstring (#22934)
- **Description:** Update ZhipuAI ChatModel rich docstring
- **Issue:** the issue #22296
3 months ago
Appletree24 6838804116
docs:Fix mispelling in streaming doc (#22936)
Description: Fix mispelling
Issue: None
Dependencies: None
Twitter handle: None

Co-authored-by: qcloud <ubuntu@localhost.localdomain>
3 months ago
Bitmonkey 570d45b2a1
Update ollama.py with optional raw setting. (#21486)
Ollama has a raw option now. 

https://github.com/ollama/ollama/blob/main/docs/api.md

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
3 months ago
caiyueliang 9944ad7f5f
community: 'Solve the issue where the _search function in ElasticsearchStore supports passing a query_vector parameter, but the parameter does not take effect. (#21532)
**Issue:**
When using the similarity_search_with_score function in
ElasticsearchStore, I expected to pass in the query_vector that I have
already obtained. I noticed that the _search function does support the
query_vector parameter, but it seems to be ineffective. I am attempting
to resolve this issue.

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
3 months ago
Erick Friis 764f1958dd
docs: add ollama json mode (#22926)
fixes #22910
3 months ago
Erick Friis c374c98389
experimental: release 0.0.61 (#22924) 3 months ago
BuxianChen af65cac609
cli[minor]: remove redefined DEFAULT_GIT_REF (#21471)
remove redefined DEFAULT_GIT_REF

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
3 months ago
Erick Friis 79a64207f5
community: release 0.2.5 (#22923) 3 months ago
Jiejun Tan c8c67dde6f
text-splitters[patch]: Fix HTMLSectionSplitter (#22812)
Update former pull request:
https://github.com/langchain-ai/langchain/pull/22654.

Modified `langchain_text_splitters.HTMLSectionSplitter`, where in the
latest version `dict` data structure is used to store sections from a
html document, in function `split_html_by_headers`. The header/section
element names serve as dict keys. This can be a problem when duplicate
header/section element names are present in a single html document.
Latter ones can replace former ones with the same name. Therefore some
contents can be miss after html text splitting is conducted.

Using a list to store sections can hopefully solve the problem. A Unit
test considering duplicate header names has been added.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Erick Friis fbeeb6da75
langchain: release 0.2.5 (#22922) 3 months ago
Erick Friis 551640a030
templates: remove lockfiles (#22920)
poetry will default to latest versions without
3 months ago
Baskar Gopinath c4f2bc9540
docs: Fix wrongly referenced class name in confluence.py (#22879)
Fixes #22542

Changed ConfluenceReader to ConfluenceLoader
3 months ago
ccurme 32966a08a9
infra: remove nvidia from monorepo scheduled tests (#22915)
Scheduled tests run in
https://github.com/langchain-ai/langchain-nvidia/tree/main
3 months ago
Erick Friis 9ef15691d6
core: release 0.2.7 (#22917) 3 months ago
Nuno Campos 338180f383
core: in astream_events v2 always await task even if already finished (#22916)
- this ensures exceptions propagate to the caller
3 months ago
Istvan/Nebulinq 513e491ce9
experimental: LLMGraphTransformer - added relationship properties. (#21856)
- **Description:** 
The generated relationships in the graph had no properties, but the
Relationship class was properly defined with properties. This made it
very difficult to transform conditional sentences into a graph. Adding
properties to relationships can solve this issue elegantly.
The changes expand on the existing LLMGraphTransformer implementation
but add the possibility to define allowed relationship properties like
this: LLMGraphTransformer(llm=llm, relationship_properties=["Condition",
"Time"],)
- **Issue:** 
    no issue found
 - **Dependencies:**
    n/a
- **Twitter handle:** 
    @IstvanSpace


-Quick Test
=================================================================
from dotenv import load_dotenv
import os
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers import
LLMGraphTransformer
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document

load_dotenv()
os.environ["NEO4J_URI"] = os.getenv("NEO4J_URI")
os.environ["NEO4J_USERNAME"] = os.getenv("NEO4J_USERNAME")
os.environ["NEO4J_PASSWORD"] = os.getenv("NEO4J_PASSWORD")
graph = Neo4jGraph()
llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
llm_transformer = LLMGraphTransformer(llm=llm)
#text = "Harry potter likes pies, but only if it rains outside"
text = "Jack has a dog named Max. Jack only walks Max if it is sunny
outside."
documents = [Document(page_content=text)]
llm_transformer_props = LLMGraphTransformer(
    llm=llm,
    relationship_properties=["Condition"],
)
graph_documents_props =
llm_transformer_props.convert_to_graph_documents(documents)
print(f"Nodes:{graph_documents_props[0].nodes}")
print(f"Relationships:{graph_documents_props[0].relationships}")
graph.add_graph_documents(graph_documents_props)

---------

Co-authored-by: Istvan Lorincz <istvan.lorincz@pm.me>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
ccurme 694ae87748
docs: add groq to chatmodeltabs (#22913) 3 months ago
Eugene Yurtsev c816d03699
dcos: Add admonition to PythonREPL tool (#22909)
Add admonition to the documentation to make sure users are aware that
the tool allows execution of code on the host machine using a python
interpreter (by design).
3 months ago
kiarina 8171efd07a
core[patch]: Fix FunctionCallbackHandler._on_tool_end (#22908)
If the global `debug` flag is enabled, the agent will get the following
error in `FunctionCallbackHandler._on_tool_end` at runtime.

```
Error in ConsoleCallbackHandler.on_tool_end callback: AttributeError("'list' object has no attribute 'strip'")
```

By calling str() before strip(), the error was avoided.
This error can be seen at
[debugging.ipynb](https://github.com/langchain-ai/langchain/blob/master/docs/docs/how_to/debugging.ipynb).

- Issue: NA
- Dependencies: NA
- Twitter handle: https://x.com/kiarina37
3 months ago
Philippe PRADOS b61de9728e
community[minor]: Fix long_context_reorder.py async (#22839)
Implement `async def atransform_documents( self, documents:
Sequence[Document], **kwargs: Any ) -> Sequence[Document]` for
`LongContextReorder`
3 months ago
Eugene Yurtsev c72bcda4f2
community[major], experimental[patch]: Remove Python REPL from community (#22904)
Remove the REPL from community, and suggest an alternative import from
langchain_experimental.

Fix for this issue:
https://github.com/langchain-ai/langchain/issues/14345

This is not a bug in the code or an actual security risk. The python
REPL itself is behaving as expected.

The PR is done to appease blanket security policies that are just
looking for the presence of exec in the code.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
3 months ago
Eugene Yurtsev 9a877c7adb
community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903)
This PR restricts the depth to which the sitemap can be parsed.

Fix for: CVE-2024-2965
3 months ago
Eugene Yurtsev 4a77a3ab19
core[patch]: fix validation of @deprecated decorator (#22513)
This PR moves the validation of the decorator to a better place to avoid
creating bugs while deprecating code.

Prevent issues like this from arising:
https://github.com/langchain-ai/langchain/issues/22510

we should replace with a linter at some point that just does static
analysis
3 months ago
Jacob Lee 181a61982f
anthropic[minor]: Adds streaming tool call support for Anthropic (#22687)
Preserves string content chunks for non tool call requests for
convenience.

One thing - Anthropic events look like this:

```
RawContentBlockStartEvent(content_block=TextBlock(text='', type='text'), index=0, type='content_block_start')
RawContentBlockDeltaEvent(delta=TextDelta(text='<thinking>\nThe', type='text_delta'), index=0, type='content_block_delta')
RawContentBlockDeltaEvent(delta=TextDelta(text=' provide', type='text_delta'), index=0, type='content_block_delta')
...
RawContentBlockStartEvent(content_block=ToolUseBlock(id='toolu_01GJ6x2ddcMG3psDNNe4eDqb', input={}, name='get_weather', type='tool_use'), index=1, type='content_block_start')
RawContentBlockDeltaEvent(delta=InputJsonDelta(partial_json='', type='input_json_delta'), index=1, type='content_block_delta')
```

Note that `delta` has a `type` field. With this implementation, I'm
dropping it because `merge_list` behavior will concatenate strings.

We currently have `index` as a special field when merging lists, would
it be worth adding `type` too?

If so, what do we set as a context block chunk? `text` vs.
`text_delta`/`tool_use` vs `input_json_delta`?

CC @ccurme @efriis @baskaryan
3 months ago
ccurme f40b2c6f9d
fireworks[patch]: add usage_metadata to (a)invoke and (a)stream (#22906) 3 months ago
Mohammad Mohtashim d1b7a934aa
[Community]: HuggingFaceCrossEncoder `score` accounting for <not-relevant score,relevant score> pairs. (#22578)
- **Description:** Some of the Cross-Encoder models provide scores in
pairs, i.e., <not-relevant score (higher means the document is less
relevant to the query), relevant score (higher means the document is
more relevant to the query)>. However, the `HuggingFaceCrossEncoder`
`score` method does not currently take into account the pair situation.
This PR addresses this issue by modifying the method to consider only
the relevant score if score is being provided in pair. The reason for
focusing on the relevant score is that the compressors select the top-n
documents based on relevance.
    - **Issue:** #22556 
- Please also refer to this
[comment](https://github.com/UKPLab/sentence-transformers/issues/568#issuecomment-729153075)
3 months ago