**Description:**
Added support for a Pandas DataFrame OutputParser with format
instructions, along with unit tests and a demo notebook. Namely, we've
added the ability to request data from a DataFrame, have the LLM parse
the request, and then use that request to retrieve a well-formatted
response.
Within LangChain, it seamlessly integrates with language models like
OpenAI's `text-davinci-003`, facilitating streamlined interaction using
the format instructions (just like the other output parsers).
This parser structures its requests as
`<operation/column/row>[<optional_array_params>]`. The instructions
detail permissible operations, valid columns, and array formats,
ensuring clarity and adherence to the required format.
For example:
- When the LLM receives the input: "Retrieve the mean of `num_legs` from
rows 1 to 3."
- The provided format instructions guide the LLM to structure the
request as: "mean:num_legs[1..3]".
The parser processes this formatted request, leveraging the LLM's
understanding to extract the mean of `num_legs` from rows 1 to 3 within
the Pandas DataFrame.
This integration allows users to communicate requests naturally, with
the LLM transforming these instructions into structured commands
understood by the `PandasDataFrameOutputParser`. The format instructions
act as a bridge between natural language queries and precise DataFrame
operations, optimizing communication and data retrieval.
**Issue:**
- https://github.com/langchain-ai/langchain/issues/11532
**Dependencies:**
No additional dependencies :)
**Tag maintainer:**
@baskaryan
**Twitter handle:**
No need. :)
---------
Co-authored-by: Wasee Alam <waseealam@protonmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:**
When using Vald, only insecure grpc connection was supported, so secure
connection is now supported.
In addition, grpc metadata can be added to Vald requests to enable
authentication with a token.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Response_if_no_docs_found is not implemented in
ConversationalRetrievalChain for async code paths. Implemented it and
added test cases
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
# Description
This PR implements Self-Query Retriever for MongoDB Atlas vector store.
I've implemented the comparators and operators that are supported by
MongoDB Atlas vector store according to the section titled "Atlas Vector
Search Pre-Filter" from
https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/.
Namely:
```
allowed_comparators = [
Comparator.EQ,
Comparator.NE,
Comparator.GT,
Comparator.GTE,
Comparator.LT,
Comparator.LTE,
Comparator.IN,
Comparator.NIN,
]
"""Subset of allowed logical operators."""
allowed_operators = [
Operator.AND,
Operator.OR
]
```
Translations from comparators/operators to MongoDB Atlas filter
operators(you can find the syntax in the "Atlas Vector Search
Pre-Filter" section from the previous link) are done using the following
dictionary:
```
map_dict = {
Operator.AND: "$and",
Operator.OR: "$or",
Comparator.EQ: "$eq",
Comparator.NE: "$ne",
Comparator.GTE: "$gte",
Comparator.LTE: "$lte",
Comparator.LT: "$lt",
Comparator.GT: "$gt",
Comparator.IN: "$in",
Comparator.NIN: "$nin",
}
```
In visit_structured_query() the filters are passed as "pre_filter" and
not "filter" as in the MongoDB link above since langchain's
implementation of MongoDB atlas vector
store(libs\langchain\langchain\vectorstores\mongodb_atlas.py) in
_similarity_search_with_score() sets the "filter" key to have the value
of the "pre_filter" argument.
```
params["filter"] = pre_filter
```
Test cases and documentation have also been added.
# Issue
#11616
# Dependencies
No new dependencies have been added.
# Documentation
I have created the notebook mongodb_atlas_self_query.ipynb outlining the
steps to get the self-query mechanism working.
I worked closely with [@Farhan-Faisal](https://github.com/Farhan-Faisal)
on this PR.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
# Description
We implemented a simple tool for accessing the Merriam-Webster
Collegiate Dictionary API
(https://dictionaryapi.com/products/api-collegiate-dictionary).
Here's a simple usage example:
```py
from langchain.llms import OpenAI
from langchain.agents import load_tools, initialize_agent, AgentType
llm = OpenAI()
tools = load_tools(["serpapi", "merriam-webster"], llm=llm) # Serp API gives our agent access to Google
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run("What is the english word for the german word Himbeere? Define that word.")
```
Sample output:
```
> Entering new AgentExecutor chain...
I need to find the english word for Himbeere and then get the definition of that word.
Action: Search
Action Input: "English word for Himbeere"
Observation: {'type': 'translation_result'}
Thought: Now I have the english word, I can look up the definition.
Action: MerriamWebster
Action Input: raspberry
Observation: Definitions of 'raspberry':
1. rasp-ber-ry, noun: any of various usually black or red edible berries that are aggregate fruits consisting of numerous small drupes on a fleshy receptacle and that are usually rounder and smaller than the closely related blackberries
2. rasp-ber-ry, noun: a perennial plant (genus Rubus) of the rose family that bears raspberries
3. rasp-ber-ry, noun: a sound of contempt made by protruding the tongue between the lips and expelling air forcibly to produce a vibration; broadly : an expression of disapproval or contempt
4. black raspberry, noun: a raspberry (Rubus occidentalis) of eastern North America that has a purplish-black fruit and is the source of several cultivated varieties —called also blackcap
Thought: I now know the final answer.
Final Answer: Raspberry is an english word for Himbeere and it is defined as any of various usually black or red edible berries that are aggregate fruits consisting of numerous small drupes on a fleshy receptacle and that are usually rounder and smaller than the closely related blackberries.
> Finished chain.
```
# Issue
This closes#12039.
# Dependencies
We added no extra dependencies.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Lara <63805048+larkgz@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Update the document for drop box loader + made the
messages more verbose when loading pdf file since people were getting
confused
- **Issue:** #13952
- **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17,
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Added a tool called RedditSearchRun and an
accompanying API wrapper, which searches Reddit for posts with support
for time filtering, post sorting, query string and subreddit filtering.
- **Issue:** #13891
- **Dependencies:** `praw` module is used to search Reddit
- **Tag maintainer:** @baskaryan , and any of the other maintainers if
needed
- **Twitter handle:** None.
Hello,
This is our first PR and we hope that our changes will be helpful to the
community. We have run `make format`, `make lint` and `make test`
locally before submitting the PR. To our knowledge, our changes do not
introduce any new errors.
Our PR integrates the `praw` package which is already used by
RedditPostsLoader in LangChain. Nonetheless, we have added integration
tests and edited unit tests to test our changes. An example notebook is
also provided. These changes were put together by me, @Anika2000,
@CharlesXu123, and @Jeremy-Cheng-stack
Thank you in advance to the maintainers for their time.
---------
Co-authored-by: What-Is-A-Username <49571870+What-Is-A-Username@users.noreply.github.com>
Co-authored-by: Anika2000 <anika.sultana@mail.utoronto.ca>
Co-authored-by: Jeremy Cheng <81793294+Jeremy-Cheng-stack@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Added some of the more endpoints supported by serpapi
that are not suported on langchain at the moment, like google trends,
google finance, google jobs, and google lens
- **Issue:** [Add support for many of the querying endpoints with
serpapi #11811](https://github.com/langchain-ai/langchain/issues/11811)
---------
Co-authored-by: zushenglu <58179949+zushenglu@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Ian Xu <ian.xu@mail.utoronto.ca>
Co-authored-by: zushenglu <zushenglu1809@gmail.com>
Co-authored-by: KevinT928 <96837880+KevinT928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Volc Engine MaaS serves as an enterprise-grade,
large-model service platform designed for developers. You can visit its
homepage at https://www.volcengine.com/docs/82379/1099455 for details.
This change will facilitate developers to integrate quickly with the
platform.
- **Issue:** None
- **Dependencies:** volcengine
- **Tag maintainer:** @baskaryan
- **Twitter handle:** @he1v3tica
---------
Co-authored-by: lvzhong <lvzhong@bytedance.com>
- **Description:** use post field validation for `CohereRerank`
- **Issue:** #12899 and #13058
- **Dependencies:**
- **Tag maintainer:** @baskaryan
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Update 5 pdf document loaders in
`langchain.document_loaders.pdf`, to store a url in the metadata
(instead of a temporary, local file path) if the user provides a web
path to a pdf: `PyPDFium2Loader`, `PDFMinerLoader`,
`PDFMinerPDFasHTMLLoader`, `PyMuPDFLoader`, and `PDFPlumberLoader` were
updated.
- The updates follow the approach used to update `PyPDFLoader` for the
same behavior in #12092
- The `PyMuPDFLoader` changes required additional work in updating
`langchain.document_loaders.parsers.pdf.PyMuPDFParser` to be able to
process either an `io.BufferedReader` (from local pdf) or `io.BytesIO`
(from online pdf)
- The `PDFMinerPDFasHTMLLoader` change used a simpler approach since the
metadata is assigned by the loader and not the parser
- **Issue:** Fixes#7034
- **Dependencies:** None
```python
# PyPDFium2Loader example:
# old behavior
>>> from langchain.document_loaders import PyPDFium2Loader
>>> loader = PyPDFium2Loader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': '/var/folders/7z/d5dt407n673drh1f5cm8spj40000gn/T/tmpm5oqa92f/tmp.pdf', 'page': 0}
# new behavior
>>> from langchain.document_loaders import PyPDFium2Loader
>>> loader = PyPDFium2Loader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}
```
- **Description:** Updated to remove deprecated parameter penalty_alpha,
and use string variation of prompt rather than json object for better
flexibility. - **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** N/A
- **Tag maintainer:** @eyurtsev
- **Twitter handle:** @symbldotai
---------
Co-authored-by: toshishjawale <toshish@symbl.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Instead of using JSON-like syntax to describe node and relationship
properties we changed to a shorter and more concise schema description
Old:
```
Node properties are the following:
[{'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Movie'}, {'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Actor'}]
Relationship properties are the following:
[]
The relationships are the following:
['(:Actor)-[:ACTED_IN]->(:Movie)']
```
New:
```
Node properties are the following:
Movie {name: STRING},Actor {name: STRING}
Relationship properties are the following:
The relationships are the following:
(:Actor)-[:ACTED_IN]->(:Movie)
```
Implements
[#12115](https://github.com/langchain-ai/langchain/issues/12115)
Who can review?
@baskaryan , @eyurtsev , @hwchase17
Integrated Stack Exchange API into Langchain, enabling access to diverse
communities within the platform. This addition enhances Langchain's
capabilities by allowing users to query Stack Exchange for specialized
information and engage in discussions. The integration provides seamless
interaction with Stack Exchange content, offering content from varied
knowledge repositories.
A notebook example and test cases were included to demonstrate the
functionality and reliability of this integration.
- Add StackExchange as a tool.
- Add unit test for the StackExchange wrapper and tool.
- Add documentation for the StackExchange wrapper and tool.
If you have time, could you please review the code and provide any
feedback as necessary! My team is welcome to any suggestions.
---------
Co-authored-by: Yuval Kamani <yuvalkamani@gmail.com>
Co-authored-by: Aryan Thakur <aryanthakur@Aryans-MacBook-Pro.local>
Co-authored-by: Manas1818 <79381912+manas1818@users.noreply.github.com>
Co-authored-by: aryan-thakur <61063777+aryan-thakur@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** The class allows to only select between a few
predefined prompts from the paper. That is not ideal, since other use
cases might need a custom prompt. The changes made allow for this. To be
able to monitor those, I also added functionality to supply a custom
run_manager.
- **Issue:** no issue, but a new feature,
- **Dependencies:** none,
- **Tag maintainer:** @hwchase17,
- **Twitter handle:** @yvesloy
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Support providing whatever extra parameters you want
to the Mathpix PDF loader API request.
- **Issue:** #12773
- **Dependencies:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Adds a tqdm progress bar to GooglePalmEmbeddings when
embedding a list.
- **Issue:** #13637
- **Dependencies:** TQDM as a main dependency (instead of extra)
Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
---------
Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This PR is fixing an attributeError: object endpoint has no attribute
"_public_match_client" when using gcp matching engine with private VPC
network.
@baskaryan, @eyurtsev, @hwchase17.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** As of OpenAI's Python package 1.0, the existing
DallEAPIWrapper does not work correctly, so the example in the LangChain
Documentation link below does not work either.
https://python.langchain.com/docs/integrations/tools/dalle_image_generator
Also, since OpenAI only supports DALL-E version 2 or version 3, I
modified the DallEAPIWrapper to support it.
- **Issue:** #13825
- **Twitter handle:** ggeutzzang
- **Description:** According to the document
https://cloud.baidu.com/doc/WENXINWORKSHOP/s/6lp69is2a, add ERNIE-Bot-8K
model support for ErnieBotChat.
- **Dependencies:** Before using the ERNIE-Bot-8K, you should have the
model's access authority.
Replace this entire comment with:
- **Description:** updates `create_llm_result` function within
`openai.py` to consider latest `params`,
- **Issue:** #8928
- **Dependencies:** -,
- **Tag maintainer:** -
- **Twitter handle:** [burkomr](https://twitter.com/burkomr)
<!-- If no one reviews your PR within a few days, please @-mention one
of @baskaryan, @eyurtsev, @hwchase17. -->
---------
Co-authored-by: Burak Ömür <burakomur@retorio.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Replace this entire comment with:
- **Description:** VertexAI models are now GA, moved away from using
preview ones from the SDK
- **Issue:** #13606
---------
Co-authored-by: Nuno Campos <nuno@boringbits.io>
**Description:**
Repair Wikipedia document loader `load_max_docs` and improve test
coverage.
**Issue:**
The Wikipedia document loader was not respecting the `load_max_docs`
paramater (not reported) and would always return a maximum of 10
documents. This is because the API wrapper (in `utilities/wikipedia.py`)
wasn't passing `top_k_results` to the underlying [Wikipedia
library](https://wikipedia.readthedocs.io/en/latest/code.html#module-wikipedia).
By default this library returns 10 results.
The default number of results for the document loader has been reduced
from 100 to 25. This is because loading 100 results takes a very long
time and is an inconvenient default. It should possibly be 10.
In addition, the documentation for the loader reported that there was a
hard limit (300) on the number of documents returned. In actuality 300
is the maximum Wikipedia query character length set by the API wrapper.
Tests have been added for the document loader (previously missing) and
to test the correct numbers of documents are being returned by each
class, both by default, and when overridden. Also repaired is the
`assert_docs` test which has been updated to correctly test for the
default metadata (which includes `source` in recent releases).
**Dependencies:**
nil
**Tag maintainer:**
@leo-gan
**Twitter handle:**
@queenvictoria
### **Description:**
Previously `python_repl` was a built-in tool, but now it has been moved
to `langchain_experimental`.
When I use `load_tools` I get an error:
```python
In [1]: from langchain.agents import load_tools
In [2]: load_tools(["python_repl"])
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[2], line 1
----> 1 load_tools(["python_repl"])
File ~/workspace/langchain/libs/langchain/langchain/agents/load_tools.py:530, in load_tools(tool_names, llm, callbacks, **kwargs)
528 tool_names.extend(requests_method_tools)
529 elif name in _BASE_TOOLS:
--> 530 tools.append(_BASE_TOOLS[name]())
531 elif name in _LLM_TOOLS:
532 if llm is None:
File ~/workspace/langchain/libs/langchain/langchain/agents/load_tools.py:84, in _get_python_repl()
83 def _get_python_repl() -> BaseTool:
---> 84 raise ImportError(
85 "This tool has been moved to langchain experiment. "
86 "This tool has access to a python REPL. "
87 "For best practices make sure to sandbox this tool. "
88 "Read https://github.com/langchain-ai/langchain/blob/master/SECURITY.md "
89 "To keep using this code as is, install langchain experimental and "
90 "update relevant imports replacing 'langchain' with 'langchain_experimental'"
91 )
ImportError: This tool has been moved to langchain experiment. This tool has access to a python REPL. For best practices make sure to sandbox this tool. Read https://github.com/langchain-ai/langchain/blob/master/SECURITY.md To keep using this code as is, install langchain experimental and update relevant imports replacing 'langchain' with 'langchain_experimental'
```
In this case, it will be very confusing. I think it is no longer a
built-in tool now, so it can be removed from `_BASE_TOOLS`
### **Issue:**
https://github.com/langchain-ai/langchain/issues/13858,
https://github.com/langchain-ai/langchain/issues/13859,
https://github.com/langchain-ai/langchain/issues/13856
### **Twitter handle:**
[lin_bob57617](https://twitter.com/lin_bob57617)
The `integrations/vectorstores/matchingengine.ipynb` example has the
"Google Vertex AI Vector Search" title. This place this Title in the
wrong order in the ToC (it is sorted by the file name).
- Renamed `integrations/vectorstores/matchingengine.ipynb` into
`integrations/vectorstores/google_vertex_ai_vector_search.ipynb`.
- Updated a correspondent comment in docstring
- Rerouted old URL to a new URL
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
It was :
`from langchain.schema.prompts import BasePromptTemplate`
but because of the breaking change in the ns, it is now
`from langchain.schema.prompt_template import BasePromptTemplate`
This bug prevents building the API Reference for the langchain_experimental
There are the following main changes in this PR:
1. Rewrite of the DocugamiLoader to not do any XML parsing of the DGML
format internally, and instead use the `dgml-utils` library we are
separately working on. This is a very lightweight dependency.
2. Added MMR search type as an option to multi-vector retriever, similar
to other retrievers. MMR is especially useful when using Docugami for
RAG since we deal with large sets of documents within which a few might
be duplicates and straight similarity based search doesn't give great
results in many cases.
We are @docugami on twitter, and I am @tjaffri
---------
Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
- **Description:** Adds a retriever implementation for [Knowledge Bases
for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/), a
new service announced at AWS re:Invent, shortly before this PR was
opened. This depends on the `bedrock-agent-runtime` service, which will
be included in a future version of `boto3` and of `botocore`. We will
open a follow-up PR documenting the minimum required versions of `boto3`
and `botocore` after that information is available.
- **Issue:** N/A
- **Dependencies:** `boto3>=1.33.2, botocore>=1.33.2`
- **Tag maintainer:** @baskaryan
- **Twitter handles:** `@pjain7` `@dead_letter_q`
This PR includes a documentation notebook under
`docs/docs/integrations/retrievers`, which I (@dlqqq) have verified
independently.
EDIT: `bedrock-agent-runtime` service is now included in
`boto3>=1.33.2`:
5cf793f493
---------
Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Addressing incorrect order being sent to callbacks / tracers, due to the
nature of threading
---------
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Add arg to omit streamed_output list, in cases where final_output is
enough this saves bandwidth
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
This PR rearranges the docstring for the `AstraDB` vector store class so
as to have all useful information in the _class_ docstring for ease of
reading.
(incidentally, due to an oversight, the docstring that was in the
constructor ended up buried below some lines of code, thereby
disappearing altogether from accessibility. Apologies.)
- **Description:** Updates to `AnthropicFunctions` to be compatible with
the OpenAI `function_call` functionality.
- **Issue:** The functionality to indicate `auto`, `none` and a forced
function_call was not completely implemented in the existing code.
- **Dependencies:** None
- **Tag maintainer:** @baskaryan , and any of the other maintainers if
needed.
- **Twitter handle:** None
I have specifically tested this functionality via AWS Bedrock with the
Claude-2 and Claude-Instant models.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** We are adding functionality to extract message
content from the `attributedBody` field of the database, in case the
content is not in the `text` field.
- **Issue:** Closes#13326 and #10680
- **Dependencies:** None.
- **Tag maintainer:** @eyurtsev, @hwchase17
---------
Co-authored-by: onotate <johnp.pham@mail.utoronto.ca>
- **Description:** Previously `MarkdownHeaderTextSplitter` did not
consider tilde-fenced code blocks
(https://spec.commonmark.org/0.30/#fenced-code-blocks). This PR fixes
that.
````md
# Bug caused by previous implementation:
~~~py
foo()
# This is a comment that would be considered header
bar()
~~~
````
- **Tag maintainer:** @baskaryan
Several bug fixes:
- emails: instead of `bcc` the `cc` is used.
- errors in the truncation descriptions
- no truncation of the `message_search`
Several updates:
- generalized UTC format
- truncation limit can be changed now in _call()
Fixes#13407.
This workaround consists in letting the RunnableLambda create its
self.afunc from its self.func when self.afunc is not provided; the
change has no dependency.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
```
---- chunk 1
{'actions': [AgentActionMessageLog(tool='Search', tool_input="Leo DiCaprio's current girlfriend", log="\nInvoking: `Search` with `Leo DiCaprio's current girlfriend`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})])],
'messages': [AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})]}
---- chunk 2
{'messages': [FunctionMessage(content="According to Us, the 48-year-old actor is now “exclusively” dating Italian model Vittoria Ceretti. A source told Us that DiCaprio is “completely smitten” with Ceretti, and their relationship is “going so well that Leo's actually being exclusive.”", name='Search')],
'steps': [AgentStep(action=AgentActionMessageLog(tool='Search', tool_input="Leo DiCaprio's current girlfriend", log="\nInvoking: `Search` with `Leo DiCaprio's current girlfriend`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})]), observation="According to Us, the 48-year-old actor is now “exclusively” dating Italian model Vittoria Ceretti. A source told Us that DiCaprio is “completely smitten” with Ceretti, and their relationship is “going so well that Leo's actually being exclusive.”")]}
---- chunk 3
{'actions': [AgentActionMessageLog(tool='Search', tool_input='Vittoria Ceretti age', log='\nInvoking: `Search` with `Vittoria Ceretti age`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Vittoria Ceretti age"\n}'}})])],
'messages': [AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Vittoria Ceretti age"\n}'}})]}
---- chunk 4
{'messages': [FunctionMessage(content='25 years', name='Search')],
'steps': [AgentStep(action=AgentActionMessageLog(tool='Search', tool_input='Vittoria Ceretti age', log='\nInvoking: `Search` with `Vittoria Ceretti age`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Vittoria Ceretti age"\n}'}})]), observation='25 years')]}
---- chunk 5
{'actions': [AgentActionMessageLog(tool='Calculator', tool_input='25^0.43', log='\nInvoking: `Calculator` with `25^0.43`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n "__arg1": "25^0.43"\n}'}})])],
'messages': [AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n "__arg1": "25^0.43"\n}'}})]}
---- chunk 6
{'messages': [FunctionMessage(content='Answer: 3.991298452658078', name='Calculator')],
'steps': [AgentStep(action=AgentActionMessageLog(tool='Calculator', tool_input='25^0.43', log='\nInvoking: `Calculator` with `25^0.43`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n "__arg1": "25^0.43"\n}'}})]), observation='Answer: 3.991298452658078')]}
---- chunk 7
{'messages': [AIMessage(content="Leonardo DiCaprio's current girlfriend is the Italian model Vittoria Ceretti, who is 25 years old. Her age raised to the 0.43 power is approximately 3.99.")],
'output': "Leonardo DiCaprio's current girlfriend is the Italian model "
'Vittoria Ceretti, who is 25 years old. Her age raised to the 0.43 '
'power is approximately 3.99.'}
---- final
{'actions': [AgentActionMessageLog(tool='Search', tool_input="Leo DiCaprio's current girlfriend", log="\nInvoking: `Search` with `Leo DiCaprio's current girlfriend`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})]),
AgentActionMessageLog(tool='Search', tool_input='Vittoria Ceretti age', log='\nInvoking: `Search` with `Vittoria Ceretti age`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Vittoria Ceretti age"\n}'}})]),
AgentActionMessageLog(tool='Calculator', tool_input='25^0.43', log='\nInvoking: `Calculator` with `25^0.43`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n "__arg1": "25^0.43"\n}'}})])],
'messages': [AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}}),
FunctionMessage(content="According to Us, the 48-year-old actor is now “exclusively” dating Italian model Vittoria Ceretti. A source told Us that DiCaprio is “completely smitten” with Ceretti, and their relationship is “going so well that Leo's actually being exclusive.”", name='Search'),
AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Vittoria Ceretti age"\n}'}}),
FunctionMessage(content='25 years', name='Search'),
AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n "__arg1": "25^0.43"\n}'}}),
FunctionMessage(content='Answer: 3.991298452658078', name='Calculator'),
AIMessage(content="Leonardo DiCaprio's current girlfriend is the Italian model Vittoria Ceretti, who is 25 years old. Her age raised to the 0.43 power is approximately 3.99.")],
'output': "Leonardo DiCaprio's current girlfriend is the Italian model "
'Vittoria Ceretti, who is 25 years old. Her age raised to the 0.43 '
'power is approximately 3.99.',
'steps': [AgentStep(action=AgentActionMessageLog(tool='Search', tool_input="Leo DiCaprio's current girlfriend", log="\nInvoking: `Search` with `Leo DiCaprio's current girlfriend`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})]), observation="According to Us, the 48-year-old actor is now “exclusively” dating Italian model Vittoria Ceretti. A source told Us that DiCaprio is “completely smitten” with Ceretti, and their relationship is “going so well that Leo's actually being exclusive.”"),
AgentStep(action=AgentActionMessageLog(tool='Search', tool_input='Vittoria Ceretti age', log='\nInvoking: `Search` with `Vittoria Ceretti age`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n "__arg1": "Vittoria Ceretti age"\n}'}})]), observation='25 years'),
AgentStep(action=AgentActionMessageLog(tool='Calculator', tool_input='25^0.43', log='\nInvoking: `Calculator` with `25^0.43`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n "__arg1": "25^0.43"\n}'}})]), observation='Answer: 3.991298452658078')]}
```
- **Description:** Existing model used for Prompt Injection is quite
outdated but we fine-tuned and open-source a new model based on the same
model deberta-v3-base from Microsoft -
[laiyer/deberta-v3-base-prompt-injection](https://huggingface.co/laiyer/deberta-v3-base-prompt-injection).
It supports more up-to-date injections and less prone to
false-positives.
- **Dependencies:** No
- **Tag maintainer:** -
- **Twitter handle:** @alex_yaremchuk
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** The experimental package needs to be compatible with
the usage of importing agents
For example, if i use `from langchain.agents import
create_pandas_dataframe_agent`, running the program will prompt the
following information:
```
Traceback (most recent call last):
File "/Users/dongwm/test/main.py", line 1, in <module>
from langchain.agents import create_pandas_dataframe_agent
File "/Users/dongwm/test/venv/lib/python3.11/site-packages/langchain/agents/__init__.py", line 87, in __getattr__
raise ImportError(
ImportError: create_pandas_dataframe_agent has been moved to langchain experimental. See https://github.com/langchain-ai/langchain/discussions/11680 for more information.
Please update your import statement from: `langchain.agents.create_pandas_dataframe_agent` to `langchain_experimental.agents.create_pandas_dataframe_agent`.
```
But when I changed to `from langchain_experimental.agents import
create_pandas_dataframe_agent`, it was actually wrong:
```python
Traceback (most recent call last):
File "/Users/dongwm/test/main.py", line 2, in <module>
from langchain_experimental.agents import create_pandas_dataframe_agent
ImportError: cannot import name 'create_pandas_dataframe_agent' from 'langchain_experimental.agents' (/Users/dongwm/test/venv/lib/python3.11/site-packages/langchain_experimental/agents/__init__.py)
```
I should use `from langchain_experimental.agents.agent_toolkits import
create_pandas_dataframe_agent`. In order to solve the problem and make
it compatible, I added additional import code to the
langchain_experimental package. Now it can be like this Used `from
langchain_experimental.agents import create_pandas_dataframe_agent`
- **Twitter handle:** [lin_bob57617](https://twitter.com/lin_bob57617)
- **Description:** Adds a tqdm progress bar to OllamaEmbeddings when
embedding a list.
- **Issue:** Related to #13637, but extended to Ollama.
- **Dependencies:** `tqdm` made a necessary dependency.
Thanks to @ugm2 for helping identify a common problem. Embeddings take a
very long time to finish on local machines, and require a progress bar
to help identify if one should even attempt the workload.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** Added a line to pass the tenant parameter to
add_data_object
- **Issue:** An extra line added from the fix for #9956
- **Dependencies:** n/a
- **Tag maintainer:** @baskaryan
Tested locally, works as expected with the line change.
---------
Co-authored-by: Simon Dai <simon6752@gmail.com>
Description: Some Elastic indexes do not return a 'metadata' field in
'_source'. However, prior to this PR, the code assumed there always is a
'metadata' field. This PR adds support for cases where the field is
missing by adding it manually.
Issue: #13869
**Description:**
This PR adds Databricks Vector Search as a new vector store in
LangChain.
- [x] Add `DatabricksVectorSearch` in `langchain/vectorstores/`
- [x] Unit tests
- [x] Add
[`databricks-vectorsearch`](https://pypi.org/project/databricks-vectorsearch/)
as a new optional dependency
We ran the following checks:
- `make format` passed ✅
- `make lint` failed but the failures were caused by other files
+ Files touched by this PR passed the linter ✅
- `make test` passed ✅
- `make coverage` failed but the failures were caused by other files.
Tests added by or related to this PR all passed
+ langchain/vectorstores/databricks_vector_search.py test coverage 94% ✅
- `make spell_check` passed ✅
The example notebook and updates to the [provider's documentation
page](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/providers/databricks.md)
will be added later in a separate PR.
**Dependencies:**
Optional dependency:
[`databricks-vectorsearch`](https://pypi.org/project/databricks-vectorsearch/)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This pull request addresses an issue found in the example code within
the docstring of `libs/core/langchain_core/runnables/passthrough.py`
The original code snippet caused a `NameError` due to the missing import
of `RunnableLambda`. The error was as follows:
```
12 return "completion"
13
---> 14 chain = RunnableLambda(fake_llm) | {
15 'original': RunnablePassthrough(), # Original LLM output
16 'parsed': lambda text: text[::-1] # Parsing logic
NameError: name 'RunnableLambda' is not defined
```
To resolve this, I have modified the example code to include the
necessary import statement for `RunnableLambda`. Additionally, I have
adjusted the indentation in the code snippet to ensure consistency and
readability.
The modified code now successfully defines and utilizes
`RunnableLambda`, ensuring that users referencing the docstring will
have a functional and clear example to follow.
There are no related GitHub issues for this particular change.
Modified Code:
```python
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.runnables import RunnableLambda
runnable = RunnableParallel(
origin=RunnablePassthrough(),
modified=lambda x: x+1
)
runnable.invoke(1) # {'origin': 1, 'modified': 2}
def fake_llm(prompt: str) -> str: # Fake LLM for the example
return "completion"
chain = RunnableLambda(fake_llm) | {
'original': RunnablePassthrough(), # Original LLM output
'parsed': lambda text: text[::-1] # Parsing logic
}
chain.invoke('hello') # {'original': 'completion', 'parsed': 'noitelpmoc'}
```
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Added a retriever for the Outline API to ask
questions on knowledge base
- **Issue:** resolves#11814
- **Dependencies:** None
- **Tag maintainer:** @baskaryan
- **Description:** Simple change, I just added title metadata to
GoogleDriveLoader for optional File Loaders
- **Dependencies:** no dependencies
- **Tag maintainer:** @hwchase17
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR provides idiomatic implementations for the exact-match and the
semantic LLM caches using Astra DB as backend through the database's
HTTP JSON API. These caches require the `astrapy` library as dependency.
Comes with integration tests and example usage in the `llm_cache.ipynb`
in the docs.
@baskaryan this is the Astra DB counterpart for the Cassandra classes
you merged some time ago, tagging you for your familiarity with the
topic. Thank you!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR adds a chat message history component that uses Astra DB for
persistence through the JSON API.
The `astrapy` package is required for this class to work.
I have added tests and a small notebook, and updated the relevant
references in the other docs pages.
(@rlancemartin this is the counterpart of the Cassandra equivalent class
you so helpfully reviewed back at the end of June)
Thank you!
- **Description:** This commit fixed the problem that Redis vector store
will change the value of a metadata from 0 to empty when saving the
document, which should be an un-intended behavior.
- **Issue:** N/A
- **Dependencies:** N/A
**Description:** Currently, if we pass in a ToolMessage back to the
chain, it crashes with error
`Got unsupported message type: `
This fixes it.
Tested locally
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** BaseStringMessagePromptTemplate.from_template was
passing the value of partial_variables into cls(...) via **kwargs,
rather than passing it to PromptTemplate.from_template. Which resulted
in those *partial_variables being* lost and becoming required
*input_variables*.
Co-authored-by: Josep Pon Farreny <josep.pon-farreny@siemens.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Fix some circular deps:
- move PromptValue into top level module bc both PromptTemplates and
OutputParsers import
- move tracer context vars to `tracers.context` and import them in
functions in `callbacks.manager`
- add core import tests
- **Description:** We need to update the Dockerfile for templates to
also copy your README.md. This is because poetry requires that a readme
exists if it is specified in the pyproject.toml
Changes:
- remove langchain_core/schema since no clear distinction b/n schema and
non-schema modules
- make every module that doesn't end in -y plural
- where easy have 1-2 classes per file
- no more than one level of nesting in directories
- only import from top level core modules in langchain
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** fix a bug that prevented as_retriever() in Vectara to
use the desired input arguments
- **Issue:** as_retriever did not pass the arguments properly
- **Tag maintainer:** @baskaryan
- **Twitter handle:** @ofermend
I encountered this during summarization with VertexAI. I was receiving
an INVALID_ARGUMENT error, as it was trying to send a list of about
17000 single characters.
The [count_tokens
method](https://github.com/googleapis/python-aiplatform/blob/main/vertexai/language_models/_language_models.py#L658)
made available by Google takes in a list of prompts. It does not fail
for small texts, but it does for longer documents because the argument
list will be exceeding Googles allowed limit. Enforcing the list type
makes it work successfully.
This change will cast the input text to count to a list of that single
text so that the input format is always correct.
[Twitter](https://www.x.com/stijn_tratsaert)
- **Description:** ERNIE-Bot-Chat-4 Large Language Model adds the
ability of `Function Calling` by passing parameters through the
`functions` parameter in the request. To simplify function calling for
ERNIE-Bot-Chat-4, the `create_ernie_fn_chain()` function has been added.
The definition and usage of the `create_ernie_fn_chain()` function is
similar to that of the `create_openai_fn_chain()` function.
Examples as the follows:
```
import json
from langchain.chains.ernie_functions import (
create_ernie_fn_chain,
)
from langchain.chat_models import ErnieBotChat
from langchain.prompts import ChatPromptTemplate
def get_current_news(location: str) -> str:
"""Get the current news based on the location.'
Args:
location (str): The location to query.
Returs:
str: Current news based on the location.
"""
news_info = {
"location": location,
"news": [
"I have a Book.",
"It's a nice day, today."
]
}
return json.dumps(news_info)
def get_current_weather(location: str, unit: str="celsius") -> str:
"""Get the current weather in a given location
Args:
location (str): location of the weather.
unit (str): unit of the tempuature.
Returns:
str: weather in the given location.
"""
weather_info = {
"location": location,
"temperature": "27",
"unit": unit,
"forecast": ["sunny", "windy"],
}
return json.dumps(weather_info)
llm = ErnieBotChat(model_name="ERNIE-Bot-4")
prompt = ChatPromptTemplate.from_messages(
[
("human", "{query}"),
]
)
chain = create_ernie_fn_chain([get_current_weather, get_current_news], llm, prompt, verbose=True)
res = chain.run("北京今天的新闻是什么?")
print(res)
```
The running results of the above program are shown below:
```
> Entering new LLMChain chain...
Prompt after formatting:
Human: 北京今天的新闻是什么?
> Finished chain.
{'name': 'get_current_news', 'thoughts': '用户想要知道北京今天的新闻。我可以使用get_current_news工具来获取这些信息。', 'arguments': {'location': '北京'}}
```
- **Description:** during search with DeepLake some people are facing
backwards compatibility issues, this PR fixes it by making search
accessible for the older datasets
---------
Co-authored-by: adolkhan <adilkhan.sarsen@alumni.nu.edu.kz>
- **Description:**
- Fixes a `key_prefix` bug where passing it in on
`Redis.from_existing(...)` did not work properly. Updates doc strings
accordingly.
- Updates Redis filter classes logic with best practices on typing,
string formatting, and handling "empty" filters.
- Fixes a bug that would prevent multiple tag filters from being applied
together in some scenarios.
- Added a whole new filter unit testing module. Also updated code
formatting for a number of modules that were failing the `make`
commands.
- **Issue:** N/A
- **Dependencies:** N/A
- **Tag maintainer:** @baskaryan
- **Twitter handle:** @tchutch94
In the `FORMAT_INSTRUCTIONS` template, 4 curly braces (escaping) are
used to get single curly brace after formatting:
```
"{{{ ... }}}}" -> format_instructions.format() -> "{{ ... }}" -> template.format() -> "{ ... }".
```
Tool's `args_schema` string contains single braces `{ ... }`, and is
also transformed to `{{{{ ... }}}}` form. But this is not really correct
since there is only one `format()` call:
```
"{{{{ ... }}}}" -> template.format() -> "{{ ... }}".
```
As a result we get double curly braces in the prompt:
````
Respond to the human as helpfully and accurately as possible. You have access to the following tools:
foo: Test tool FOO, args: {{'tool_input': {{'type': 'string'}}}} # <--- !!!
...
Provide only ONE action per $JSON_BLOB, as shown:
```
{
"action": $TOOL_NAME,
"action_input": $INPUT
}
```
````
This PR fixes curly braces escaping in the `args_schema` to have single
braces in the final prompt:
````
Respond to the human as helpfully and accurately as possible. You have access to the following tools:
foo: Test tool FOO, args: {'tool_input': {'type': 'string'}} # <--- !!!
...
Provide only ONE action per $JSON_BLOB, as shown:
```
{
"action": $TOOL_NAME,
"action_input": $INPUT
}
```
````
---------
Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
Hi 👋 We are working with Llama2 on Bedrock, and would like to add it to
Langchain. We saw a [pull
request](https://github.com/langchain-ai/langchain/pull/13322) to add it
to the `llm.Bedrock` class, but since it concerns a chat model, we would
like to add it to `BedrockChat` as well.
- **Description:** Add support for Llama2 to `BedrockChat` in
`chat_models`
- **Issue:** the issue # it fixes (if applicable)
[#13316](https://github.com/langchain-ai/langchain/issues/13316)
- **Dependencies:** any dependencies required for this change `None`
- **Tag maintainer:** /
- **Twitter handle:** `@SimonBockaert @WouterDurnez`
---------
Co-authored-by: wouter.durnez <wouter.durnez@showpad.com>
Co-authored-by: Simon Bockaert <simon.bockaert@showpad.com>
- **Description:** This change adds an agent to the Azure Cognitive
Services toolkit for identifying healthcare entities
- **Dependencies:** azure-ai-textanalytics (Optional)
---------
Co-authored-by: James Beck <James.Beck@sa.gov.au>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Hi!
This short PR aims at:
* Fixing `OpenAIEmbeddings`' check on `chunk_size` when used with Azure
OpenAI (thus with openai < 1.0). Azure OpenAI embeddings support at most
16 chunks per batch, I believe we are supposed to take the min between
the passed value/default value and 16, not the max - which, I suppose,
was introduced by accident while refactoring the previous version of
this check from this other PR of mine: #10707
* Porting this fix to the newest class (`AzureOpenAIEmbeddings`) for
openai >= 1.0
This fixes#13539 (closed but the issue persists).
@baskaryan @hwchase17
- **Description:** There are several mistakes in the sample code in the
doc-string of `DashVector` class, and this pull request aims to correct
them.
The correction code has been tested against latest version (at the time
of creation of this pull request) of: `langchain==0.0.336`
`dashvector==1.0.6` .
- **Issue:** No issue is created for this.
- **Dependencies:** No dependency is required for this change,
<!-- - **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below), -->
- **Twitter handle:** `zeyanglin`
<!-- Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** AstraDB is going to deprecate the `$similarity`
projection property in favor of the ´includeSimilarity´ option flag. I
moved all the queries to the new format.
- **Tag maintainer:** @hemidactylus
- **Twitter handle:** nicoloboschi
Added a `search_kwargs` field to BingSearchAPIWrapper in
`bing_search.py,` enabling users to include extra keyword arguments in
Bing search queries. This update, like specifying language preferences,
adds more customization to searches. The `search_kwargs` seamlessly
merge with standard parameters in `_bing_search_results` method.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Fix Astra integration tests that are failing. The
`delete` always return True as the deletion is successful if no errors
are thrown. I aligned the test to verify this behaviour
- **Tag maintainer:** @hemidactylus
- **Twitter handle:** nicoloboschi
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
The issue was accuring because of `openai` update in Completions. its
not accepting `api_key` and 'api_base' args.
The fix is we check for the openai version and if ats v1 then remove
these keys from args before passing them to `Compilation.create(...)`
when sending from `VLLMOpenAI`
Fixed: #13507
@eyu
@efriis
@hwchase17
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** In this pull request, we address an issue related to
assigning a schema to the SQLDatabase class when utilizing an Oracle
database. The current implementation encounters a bug where, upon
attempting to execute a query, the alter session parse is not
appropriately defined for Oracle, leading to an error,
- **Issue:** #7928,
- **Dependencies:** No dependencies,
- **Tag maintainer:** @baskaryan,
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** This change allows for the `MWDumpLoader` to load all
namespaces including custom by default instead of only loading the
[default
namespaces](https://www.mediawiki.org/wiki/Help:Namespaces#Localisation).
- **Tag maintainer:** @hwchase17
**Description:**
This commit adds embedchain retriever along with tests and docs.
Embedchain is a RAG framework to create data pipelines.
**Twitter handle:**
- [Taranjeet's twitter](https://twitter.com/taranjeetio) and
[Embedchain's twitter](https://twitter.com/embedchain)
**Reviewer**
@hwchase17
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
Enhance the functionality of YoutubeLoader to enable the translation of
available transcripts by refining the existing logic.
**Issue:**
Encountering a problem with YoutubeLoader (#13523) where the translation
feature is not functioning as expected.
Tag maintainers/contributors who might be interested:
@eyurtsev
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
BUG: langchain.agents.openai_assistant has a reference as
`from langchain_experimental.openai_assistant.base import
OpenAIAssistantRunnable`
should be
`from langchain.agents.openai_assistant.base import
OpenAIAssistantRunnable`
This prevents building of the API Reference docs
## Update 2023-09-08
This PR now supports further models in addition to Lllama-2 chat models.
See [this comment](#issuecomment-1668988543) for further details. The
title of this PR has been updated accordingly.
## Original PR description
This PR adds a generic `Llama2Chat` model, a wrapper for LLMs able to
serve Llama-2 chat models (like `LlamaCPP`,
`HuggingFaceTextGenInference`, ...). It implements `BaseChatModel`,
converts a list of chat messages into the [required Llama-2 chat prompt
format](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) and
forwards the formatted prompt as `str` to the wrapped `LLM`. Usage
example:
```python
# uses a locally hosted Llama2 chat model
llm = HuggingFaceTextGenInference(
inference_server_url="http://127.0.0.1:8080/",
max_new_tokens=512,
top_k=50,
temperature=0.1,
repetition_penalty=1.03,
)
# Wrap llm to support Llama2 chat prompt format.
# Resulting model is a chat model
model = Llama2Chat(llm=llm)
messages = [
SystemMessage(content="You are a helpful assistant."),
MessagesPlaceholder(variable_name="chat_history"),
HumanMessagePromptTemplate.from_template("{text}"),
]
prompt = ChatPromptTemplate.from_messages(messages)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
chain = LLMChain(llm=model, prompt=prompt, memory=memory)
# use chat model in a conversation
# ...
```
Also part of this PR are tests and a demo notebook.
- Tag maintainer: @hwchase17
- Twitter handle: `@mrt1nz`
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Added a method `fetch_valid_documents` to
`WebResearchRetriever` class that will test the connection for every url
in `new_urls` and remove those that raise a `ConnectionError`.
- **Issue:** [Previous
PR](https://github.com/langchain-ai/langchain/pull/13353),
- **Dependencies:** None,
- **Tag maintainer:** @efriis
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
## Description
This PR adds an option to allow unsigned requests to the Neptune
database when using the `NeptuneGraph` class.
```python
graph = NeptuneGraph(
host='<my-cluster>',
port=8182,
sign=False
)
```
Also, added is an option in the `NeptuneOpenCypherQAChain` to provide
additional domain instructions to the graph query generation prompt.
This will be injected in the prompt as-is, so you should include any
provider specific tags, for example `<instructions>` or `<INSTR>`.
```python
chain = NeptuneOpenCypherQAChain.from_llm(
llm=llm,
graph=graph,
extra_instructions="""
Follow these instructions to build the query:
1. Countries contain airports, not the other way around
2. Use the airport code for identifying airports
"""
)
```
**Description**
MongoDB drivers are used in various flavors and languages. Making sure
we exercise our due diligence in identifying the "origin" of the library
calls makes it best to understand how our Atlas servers get accessed.
**Description/Issue:**
When OpenAI calls a function with no args, the args are `""` rather than
`"{}"`. Then `json.loads("")` blows up. This PR handles it correctly.
**Dependencies:** None
This PR brings a few minor improvements to the docs, namely class/method
docstrings and the demo notebook.
- A note on how to control concurrency levels to tune performance in
bulk inserts, both in the class docstring and the demo notebook;
- Slightly increased concurrency defaults after careful experimentation
(still on the conservative side even for clients running on
less-than-typical network/hardware specs)
- renamed the DB token variable to the standardized
`ASTRA_DB_APPLICATION_TOKEN` name (used elsewhere, e.g. in the Astra DB
docs)
- added a note and a reference (add_text docstring, demo notebook) on
allowed metadata field names.
Thank you!
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
fix#13356
Add supports following properties for metadata to NotionDBLoader.
- `checkbox`
- `email`
- `number`
- `select`
There are no relevant tests for this code to be updated.
- **Description:** Adds `limit_to_domains` param to the APIChain based
tools (open_meteo, TMDB, podcast_docs, and news_api)
- **Issue:** I didn't open an issue, but after upgrading to 0.0.328
using these tools would throw an error.
- **Dependencies:** N/A
- **Tag maintainer:** @baskaryan
**Note**: I included the trailing / simply because the docs here did
fc886cc303/docs/docs/use_cases/apis.ipynb (L246)
, but I checked the code and it is using `urlparse`. SoI followed the
docs since it comes down to stylee.
Hi,
this PR adds support for OpenAI API v1 for Azure OpenAI completion API.
@baskaryan @hwchase17
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Bumps [pyarrow](https://github.com/apache/arrow) from 13.0.0 to 14.0.1.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ba53748361"><code>ba53748</code></a>
MINOR: [Release] Update versions for 14.0.1</li>
<li><a
href="529f3768fa"><code>529f376</code></a>
MINOR: [Release] Update .deb/.rpm changelogs for 14.0.1</li>
<li><a
href="b84bbcac64"><code>b84bbca</code></a>
MINOR: [Release] Update CHANGELOG.md for 14.0.1</li>
<li><a
href="f141709763"><code>f141709</code></a>
<a
href="https://redirect.github.com/apache/arrow/issues/38607">GH-38607</a>:
[Python] Disable PyExtensionType autoload (<a
href="https://redirect.github.com/apache/arrow/issues/38608">#38608</a>)</li>
<li><a
href="5a37e74198"><code>5a37e74</code></a>
<a
href="https://redirect.github.com/apache/arrow/issues/38431">GH-38431</a>:
[Python][CI] Update fs.type_name checks for s3fs tests (<a
href="https://redirect.github.com/apache/arrow/issues/38455">#38455</a>)</li>
<li><a
href="2dcee3f82c"><code>2dcee3f</code></a>
MINOR: [Release] Update versions for 14.0.0</li>
<li><a
href="297428cbf2"><code>297428c</code></a>
MINOR: [Release] Update .deb/.rpm changelogs for 14.0.0</li>
<li><a
href="3e9734f883"><code>3e9734f</code></a>
MINOR: [Release] Update CHANGELOG.md for 14.0.0</li>
<li><a
href="9f90995c8c"><code>9f90995</code></a>
<a
href="https://redirect.github.com/apache/arrow/issues/38332">GH-38332</a>:
[CI][Release] Resolve symlinks in RAT lint (<a
href="https://redirect.github.com/apache/arrow/issues/38337">#38337</a>)</li>
<li><a
href="bd61239a32"><code>bd61239</code></a>
<a
href="https://redirect.github.com/apache/arrow/issues/35531">GH-35531</a>:
[Python] C Data Interface PyCapsule Protocol (<a
href="https://redirect.github.com/apache/arrow/issues/37797">#37797</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/apache/arrow/compare/go/v13.0.0...go/v14.0.1">compare
view</a></li>
</ul>
</details>
<br />
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pyarrow&package-manager=pip&previous-version=13.0.0&new-version=14.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/langchain-ai/langchain/network/alerts).
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>
Hey @rlancemartin, @eyurtsev ,
I did some minimal changes to the `ElasticVectorSearch` client so that
it plays better with existing ES indices.
Main changes are as follows:
1. You can pass the dense vector field name into `_default_script_query`
2. You can pass a custom script query implementation and the respective
parameters to `similarity_search_with_score`
3. You can pass functions for building page content and metadata for the
resulting `Document`
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
4. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @dev2049
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @dev2049
- Memory: @hwchase17
- Agents / Tools / Toolkits: @vowelparrot
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
hi!
This is pretty straight-forward: The sdist package does not contain the
license file (which is needed by e.g. conda) because the package is
built from the subdir and can't see the license.
I _copied_ the license but since I'm unfamiliar with the projects
direction, I'm not sure that's correct.
thanks!
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Fixes: #8207
Description:
Pinecone returns scores (not distances) with cosine similarity. The
values according to the docs are [-1, 1], although I could never
reproduce negative values.
This PR ensures that the score returned from Pinecone is preserved,
rather than inverted, so the most relevant documents can be filtered (eg
when using similarity thresholds)
I'll leave this as a draft PR as I couldn't run the tests (my pinecone
account might not be enough - some errors were being thrown around
namespaces) so hopefully someone who _can_ will pick this up.
Maintainers:
@rlancemartin, @eyurtsev
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Fixed a serialization issue in the add_texts method
of the Matching Engine Vector Store caused by a typo, leading to an
attempt to serialize the json module itself.
- **Issue:** #12154
- **Dependencies:** ./.
- **Tag maintainer:**
Due to the possibility of external inputs including UUIDs, there may be
additional values in **kwargs, while Weaviate's `__init__` method does
not support passing extra **kwarg parameters.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
When calling max_marginal_relevance_search from PGVector the filter
param is not carried over to max_marginal_relevance_search_by_vector
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Uses `endpoint_url` if provided with a boto3 session.
When running dynamodb locally, credentials are required even if invalid.
With this change, it will be possible to pass a boto3 session with
credentials and specify an endpoint_url
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Replace this entire comment with:
- **Description:** Add MyScaleWithoutJSON which allows user to wrap
columns into Document's Metadata
- **Tag maintainer:** @baskaryan
**Description**
Bumps the Momento dependency to the latest version and refactors the
usage of `SearchHit` in the Momento Vector Index (MVI) vector store
integration. This change is a one liner where we use the preferred
attribute `score` to read the query-document similarity instead of
`distance`. The latest versions of Momento clients will use this
attribute going forward.
**Dependencies**
Updated the Momento dependency to latest version.
**Tests**
💚 I re-ran the existing MVI integration tests
(`tests/integration_tests/vectorstores/test_momento_vector_index.py`)
and they pass.
**Review**
cc @baskaryan @eyurtsev
**Description**
the ollama api now supports passing system prompt and template directly
instead of modifying the model file , but the ollama integration in
langchain did not have this change updated . The update just adds these
two parameters to it ( there are 2 more parameters that are pending to
be updated, I was not sure about their utility wrt to langchain )
Refer :
8713ac23a8
**Issue** : None Applicable
**Dependencies** : None Changed
**Twitter handle** : https://twitter.com/violetto96
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
`_get_kwarg_value` function is useless, one can rely on python builtin
functionalities to do the exact same thing.
- **Description:** Removed `_get_kwarg_value`. Helps with code
readability.
- **Issue:** the issue # it fixes (if applicable),
- **Twitter handle:** @Guillem_96
Improve CSV reader which can't call .strip() on NoneType if there are
less cells in the row compared to the header
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:**
I have a CSV file as followed
```
headerA,headerB,headerC
v1A,v1B,v1C,
v2A,v2B
v3A,v3B,v3C
```
In this case, row 2 is missing a value, which results in reading a None
type. The strip() method can not be called on None, hence raising. In
this PR I am making the change to only call strip if the value if not
None.
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Added a Docusaurus Loader
Issue: #6353
I had to implement this for working with the Ionic documentation, and
wanted to open this up as a draft to get some guidance on building this
out further. I wasn't sure if having it be a light extension of the
SitemapLoader was in the spirit of a proper feature for the library --
but I'm grateful for the opportunities Langchain has given me and I'd
love to build this out properly for the sake of the community.
Any feedback welcome!
- **Description:** `AzureMLChatOnlineEndpoint` object from
langchain/chat_models/azureml_endpoint.py safe to print
without having any secrets included in raw format in the string
representation.
- **Issue:** #12165,
- **Tag maintainer:** @eyurtsev
---------
Co-authored-by: Faysal Bougamale <faysal.bougamale@horiba.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Adding documentation to the runnable.
- Documentation is not organized in the best way for the runnable; i.e.,
in
terms of LCEL vs. other standard methods, will follow up with more
edits.
**Description:** Removing the single quote wrapper around the table
names in the SQL agent toolkit.py file as it misleads the LLM into
querying against tables with single quotes around their names.
**Issue:** #7457
**Dependencies:** None
**Tag maintainer:** @hwchase17
**Twitter handle:** None
- Implement config_specs to include session_id
- Remove Runnable method and update notebook
- Add more details to notebook, eg. show input schema and config schema
before and after adding message history
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
This commit fixes the issue that langchain.llms OpenAI completion
stopped working since the V1 openai client update.
Replace this entire comment with:
- **Description:** This PR fixes the issue [AttributeError: module
'openai' has no attribute
'Completion'](https://github.com/langchain-ai/langchain/issues/12967)
similar to
8e0cb2eb84
and https://github.com/langchain-ai/langchain/pull/12969,
- **Issue:** https://github.com/langchain-ai/langchain/issues/12967,
- **Dependencies:** `openai` v1.x.x client,
- **Tag maintainer:** @baskaryan,
- **Twitter handle:** @dosuken123
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This adds the response message as a document to the rag retriever so
users can choose to use this. Also drops document limit.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
We need to centralize the API we use to get the project name for our
tracers. This PR makes it so we always get this from a shared function
in the langsmith sdk.
## Dependencies
Upgraded langsmith from 0.52 to 0.62 to include the new API
`get_tracer_project`
- **Description:**
Recently Chroma rolled out a breaking change on the way we handle
embedding functions, in order to support multi-modal collections.
This broke the way LangChain's `Chroma` objects get created, because we
were passing the EF down into the Chroma collection:
https://docs.trychroma.com/migration#migration-to-0416---november-7-2023
However, internally, we are never actually using embeddings on the
chroma collection - LangChain's `Chroma` object calls it instead. Thus
we just don't pass an `embedding_function` to Chroma itself, which fixes
the issue.
- **Description:** The issue was not listing the proper import error for
amazon textract loader.
- **Issue:** Time wasted trying to figure out what to install...
(langchain docs don't list the dependency either)
- **Dependencies:** N/A
- **Tag maintainer:** @sbusso
- **Twitter handle:** @h9ste
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
# Astra DB Vector store integration
- **Description:** This PR adds a `VectorStore` implementation for
DataStax Astra DB using its HTTP API
- **Issue:** (no related issue)
- **Dependencies:** A new required dependency is `astrapy` (`>=0.5.3`)
which was added to pyptoject.toml, optional, as per guidelines
- **Tag maintainer:** I recently mentioned to @baskaryan this
integration was coming
- **Twitter handle:** `@rsprrs` if you want to mention me
This PR introduces the `AstraDB` vector store class, extensive
integration test coverage, a reworking of the documentation which
conflates Cassandra and Astra DB on a single "provider" page and a new,
completely reworked vector-store example notebook (common to the
Cassandra store, since parts of the flow is shared by the two APIs). I
also took care in ensuring docs (and redirects therein) are behaving
correctly.
All style, linting, typechecks and tests pass as far as the `AstraDB`
integration is concerned.
I could build the documentation and check it all right (but ran into
trouble with the `api_docs_build` makefile target which I could not
verify: `Error: Unable to import module
'plan_and_execute.agent_executor' with error: No module named
'langchain_experimental'` was the first of many similar errors)
Thank you for a review!
Stefano
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Cohere released the new embedding API (Embed v3:
https://txt.cohere.com/introducing-embed-v3/) that treats document and
query embeddings differently. This PR updated the `CohereEmbeddings` to
use them appropriately. It also works with the old models.
Description: This PR masks API key secrets for the Nebula model from
Symbl.ai
Issue: #12165
Maintainer: @eyurtsev
---------
Co-authored-by: Praveen Venkateswaran <praveen.venkateswaran@ibm.com>
* ChatAnyscale was missing coercion to SecretStr for anyscale api key
* The model inherits from ChatOpenAI so it should not force the openai
api key to be secret str until openai model has the same changes
https://github.com/langchain-ai/langchain/issues/12841
Qdrant was incorrectly calculating the cosine similarity and returning
`0.0` for the best match, instead of `1.0`. Internally Qdrant returns a
cosine score from `-1.0` (worst match) to `1.0` (best match), and the
current formula reflects it.
Possibility to pass on_artifacts to a conversation. It can be then
achieved by adding this way:
```python
result = agent.run(
input=message.text,
metadata={
"on_artifact": CALLBACK_FUNCTION
},
)
```
Calls uvicorn directly from cli:
Reload works if you define app by import string instead of object.
(was doing subprocess in order to get reloading)
Version bump to 0.0.14
Remove the need for [serve] for simplicity.
Readmes are updated in #12847 to avoid cluttering this PR
Previously we treated trace_on_chain_group as a command to always start
tracing. This is unintuitive (makes the function do 2 things), and makes
it harder to toggle tracing
When you use a MultiQuery it might be useful to use the original query
as well as the newly generated ones to maximise the changes to retriever
the correct document. I haven't created an issue, it seems a very small
and easy thing.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Correct number of elements in config list in
`batch()` and `abatch()` of `BaseLLM` in case `max_concurrency` is not
None.
- **Issue:** #12643
- **Twitter handle:** @akionux
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Zep now has the ability to search over chat history summaries. This PR
adds support for doing so. More here: https://blog.getzep.com/zep-v0-17/
@baskaryan @eyurtsev
…s present
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
### Enabling `device_map` in HuggingFacePipeline
For multi-gpu settings with large models, the
[accelerate](https://huggingface.co/docs/accelerate/usage_guides/big_modeling#using--accelerate)
library provides the `device_map` parameter to automatically distribute
the model across GPUs / disk.
The [Transformers
pipeline](3520e37e86/src/transformers/pipelines/__init__.py (L543))
enables users to specify `device` (or) `device_map`, and handles cases
(with warnings) when both are specified.
However, Langchain's HuggingFacePipeline only supports specifying
`device` when calling transformers which limits large models and
multi-gpu use-cases.
Additionally, the [default
value](8bd3ce59cd/libs/langchain/langchain/llms/huggingface_pipeline.py (L72))
of `device` is initialized to `-1` , which is incompatible with the
transformers pipeline when `device_map` is specified.
This PR addresses the addition of `device_map` as a parameter , and
solves the incompatibility of `device = -1` when `device_map` is also
specified.
An additional test has been added for this feature.
Additionally, some existing tests no longer work since
1. `max_new_tokens` has to be specified under `pipeline_kwargs` and not
`model_kwargs`
2. The GPT2 tokenizer raises a `ValueError: Pipeline with tokenizer
without pad_token cannot do batching`, since the `tokenizer.pad_token`
is `None` ([related
issue](https://github.com/huggingface/transformers/issues/19853) on the
transformers repo).
This PR handles fixing these tests as well.
Co-authored-by: Praveen Venkateswaran <praveen.venkateswaran@ibm.com>
[The python
spec](https://docs.python.org/3/reference/datamodel.html#object.__getattr__)
requires that `__getattr__` throw `AttributeError` for missing
attributes but there are several places throwing `ImportError` in the
current code base. This causes a specific problem with `hasattr` since
it calls `__getattr__` then looks only for `AttributeError` exceptions.
At present, calling `hasattr` on any of these modules will raise an
unexpected exception that most code will not handle as `hasattr`
throwing exceptions is not expected.
In our case this is triggered by an exception tracker (Airbrake) that
attempts to collect the version of all installed modules with code that
looks like: `if hasattr(mod, "__version__"):`. With `HEAD` this is
causing our exception tracker to fail on all exceptions.
I only changed instances of unknown attributes raising `ImportError` and
left instances of known attributes raising `ImportError`. It feels a
little weird but doesn't seem to break anything.
- **Description:** Use all Google search results data in SerpApi.com
wrapper instead of the first one only
- **Tag maintainer:** @hwchase17
_P.S. `libs/langchain/tests/integration_tests/utilities/test_serpapi.py`
are not executed during the `make test`._
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
It was passing in message instead of generation
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
* Restrict the chain to specific domains by default
* This is a breaking change, but it will fail loudly upon object
instantiation -- so there should be no silent errors for users
* Resolves CVE-2023-32786
* This is an opt-in feature, so users should be aware of risks if using
jinja2.
* Regardless we'll add sandboxing by default to jinja2 templates -- this
sandboxing is a best effort basis.
* Best strategy is still to make sure that jinja2 templates are only
loaded from trusted sources.
**Description:** Update `langchain.document_loaders.pdf.PyPDFLoader` to
store url in metadata (instead of a temporary file path) if user
provides a web path to a pdf
- **Issue:** Related to #7034; the reporter on that issue submitted a PR
updating `PyMuPDFParser` for this behavior, but it has unresolved merge
issues as of 20 Oct 2023 #7077
- In addition to `PyPDFLoader` and `PyMuPDFParser`, these other classes
in `langchain.document_loaders.pdf` exhibit similar behavior and could
benefit from an update: `PyPDFium2Loader`, `PDFMinerLoader`,
`PDFMinerPDFasHTMLLoader`, `PDFPlumberLoader` (I'm happy to contribute
to some/all of that, including assisting with `PyMuPDFParser`, if my
work is agreeable)
- The root cause is that the underlying pdf parser classes, e.g.
`langchain.document_loaders.parsers.pdf.PyPDFParser`, never receive
information about the url; the parsers receive a
`langchain.document_loaders.blob_loaders.blob`, which contains the pdf
contents and local file path, but not the url
- This update passes the web path directly to the parser since it's
minimally invasive and doesn't require further changes to maintain
existing behavior for local files... bigger picture, I'd consider
extending `blob` so that extra information like this can be
communicated, but that has much bigger implications on the codebase
which I think warrants maintainer input
- **Dependencies:** None
```python
# old behavior
>>> from langchain.document_loaders import PyPDFLoader
>>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': '/var/folders/w2/zx77z1cs01s1thx5dhshkd58h3jtrv/T/tmpfgrorsi5/tmp.pdf', 'page': 0}
# new behavior
>>> from langchain.document_loaders import PyPDFLoader
>>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}
```
- **Description:** #12273 's suggestion PR
Like other PDFLoader, loading pdf per each page and giving page
metadata.
- **Issue:** #12273
- **Twitter handle:** @blue0_0hope
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This will allow you create the schema beforehand. The check was failing
and preventing importing into existing classes.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** implement [quip](https://quip.com) loader
- **Issue:** https://github.com/langchain-ai/langchain/issues/10352
- **Dependencies:** No
- pass make format, make lint, make test
---------
Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
latest release broken, this fixes it
---------
Co-authored-by: Roman Vasilyev <rvasilyev@mozilla.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Prior to this PR, `ruff` was used only for linting and not for
formatting, despite the names of the commands. This PR makes it be used
for both linting code and autoformatting it.
This input key was missed in the last update PR:
https://github.com/langchain-ai/langchain/pull/7391
The input/output formats are intended to be like this:
```
{"inputs": [<prompt>]}
{"outputs": [<output_text>]}
```
## Description
This PR adds support for
[lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer) to
LangChain.
![image](https://raw.githubusercontent.com/noamgat/lm-format-enforcer/main/docs/Intro.webp)
The library is similar to jsonformer / RELLM which are supported in
Langchain, but has several advantages such as
- Batching and Beam search support
- More complete JSON Schema support
- LLM has control over whitespace, improving quality
- Better runtime performance due to only calling the LLM's generate()
function once per generate() call.
The integration is loosely based on the jsonformer integration in terms
of project structure.
## Dependencies
No compile-time dependency was added, but if `lm-format-enforcer` is not
installed, a runtime error will occur if it is trying to be used.
## Tests
Due to the integration modifying the internal parameters of the
underlying huggingface transformer LLM, it is not possible to test
without building a real LM, which requires internet access. So, similar
to the jsonformer and RELLM integrations, the testing is via the
notebook.
## Twitter Handle
[@noamgat](https://twitter.com/noamgat)
Looking forward to hearing feedback!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Best to review one commit at a time, since two of the commits are 100%
autogenerated changes from running `ruff format`:
- Install and use `ruff format` instead of black for code formatting.
- Output of `ruff format .` in the `langchain` package.
- Use `ruff format` in experimental package.
- Format changes in experimental package by `ruff format`.
- Manual formatting fixes to make `ruff .` pass.
I always take 20-30 seconds to re-discover where the
`convert_to_openai_function` wrapper lives in our codebase. Chat
langchain [has no
clue](https://smith.langchain.com/public/3989d687-18c7-4108-958e-96e88803da86/r)
what to do either. There's the older `create_openai_fn_chain` , but we
haven't been recommending it in LCEL. The example we show in the
[cookbook](https://python.langchain.com/docs/expression_language/how_to/binding#attaching-openai-functions)
is really verbose.
General function calling should be as simple as possible to do, so this
seems a bit more ergonomic to me (feel free to disagree). Another option
would be to directly coerce directly in the class's init (or when
calling invoke), if provided. I'm not 100% set against that. That
approach may be too easy but not simple. This PR feels like a decent
compromise between simple and easy.
```
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field
class Category(str, Enum):
"""The category of the issue."""
bug = "bug"
nit = "nit"
improvement = "improvement"
other = "other"
class IssueClassification(BaseModel):
"""Classify an issue."""
category: Category
other_description: Optional[str] = Field(
description="If classified as 'other', the suggested other category"
)
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI().bind_functions([IssueClassification])
llm.invoke("This PR adds a convenience wrapper to the bind argument")
# AIMessage(content='', additional_kwargs={'function_call': {'name': 'IssueClassification', 'arguments': '{\n "category": "improvement"\n}'}})
```
- Prefer lambda type annotations over inferred dict schema
- For sequences that start with RunnableAssign infer seq input type as
"input type of 2nd item in sequence - output type of runnable assign"
Replace this entire comment with:
-Add MultiOn close function and update key value and add async
functionality
- solved the key value TabId not found.. (updated to use latest key
value)
@hwchase17
- **Description:** This pull request removes secrets present in raw
format,
- **Issue:** Fireworks api key was exposed when printing out the
langchain object
[#12165](https://github.com/langchain-ai/langchain/issues/12165)
- **Maintainer:** @eyurtsev
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Textract PDF Loader generating linearized output,
meaning it will replicate the structure of the source document as close
as possible based on the features passed into the call (e. g. LAYOUT,
FORMS, TABLES). With LAYOUT reading order for multi-column documents or
identification of lists and figures is supported and with TABLES it will
generate the table structure as well. FORMS will indicate "key: value"
with columms.
- **Issue:** the issue fixes#12068
- **Dependencies:** amazon-textract-textractor is added, which provides
the linearization
- **Tag maintainer:** @3coins
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
can get the correct token count instead of using gpt-2 model
**Description:**
Implement get_num_tokens within VertexLLM to use google's count_tokens
function.
(https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count).
So we don't need to download gpt-2 model from huggingface, also when we
do the mapreduce chain we can get correct token count.
**Tag maintainer:**
@lkuligin
**Twitter handle:**
My twitter: @abehsu1992626
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Following this tutoral about using OpenAI Embeddings with FAISS
https://python.langchain.com/docs/integrations/vectorstores/faiss
```python
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
from langchain.document_loaders import TextLoader
loader = TextLoader("../../../extras/modules/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
```
This works fine
```python
db = FAISS.from_documents(docs, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
```
But the async version is not
```python
db = await FAISS.afrom_documents(docs, embeddings) # NotImplementedError
query = "What did the president say about Ketanji Brown Jackson"
docs = await db.asimilarity_search(query) # this will use await asyncio.get_event_loop().run_in_executor under the hood and will not call OpenAIEmbeddings.aembed_query but call OpenAIEmbeddings.embed_query
```
So this PR add async/await supports for FAISS
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- Description: adding support to Activeloop's DeepMemory feature that
boosts recall up to 25%. Added Jupyter notebook showcasing the feature
and also made index params explicit.
- Twitter handle: will really appreciate if we could announce this on
twitter.
---------
Co-authored-by: adolkhan <adilkhan.sarsen@alumni.nu.edu.kz>
Hey, we're looking to invest more in adding cohere integrations to
langchain so would love to get more of an idea for how it's used.
Hopefully this pr is acceptable. This week I'm also going to be looking
into adding our new [retrieval augmented generation
product](https://txt.cohere.com/chat-with-rag/) to langchain.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
## **Description:**
When building our own readthedocs.io scraper, we noticed a couple
interesting things:
1. Text lines with a lot of nested <span> tags would give unclean text
with a bunch of newlines. For example, for [Langchain's
documentation](https://api.python.langchain.com/en/latest/document_loaders/langchain.document_loaders.readthedocs.ReadTheDocsLoader.html#langchain.document_loaders.readthedocs.ReadTheDocsLoader),
a single line is represented in a complicated nested HTML structure, and
the naive `soup.get_text()` call currently being made will create a
newline for each nested HTML element. Therefore, the document loader
would give a messy, newline-separated blob of text. This would be true
in a lot of cases.
<img width="945" alt="Screenshot 2023-10-26 at 6 15 39 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/eca85d1f-d2bf-4487-a18a-e1e732fadf19">
<img width="1031" alt="Screenshot 2023-10-26 at 6 16 00 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/035938a0-9892-4f6a-83cd-0d7b409b00a3">
Additionally, content from iframes, code from scripts, css from styles,
etc. will be gotten if it's a subclass of the selector (which happens
more often than you'd think). For example, [this
page](https://pydeck.gl/gallery/contour_layer.html#) will scrape 1.5
million characters of content that looks like this:
<img width="1372" alt="Screenshot 2023-10-26 at 6 32 55 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/dbd89e39-9478-4a18-9e84-f0eb91954eac">
Therefore, I wrote a recursive _get_clean_text(soup) class function that
1. skips all irrelevant elements, and 2. only adds newlines when
necessary.
2. Index pages (like [this
one](https://api.python.langchain.com/en/latest/api_reference.html))
would be loaded, chunked, and eventually embedded. This is really bad
not just because the user will be embedding irrelevant information - but
because index pages are very likely to show up in retrieved content,
making retrieval less effective (in our tests). Therefore, I added a
bool parameter `exclude_index_pages` defaulted to False (which is the
current behavior — although I'd petition to default this to True) that
will skip all pages where links take up 50%+ of the page. Through manual
testing, this seems to be the best threshold.
## Other Information:
- **Issue:** n/a
- **Dependencies:** n/a
- **Tag maintainer:** n/a
- **Twitter handle:** @andrewthezhou
---------
Co-authored-by: Andrew Zhou <andrew@heykona.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:**
* Add unit tests for document_transformers/beautiful_soup_transformer.py
* Basic functionality is tested (extract tags, remove tags, drop lines)
* add a FIXME comment about the order of tags that is not preserved
(and a passing test, but with the expected tags now out-of-order)
- **Issue:** None
- **Dependencies:** None
- **Tag maintainer:** @rlancemartin
- **Twitter handle:** `peter_v`
Please make sure your PR is passing linting and testing before
submitting.
=> OK: I ran `make format`, `make test` (passing after install of
beautifulsoup4) and `make lint`.
- **Description:** Added masking of the API Key for AI21 LLM when
printed and improved the docstring for AI21 LLM.
- Updated the AI21 LLM to utilize SecretStr from pydantic to securely
manage API key.
- Made improvements in the docstring of AI21 LLM. It now mentions that
the API key can also be passed as a named parameter to the constructor.
- Added unit tests.
- **Issue:** #12165
- **Tag maintainer:** @eyurtsev
---------
Co-authored-by: Anirudh Gautam <anirudh@Anirudhs-Mac-mini.local>
Currently this gives a bug:
```
from langchain.schema.runnable import RunnableLambda
bound = RunnableLambda(lambda x: x).with_config({"callbacks": []})
# ConfigError: field "callbacks" not yet prepared so type is still a ForwardRef, you might need to call RunnableConfig.update_forward_refs().
```
Rather than deal with cyclic imports and extra load time, etc., I think
it makes sense to just have a separate Callbacks definition here that is
a relaxed typehint.
1. Allow run evaluators to return {"results": [list of evaluation
results]} in the evaluator callback.
2. Allows run evaluators to pick the target run ID to provide feedback
to
(1) means you could do something like a function call that populates a
full rubric in one go (not sure how reliable that is in general though)
rather than splitting off into separate LLM calls - cheaper and less
code to write
(2) means you can provide feedback to runs on subsequent calls.
Immediate use case is if you wanted to add an evaluator to a chat bot
and assign to assign to previous conversation turns
have a corresponding one in the SDK
In the GoogleSerperResults class, the name field is defined as
'google_serrper_results_json'. This looks like a typo, and perhaps
should be 'google_serper_results_json'.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Add Redis langserve template! Eventually will add semantic caching to
this too. But I was struggling to get that to work for some reason with
the LCEL implementation here.
- **Description:** Introduces the Redis LangServe template. A simple RAG
based app built on top of Redis that allows you to chat with company's
public financial data (Edgar 10k filings)
- **Issue:** None
- **Dependencies:** The template contains the poetry project
requirements to run this template
- **Tag maintainer:** @baskaryan @Spartee
- **Twitter handle:** @tchutch94
**Note**: this requires the commit here that deletes the
`_aget_relevant_documents()` method from the Redis retriever class that
wasn't implemented. That was breaking the langserve app.
---------
Co-authored-by: Sam Partee <sam.partee@redis.com>
-**Description** Adds returning the reranking score when using semantic
search
-**Issue:* #12317
---------
Co-authored-by: Adam Law <adamlaw@microsoft.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Improve handling of empty queries in timescale-vector.
For timescale-vector it is more efficient to get a None embedding when
the embedding has no semantic meaning. It allows timescale-vector to
perform more optimizations. Thus, when the query is empty, use a None
embedding.
Also pass down constructor arguments to the timescale vector client.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This code path is hit in the following case:
- Start in langchain code and manually provide a tracer
- Handoff to the traceable
- Hand back to langchain code.
Which happens for evaluating `@traceable` functions unfortunately
- **Description: To handle the hybrid search with RRF(Reciprocal Rank
Fusion) in the Elasticsearch, rrf argument was added for adjusting
'rank_constant' and 'window_size' to combine multiple result sets with
different relevance indicators into a single result set. (ref:
https://www.elastic.co/kr/blog/whats-new-elastic-enterprise-search-8-9-0),
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** No dependencies changed,
- **Tag maintainer:** @baskaryan,
Nice to meet you,
I'm a newbie for contributions and it's my first PR.
I only changed the langchain/vectorstores/elasticsearch.py file.
I did make format&lint
I got this message,
```shell
make lint_diff
./scripts/check_pydantic.sh .
./scripts/check_imports.sh
poetry run ruff .
[ "langchain/vectorstores/elasticsearch.py" = "" ] || poetry run black langchain/vectorstores/elasticsearch.py --check
All done! ✨🍰✨
1 file would be left unchanged.
[ "langchain/vectorstores/elasticsearch.py" = "" ] || poetry run mypy langchain/vectorstores/elasticsearch.py
langchain/__init__.py: error: Source file found twice under different module names: "mvp.nlp.langchain.libs.langchain.langchain" and "langchain"
Found 1 error in 1 file (errors prevented further checking)
make: *** [lint_diff] Error 2
```
Thank you
---------
Co-authored-by: 황중원 <jwhwang@amorepacific.com>
My postgres out of connections after continuous PGVector usage, and the
reason because it constantly creates new connections, so adding a
reusable pre established connection seems like solves an issue
---------
Co-authored-by: Roman Vasilyev <rvasilyev@mozilla.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
See discussion here:
https://github.com/langchain-ai/langchain/discussions/11680
The code is available for usage from langchain_experimental. The reason
for the deprecation is that the agents are relying on a Python REPL. The
code can only be run safely with appropriate sandboxing.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
The changes introduced in #12267 and #12190 broke the cost computation
of the `completion` tokens for fine-tuned models because of the early
return. This PR aims at fixing this.
@baskaryan.
**Description:**
Revise `libs/langchain/langchain/document_loaders/async_html.py` to
store the HTML Title and Page Language in the `metadata` of
`AsyncHtmlLoader`.
Compare predicted json to reference. First canonicalize (sort keys, rm
whitespace separators), then return normalized string edit distance.
Not a silver bullet but maybe an easy way to capture structure
differences in a less flakey way
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Will run all CI because of _test change, but future PRs against CLI will
only trigger the new CLI one
Has a bunch of file changes related to formatting/linting.
No mypy yet - coming soon
**Description**
This small change will make chunk_size a configurable parameter for
loading documents into a Supabase database.
**Issue**
https://github.com/langchain-ai/langchain/issues/11422
**Dependencies**
No chanages
**Twitter**
@ j1philli
**Reminder**
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
---------
Co-authored-by: Greg Richardson <greg.nmr@gmail.com>
Description
* Add _generate and _agenerate to support Fireworks batching.
* Add stop words test cases
* Opt out retry mechanism
Issue - Not applicable
Dependencies - None
Tag maintainer - @baskaryan
- **Description:** refactors the redis vector field schema to properly
handle default values, includes a new unit test suite.
- **Issue:** N/A
- **Dependencies:** nothing new.
- **Tag maintainer:** @baskaryan @Spartee
- **Twitter handle:** this is a tiny fix/improvement :)
This issue was causing some clients/cuatomers issues when building a
vector index on Redis on smaller db instances (due to fault default
values in index configuration). It would raise an error like:
```redis.exceptions.ResponseError: Vector index initial capacity 20000 exceeded server limit (852 with the given parameters)```
This PR will address this moving forward.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
This PR replaces the previous `Intent` check with the new `Prompt
Safety` check. The logic and steps to enable chain moderation via the
Amazon Comprehend service, allowing you to detect and redact PII, Toxic,
and Prompt Safety information in the LLM prompt or answer remains
unchanged.
This implementation updates the code and configuration types with
respect to `Prompt Safety`.
### Usage sample
```python
from langchain_experimental.comprehend_moderation import (BaseModerationConfig,
ModerationPromptSafetyConfig,
ModerationPiiConfig,
ModerationToxicityConfig
)
pii_config = ModerationPiiConfig(
labels=["SSN"],
redact=True,
mask_character="X"
)
toxicity_config = ModerationToxicityConfig(
threshold=0.5
)
prompt_safety_config = ModerationPromptSafetyConfig(
threshold=0.5
)
moderation_config = BaseModerationConfig(
filters=[pii_config, toxicity_config, prompt_safety_config]
)
comp_moderation_with_config = AmazonComprehendModerationChain(
moderation_config=moderation_config, #specify the configuration
client=comprehend_client, #optionally pass the Boto3 Client
verbose=True
)
template = """Question: {question}
Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question"])
responses = [
"Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like 323-22-9980. John Doe's phone number is (999)253-9876.",
"Final Answer: This is a really shitty way of constructing a birdhouse. This is fucking insane to think that any birds would actually create their motherfucking nests here."
]
llm = FakeListLLM(responses=responses)
llm_chain = LLMChain(prompt=prompt, llm=llm)
chain = (
prompt
| comp_moderation_with_config
| {llm_chain.input_keys[0]: lambda x: x['output'] }
| llm_chain
| { "input": lambda x: x['text'] }
| comp_moderation_with_config
)
try:
response = chain.invoke({"question": "A sample SSN number looks like this 123-456-7890. Can you give me some more samples?"})
except Exception as e:
print(str(e))
else:
print(response['output'])
```
### Output
```python
> Entering new AmazonComprehendModerationChain chain...
Running AmazonComprehendModerationChain...
Running pii Validation...
Running toxicity Validation...
Running prompt safety Validation...
> Finished chain.
> Entering new AmazonComprehendModerationChain chain...
Running AmazonComprehendModerationChain...
Running pii Validation...
Running toxicity Validation...
Running prompt safety Validation...
> Finished chain.
Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like XXXXXXXXXXXX John Doe's phone number is (999)253-9876.
```
---------
Co-authored-by: Jha <nikjha@amazon.com>
Co-authored-by: Anjan Biswas <anjanavb@amazon.com>
Co-authored-by: Anjan Biswas <84933469+anjanvb@users.noreply.github.com>
**Description:**
This PR adds support for the [Pro version of Titan Takeoff
Server](https://docs.titanml.co/docs/category/pro-features). Users of
the Pro version will have to import the TitanTakeoffPro model, which is
different from TitanTakeoff.
**Issue:**
Also minor fixes to docs for Titan Takeoff (Community version)
**Dependencies:**
No additional dependencies
**Twitter handle:** @becoming_blake
@baskaryan @hwchase17
- **Description:**
This PR adds `allowd_operators` property to `QdrantTranslator` to fix
the `TypeError: can only join an iterable` bug. This property is
required in `get_query_constructor_prompt` in
`query_constructor\base.py`:
```
allowed_operators=" | ".join(allowed_operators),
```
- **Issue:**
#12061
---------
Co-authored-by: XIE Qihui <qihui.xie@bopufund.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Jacob Lee <jacoblee93@gmail.com>
If user function is wrapped as a traceable function, this will help hand
off the trace between the two.
Also update handling fields to reflect optional values
- **Description**: Fix for the SPARQL QA chain: fixed SPARQL queries for
retrieving information about relations in the graph to create a textual
description of the schema for the language model. This should resolve
#8907
- **Issue**: #8907
- **Dependencies**: None
- **Tag maintainer**: @baskaryan, @hwchase17
**Description:** When llms output leading or trailing whitespace for xml
(when using XMLOutputParser) the parser would raise a `ValueError: Could
not parse output: ...`. However, leading or trailing whitespace are
"ignorable" in the sense of XML standard.
**Issue:** I did not find an issue related.
**Dependencies:** None
**Tag maintainer:**
**Twitter handle:** donatoaz
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
Done, updated unit test and ran `make docker_test`.
- **Description:** Response parser for arcee retriever,
- **Issue:** follow-up pr on #11578 and
[discussion](https://github.com/arcee-ai/arcee-python/issues/15#issuecomment-1759874053),
- **Dependencies:** NA
This pr implements a parser for the response from ArceeRetreiver to
convert to langchain `Document`. This closes the loop of generation and
retrieval for Arcee DALMs in langchain.
The reference for the response parser is
[api-docs:retrieve](https://api.arcee.ai/docs#/v2/retrieve_model)
Attaching screenshot of working implementation:
<img width="1984" alt="Screenshot 2023-10-25 at 7 42 34 PM"
src="https://github.com/langchain-ai/langchain/assets/65639964/026987b9-34b2-4e4b-b87d-69fcd0c6641a">
\*api key deleted
---
Successful tests, lints, etc.
```shell
Re-run pytest with --snapshot-update to delete unused snapshots.
==================================================================================================================== slowest 5 durations =====================================================================================================================
1.56s call tests/unit_tests/schema/runnable/test_runnable.py::test_retrying
0.63s call tests/unit_tests/schema/runnable/test_runnable.py::test_map_astream
0.33s call tests/unit_tests/schema/runnable/test_runnable.py::test_map_stream_iterator_input
0.30s call tests/unit_tests/schema/runnable/test_runnable.py::test_map_astream_iterator_input
0.20s call tests/unit_tests/indexes/test_indexing.py::test_cleanup_with_different_batchsize
======================================================================================================= 1265 passed, 270 skipped, 32 warnings in 6.55s =======================================================================================================
[ "." = "" ] || poetry run black .
All done! ✨🍰✨
1871 files left unchanged.
[ "." = "" ] || poetry run ruff --select I --fix .
./scripts/check_pydantic.sh .
./scripts/check_imports.sh
poetry run ruff .
[ "." = "" ] || poetry run black . --check
All done! ✨🍰✨
1871 files would be left unchanged.
[ "." = "" ] || poetry run mypy .
Success: no issues found in 1868 source files
poetry run codespell --toml pyproject.toml
poetry run codespell --toml pyproject.toml -w
```
Co-authored-by: Shubham Kushwaha <shwu@Shubhams-MacBook-Pro.local>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
**Description:**
Documents further usage of RetrievalQAWithSourcesChain in an existing
test. I'd not found much documented usage of RetrievalQAWithSourcesChain
and how to get the sources out. This additional code will hopefully be
useful to other potential users of this retriever.
**Issue:** No raised issue
**Dependencies:** No new dependencies needed to run the test (it already
needs `open-ai`, `faiss-cpu` and `unstructured`).
Note - `make lint` showed 8 linting errors in unrelated files
---------
Co-authored-by: richarda23 <richard.c.adams@infinityworks.com>
If I go traceable -> runnable when the project is manually specified,
the runnable wont be logged. This makes sure the session/project is
threaded through appropriately.
This PR adds a data [E2B's](https://e2b.dev/) analysis/code interpreter
sandbox as a tool
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Jakub Novak <jakub@e2b.dev>
* Add a type literal for the generation and sub-classes for serialization purposes.
* Fix the root validator of ChatGeneration to return ValueError instead of KeyError or Attribute error if intialized improperly.
* This change is done for langserve to make sure that llm related callbacks can be serialized/deserialized properly.
Fix Description:
For Redis Vector integration in add_texts method, there were two issues
that lead to this bug.
1. Vector index is not being created leading to no such_index error
2. `doc:index` prefix was also missing for Redis Keys.
resolves#11197
Maintainer: @baskaryan
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
Add cost calculation for fine tuned models (new and legacy), this is
required after OpenAI added new models for fine tuning and separated the
costs of I/O for fine tuned models.
Also I updated the relevant unit tests
see https://platform.openai.com/docs/guides/fine-tuning for more
information.
issue: https://github.com/langchain-ai/langchain/issues/11715
- **Issue:** 11715
- **Twitter handle:** @nirkopler
- replace `requests` package with `langchain.requests`
- add `_acall` support
- add `_stream` and `_astream`
- freshen up the documentation a bit
- update vendor doc
Allows for passing arguments into the LLM chains used by the
GraphCypherQAChain. This is to address a request by a user to include
memory in the Cypher creating chain. Will keep the prompt variables
as-is to be backward compatible. But, would be a good idea to deprecate
them and use the **kwargs variables. Added a test case.
In general, I think it would be good for any chain to automatically pass
in a readonlymemory(of its input) to its subchains whilist allowing for
an override. But, this would be a different change.
- **Description:**
Add missing apostrophe in `user's` in stuff_prompt's system_template.
The first sentence in the system template went from:
> Use the following pieces of context to answer the users question.
to
> Use the following pieces of context to answer the user's question.
- **Issue:**
- **Dependencies:** none
- **Tag maintainer:** @baskaryan
- **Twitter handle:** ojohnnyo
- This is used internally to gather aggregate usage metrics for the
LangChain integrations
- Note: This cannot be added to some of the Vertex AI integrations at
this time because the SDK doesn't allow overriding the
[`ClientInfo`](https://googleapis.dev/python/google-api-core/latest/client_info.html#module-google.api_core.client_info)
- Added to:
- BigQuery
- Google Cloud Storage
- Document AI
- Vertex AI Model Garden
- Document AI Warehouse
- Vertex AI Search
- Vertex AI Matching Engine (Cloud Storage Client)
@baskaryan, @eyurtsev, @hwchase17
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description:** In the max_marginal_relevance_search function of the
ElasticsearchStore vector store, the name of the field corresponding to
the vector embedding of the document is hard coded in the delete
statement that drops the field from the document metadata. This results
in an exception if the vector embedding field is customized. This PR
changes the hard-coded "vector" into the vector_query_field variable.
- **Issue:** None
- **Dependencies:** None
- **Tag maintainer:** @hwchase17
Co-authored-by: Shilong Dai <sdai@viperfish.net>
**Description: Allow to inject boto3 client for Cross account access
type of scenarios in using SagemakerEndpointEmbeddings and also updated
the documentation for same in the sample notebook**
**Issue:SagemakerEndpointEmbeddings cross account capability #10634
#10184**
Dependencies: None
Tag maintainer:
Twitter handle:lethargicoder
Co-authored-by: Vikram(VS) <vssht@amazon.com>
- **Description:** sqlalchemy create_engine() does not take into account
connect_args which are mandatory for managed PGSQL instances on cloud
providers (ssl_context for example).
Also re-enabled create_vector_extension at post_init for using pgvector
class seamlessly
- **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17.
---------
Co-authored-by: Sami Bargaoui <bargaoui.sam@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
If non-pickleable objects (like locks) get passed to the tracing
callback, they'll fail in the deepcopy. Fallback to a shallow copy in
these instances .
We don't use any of the new functionality at the moment. Just making
sure we don't fall back on versions and fail to benefit from new
patches. This is an easy upgrade and it's always harder to upgrade
across multiple major versions at once.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Adding Tavily Search API as a tool. I will be the maintainer and
assaf_elovic is the twitter handler.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Current ChatTongyi is not compatible with DashScope API, which will
cause error when passing api key to chat model directly.
- **Description:** Update tongyi.py to be compatible with DashScope API.
Specifically, update parameter name "dashscope_api_key" to "api_key".
- **Issue:** None.
- **Dependencies:** Nothing new, Tongyi would require DashScope as
before.
- **Description:** Implementing the Google Scholar Tool as requested in
PR #11505. The tool will be using the [serpapi python
package](https://serpapi.com/integrations/python#search-google-scholar).
The main idea of the tool will be to return the results from a Google
Scholar search given a query as an input to the tool.
- **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17
- Fixes error:
```
ValueError: "GoogleVertexAISearchRetriever" object has no field "_serving_config"
```
Introduced in #11736
@baskaryan, @eyurtsev, @hwchase17 if you could review and merge quickly,
that would be appreciated :)
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** The return info in the documentation for
similarity_search_by_vector and similarity_search_with_relevance_scores
is wrong
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
This reverts commit a46eef64a7.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** Provide a way to use different text for embedding.
- For example, if you are ingesting stack-overflow Q&As for RAG, you
would want to embed the questions and return the answer(s) for the hits.
With this change, the consumer of langchain can implement that easily.
- I noticed the similar function is added on faiss.py with #1912 which
was for performance reason, but I see the same function can be used to
achieve what I thought. So instead of changing Document class to have
embedding_content, I mimicked the implementation of faiss.py.
- The test should provide some guidance on how to use it. It would be
more intuitive if I just pass texts and embedding_texts as separate
arguments, but I chose to use `zip`-ed object for the consistency with
faiss.py implementation.
- I plan to make similar pull request for OpenSearch.
- **Issue:** N/A
- **Dependencies:** None other than the existing ones.
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Adding Pydantic v2 support for OpenAPI Specs
- **Issue:**
- OpenAPI spec support was disabled because `openapi-schema-pydantic`
doesn't support Pydantic v2:
#9205
- Caused errors in `get_openapi_chain`
- This may be the cause of #9520.
- **Tag maintainer:** @eyurtsev
- **Twitter handle:** kreneskyp
The root cause was that `openapi-schema-pydantic` hasn't been updated in
some time but
[openapi-pydantic](https://github.com/mike-oakley/openapi-pydantic)
forked and updated the project.
Updated the elasticsearch self query retriever to use the match clause
for LIKE operator instead of the non-analyzed fuzzy search clause.
Other small updates include:
- fixing the stack inference integration test where the index's default
pipeline didn't use the inference pipeline created
- adding a user-agent to the old implementation to track usage
- improved the documentation for ElasticsearchStore filters
### Description:
To provide an eas llm service access methods in this pull request by
impletementing `PaiEasEndpoint` and `PaiEasChatEndpoint` classes in
`langchain.llms` and `langchain.chat_models` modules. Base on this pr,
langchain users can build up a chain to call remote eas llm service and
get the llm inference results.
### About EAS Service
EAS is a Alicloud product on Alibaba Cloud Machine Learning Platform for
AI which is short for AliCloud PAI. EAS provides model inference
deployment services for the users. We build up a llm inference services
on EAS with a general llm docker images. Therefore, end users can
quickly setup their llm remote instances to load majority of the
hugginface llm models, and serve as a backend for most of the llm apps.
### Dependencies
This pr does't involve any new dependencies.
---------
Co-authored-by: 子洪 <gaoyihong.gyh@alibaba-inc.com>
Description: Supported RetryOutputParser & RetryWithErrorOutputParser
max_retries
- max_retries: Maximum number of retries to parser.
Issue: None
Dependencies: None
Tag maintainer: @baskaryan
Twitter handle:
We now require uses to have the pip package `llmonitor` installed. It
allows us to have cleaner code and avoid duplicates between our library
and our code in Langchain.
FAISS does not implement embeddings method and use embed_query to
embedding texts which is wrong for some embedding models.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
feat: Raise KeyError when 'prompt' key is missing in JSON response
This commit updates the error handling in the code to raise a KeyError
when the 'prompt' key is not found in the JSON response. This change
makes the code more explicit about the nature of the error, helping to
improve clarity and debugging.
@baskaryan, @eyurtsev.
I may be missing something but it seems like we inappropriately overrode
the 'stream()' method, losing callbacks in the process. I don't think
(?) it gave us anything in this case to customize it here?
See new trace:
https://smith.langchain.com/public/fbb82825-3a16-446b-8207-35622358db3b/r
and confirmed it streams.
Also fixes the stopwords issues from #12000
- **Description:** According to the document
https://cloud.baidu.com/doc/WENXINWORKSHOP/s/clntwmv7t, add ERNIE-Bot-4
model support for ErnieBotChat.
- **Dependencies:** Before using the ERNIE-Bot-4, you should have the
model's access authority.
By default replace input_variables with the correct value
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
.dict() is a Pydantic method that cannot raise exceptions, as it is used
eg. in `__eq__`
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Type hinting `*args` as `List[Any]` means that each positional argument
should be a list. Type hinting `**kwargs` as `Dict[str, Any]` means that
each keyword argument should be a dict of strings.
This is almost never what we actually wanted, and doesn't seem to be
what we want in any of the cases I'm replacing here.
- Update Zep Memory and Retriever docstrings
- Zep Memory Retriever: Add support for native MMR
- Add MMR example to existing ZepRetriever Notebook
@baskaryan
Example
```
from langchain.schema.runnable import RunnableLambda
from langsmith import traceable
chain = RunnableLambda(lambda x: x)
@traceable(run_type = "chain")
def my_traceable(a):
chain.invoke(a)
my_traceable(5)
```
Would have a nested result.
This would NOT work for interleaving chains and traceables. E.g., things
like thiswould still not work well
```
from langchain.schema.runnable import RunnableLambda
from langsmith import traceable
@traceable()
def other_traceable(a):
return a
def foo(x):
return other_traceable(x)
chain = RunnableLambda(foo)
@traceable(run_type = "chain")
def my_traceable(a):
chain.invoke(a)
my_traceable(5)
```
Minor lint dependency version upgrade to pick up latest functionality.
Ruff's new v0.1 version comes with lots of nice features, like
fix-safety guarantees and a preview mode for not-yet-stable features:
https://astral.sh/blog/ruff-v0.1.0
- **Description:** Chroma >= 0.4.10 added support for batch sizes
validation of add/upsert. This batch size is dependent on the SQLite
limits of the target system and varies. In this change, for
Chroma>=0.4.10 batch splitting was added as the aforementioned
validation is starting to surface in the Chroma community (users using
LC)
- **Issue:** N/A
- **Dependencies:** N/A
- **Tag maintainer:** @eyurtsev
- **Twitter handle:** t_azarov
Description: A large language models developed by Baichuan Intelligent
Technology,https://www.baichuan-ai.com/home
Issue: None
Dependencies: None
Tag maintainer:
Twitter handle:
This adds security notices to toolkits init, and to several toolkits.
We'll need to continue documenting the rest of the toolkits.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
the updated value was:
` Criteria.MISOGYNY: "Is the submission misogynistic? If so, respond Y."
`
The " If so, respond Y." should not be here. This sub-string is not
presented in any other criteria and should not be presented here.
I also added a synonym to "misogynistic" as it done in many other
criteria.
**Description**
- Added the `SingleStoreDBChatMessageHistory` class that inherits
`BaseChatMessageHistory` and allows to use of a SingleStoreDB database
as a storage for chat message history.
- Added integration test to check that everything works (requires
`singlestoredb` to be installed)
- Added notebook with usage example
- Removed custom retriever for SingleStoreDB vector store (as it is
useless)
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
## Description
| Tool | Original Tool Name |
|-----------------------------|---------------------------|
| open-meteo-api | Open Meteo API |
| news-api | News API |
| tmdb-api | TMDB API |
| podcast-api | Podcast API |
| golden_query | Golden Query |
| dall-e-image-generator | Dall-E Image Generator |
| twilio | Text Message |
| searx_search_results | Searx Search Results |
| dataforseo | DataForSeo Results JSON |
When using these tools through `load_tools`, I encountered the following
validation error:
```console
openai.error.InvalidRequestError: 'TMDB API' does not match '^[a-zA-Z0-9_-]{1,64}$' - 'functions.0.name'
```
In order to avoid this error, I replaced spaces with hyphens in the tool
names:
| Tool | Corrected Tool Name |
|-----------------------------|---------------------------|
| open-meteo-api | Open-Meteo-API |
| news-api | News-API |
| tmdb-api | TMDB-API |
| podcast-api | Podcast-API |
| golden_query | Golden-Query |
| dall-e-image-generator | Dall-E-Image-Generator |
| twilio | Text-Message |
| searx_search_results | Searx-Search-Results |
| dataforseo | DataForSeo-Results-JSON |
This correction resolved the validation error.
Additionally, a unit test,
`tests/unit_tests/schema/runnable/test_runnable.py::test_stream_log_retriever`,
was failing at random. Upon further investigation, I confirmed that the
failure was not related to the above-mentioned changes. The `stream_log`
variable was generating the order of logs in two ways at random The
reason for this behavior is unclear, but in the assertion, I included
both possible orders to account for this variability.
Hello Folks,
Alibaba Cloud OpenSearch has released a new version of the vector
storage engine, which has significantly improved performance compared to
the previous version. At the same time, the sdk has also undergone
changes, requiring adjustments alibaba opensearch vector store code to
adapt.
This PR includes:
Adapt to the latest version of Alibaba Cloud OpenSearch API.
More comprehensive unit testing.
Improve documentation.
I have read your contributing guidelines. And I have passed the tests
below
- [x] make format
- [x] make lint
- [x] make coverage
- [x] make test
---------
Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
**Description:**
While working on the Docusaurus site loader #9138, I noticed some
outdated docs and tests for the Sitemap Loader.
**Issue:**
This is tangentially related to #6691 in reference to doc links. I plan
on digging in to a few of these issue when I find time next.
- **Description:** added examples to Vertex chat models as optional
class attributes, so that a model with examples can be used inside a
chain
- **Twitter handle:** lkuligin
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Related to #10800
- Errors in the Docstring of GradientLLM / Gradient.ai LLM
- Renamed the `model_id` to `model` and adapting this in all tests.
Reason to so is to be in Sync with `GradientEmbeddings` and other LLM's.
- inmproving tests so they check the headers in the sent request.
- making the aiosession a private attribute in the docs, as in the
future `pip install gradientai` will be replacing aiosession.
- adding a example how to fine-tune on the Prompt Template as suggested
in #10800
- Description: Adds the ChatEverlyAI class with llama-2 7b on [EverlyAI
Hosted
Endpoints](https://everlyai.xyz/)
- It inherits from ChatOpenAI and requires openai (probably unnecessary
but it made for a quick and easy implementation)
---------
Co-authored-by: everly-studio <127131037+everly-studio@users.noreply.github.com>
- **Description:**
- If the Elasticsearch field used for Langchain > Document.page_content
is missing because the specific document is
somehow malformed fail gracefully.
- **Tag maintainer:**
- @joemcelroy
Reverts langchain-ai/langchain#11714
This has linting and formatting issues, plus it's added to chat models
folder but doesn't subclass Chat Model base class
Motivation and Context
At present, the Baichuan Large Language Model is relatively popular and
efficient in performance. Due to widespread market recognition, this
model has been added to enhance the scalability of Langchain's ability
to access the big language model, so as to facilitate application access
and usage for interested users.
System Info
langchain: 0.0.295
python:3.8.3
IDE:vs code
Description
Add the following files:
1. Add baichuan_baichuaninc_endpoint.py in the
libs/langchain/langchain/chat_models
2. Modify the __init__.py file,which is located in the
libs/langchain/langchain/chat_models/__init__.py:
a. Add "from langchain.chat_models.baichuan_baichuaninc_endpoint import
BaichuanChatEndpoint"
b. Add "BaichuanChatEndpoint" In the file's __ All__ method
Your contribution
I am willing to help implement this feature and submit a PR, but I would
appreciate guidance from the maintainers or community to ensure the
changes are made correctly and in line with the project's standards and
practices.
- **Description:** Add `TrainableLLM` for those LLM support fine-tuning
- **Tag maintainer:** @hwchase17
This PR add training methods to `GradientLLM`
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Hi there
This PR is aim to implement chat model for Alibaba Tongyi LLM model. It
contains work below:
1.Implement ChatTongyi chat model in langchain.chat_models.tongyi. Note
this is different with tongyi llm model to another PR
https://github.com/langchain-ai/langchain/pull/10878.
For detail it implements _generate() and _stream() function in
ChatTongyi.
2. Add some examples in chat/tongyi.ipynb.
3. Add integration test in chat_models/test_tongyi.py
Note async completion for the Text API is not yet supported.
Dependencies: dashscope. It will be installed manually cause it is not
need by everyone.
**Description**
This PR adds the `ElasticsearchChatMessageHistory` implementation that
stores chat message history in the configured
[Elasticsearch](https://www.elastic.co/elasticsearch/) deployment.
```python
from langchain.memory.chat_message_histories import ElasticsearchChatMessageHistory
history = ElasticsearchChatMessageHistory(
es_url="https://my-elasticsearch-deployment-url:9200", index="chat-history-index", session_id="123"
)
history.add_ai_message("This is me, the AI")
history.add_user_message("This is me, the human")
```
**Dependencies**
- [elasticsearch client](https://elasticsearch-py.readthedocs.io/)
required
Co-authored-by: Bagatur <baskaryan@gmail.com>
Instead of accessing `langchain.debug`, `langchain.verbose`, or
`langchain.llm_cache`, please use the new getter/setter functions in
`langchain.globals`:
- `langchain.globals.set_debug()` and `langchain.globals.get_debug()`
- `langchain.globals.set_verbose()` and
`langchain.globals.get_verbose()`
- `langchain.globals.set_llm_cache()` and
`langchain.globals.get_llm_cache()`
Using the old globals directly will now raise a warning.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:**
Add a document loader for the RSpace Electronic Lab Notebook
(www.researchspace.com), so that scientific documents and research notes
can be easily pulled into Langchain pipelines.
**Issue**
This is an new contribution, rather than an issue fix.
**Dependencies:**
There are no new required dependencies.
In order to use the loader, clients will need to install rspace_client
SDK using `pip install rspace_client`
---------
Co-authored-by: richarda23 <richard.c.adams@infinityworks.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Update Indexing API docs to specify vectorstores that
are compatible with the Indexing API. I add a unit test to remind
developers to update the documentation whenever they add or change a
vectorstore in a way that affects compatibility. For the unit test I
repurposed existing code from
[here](https://github.com/langchain-ai/langchain/blob/v0.0.311/libs/langchain/langchain/indexes/_api.py#L245-L257).
This is my first PR to an open source project. This is a trivially
simple PR whose main purpose is to make me more comfortable submitting
Langchain PRs. If this PR goes through I plan to submit PRs with more
substantive changes in the near future.
**Issue:** Resolves
[10482](https://github.com/langchain-ai/langchain/discussions/10482).
**Dependencies:** No new dependencies.
**Twitter handle:** None.
Allows MMR functionality only for the case where we have access to the
embedding function. Also allows for users to request for fields from
elasticsearch store. These are added to the document metadata.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: Introducing an ability to load a transcription document of
audio file using [Yandex
SpeechKit](https://cloud.yandex.com/en-ru/services/speechkit)
Issue: None
Dependencies: yandex-speechkit
Tag maintainer: @rlancemartin, @eyurtsev
**Description**
This PR implements the usage of the correct tokenizer in Bedrock LLMs,
if using anthropic models.
**Issue:** #11560
**Dependencies:** optional dependency on `anthropic` python library.
**Twitter handle:** jtolgyesi
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Modify Anyscale integration to work with [Anyscale
Endpoint](https://docs.endpoints.anyscale.com/)
and it supports invoke, async invoke, stream and async invoke features
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Should delegate to parse_result, not to aparse, as parse_result is a
method that some output parsers override
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
**Description:** Avoid huggingfacepipeline to truncate the response if
user setup return_full_text as False within huggingface pipeline.
**Dependencies:** : None
**Tag maintainer:** Maybe @sam-h-bean ?
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** implements a retriever on top of DocAI Warehouse (to
interact with existing enterprise documents)
https://cloud.google.com/document-ai-warehouse?hl=en
- **Issue:** new functionality
@baskaryan
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
No relevant documents may be found for a given question. In some use
cases, we could directly respond with a fixed message instead of doing
an LLM call with an empty context. This PR exposes this as an option:
response_if_no_docs_found.
---------
Co-authored-by: Sudharsan Rangarajan <sudranga@nile-global.com>
Replace this entire comment with:
- **Description:** In this modified version of the function, if the
metadatas parameter is not None, the function includes the corresponding
metadata in the JSON object for each text. This allows the metadata to
be stored alongside the text's embedding in the vector store.
-
- **Issue:** #10924
- **Dependencies:** None
- **Tag maintainer:** @hwchase17
@agola11
- **Twitter handle:** @MelliJoaco
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** fixed a bug in pal-chain when it reports Python
code validation errors. When node.func does not have any ids, the
original code tried to print node.func.id in raising ValueError.
- **Issue:** n/a,
- **Dependencies:** no dependencies,
- **Tag maintainer:** @hazzel-cn, @eyurtsev
- **Twitter handle:** @lazyswamp
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
I am merely making some minor adjustments to the function documentation.
I hope to provide a small assistance to LangChain.
- **Description:** Change the docs of JSONAgentOutputParser. It will be
`JSON` better,
- **Issue:** no,
- **Dependencies:** no,
- **Tag maintainer:** @hwchase17,
- **Twitter handle:** Not worth mentioning.
**Description:** This PR adds support for ChatOpenAI models in the
Infino callback handler. In particular, this PR implements
`on_chat_model_start` callback, so that ChatOpenAI models are supported.
With this change, Infino callback handler can be used to track latency,
errors, and prompt tokens for ChatOpenAI models too (in addition to the
support for OpenAI and other non-chat models it has today). The existing
example notebook is updated to show how to use this integration as well.
cc/ @naman-modi @savannahar68
**Issue:** https://github.com/langchain-ai/langchain/issues/11607
**Dependencies:** None
**Tag maintainer:** @hwchase17
**Twitter handle:** [@vkakade](https://twitter.com/vkakade)
This PR adds support for the Azure Cosmos DB MongoDB vCore Vector Store
https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search
Summary:
- **Description:** added vector store integration for Azure Cosmos DB
MongoDB vCore Vector Store,
- **Issue:** the issue # it fixes#11627,
- **Dependencies:** pymongo dependency,
- **Tag maintainer:** @hwchase17,
- **Twitter handle:** @izzyacademy
---------
Co-authored-by: Israel Ekpo <israel.ekpo@gmail.com>
Co-authored-by: Israel Ekpo <44282278+izzyacademy@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
* Should use non chunked messages for Invoke/Batch
* After this PR, stream output type is not represented, do we want to
use the union?
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Adds standard `type` field for all messages that will be
serialized/validated by pydantic.
* The presence of `type` makes it easier for developers consuming
schemas to write client code to serialize/deserialize.
* In LangServe `type` will be used for both validation and will appear
in the generated openapi specs
Preventing error caused by attempting to move the model that was already
loaded on the GPU using the Accelerate module to the same or another
device. It is not possible to load model with Accelerate/PEFT to CPU for
now
Addresses:
[#10985](https://github.com/langchain-ai/langchain/issues/10985)
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** This is an update to OctoAI LLM provider that adds
support for llama2 endpoints hosted on OctoAI and updates MPT-7b url
with the current one.
@baskaryan
Thanks!
---------
Co-authored-by: ML Wiz <bassemgeorgi@gmail.com>
**Description:** I noticed the metadata returned by the url_selenium
loader was missing several values included by the web_base loader. (The
former returned `{source: ...}`, the latter returned `{source: ...,
title: ..., description: ..., language: ...}`.) This change fixes it so
both loaders return all 4 key value pairs.
Files have been properly formatted and all tests are passing. Note,
however, that I am not much of a python expert, so that whole "Adding
the imports inside the code so that tests pass" thing seems weird to me.
Please LMK if I did anything wrong.
- **Description:** Assigning the custom_llm_provider to the default
params function so that it will be passed to the litellm
- **Issue:** Even though the custom_llm_provider argument is being
defined it's not being assigned anywhere in the code and hence its not
being passed to litellm, therefore any litellm call which uses the
custom_llm_provider as required parameter is being failed. This
parameter is mainly used by litellm when we are doing inference via
Custom API server.
https://docs.litellm.ai/docs/providers/custom_openai_proxy
- **Dependencies:** No dependencies are required
@krrishdholakia , @baskaryan
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** This PR introduces a new LLM and Retriever API to
https://arcee.ai for the python client
- **Issue:** implements the integrations as requested in #11578 ,
- **Dependencies:** no dependencies are required,
- **Tag maintainer:** @hwchase17
- **Twitter handle:** shwooobham
**✅ `make format`, `make lint` and `make test` runs locally.**
```shell
=========== 1245 passed, 277 skipped, 20 warnings in 16.26s ===========
./scripts/check_pydantic.sh .
./scripts/check_imports.sh
poetry run ruff .
[ "." = "" ] || poetry run black . --check
All done! ✨🍰✨
1818 files would be left unchanged.
[ "." = "" ] || poetry run mypy .
Success: no issues found in 1815 source files
[ "." = "" ] || poetry run black .
All done! ✨🍰✨
1818 files left unchanged.
[ "." = "" ] || poetry run ruff --select I --fix .
poetry run codespell --toml pyproject.toml
poetry run codespell --toml pyproject.toml -w
```
**Contributions**
1. Arcee (langchain/llms), ArceeRetriever (langchain/retrievers),
ArceeWrapper (langchain/utilities)
2. docs for Arcee (llms/arcee.py) and
ArceeRetriever(retrievers/arcee.py)
3.
cc: @jacobsolawetz @ben-epstein
---------
Co-authored-by: Shubham <shubham@sORo.local>
jinja2 templates are not sandboxed and are at risk for arbitrary code
execution. To mitigate this risk:
- We no longer support loading jinja2-formatted prompt template files.
- `PromptTemplate` with jinja2 may still be constructed manually, but
the class carries a security warning reminding the user to not pass
untrusted input into it.
Resolves#4394.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
**Description:** CohereRerank is missing `cohere_api_key` as a field and
since extras are forbidden, it is not possible to pass-in the key. The
only way is to use an env variable named `COHERE_API_KEY`.
For example, if trying to create a compressor like this:
```python
cohere_api_key = "......Cohere api key......"
compressor = CohereRerank(cohere_api_key=cohere_api_key)
```
you will get the following error:
```
File "/langchain/.venv/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for CohereRerank
cohere_api_key
extra fields not permitted (type=value_error.extra)
```
- **Description:** Fixes minor typo for the
query_sql_database_tool_description in the db toolkit
- **Issue:** N/A
- **Dependencies:** N/A
- **Tag maintainer:** @nfcampos
- **Twitter handle:** N/A
LangChain relies on NumPy to compute cosine distances, which becomes a
bottleneck with the growing dimensionality and number of embeddings. To
avoid this bottleneck, in our libraries at
[Unum](https://github.com/unum-cloud), we have created a specialized
package - [SimSIMD](https://github.com/ashvardanian/simsimd), that knows
how to use newer hardware capabilities. Compared to SciPy and NumPy, it
reaches 3x-200x performance for various data types. Since publication,
several LangChain users have asked me if I can integrate it into
LangChain to accelerate their workflows, so here I am 🤗
## Benchmarking
To conduct benchmarks locally, run this in your Jupyter:
```py
import numpy as np
import scipy as sp
import simsimd as simd
import timeit as tt
def cosine_similarity_np(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
X_norm = np.linalg.norm(X, axis=1)
Y_norm = np.linalg.norm(Y, axis=1)
with np.errstate(divide="ignore", invalid="ignore"):
similarity = np.dot(X, Y.T) / np.outer(X_norm, Y_norm)
similarity[np.isnan(similarity) | np.isinf(similarity)] = 0.0
return similarity
def cosine_similarity_sp(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
return 1 - sp.spatial.distance.cdist(X, Y, metric='cosine')
def cosine_similarity_simd(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
return 1 - simd.cdist(X, Y, metric='cosine')
X = np.random.randn(1, 1536).astype(np.float32)
Y = np.random.randn(1, 1536).astype(np.float32)
repeat = 1000
print("NumPy: {:,.0f} ops/s, SciPy: {:,.0f} ops/s, SimSIMD: {:,.0f} ops/s".format(
repeat / tt.timeit(lambda: cosine_similarity_np(X, Y), number=repeat),
repeat / tt.timeit(lambda: cosine_similarity_sp(X, Y), number=repeat),
repeat / tt.timeit(lambda: cosine_similarity_simd(X, Y), number=repeat),
))
```
## Results
I ran this on an M2 Pro Macbook for various data types and different
number of rows in `X` and reformatted the results as a table for
readability:
| Data Type | NumPy | SciPy | SimSIMD |
| :--- | ---: | ---: | ---: |
| `f32, 1` | 59,114 ops/s | 80,330 ops/s | 475,351 ops/s |
| `f16, 1` | 32,880 ops/s | 82,420 ops/s | 650,177 ops/s |
| `i8, 1` | 47,916 ops/s | 115,084 ops/s | 866,958 ops/s |
| `f32, 10` | 40,135 ops/s | 24,305 ops/s | 185,373 ops/s |
| `f16, 10` | 7,041 ops/s | 17,596 ops/s | 192,058 ops/s |
| `f16, 10` | 21,989 ops/s | 25,064 ops/s | 619,131 ops/s |
| `f32, 100` | 3,536 ops/s | 3,094 ops/s | 24,206 ops/s |
| `f16, 100` | 900 ops/s | 2,014 ops/s | 23,364 ops/s |
| `i8, 100` | 5,510 ops/s | 3,214 ops/s | 143,922 ops/s |
It's important to note that SimSIMD will underperform if both matrices
are huge.
That, however, seems to be an uncommon usage pattern for LangChain
users.
You can find a much more detailed performance report for different
hardware models here:
- [Apple M2
Pro](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-1-performance-on-apple-m2-pro).
- [4th Gen Intel Xeon
Platinum](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-2-performance-on-4th-gen-intel-xeon-platinum-8480).
- [AWS Graviton
3](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-3-performance-on-aws-graviton-3).
## Additional Notes
1. Previous version used `X = np.array(X)`, to repackage lists of lists.
It's an anti-pattern, as it will use double-precision floating-point
numbers, which are slow on both CPUs and GPUs. I have replaced it with
`X = np.array(X, dtype=np.float32)`, but a more selective approach
should be discussed.
2. In numerical computations, it's recommended to explicitly define
tolerance levels, which were previously avoided in
`np.allclose(expected, actual)` calls. For now, I've set absolute
tolerance to distance computation errors as 0.01: `np.allclose(expected,
actual, atol=1e-2)`.
---
- **Dependencies:** adds `simsimd` dependency
- **Tag maintainer:** @hwchase17
- **Twitter handle:** @ashvardanian
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
#### Description
This PR adds the option to specify additional metadata columns in the
CSVLoader beyond just `Source`.
The current CSV loader includes all columns in `page_content` and if we
want to have columns specified for `page_content` and `metadata` we have
to do something like the below.:
```
csv = pd.read_csv(
"path_to_csv"
).to_dict("records")
documents = [
Document(
page_content=doc["content"],
metadata={
"last_modified_by": doc["last_modified_by"],
"point_of_contact": doc["point_of_contact"],
}
) for doc in csv
]
```
#### Usage
Example Usage:
```
csv_test = CSVLoader(
file_path="path_to_csv",
metadata_columns=["last_modified_by", "point_of_contact"]
)
```
Example CSV:
```
content, last_modified_by, point_of_contact
"hello world", "Person A", "Person B"
```
Example Result:
```
Document {
page_content: "hello world"
metadata: {
row: '0',
source: 'path_to_csv',
last_modified_by: 'Person A',
point_of_contact: 'Person B',
}
```
---------
Co-authored-by: Ben Chello <bchello@dropbox.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Fixes the comments in the ConvoOutputParser. Because
the \\\\ is escaping a single \\, they render something like:
`"action_input": string \ The input to the action` in the prompt.
Changing this to \\\\\\\\ lets it escape two slashes so that it renders
a proper comment: `"action_input": string \\ The input to the action`
- **Issue:** N/A
- **Dependencies:**
- **Tag maintainer:** @hwchase17
- **Twitter handle:**
**Description**:
- Added Momento Vector Index (MVI) as a vector store provider. This
includes an implementation with docstrings, integration tests, a
notebook, and documentation on the docs pages.
- Updated the Momento dependency in pyproject.toml and the lock file to
enable access to MVI.
- Refactored the Momento cache and chat history session store to prefer
using "MOMENTO_API_KEY" over "MOMENTO_AUTH_TOKEN" for consistency with
MVI. This change is backwards compatible with the previous "auth_token"
variable usage. Updated the code and tests accordingly.
**Dependencies**:
- Updated Momento dependency in pyproject.toml.
**Testing**:
- Run the integration tests with a Momento API key. Get one at the
[Momento Console](https://console.gomomento.com) for free. MVI is
available in AWS us-west-2 with a superuser key.
- `MOMENTO_API_KEY=<your key> poetry run pytest
tests/integration_tests/vectorstores/test_momento_vector_index.py`
**Tag maintainer:**
@eyurtsev
**Twitter handle**:
Please mention @momentohq for this addition to langchain. With the
integration of Momento Vector Index, Momento caching, and session store,
Momento provides serverless support for the core langchain data needs.
Also mention @mlonml for the integration.
Wraps every callback handler method in error handlers to avoid breaking
users' programs when an error occurs inside the handler.
Thanks @valdo99 for the suggestion 🙂
[The `duckduckgo-search` v3.9.2 was removed from
PyPi](https://pypi.org/project/duckduckgo-search/#history). That breaks
the build.
- **Description:** refreshes the Poetry dependency to v3.9.3
- **Tag maintainer:** @baskaryan
- **Twitter handle:** @ashvardanian
updating query constructor and self query retriever to
- make it easier to pass in examples
- validate attributes used in query
- remove invalid parts of query
- make it easier to get + edit prompt
- make query constructor a runnable
- make self query retriever use as runnable
- keep alias for RunnableMap
- update docs to use RunnableParallel and RunnablePassthrough.assign
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** Add support for a SQLRecordManager in async
environments. It includes the creation of `RecorManagerAsync` abstract
class.
- **Issue:** None
- **Dependencies:** Optional `aiosqlite`.
- **Tag maintainer:** @nfcampos
- **Twitter handle:** @jvelezmagic
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Use `.copy()` to fix the bug that the first `llm_inputs` element is
overwritten by the second `llm_inputs` element in `intermediate_steps`.
***Problem description:***
In [line 127](
c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L127C17-L127C17)),
the `llm_inputs` of the sql generation step is appended as the first
element of `intermediate_steps`:
```
intermediate_steps.append(llm_inputs) # input: sql generation
```
However, `llm_inputs` is a mutable dict, it is updated in [line
179](https://github.com/langchain-ai/langchain/blob/master/libs/experimental/langchain_experimental/sql/base.py#L179)
for the final answer step:
```
llm_inputs["input"] = input_text
```
Then, the updated `llm_inputs` is appended as another element of
`intermediate_steps` in [line
180](c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L180)):
```
intermediate_steps.append(llm_inputs) # input: final answer
```
As a result, the final `intermediate_steps` returned in [line
189](c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L189C43-L189C43))
actually contains two same `llm_inputs` elements, i.e., the `llm_inputs`
for the sql generation step overwritten by the one for final answer step
by mistake. Users are not able to get the actual `llm_inputs` for the
sql generation step from `intermediate_steps`
Simply calling `.copy()` when appending `llm_inputs` to
`intermediate_steps` can solve this problem.
### Description
This pull request involves modifications to the extraction method for
abstracts/summaries within the PubMed utility. A condition has been
added to verify the presence of unlabeled abstracts. Now an abstract
will be extracted even if it does not have a subtitle. In addition, the
extraction of the abstract was extended to books.
### Issue
The PubMed utility occasionally returns an empty result when extracting
abstracts from articles, despite the presence of an abstract for the
paper on PubMed. This issue arises due to the varying structure of
articles; some articles follow a "subtitle/label: text" format, while
others do not include subtitles in their abstracts. An example of the
latter case can be found at:
[https://pubmed.ncbi.nlm.nih.gov/37666905/](url)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Description
SelfQueryRetriever is missing async support, so I am adding it.
I also removed deprecated predict_and_parse method usage here, and added
some tests.
### Issue
N/A
### Tag maintainer
Not yet
### Twitter handle
N/A
**Description**
It is for #10423 that it will be a useful feature if we can extract
images from pdf and recognize text on them. I have implemented it with
`PyPDFLoader`, `PyPDFium2Loader`, `PyPDFDirectoryLoader`,
`PyMuPDFLoader`, `PDFMinerLoader`, and `PDFPlumberLoader`.
[RapidOCR](https://github.com/RapidAI/RapidOCR.git) is used to recognize
text on extracted images. It is time-consuming for ocr so a boolen
parameter `extract_images` is set to control whether to extract and
recognize. I have tested the time usage for each parser on my own laptop
thinkbook 14+ with AMD R7-6800H by unit test and the result is:
| extract_images | PyPDFParser | PDFMinerParser | PyMuPDFParser |
PyPDFium2Parser | PDFPlumberParser |
| ------------- | ------------- | ------------- | ------------- |
------------- | ------------- |
| False | 0.27s | 0.39s | 0.06s | 0.08s | 1.01s |
| True | 17.01s | 20.67s | 20.32s | 19,75s | 20.55s |
**Issue**
#10423
**Dependencies**
rapidocr_onnxruntime in
[RapidOCR](https://github.com/RapidAI/RapidOCR/tree/main)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: The previous version of the MarkdownHeaderTextSplitter
did not take into account the possibility of '#' appearing within code
blocks, which caused segmentation anomalies in these situations. This PR
has fixed this issue.
- Issue:
- Dependencies: No
- Tag maintainer:
- Twitter handle:
cc @baskaryan @eyurtsev @rlancemartin
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: This PR adds a new chain `rl_chain.PickBest` for learned
prompt variable injection, detailed description and usage can be found
in the example notebook added. It essentially adds a
[VowpalWabbit](https://github.com/VowpalWabbit/vowpal_wabbit) layer
before the llm call in order to learn or personalize prompt variable
selections.
Most of the code is to make the API simple and provide lots of defaults
and data wrangling that is needed to use Vowpal Wabbit, so that the user
of the chain doesn't have to worry about it.
- Dependencies:
[vowpal-wabbit-next](https://pypi.org/project/vowpal-wabbit-next/),
- sentence-transformers (already a dep)
- numpy (already a dep)
- tagging @ataymano who contributed to this chain
- Tag maintainer: @baskaryan
- Twitter handle: @olgavrou
Added example notebook and unit tests
Replace this entire comment with:
- **Description:** minor update to constructor to allow for
specification of "source"
- **Tag maintainer:** @baskaryan
- **Twitter handle:** @ofermend
# Description
Attempts to fix RedisCache for ChatGenerations using `loads` and `dumps`
used in SQLAlchemy cache by @hwchase17 . this is better than pickle
dump, because this won't execute any arbitrary code during
de-serialisation.
# Issues
#7722 & #8666
# Dependencies
None, but removes the warning introduced in #8041 by @baskaryan
Handle: @jaikanthjay46
- Description: Updated output parser for mrkl to remove any
hallucination actions after the final answer; this was encountered when
using Anthropic claude v2 for planning; reopening PR with updated unit
tests
- Issue: #10278
- Dependencies: N/A
- Twitter handle: @johnreynolds
Description: this PR changes the `ArcGISLoader` to set
`return_all_records` to `False` when `result_record_count` is provided
as a keyword argument. Previously, `return_all_records` was `True` by
default and this made the API ignore `result_record_count`.
Issue: `ArcGISLoader` would ignore `result_record_count` unless user
also passed `return_all_records=False`.
- **Description:** Fix the `PyMuPDFLoader` to accept `loader_kwargs`
from the document loader's `loader_kwargs` option. This provides more
flexibility in formatting the output from documents.
- **Issue:** The `loader_kwargs` is not passed into the `load` method
from the document loader, which limits configuration options.
- **Dependencies:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
I have restructured the code to ensure uniform handling of ImportError.
In place of previously used ValueError, I've adopted the standard
practice of raising ImportError with explanatory messages. This
modification enhances code readability and clarifies that any problems
stem from module importation.
### Description
Add instance anonymization - if `John Doe` will appear twice in the
text, it will be treated as the same entity.
The difference between `PresidioAnonymizer` and
`PresidioReversibleAnonymizer` is that only the second one has a
built-in memory, so it will remember anonymization mapping for multiple
texts:
```
>>> anonymizer = PresidioAnonymizer()
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Brett Russell. Hi Brett Russell!'
```
```
>>> anonymizer = PresidioReversibleAnonymizer()
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
```
### Twitter handle
@deepsense_ai / @MaksOpp
### Tag maintainer
@baskaryan @hwchase17 @hinthornw
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
PyPDF does not chunk at the character level to my understanding.
Description: PyPDF does not chunk at the character level, but instead
breaks up content by page. Fixup comment
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: There are cases when the output from the LLM comes fine
(i.e. function_call["arguments"] is a valid JSON object), but it does
not contain the key "actions". So I split the validation in 2 steps:
loading arguments as JSON and then checking for "actions" in it.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: Google Cloud Enterprise Search was renamed to Vertex AI
Search
-
https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-search-and-conversation-is-now-generally-available
- This PR updates the documentation and Retriever class to use the new
terminology.
- Changed retriever class from `GoogleCloudEnterpriseSearchRetriever` to
`GoogleVertexAISearchRetriever`
- Updated documentation to specify that `extractive_segments` requires
the new [Enterprise
edition](https://cloud.google.com/generative-ai-app-builder/docs/about-advanced-features#enterprise-features)
to be enabled.
- Fixed spelling errors in documentation.
- Change parameter for Retriever from `search_engine_id` to
`data_store_id`
- When this retriever was originally implemented, there was no
distinction between a data store and search engine, but now these have
been split.
- Fixed an issue blocking some users where the api_endpoint can't be set
### Description
When using Weaviate Self-Retrievers, certain common filter comparators
generated by user queries were unimplemented, resulting in errors. This
PR implements some of them. All linting and format commands have been
run and tests passed.
### Issue
#10474
### Dependencies
timestamp module
---------
Co-authored-by: Patrick Randell <prandell@deloitte.com.au>
**Description:** Previously if the access to Azure Cognitive Search was
not done via an API key, the default credential was called which doesn't
allow to use an interactive login. I simply added the option to use
"INTERACTIVE" as a key name, and this will launch a login window upon
initialization of the AzureSearch object.
I was hoping this would pick up numpy 1.26, which is required to support
the new Python 3.12 release, but it didn't. It seems that some
transitive dependency requirement on numpy is preventing that, and the
highest we can currently go is 1.24.x.
But to find this out required a 15min `poetry lock`, so I figured we
might as well upgrade the dependencies we can and hopefully make the
next dependency upgrade a bit smaller.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Consolidating to a single README for now, will be easier to maintain we
can differentiate between poetry and pip later. Does not seem critical.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
First version of CLI command to create a new langchain project template
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
## Description
Currently SQLAlchemy >=1.4.0 is a hard requirement. We are unable to run
`from langchain.vectorstores import FAISS` with SQLAlchemy <1.4.0 due to
top-level imports, even if we aren't even using parts of the library
that use SQLAlchemy. See Testing section for repro. Let's make it so
that langchain is still compatible with SQLAlchemy <1.4.0, especially if
we aren't using parts of langchain that require it.
The main conflict is that SQLAlchemy removed `declarative_base` from
`sqlalchemy.ext.declarative` in 1.4.0 and moved it to `sqlalchemy.orm`.
We can fix this by try-catching the import. This is the same fix as
applied in https://github.com/langchain-ai/langchain/pull/883.
(I see that there seems to be some refactoring going on about isolating
dependencies, e.g.
c87e9fb2ce,
so if this issue will be eventually fixed by isolating imports in
langchain.vectorstores that also works).
## Issue
I can't find a matching issue.
## Dependencies
No additional dependencies
## Maintainer
@hwchase17 since you reviewed
https://github.com/langchain-ai/langchain/pull/883
## Testing
I didn't add a test, but I manually tested this.
1. Current failure:
```
langchain==0.0.305
sqlalchemy==1.3.24
```
``` python
python -i
>>> from langchain.vectorstores import FAISS
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/pay/src/zoolander/vendor3/lib/python3.8/site-packages/langchain/vectorstores/__init__.py", line 58, in <module>
from langchain.vectorstores.pgembedding import PGEmbedding
File "/pay/src/zoolander/vendor3/lib/python3.8/site-packages/langchain/vectorstores/pgembedding.py", line 10, in <module>
from sqlalchemy.orm import Session, declarative_base, relationship
ImportError: cannot import name 'declarative_base' from 'sqlalchemy.orm' (/pay/src/zoolander/vendor3/lib/python3.8/site-packages/sqlalchemy/orm/__init__.py)
```
2. This fix:
```
langchain==<this PR>
sqlalchemy==1.3.24
```
``` python
python -i
>>> from langchain.vectorstores import FAISS
<succeeds>
```
- Make logs a dictionary keyed by run name (and counter for repeats)
- Ensure no output shows up in lc_serializable format
- Fix up repr for RunLog and RunLogPatch
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- default MessagesPlaceholder one to list of messages
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Removes human prompt prefix before system message for anthropic models
Bedrock anthropic api enforces that Human and Assistant messages must be
interleaved (cannot have same type twice in a row). We currently treat
System Messages as human messages when converting messages -> string
prompt. Our validation when using Bedrock/BedrockChat raises an error
when this happens. For ChatAnthropic we don't validate this so no error
is raised, but perhaps the behavior is still suboptimal
**Description:**
Added support for Cohere command model via Bedrock.
With this change it is now possible to use the `cohere.command-text-v14`
model via Bedrock API.
About Streaming: Cohere model outputs 2 additional chunks at the end of
the text being generated via streaming: a chunk containing the text
`<EOS_TOKEN>`, and a chunk indicating the end of the stream. In this
implementation I chose to ignore both chunks. An alternative solution
could be to replace `<EOS_TOKEN>` with `\n`
Tests: manually tested that the new model work with both
`llm.generate()` and `llm.stream()`.
Tested with `temperature`, `p` and `stop` parameters.
**Issue:** #11181
**Dependencies:** No new dependencies
**Tag maintainer:** @baskaryan
**Twitter handle:** mangelino
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Description: Similar in concept to the `MarkdownHeaderTextSplitter`, the
`HTMLHeaderTextSplitter` is a "structure-aware" chunker that splits text
at the element level and adds metadata for each header "relevant" to any
given chunk. It can return chunks element by element or combine elements
with the same metadata, with the objectives of (a) keeping related text
grouped (more or less) semantically and (b) preserving context-rich
information encoded in document structures. It can be used with other
text splitters as part of a chunking pipeline.
Dependency: lxml python package
Maintainer: @hwchase17
Twitter handle: @MartinZirulnik
---------
Co-authored-by: PresidioVantage <github@presidiovantage.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
I've refactored the code to ensure that ImportError is consistently
handled. Instead of using ValueError as before, I've now followed the
standard practice of raising ImportError along with clear and
informative error messages. This change enhances the code's clarity and
explicitly signifies that any problems are associated with module
imports.
Add device to GPT4All
- **Description:** GPT4All now supports GPU. This commit adds the option
to enable it.
- **Issue:** It closes
https://github.com/langchain-ai/langchain/issues/10486
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** Adds Kotlin language to `TextSplitter`
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
For external libraries that depend on `type_to_cls_dict`, adds a
workaround to continue using the old format.
Recommend people use `get_type_to_cls_dict()` instead and only resolve
the imports when they're used.
- **Description:** use term keyword according to the official python doc
glossary, see https://docs.python.org/3/glossary.html
- **Issue:** not applicable
- **Dependencies:** not applicable
- **Tag maintainer:** @hwchase17
- **Twitter handle:** vreyespue
The previous API of the `_execute()` function had a few rough edges that
this PR addresses:
- The `fetch` argument was type-hinted as being able to take any string,
but any string other than `"all"` or `"one"` would `raise ValueError`.
The new type hints explicitly declare that only those values are
supported.
- The return type was type-hinted as `Sequence` but using `fetch =
"one"` would actually return a single result item. This was incorrectly
suppressed using `# type: ignore`. We now always return a list.
- Using `fetch = "one"` would return a single item if data was found, or
an empty *list* if no data was found. This was confusing, and we now
always return a list to simplify.
- The return type was `Sequence[Any]` which was a bit difficult to use
since it wasn't clear what one could do with the returned rows. I'm
making the new type `Dict[str, Any]` that corresponds to the column
names and their values in the query.
I've updated the use of this method elsewhere in the file to match the
new behavior.
continuation of PR #8550
@hwchase17 please see and merge. And also close the PR #8550.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>