This PR updates the `TF-IDF.ipynb` documentation to reflect the new
import path for TFIDFRetriever in the langchain-community package. The
previous path, `from langchain.retrievers import TFIDFRetriever`, has
been updated to `from langchain_community.retrievers import
TFIDFRetriever` to align with the latest changes in the langchain
library.
according to https://youtu.be/rZus0JtRqXE?si=aFo1JTDnu5kSEiEN&t=678 by
@efriis
- **Description:** Seems the requirements for tool names have changed
and spaces are no longer allowed. Changed the tool name from Google
Search to google_search in the notebook
- **Issue:** n/a
- **Dependencies:** none
- **Twitter handle:** @mesirii
**Description**
Make some functions work with Milvus:
1. get_ids: Get primary keys by field in the metadata
2. delete: Delete one or more entities by ids
3. upsert: Update/Insert one or more entities
**Issue**
None
**Dependencies**
None
**Tag maintainer:**
@hwchase17
**Twitter handle:**
None
---------
Co-authored-by: HoaNQ9 <hoanq.1811@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
## Summary
This PR upgrades LangChain's Ruff configuration in preparation for
Ruff's v0.2.0 release. (The changes are compatible with Ruff v0.1.5,
which LangChain uses today.) Specifically, we're now warning when
linter-only options are specified under `[tool.ruff]` instead of
`[tool.ruff.lint]`.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Issue:** Issue with model argument support (been there for a while
actually):
- Non-specially-handled arguments like temperature don't work when
passed through constructor.
- Such arguments DO work quite well with `bind`, but also do not abide
by field requirements.
- Since initial push, server-side error messages have gotten better and
v0.0.2 raises better exceptions. So maybe it's better to let server-side
handle such issues?
- **Description:**
- Removed ChatNVIDIA's argument fields in favor of
`model_kwargs`/`model_kws` arguments which aggregates constructor kwargs
(from constructor pathway) and merges them with call kwargs (bind
pathway).
- Shuffled a few functions from `_NVIDIAClient` to `ChatNVIDIA` to
streamline construction for future integrations.
- Minor/Optional: Old services didn't have stop support, so client-side
stopping was implemented. Now do both.
- **Any Breaking Changes:** Minor breaking changes if you strongly rely
on chat_model.temperature, etc. This is captured by
chat_model.model_kwargs.
PR passes tests and example notebooks and example testing. Still gonna
chat with some people, so leaving as draft for now.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
The Integrations `Toolkits` menu was named as [`Agents and
toolkits`](https://python.langchain.com/docs/integrations/toolkits).
This name has a historical reason that is not correct anymore. Now this
menu is all about community `Toolkits`. There is a separate menu for
[Agents](https://python.langchain.com/docs/modules/agents/). Also Agents
are officially not part of Integrations (Community package) but part of
LangChain package.
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description: changes to you.com files**
- general cleanup
- adds community/utilities/you.py, moving bulk of code from retriever ->
utility
- removes `snippet` as endpoint
- adds `news` as endpoint
- adds more tests
<s>**Description: update community MAKE file**
- adds `integration_tests`
- adds `coverage`</s>
- **Issue:** the issue # it fixes if applicable,
- [For New Contributors: Update Integration
Documentation](https://github.com/langchain-ai/langchain/issues/15664#issuecomment-1920099868)
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** This adds a recursive json splitter class to the
existing text_splitters as well as unit tests
- **Issue:** splitting text from structured data can cause issues if you
have a large nested json object and you split it as regular text you may
end up losing the structure of the json. To mitigate against this you
can split the nested json into large chunks and overlap them, but this
causes unnecessary text processing and there will still be times where
the nested json is so big that the chunks get separated from the parent
keys.
As an example you wouldn't want the following to be split in half:
```shell
{'val0': 'DFWeNdWhapbR',
'val1': {'val10': 'QdJo',
'val11': 'FWSDVFHClW',
'val12': 'bkVnXMMlTiQh',
'val13': 'tdDMKRrOY',
'val14': 'zybPALvL',
'val15': 'JMzGMNH',
'val16': {'val160': 'qLuLKusFw',
'val161': 'DGuotLh',
'val162': 'KztlcSBropT',
-----------------------------------------------------------------------split-----
'val163': 'YlHHDrN',
'val164': 'CtzsxlGBZKf',
'val165': 'bXzhcrWLmBFp',
'val166': 'zZAqC',
'val167': 'ZtyWno',
'val168': 'nQQZRsLnaBhb',
'val169': 'gSpMbJwA'},
'val17': 'JhgiyF',
'val18': 'aJaqjUSFFrI',
'val19': 'glqNSvoyxdg'}}
```
Any llm processing the second chunk of text may not have the context of
val1, and val16 reducing accuracy. Embeddings will also lack this
context and this makes retrieval less accurate.
Instead you want it to be split into chunks that retain the json
structure.
```shell
{'val0': 'DFWeNdWhapbR',
'val1': {'val10': 'QdJo',
'val11': 'FWSDVFHClW',
'val12': 'bkVnXMMlTiQh',
'val13': 'tdDMKRrOY',
'val14': 'zybPALvL',
'val15': 'JMzGMNH',
'val16': {'val160': 'qLuLKusFw',
'val161': 'DGuotLh',
'val162': 'KztlcSBropT',
'val163': 'YlHHDrN',
'val164': 'CtzsxlGBZKf'}}}
```
and
```shell
{'val1':{'val16':{
'val165': 'bXzhcrWLmBFp',
'val166': 'zZAqC',
'val167': 'ZtyWno',
'val168': 'nQQZRsLnaBhb',
'val169': 'gSpMbJwA'},
'val17': 'JhgiyF',
'val18': 'aJaqjUSFFrI',
'val19': 'glqNSvoyxdg'}}
```
This recursive json text splitter does this. Values that contain a list
can be converted to dict first by using split(... convert_lists=True)
otherwise long lists will not be split and you may end up with chunks
larger than the max chunk.
In my testing large json objects could be split into small chunks with
✅ Increased question answering accuracy
✅ The ability to split into smaller chunks meant retrieval queries can
use fewer tokens
- **Dependencies:** json import added to text_splitter.py, and random
added to the unit test
- **Twitter handle:** @joelsprunger
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:**: Fix 422 error in example with LangServe client code
httpx.HTTPStatusError: Client error '422 Unprocessable Entity' for url
'http://localhost:8000/agent/invoke'
- **Description:** Fixes in the Ontotext GraphDB Graph and QA Chain
related to the error handling in case of invalid SPARQL queries, for
which `prepareQuery` doesn't throw an exception, but the server returns
400 and the query is indeed invalid
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** @OntotextGraphDB
Ran
```python
import glob
import re
def update_prompt(x):
return re.sub(
r"(?P<start>\b)PromptTemplate\(template=(?P<template>.*), input_variables=(?:.*)\)",
"\g<start>PromptTemplate.from_template(\g<template>)",
x
)
for fn in glob.glob("docs/**/*", recursive=True):
try:
content = open(fn).readlines()
except:
continue
content = [update_prompt(l) for l in content]
with open(fn, "w") as f:
f.write("".join(content))
```
Replace this entire comment with:
- **Description:** Added missing link for Quickstart in Model IO
documentation,
- **Issue:** N/A,
- **Dependencies:** N/A,
- **Twitter handle:** N/A
<!--
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Several notebooks have Title != file name. That results in corrupted
sorting in Navbar (ToC).
- Fixed titles and file names.
- Changed text formats to the consistent form
- Redirected renamed files in the `Vercel.json`
This PR is opinionated.
- Moved `Embedding models` item to place after `LLMs` and `Chat model`,
so all items with models are together.
- Renamed `Text embedding models` to `Embedding models`. Now, it is
shorter and easier to read. `Text` is obvious from context. The same as
the `Text LLMs` vs. `LLMs` (we also have multi-modal LLMs).
The `Partner libs` menu is not sorted. Now it is long enough, and items
should be sorted to simplify a package search.
- Sorted items in the `Partner libs` menu
### Description
support load any github file content based on file extension.
Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.
### Twitter handle
my twitter: @shufanhaotop
---------
Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Link to the Brave Website added to the
`brave-search.ipynb` notebook.
This notebook is shown in the docs as an example for the brave tool.
**Issue:** There was to reference on where / how to get an api key
**Dependencies:** none
**Twitter handle:** not for this one :)
- **Description:** docs: update StreamlitCallbackHandler example.
- **Issue:** None
- **Dependencies:** None
I have updated the example for StreamlitCallbackHandler in the
documentation bellow.
https://python.langchain.com/docs/integrations/callbacks/streamlit
Previously, the example used `initialize_agent`, which has been
deprecated, so I've updated it to use `create_react_agent` instead. Many
langchain users are likely searching examples of combining
`create_react_agent` or `openai_tools_agent_chain` with
StreamlitCallbackHandler. I'm sure this update will be really helpful
for them!
Unfortunately, writing unit tests for this example is difficult, so I
have not written any tests. I have run this code in a standalone Python
script file and ensured it runs correctly.
- **Description:** "load HTML **form** web URLs" should be "load HTML
**from** web URLs"? 🤔
- **Issue:** Typo
- **Dependencies:** Nope
- **Twitter handle:** n0vad3v
- **Description:** Adds an additional class variable to `BedrockBase`
called `provider` that allows sending a model provider such as amazon,
cohere, ai21, etc.
Up until now, the model provider is extracted from the `model_id` using
the first part before the `.`, such as `amazon` for
`amazon.titan-text-express-v1` (see [supported list of Bedrock model IDs
here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html)).
But for custom Bedrock models where the ARN of the provisioned
throughput must be supplied, the `model_id` is like
`arn:aws:bedrock:...` so the `model_id` cannot be extracted from this. A
model `provider` is required by the LangChain Bedrock class to perform
model-based processing. To allow the same processing to be performed for
custom-models of a specific base model type, passing this `provider`
argument can help solve the issues.
The alternative considered here was the use of
`provider.arn:aws:bedrock:...` which then requires ARN to be extracted
and passed separately when invoking the model. The proposed solution
here is simpler and also does not cause issues for current models
already using the Bedrock class.
- **Issue:** N/A
- **Dependencies:** N/A
---------
Co-authored-by: Piyush Jain <piyushjain@duck.com>
- **Description:** Several meta/usability updates, including User-Agent.
- **Issue:**
- User-Agent metadata for tracking connector engagement. @milesial
please check and advise.
- Better error messages. Tries harder to find a request ID. @milesial
requested.
- Client-side image resizing for multimodal models. Hope to upgrade to
Assets API solution in around a month.
- `client.payload_fn` allows you to modify payload before network
request. Use-case shown in doc notebook for kosmos_2.
- `client.last_inputs` put back in to allow for advanced
support/debugging.
- **Dependencies:**
- Attempts to pull in PIL for image resizing. If not installed, prints
out "please install" message, warns it might fail, and then tries
without resizing. We are waiting on a more permanent solution.
For LC viz: @hinthornw
For NV viz: @fciannella @milesial @vinaybagade
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** Updating one line code sample for Ollama with new
**langchain_community** package
- **Issue:**
- **Dependencies:** none
- **Twitter handle:** @picsoung