Commit Graph

2968 Commits

Author SHA1 Message Date
joelsprunger
3984f6604f
langchain: adds recursive json splitter (#17144)
- **Description:** This adds a recursive json splitter class to the
existing text_splitters as well as unit tests
- **Issue:** splitting text from structured data can cause issues if you
have a large nested json object and you split it as regular text you may
end up losing the structure of the json. To mitigate against this you
can split the nested json into large chunks and overlap them, but this
causes unnecessary text processing and there will still be times where
the nested json is so big that the chunks get separated from the parent
keys.

As an example you wouldn't want the following to be split in half:
```shell
{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
-----------------------------------------------------------------------split-----
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf',
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}
```
Any llm processing the second chunk of text may not have the context of
val1, and val16 reducing accuracy. Embeddings will also lack this
context and this makes retrieval less accurate.

Instead you want it to be split into chunks that retain the json
structure.
```shell
{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf'}}}
```
and
```shell
{'val1':{'val16':{
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}
```
This recursive json text splitter does this. Values that contain a list
can be converted to dict first by using split(... convert_lists=True)
otherwise long lists will not be split and you may end up with chunks
larger than the max chunk.

In my testing large json objects could be split into small chunks with 
   Increased question answering accuracy
 The ability to split into smaller chunks meant retrieval queries can
use fewer tokens


- **Dependencies:** json import added to text_splitter.py, and random
added to the unit test
  - **Twitter handle:** @joelsprunger

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-08 13:45:34 -08:00
Schalkje
f0ada1a396
docs: Update quickstart.mdx - Fix 422 error in example with LangServe client code (#17163)
**Description:**: Fix 422 error in example with LangServe client code

httpx.HTTPStatusError: Client error '422 Unprocessable Entity' for url
'http://localhost:8000/agent/invoke'
2024-02-08 13:35:39 -08:00
Kartheek Yakkala
3a22157d92
docs: Added LCEL for alibabacloud and anyscale (#17252)
---------

Co-authored-by: KARTHEEK YAKKALA <kartheekyakkala@KARTHEEKs-Air.lan>
Co-authored-by: KARTHEEK YAKKALA <kartheekyakkala.se@gmail.com>
2024-02-08 13:18:09 -08:00
Neli Hateva
9bb5157a3d
langchain[patch], community[patch]: Fixes in the Ontotext GraphDB Graph and QA Chain (#17239)
- **Description:** Fixes in the Ontotext GraphDB Graph and QA Chain
related to the error handling in case of invalid SPARQL queries, for
which `prepareQuery` doesn't throw an exception, but the server returns
400 and the query is indeed invalid
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** @OntotextGraphDB
2024-02-08 12:05:43 -08:00
Jorge Campo
88609565a3
docs: Fix typo in github.ipynb (#17259)
'agiven' -> 'a given'
2024-02-08 12:03:00 -08:00
Bagatur
00a09e1b71
docs: use PromptTemplate.from_template (#17218)
Ran
```python
import glob
import re

def update_prompt(x):
    return re.sub(
        r"(?P<start>\b)PromptTemplate\(template=(?P<template>.*), input_variables=(?:.*)\)",
        "\g<start>PromptTemplate.from_template(\g<template>)",
        x
    )


for fn in glob.glob("docs/**/*", recursive=True):
    try:
        content = open(fn).readlines()
    except:
        continue
    content = [update_prompt(l) for l in content]
    with open(fn, "w") as f:
        f.write("".join(content))
```
2024-02-07 19:52:42 -08:00
sana-google
7f55c95790
docs: add missing link to Quickstart (#17085)
Replace this entire comment with:
- **Description:** Added missing link for Quickstart in Model IO
documentation,
  - **Issue:** N/A,
  - **Dependencies:** N/A,
  - **Twitter handle:** N/A

<!--
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-02-07 22:26:10 -05:00
Eugene Yurtsev
780e84ae79
community[minor]: SQLDatabase Add fetch mode cursor, query parameters, query by selectable, expose execution options, and documentation (#17191)
- **Description:** Improve `SQLDatabase` adapter component to promote
code re-use, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16246#pullrequestreview-1846590962).
  - **Needed by:** GH-16246
  - **Addressed to:** @baskaryan, @cbornet 

## Details
- Add `cursor` fetch mode
- Accept SQL query parameters
- Accept both `str` and SQLAlchemy selectables as query expression
- Expose `execution_options`
- Documentation page (notebook) about `SQLDatabase` [^1]
See [About
SQLDatabase](https://github.com/langchain-ai/langchain/blob/c1c7b763/docs/docs/integrations/tools/sql_database.ipynb).

[^1]: Apparently there hasn't been any yet?

---------

Co-authored-by: Andreas Motl <andreas.motl@crate.io>
2024-02-07 22:23:43 -05:00
Leonid Ganeline
d903fa313e
docs: titles fix (#17206)
Several notebooks have Title != file name. That results in corrupted
sorting in Navbar (ToC).
- Fixed titles and file names.
- Changed text formats to the consistent form
- Redirected renamed files in the `Vercel.json`
2024-02-07 22:09:34 -05:00
Bagatur
aeb6b38901
docs: cleanup fleet integration (#17214)
Causing search issues
2024-02-07 17:18:48 -08:00
Leonid Ganeline
5ceaf784f3
docs Integraions/Components menu reordered (#17151)
This PR is opinionated.
- Moved `Embedding models` item to place after `LLMs` and `Chat model`,
so all items with models are together.
- Renamed `Text embedding models` to `Embedding models`. Now, it is
shorter and easier to read. `Text` is obvious from context. The same as
the `Text LLMs` vs. `LLMs` (we also have multi-modal LLMs).
2024-02-06 20:33:41 -08:00
Leonid Ganeline
0af0fc5d25
docs integraions/providers nav fix (#17148)
Issue: `Provides` page is presented as the index page (on the
`Providers` item) and as the `Providers/Providers` item. The latter
should not be in the menu. See the picture.

![image](https://github.com/langchain-ai/langchain/assets/2256422/6894023f-f13a-4f0d-8fe2-ed5b0ae2bdd2)
This PR fixes this.
2024-02-06 20:33:14 -08:00
Leonid Ganeline
bf55279d39
docs: tutorials update (#17132)
Added the course and the one-pager links
2024-02-06 20:30:30 -08:00
Erick Friis
d397721a34
docs: format (#17143) 2024-02-06 16:32:53 -08:00
Arno Schutijzer
863f96b2e0
docs: fix typo in ollama notebook (#17127)
- **Description:** typo fix in ollama notebook
2024-02-06 16:54:40 -05:00
Leonid Ganeline
42c812a549
API References sorted Partner libs menu (#17130)
The `Partner libs` menu is not sorted. Now it is long enough, and items
should be sorted to simplify a package search.
- Sorted items in the `Partner libs` menu
2024-02-06 16:49:23 -05:00
Junyoung Park
1ed73f1992
community[minor]: Add SelfQueryRetriever support to PGVector (#16991)
- **Description:** Add SelfQueryRetriever support to PGVector
  - **Issue:** -
  - **Dependencies:** -
  - **Twitter handle:** -

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-06 10:50:50 -08:00
Frank
ef082c77b1
community[minor]: add github file loader to load any github file content b… (#15305)
### Description
support load any github file content based on file extension.  

Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.

### Twitter handle
my twitter: @shufanhaotop

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-06 09:42:33 -08:00
老阿張
ac662b3698
docs: Fix typo in amadeus.ipynb (#16916)
Description: "enviornment should be  environment"? 🤔
Issue: Typo
Dependencies: Nope
Twitter handle: laoazhang
2024-02-06 09:42:05 -08:00
Jan de Boer
2d8015554c
docs: Link to Brave Website added (#16958)
**Description:** Link to the Brave Website added to the
`brave-search.ipynb` notebook.
This notebook is shown in the docs as an example for the brave tool.

**Issue:** There was to reference on where / how to get an api key
 
**Dependencies:** none
 
**Twitter handle:** not for this one :)
2024-02-05 18:29:16 -08:00
os1ma
fd88e0f800
docs: update StreamlitCallbackHandler example (#16970)
- **Description:** docs: update StreamlitCallbackHandler example.
  - **Issue:** None
  - **Dependencies:** None

I have updated the example for StreamlitCallbackHandler in the
documentation bellow.
https://python.langchain.com/docs/integrations/callbacks/streamlit

Previously, the example used `initialize_agent`, which has been
deprecated, so I've updated it to use `create_react_agent` instead. Many
langchain users are likely searching examples of combining
`create_react_agent` or `openai_tools_agent_chain` with
StreamlitCallbackHandler. I'm sure this update will be really helpful
for them!

Unfortunately, writing unit tests for this example is difficult, so I
have not written any tests. I have run this code in a standalone Python
script file and ensured it runs correctly.
2024-02-05 18:20:59 -08:00
Marc Mahe
f08a9139d2
docs: update mistral docs for version 0.1+ (#17011)
**Description:**
Updated integration page for mistralai.
2024-02-05 18:03:12 -08:00
Ikko Eltociear Ashimine
5f5f5acbc5
docs: fix typo in dspy.ipynb (#16996)
langugage -> language
2024-02-05 17:31:06 -08:00
Eugene Yurtsev
609ea019b2
docs: Update streaming documentation (#17066)
Updating streaming documentation following fix of JSON parser for
streaming json.
2024-02-05 17:24:46 -08:00
Bagatur
d8f41d0521
docs: add youtube link (#17065) 2024-02-05 16:12:56 -08:00
Harrison Chase
83fbf0e11a
docs: add structured tools howto to agents (#15772)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-05 15:53:01 -08:00
Alex Boury
334b6ebdf3
community[minor]: Breebs docs retriever (#16578)
- **Description:** Implementation of breeb retriever with integration
tests ->
libs/community/tests/integration_tests/retrievers/test_breebs.py and
documentation (notebook) ->
docs/docs/integrations/retrievers/breebs.ipynb.
  - **Dependencies:** None
2024-02-05 15:51:08 -08:00
Nova Kwok
eb7b05885f
docs: Fix typo in quickstart.ipynb (#16859)
- **Description:** "load HTML **form** web URLs" should be "load HTML
**from** web URLs"? 🤔
  - **Issue:** Typo
  - **Dependencies:** Nope
  - **Twitter handle:** n0vad3v
2024-02-05 15:50:11 -08:00
Shorthills AI
cf0b29b6d2
docs: fixing a minor grammatical mistake (#16931) 2024-02-05 15:49:47 -08:00
Shivani Modi
fcb875629d
docs: Updating documentation for Konko provider (#16953)
- **Description:** A small update to the Konko provider documentation.

---------

Co-authored-by: Shivani Modi <shivanimodi@Shivanis-MacBook-Pro.local>
2024-02-05 15:49:13 -08:00
Benjamin Muskalla
973ba0d84b
docs: Fix Copilot name (#16956)
The official name is "GitHub Copilot"
2024-02-05 15:48:47 -08:00
IMRAN KHAN
4b17699818
docs: add 2 more tutorials to the list in youtube.mdx (#16998)
- **Description:** add 2 more tutorials to the list in youtube.mdx, 
  - **Twitter handle:** EhThing
2024-02-05 15:48:34 -08:00
Supreet Takkar
ae33979813
community[patch]: Allow adding ARNs as model_id to support Amazon Bedrock custom models (#16800)
- **Description:** Adds an additional class variable to `BedrockBase`
called `provider` that allows sending a model provider such as amazon,
cohere, ai21, etc.
Up until now, the model provider is extracted from the `model_id` using
the first part before the `.`, such as `amazon` for
`amazon.titan-text-express-v1` (see [supported list of Bedrock model IDs
here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html)).
But for custom Bedrock models where the ARN of the provisioned
throughput must be supplied, the `model_id` is like
`arn:aws:bedrock:...` so the `model_id` cannot be extracted from this. A
model `provider` is required by the LangChain Bedrock class to perform
model-based processing. To allow the same processing to be performed for
custom-models of a specific base model type, passing this `provider`
argument can help solve the issues.
The alternative considered here was the use of
`provider.arn:aws:bedrock:...` which then requires ARN to be extracted
and passed separately when invoking the model. The proposed solution
here is simpler and also does not cause issues for current models
already using the Bedrock class.
  - **Issue:** N/A
  - **Dependencies:** N/A

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
2024-02-05 14:28:03 -08:00
Vadim Kudlay
75b6fa1134
nvidia-ai-endpoints[patch]: Support User-Agent metadata and minor fixes. (#16942)
- **Description:** Several meta/usability updates, including User-Agent.
  - **Issue:** 
- User-Agent metadata for tracking connector engagement. @milesial
please check and advise.
- Better error messages. Tries harder to find a request ID. @milesial
requested.
- Client-side image resizing for multimodal models. Hope to upgrade to
Assets API solution in around a month.
- `client.payload_fn` allows you to modify payload before network
request. Use-case shown in doc notebook for kosmos_2.
- `client.last_inputs` put back in to allow for advanced
support/debugging.
  - **Dependencies:** 
- Attempts to pull in PIL for image resizing. If not installed, prints
out "please install" message, warns it might fail, and then tries
without resizing. We are waiting on a more permanent solution.

For LC viz: @hinthornw 
For NV viz: @fciannella @milesial @vinaybagade

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-05 12:24:53 -08:00
Erick Friis
6ffd5b15bc
pinecone: init pkg (#16556)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-05 11:55:01 -08:00
Erick Friis
db6af21395
docs: exa contents (#16555) 2024-02-05 11:15:06 -08:00
Nicolas Grenié
54fcd476bb
docs: Update ollama examples with new community libraries (#17007)
- **Description:** Updating one line code sample for Ollama with new
**langchain_community** package
  - **Issue:**
  - **Dependencies:** none
  - **Twitter handle:**  @picsoung
2024-02-04 15:13:29 -08:00
Erick Friis
afdd636999
docs: partner packages (#16960) 2024-02-02 15:12:21 -08:00
Ashley Xu
66adb95284
docs: BigQuery Vector Search went public review and updated docs (#16896)
Update the docs for BigQuery Vector Search
2024-02-02 10:26:44 -08:00
Massimiliano Pronesti
71f9ea33b6
docs: add quantization to vllm and update API (#16950)
- **Description:** Update vLLM docs to include instructions on how to
use quantized models, as well as to replace the deprecated methods.
2024-02-02 10:24:49 -08:00
Radhakrishnan
3b0fa9079d
docs: Updated integration doc for aleph alpha (#16844)
Description: Updated doc for llm/aleph_alpha with new functions: invoke.
Changed structure of the document to match the required one.
Issue: https://github.com/langchain-ai/langchain/issues/15664
Dependencies: None
Twitter handle: None

---------

Co-authored-by: Radhakrishnan Iyer <radhakrishnan.iyer@ibm.com>
2024-02-02 09:28:06 -08:00
Erick Friis
6fc2835255
docs: fix broken links (#16855) 2024-02-01 17:29:38 -08:00
Erick Friis
b1a847366c
community: revert SQL Stores (#16912)
This reverts commit cfc225ecb3.


https://github.com/langchain-ai/langchain/pull/15909#issuecomment-1922418097

These will have existed in langchain-community 0.0.16 and 0.0.17.
2024-02-01 16:37:40 -08:00
akira wu
f7c709b40e
doc: fix typo in message_history.ipynb (#16877)
- **Description:** just fixed a small typo in the documentation in the
`expression_language/how_to/message_history` session
[here](https://python.langchain.com/docs/expression_language/how_to/message_history)
2024-02-01 13:30:29 -08:00
Shorthills AI
0bca0f4c24
Docs: Fixed grammatical mistake (#16858)
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: Sanskar Tanwar <142409040+SanskarTanwarShorthillsAI@users.noreply.github.com>
Co-authored-by: UpneetShorthillsAI <144228282+UpneetShorthillsAI@users.noreply.github.com>
Co-authored-by: HarshGuptaShorthillsAI <144897987+HarshGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: AdityaKalraShorthillsAI <143726711+AdityaKalraShorthillsAI@users.noreply.github.com>
Co-authored-by: SakshiShorthillsAI <144228183+SakshiShorthillsAI@users.noreply.github.com>
Co-authored-by: AashiGuptaShorthillsAI <144897730+AashiGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: ShamshadAhmedShorthillsAI <144897733+ShamshadAhmedShorthillsAI@users.noreply.github.com>
Co-authored-by: ManpreetShorthillsAI <142380984+ManpreetShorthillsAI@users.noreply.github.com>
Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com>
Co-authored-by: BajrangBishnoiShorthillsAi <148060486+BajrangBishnoiShorthillsAi@users.noreply.github.com>
2024-02-01 11:28:15 -08:00
Harel Gal
93366861c7
docs: Indicated Guardrails for Amazon Bedrock preview status (#16769)
Added notification about limited preview status of Guardrails for Amazon
Bedrock feature to code example.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
2024-02-01 10:41:48 -08:00
Erick Friis
17e886388b
nomic: init pkg (#16853)
Co-authored-by: Lance Martin <lance@langchain.dev>
2024-01-31 16:46:35 -08:00
Bagatur
b0347f3e2b
docs: add csv use case (#16756) 2024-01-30 09:39:46 -08:00
Jacob Lee
c6724a39f4
Fix rephrase step in chatbot use case (#16763) 2024-01-29 23:25:25 -08:00
Bob Lin
546b757303
community: Add ChatGLM3 (#15265)
Add [ChatGLM3](https://github.com/THUDM/ChatGLM3) and updated
[chatglm.ipynb](https://python.langchain.com/docs/integrations/llms/chatglm)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-29 20:30:52 -08:00