langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-31 15:20:26 +00:00

Author	SHA1	Message	Date
joelsprunger	3984f6604f	langchain: adds recursive json splitter (#17144 ) - Description: This adds a recursive json splitter class to the existing text_splitters as well as unit tests - Issue: splitting text from structured data can cause issues if you have a large nested json object and you split it as regular text you may end up losing the structure of the json. To mitigate against this you can split the nested json into large chunks and overlap them, but this causes unnecessary text processing and there will still be times where the nested json is so big that the chunks get separated from the parent keys. As an example you wouldn't want the following to be split in half: ```shell {'val0': 'DFWeNdWhapbR', 'val1': {'val10': 'QdJo', 'val11': 'FWSDVFHClW', 'val12': 'bkVnXMMlTiQh', 'val13': 'tdDMKRrOY', 'val14': 'zybPALvL', 'val15': 'JMzGMNH', 'val16': {'val160': 'qLuLKusFw', 'val161': 'DGuotLh', 'val162': 'KztlcSBropT', -----------------------------------------------------------------------split----- 'val163': 'YlHHDrN', 'val164': 'CtzsxlGBZKf', 'val165': 'bXzhcrWLmBFp', 'val166': 'zZAqC', 'val167': 'ZtyWno', 'val168': 'nQQZRsLnaBhb', 'val169': 'gSpMbJwA'}, 'val17': 'JhgiyF', 'val18': 'aJaqjUSFFrI', 'val19': 'glqNSvoyxdg'}} ``` Any llm processing the second chunk of text may not have the context of val1, and val16 reducing accuracy. Embeddings will also lack this context and this makes retrieval less accurate. Instead you want it to be split into chunks that retain the json structure. ```shell {'val0': 'DFWeNdWhapbR', 'val1': {'val10': 'QdJo', 'val11': 'FWSDVFHClW', 'val12': 'bkVnXMMlTiQh', 'val13': 'tdDMKRrOY', 'val14': 'zybPALvL', 'val15': 'JMzGMNH', 'val16': {'val160': 'qLuLKusFw', 'val161': 'DGuotLh', 'val162': 'KztlcSBropT', 'val163': 'YlHHDrN', 'val164': 'CtzsxlGBZKf'}}} ``` and ```shell {'val1':{'val16':{ 'val165': 'bXzhcrWLmBFp', 'val166': 'zZAqC', 'val167': 'ZtyWno', 'val168': 'nQQZRsLnaBhb', 'val169': 'gSpMbJwA'}, 'val17': 'JhgiyF', 'val18': 'aJaqjUSFFrI', 'val19': 'glqNSvoyxdg'}} ``` This recursive json text splitter does this. Values that contain a list can be converted to dict first by using split(... convert_lists=True) otherwise long lists will not be split and you may end up with chunks larger than the max chunk. In my testing large json objects could be split into small chunks with ✅ Increased question answering accuracy ✅ The ability to split into smaller chunks meant retrieval queries can use fewer tokens - Dependencies: json import added to text_splitter.py, and random added to the unit test - Twitter handle: @joelsprunger --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-02-08 13:45:34 -08:00
Schalkje	f0ada1a396	docs: Update quickstart.mdx - Fix 422 error in example with LangServe client code (#17163 ) Description:: Fix 422 error in example with LangServe client code httpx.HTTPStatusError: Client error '422 Unprocessable Entity' for url 'http://localhost:8000/agent/invoke'	2024-02-08 13:35:39 -08:00
Kartheek Yakkala	3a22157d92	docs: Added LCEL for alibabacloud and anyscale (#17252 ) --------- Co-authored-by: KARTHEEK YAKKALA <kartheekyakkala@KARTHEEKs-Air.lan> Co-authored-by: KARTHEEK YAKKALA <kartheekyakkala.se@gmail.com>	2024-02-08 13:18:09 -08:00
Neli Hateva	9bb5157a3d	langchain[patch], community[patch]: Fixes in the Ontotext GraphDB Graph and QA Chain (#17239 ) - Description: Fixes in the Ontotext GraphDB Graph and QA Chain related to the error handling in case of invalid SPARQL queries, for which `prepareQuery` doesn't throw an exception, but the server returns 400 and the query is indeed invalid - Issue: N/A - Dependencies: N/A - Twitter handle: @OntotextGraphDB	2024-02-08 12:05:43 -08:00
Jorge Campo	88609565a3	docs: Fix typo in github.ipynb (#17259 ) 'agiven' -> 'a given'	2024-02-08 12:03:00 -08:00
Bagatur	00a09e1b71	docs: use PromptTemplate.from_template (#17218 ) Ran ```python import glob import re def update_prompt(x): return re.sub( r"(?P<start>\b)PromptTemplate\(template=(?P<template>.), input_variables=(?:.)\)", "\g<start>PromptTemplate.from_template(\g<template>)", x ) for fn in glob.glob("docs/*/", recursive=True): try: content = open(fn).readlines() except: continue content = [update_prompt(l) for l in content] with open(fn, "w") as f: f.write("".join(content)) ```	2024-02-07 19:52:42 -08:00
sana-google	7f55c95790	docs: add missing link to Quickstart (#17085 ) Replace this entire comment with: - Description: Added missing link for Quickstart in Model IO documentation, - Issue: N/A, - Dependencies: N/A, - Twitter handle: N/A <!-- If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-02-07 22:26:10 -05:00
Eugene Yurtsev	780e84ae79	community[minor]: SQLDatabase Add fetch mode `cursor`, query parameters, query by selectable, expose execution options, and documentation (#17191 ) - Description: Improve `SQLDatabase` adapter component to promote code re-use, see [suggestion](https://github.com/langchain-ai/langchain/pull/16246#pullrequestreview-1846590962). - Needed by: GH-16246 - Addressed to: @baskaryan, @cbornet ## Details - Add `cursor` fetch mode - Accept SQL query parameters - Accept both `str` and SQLAlchemy selectables as query expression - Expose `execution_options` - Documentation page (notebook) about `SQLDatabase` [^1] See [About SQLDatabase](https://github.com/langchain-ai/langchain/blob/c1c7b763/docs/docs/integrations/tools/sql_database.ipynb). [^1]: Apparently there hasn't been any yet? --------- Co-authored-by: Andreas Motl <andreas.motl@crate.io>	2024-02-07 22:23:43 -05:00
Leonid Ganeline	d903fa313e	docs: titles fix (#17206 ) Several notebooks have Title != file name. That results in corrupted sorting in Navbar (ToC). - Fixed titles and file names. - Changed text formats to the consistent form - Redirected renamed files in the `Vercel.json`	2024-02-07 22:09:34 -05:00
Bagatur	aeb6b38901	docs: cleanup fleet integration (#17214 ) Causing search issues	2024-02-07 17:18:48 -08:00
Leonid Ganeline	5ceaf784f3	docs `Integraions/Components` menu reordered (#17151 ) This PR is opinionated. - Moved `Embedding models` item to place after `LLMs` and `Chat model`, so all items with models are together. - Renamed `Text embedding models` to `Embedding models`. Now, it is shorter and easier to read. `Text` is obvious from context. The same as the `Text LLMs` vs. `LLMs` (we also have multi-modal LLMs).	2024-02-06 20:33:41 -08:00
Leonid Ganeline	0af0fc5d25	docs `integraions/providers` nav fix (#17148 ) Issue: `Provides` page is presented as the index page (on the `Providers` item) and as the `Providers/Providers` item. The latter should not be in the menu. See the picture. ![image](https://github.com/langchain-ai/langchain/assets/2256422/6894023f-f13a-4f0d-8fe2-ed5b0ae2bdd2) This PR fixes this.	2024-02-06 20:33:14 -08:00
Leonid Ganeline	bf55279d39	docs: tutorials update (#17132 ) Added the course and the one-pager links	2024-02-06 20:30:30 -08:00
Erick Friis	d397721a34	docs: format (#17143 )	2024-02-06 16:32:53 -08:00
Arno Schutijzer	863f96b2e0	docs: fix typo in ollama notebook (#17127 ) - Description: typo fix in ollama notebook	2024-02-06 16:54:40 -05:00
Leonid Ganeline	42c812a549	API References sorted `Partner libs` menu (#17130 ) The `Partner libs` menu is not sorted. Now it is long enough, and items should be sorted to simplify a package search. - Sorted items in the `Partner libs` menu	2024-02-06 16:49:23 -05:00
Junyoung Park	1ed73f1992	community[minor]: Add SelfQueryRetriever support to PGVector (#16991 ) - Description: Add SelfQueryRetriever support to PGVector - Issue: - - Dependencies: - - Twitter handle: - --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-02-06 10:50:50 -08:00
Frank	ef082c77b1	community[minor]: add github file loader to load any github file content b… (#15305 ) ### Description support load any github file content based on file extension. Why not use [git loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk) ? git loader clones the whole repo even only interested part of files, that's too heavy. This GithubFileLoader only downloads that you are interested files. ### Twitter handle my twitter: @shufanhaotop --------- Co-authored-by: Hao Fan <h_fan@apple.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-02-06 09:42:33 -08:00
老阿張	ac662b3698	docs: Fix typo in amadeus.ipynb (#16916 ) Description: "enviornment should be environment"? 🤔 Issue: Typo Dependencies: Nope Twitter handle: laoazhang	2024-02-06 09:42:05 -08:00
Jan de Boer	2d8015554c	docs: Link to Brave Website added (#16958 ) Description: Link to the Brave Website added to the `brave-search.ipynb` notebook. This notebook is shown in the docs as an example for the brave tool. Issue: There was to reference on where / how to get an api key Dependencies: none Twitter handle: not for this one :)	2024-02-05 18:29:16 -08:00
os1ma	fd88e0f800	docs: update StreamlitCallbackHandler example (#16970 ) - Description: docs: update StreamlitCallbackHandler example. - Issue: None - Dependencies: None I have updated the example for StreamlitCallbackHandler in the documentation bellow. https://python.langchain.com/docs/integrations/callbacks/streamlit Previously, the example used `initialize_agent`, which has been deprecated, so I've updated it to use `create_react_agent` instead. Many langchain users are likely searching examples of combining `create_react_agent` or `openai_tools_agent_chain` with StreamlitCallbackHandler. I'm sure this update will be really helpful for them! Unfortunately, writing unit tests for this example is difficult, so I have not written any tests. I have run this code in a standalone Python script file and ensured it runs correctly.	2024-02-05 18:20:59 -08:00
Marc Mahe	f08a9139d2	docs: update mistral docs for version 0.1+ (#17011 ) Description: Updated integration page for mistralai.	2024-02-05 18:03:12 -08:00
Ikko Eltociear Ashimine	5f5f5acbc5	docs: fix typo in dspy.ipynb (#16996 ) langugage -> language	2024-02-05 17:31:06 -08:00
Eugene Yurtsev	609ea019b2	docs: Update streaming documentation (#17066 ) Updating streaming documentation following fix of JSON parser for streaming json.	2024-02-05 17:24:46 -08:00
Bagatur	d8f41d0521	docs: add youtube link (#17065 )	2024-02-05 16:12:56 -08:00
Harrison Chase	83fbf0e11a	docs: add structured tools howto to agents (#15772 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-02-05 15:53:01 -08:00
Alex Boury	334b6ebdf3	community[minor]: Breebs docs retriever (#16578 ) - Description: Implementation of breeb retriever with integration tests -> libs/community/tests/integration_tests/retrievers/test_breebs.py and documentation (notebook) -> docs/docs/integrations/retrievers/breebs.ipynb. - Dependencies: None	2024-02-05 15:51:08 -08:00
Nova Kwok	eb7b05885f	docs: Fix typo in quickstart.ipynb (#16859 ) - Description: "load HTML form web URLs" should be "load HTML from web URLs"? 🤔 - Issue: Typo - Dependencies: Nope - Twitter handle: n0vad3v	2024-02-05 15:50:11 -08:00
Shorthills AI	cf0b29b6d2	docs: fixing a minor grammatical mistake (#16931 )	2024-02-05 15:49:47 -08:00
Shivani Modi	fcb875629d	docs: Updating documentation for Konko provider (#16953 ) - Description: A small update to the Konko provider documentation. --------- Co-authored-by: Shivani Modi <shivanimodi@Shivanis-MacBook-Pro.local>	2024-02-05 15:49:13 -08:00
Benjamin Muskalla	973ba0d84b	docs: Fix Copilot name (#16956 ) The official name is "GitHub Copilot"	2024-02-05 15:48:47 -08:00
IMRAN KHAN	4b17699818	docs: add 2 more tutorials to the list in youtube.mdx (#16998 ) - Description: add 2 more tutorials to the list in youtube.mdx, - Twitter handle: EhThing	2024-02-05 15:48:34 -08:00
Supreet Takkar	ae33979813	community[patch]: Allow adding ARNs as model_id to support Amazon Bedrock custom models (#16800 ) - Description: Adds an additional class variable to `BedrockBase` called `provider` that allows sending a model provider such as amazon, cohere, ai21, etc. Up until now, the model provider is extracted from the `model_id` using the first part before the `.`, such as `amazon` for `amazon.titan-text-express-v1` (see [supported list of Bedrock model IDs here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html)). But for custom Bedrock models where the ARN of the provisioned throughput must be supplied, the `model_id` is like `arn:aws:bedrock:...` so the `model_id` cannot be extracted from this. A model `provider` is required by the LangChain Bedrock class to perform model-based processing. To allow the same processing to be performed for custom-models of a specific base model type, passing this `provider` argument can help solve the issues. The alternative considered here was the use of `provider.arn:aws:bedrock:...` which then requires ARN to be extracted and passed separately when invoking the model. The proposed solution here is simpler and also does not cause issues for current models already using the Bedrock class. - Issue: N/A - Dependencies: N/A --------- Co-authored-by: Piyush Jain <piyushjain@duck.com>	2024-02-05 14:28:03 -08:00
Vadim Kudlay	75b6fa1134	nvidia-ai-endpoints[patch]: Support User-Agent metadata and minor fixes. (#16942 ) - Description: Several meta/usability updates, including User-Agent. - Issue: - User-Agent metadata for tracking connector engagement. @milesial please check and advise. - Better error messages. Tries harder to find a request ID. @milesial requested. - Client-side image resizing for multimodal models. Hope to upgrade to Assets API solution in around a month. - `client.payload_fn` allows you to modify payload before network request. Use-case shown in doc notebook for kosmos_2. - `client.last_inputs` put back in to allow for advanced support/debugging. - Dependencies: - Attempts to pull in PIL for image resizing. If not installed, prints out "please install" message, warns it might fail, and then tries without resizing. We are waiting on a more permanent solution. For LC viz: @hinthornw For NV viz: @fciannella @milesial @vinaybagade --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-02-05 12:24:53 -08:00
Erick Friis	6ffd5b15bc	pinecone: init pkg (#16556 ) <!-- Thank you for contributing to LangChain! Please title your PR "<package>: <description>", where <package> is whichever of langchain, community, core, experimental, etc. is being modified. Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes if applicable, - Dependencies: any dependencies required for this change, - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` from the root of the package you've modified to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2024-02-05 11:55:01 -08:00
Erick Friis	db6af21395	docs: exa contents (#16555 )	2024-02-05 11:15:06 -08:00
Nicolas Grenié	54fcd476bb	docs: Update ollama examples with new community libraries (#17007 ) - Description: Updating one line code sample for Ollama with new langchain_community package - Issue: - Dependencies: none - Twitter handle: @picsoung	2024-02-04 15:13:29 -08:00
Erick Friis	afdd636999	docs: partner packages (#16960 )	2024-02-02 15:12:21 -08:00
Ashley Xu	66adb95284	docs: BigQuery Vector Search went public review and updated docs (#16896 ) Update the docs for BigQuery Vector Search	2024-02-02 10:26:44 -08:00
Massimiliano Pronesti	71f9ea33b6	docs: add quantization to vllm and update API (#16950 ) - Description: Update vLLM docs to include instructions on how to use quantized models, as well as to replace the deprecated methods.	2024-02-02 10:24:49 -08:00
Radhakrishnan	3b0fa9079d	docs: Updated integration doc for aleph alpha (#16844 ) Description: Updated doc for llm/aleph_alpha with new functions: invoke. Changed structure of the document to match the required one. Issue: https://github.com/langchain-ai/langchain/issues/15664 Dependencies: None Twitter handle: None --------- Co-authored-by: Radhakrishnan Iyer <radhakrishnan.iyer@ibm.com>	2024-02-02 09:28:06 -08:00
Erick Friis	6fc2835255	docs: fix broken links (#16855 )	2024-02-01 17:29:38 -08:00
Erick Friis	b1a847366c	community: revert SQL Stores (#16912 ) This reverts commit `cfc225ecb3`. https://github.com/langchain-ai/langchain/pull/15909#issuecomment-1922418097 These will have existed in langchain-community 0.0.16 and 0.0.17.	2024-02-01 16:37:40 -08:00
akira wu	f7c709b40e	doc: fix typo in message_history.ipynb (#16877 ) - Description: just fixed a small typo in the documentation in the `expression_language/how_to/message_history` session [here](https://python.langchain.com/docs/expression_language/how_to/message_history)	2024-02-01 13:30:29 -08:00
Shorthills AI	0bca0f4c24	Docs: Fixed grammatical mistake (#16858 ) Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com> Co-authored-by: Sanskar Tanwar <142409040+SanskarTanwarShorthillsAI@users.noreply.github.com> Co-authored-by: UpneetShorthillsAI <144228282+UpneetShorthillsAI@users.noreply.github.com> Co-authored-by: HarshGuptaShorthillsAI <144897987+HarshGuptaShorthillsAI@users.noreply.github.com> Co-authored-by: AdityaKalraShorthillsAI <143726711+AdityaKalraShorthillsAI@users.noreply.github.com> Co-authored-by: SakshiShorthillsAI <144228183+SakshiShorthillsAI@users.noreply.github.com> Co-authored-by: AashiGuptaShorthillsAI <144897730+AashiGuptaShorthillsAI@users.noreply.github.com> Co-authored-by: ShamshadAhmedShorthillsAI <144897733+ShamshadAhmedShorthillsAI@users.noreply.github.com> Co-authored-by: ManpreetShorthillsAI <142380984+ManpreetShorthillsAI@users.noreply.github.com> Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com> Co-authored-by: BajrangBishnoiShorthillsAi <148060486+BajrangBishnoiShorthillsAi@users.noreply.github.com>	2024-02-01 11:28:15 -08:00
Harel Gal	93366861c7	docs: Indicated Guardrails for Amazon Bedrock preview status (#16769 ) Added notification about limited preview status of Guardrails for Amazon Bedrock feature to code example. --------- Co-authored-by: Piyush Jain <piyushjain@duck.com>	2024-02-01 10:41:48 -08:00
Erick Friis	17e886388b	nomic: init pkg (#16853 ) Co-authored-by: Lance Martin <lance@langchain.dev>	2024-01-31 16:46:35 -08:00
Bagatur	b0347f3e2b	docs: add csv use case (#16756 )	2024-01-30 09:39:46 -08:00
Jacob Lee	c6724a39f4	Fix rephrase step in chatbot use case (#16763 )	2024-01-29 23:25:25 -08:00
Bob Lin	546b757303	community: Add ChatGLM3 (#15265 ) Add [ChatGLM3](https://github.com/THUDM/ChatGLM3) and updated [chatglm.ipynb](https://python.langchain.com/docs/integrations/llms/chatglm) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-01-29 20:30:52 -08:00

1 2 3 4 5 ...

2968 Commits