langchain

Commit Graph

Author	SHA1	Message	Date
Brigit Murtaugh	ccd916babe	Update dev container (#6189 ) Fixes https://github.com/hwchase17/langchain/issues/6172 As described in https://github.com/hwchase17/langchain/issues/6172, I'd love to help update the dev container in this project. Summary of changes: - Dev container now builds (the current container in this repo won't build for me) - Dockerfile updates - Update image to our [currently-maintained Python image](https://github.com/devcontainers/images/tree/main/src/python/.devcontainer) (`mcr.microsoft.com/devcontainers/python`) rather than the deprecated image from vscode-dev-containers - Move Dockerfile to root of repo - in order for `COPY` to work properly, it needs the files (in this case, `pyproject.toml` and `poetry.toml`) in the same directory - devcontainer.json updates - Removed `customizations` and `remoteUser` since they should be covered by the updated image in the Dockerfile - Update comments - Update docker-compose.yaml to properly point to updated Dockerfile - Add a .gitattributes to avoid line ending conversions, which can result in hundreds of pending changes ([info](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files)) - Add a README in the .devcontainer folder and info on the dev container in the contributing.md Outstanding questions: - Is it expected for `poetry install` to take some time? It takes about 30 minutes for this dev container to finish building in a Codespace, but a user should only have to experience this once. Through some online investigation, this doesn't seem unusual - Versions of poetry newer than 1.3.2 failed every time - based on some of the guidance in contributing.md and other online resources, it seemed changing poetry versions might be a good solution. 1.3.2 is from Jan 2023 --------- Co-authored-by: bamurtaugh <brmurtau@microsoft.com> Co-authored-by: Samruddhi Khandale <samruddhikhandale@github.com>	12 months ago
Davis Chase	03b5891cf7	more redirect (#6314 )	12 months ago
Davis Chase	eaee492dbc	basic redirect (#6309 )	12 months ago
Davis Chase	d2243757a3	update readme (#6304 )	12 months ago
Davis Chase	2f47e5c766	update api link (#6303 )	12 months ago
Davis Chase	d558bcfad8	rm ignore_vercel (#6302 )	12 months ago
Davis Chase	87e502c6bc	Doc refactor (#6300 ) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	12 months ago
Harrison Chase	94c82a189d	bump to 202 (#6262 )	12 months ago
hp0404	b01cf0dd54	ArxivAPIWrapper - doc_content_chars_max (#6063 ) This PR refactors the ArxivAPIWrapper class making `doc_content_chars_max` parameter optional. Additionally, tests have been added to ensure the functionality of the doc_content_chars_max parameter. Fixes #6027 (issue)	12 months ago
Daniel King	a9b97aa6f4	Update output format of MosaicML endpoint to be more flexible (#6060 ) There will likely be another change or two coming over the next couple weeks as we stabilize the API, but putting this one in now which just makes the integration a bit more flexible with the response output format. ``` (langchain) danielking@MML-1B940F4333E2 langchain % pytest tests/integration_tests/llms/test_mosaicml.py tests/integration_tests/embeddings/test_mosaicml.py =================================================================================== test session starts =================================================================================== platform darwin -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0 rootdir: /Users/danielking/github/langchain configfile: pyproject.toml plugins: asyncio-0.20.3, mock-3.10.0, dotenv-0.5.2, cov-4.0.0, anyio-3.6.2 asyncio: mode=strict collected 12 items tests/integration_tests/llms/test_mosaicml.py ...... [ 50%] tests/integration_tests/embeddings/test_mosaicml.py ...... [100%] =================================================================================== slowest 5 durations =================================================================================== 4.76s call tests/integration_tests/llms/test_mosaicml.py::test_retry_logic 4.74s call tests/integration_tests/llms/test_mosaicml.py::test_mosaicml_llm_call 4.13s call tests/integration_tests/llms/test_mosaicml.py::test_instruct_prompt 0.91s call tests/integration_tests/llms/test_mosaicml.py::test_short_retry_does_not_loop 0.66s call tests/integration_tests/llms/test_mosaicml.py::test_mosaicml_extra_kwargs =================================================================================== 12 passed in 19.70s =================================================================================== ``` #### Who can review? @hwchase17 @dev2049	12 months ago
JaysonAlbert	50d9c7d5a4	Fix: change the chatgpt plugin retriever metadata format (#5920 ) the current implement put the doc itself as the metadata, but the document chatgpt plugin retriever returned already has a `metadata` field, it's better to use that instead. the original code will throw the following exception when using `RetrievalQAWithSourcesChain`, becuse it can not find the field `metadata`: ```python Exception has occurred: ValueError (note: full exception trace is shown but execution is paused at: _run_module_as_main) Document prompt requires documents to have metadata variables: ['source']. Received document with missing metadata: ['source']. File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 27, in format_document raise ValueError( File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in <listcomp> doc_strings = [format_document(doc, self.document_prompt) for doc in docs] File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in _get_inputs doc_strings = [format_document(doc, self.document_prompt) for doc in docs] File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 85, in combine_docs inputs = self._get_inputs(docs, **kwargs) File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 84, in _call output, extra_return_dict = self.combine_docs( File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/base.py", line 140, in __call__ raise e ``` Additionally, the `metadata` filed in the `chatgpt plugin retriever` have these fileds by default: ```json { "source": "file", //email, file or chat "source_id": "filename.docx", // the filename "url": "", ... } ``` so, we should set `source_id` to `source` in the langchain metadata. ```python metadata = d.pop("metadata", d) if(metadata.get("source_id")): metadata["source"] = metadata.pop("source_id") ``` #### Who can review? @dev2049 <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: wangjie <wangjie@htffund.com>	12 months ago
Harrison Chase	e67b26eee9	Harrison/openai functions (#6261 ) Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>	12 months ago
Harrison Chase	6aafb46807	Harrison/openai functions (#6223 ) Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>	12 months ago
Zander Chase	bc9b8c8239	Improve Error Message for failed callback (#6247 ) Include the handler class name in the warning	12 months ago
Alon Roth	0013256e81	Support chat history persistence in AutoGPT (#5716 ) Short Description Added a new argument to AutoGPT class which allows to persist the chat history to a file. Changes 1. Removed the `self.full_message_history: List[BaseMessage] = []` 2. Replaced it with `chat_history_memory` which can take any subclasses of `BaseChatMessageHistory` --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	12 months ago
Martin Antos	1913320cbe	Feature/add acreom loader (#5780 ) adding new loader for [acreom](https://acreom.com) vaults. It's based on the Obsidian loader with some additional text processing for acreom specific markdown elements. @eyurtsev please take a look! --------- Co-authored-by: rlm <pexpresss31@gmail.com>	12 months ago
Zander Chase	ae76e473e1	Add Tags for LLMs (#6229 ) - [x] Add tracing tags to LLMs + Chat Models (both inheritable and local) - [x] Add tags for the run_on_dataset helper function(s)	12 months ago
Harrison Chase	8e1a7a8646	bump version to 201 (#6233 )	12 months ago
Harrison Chase	e82687ddf4	Harrison/use functions agent (#6185 ) Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>	12 months ago
Ryo Kanazawa	7d2b946d0b	Fix typo `pandocs` to `pandoc` (#6203 ) Fixes https://github.com/hwchase17/langchain/issues/6204 ### Context An typo issue with `pandoc`. #### Who can review? @hwchase17	12 months ago
Kyle Roth	c7db9febb0	count tokens for new OpenAI model versions (#6195 ) Trying to call `ChatOpenAI.get_num_tokens_from_messages` returns the following error for the newly announced models `gpt-3.5-turbo-0613` and `gpt-4-0613`: ``` NotImplementedError: get_num_tokens_from_messages() is not presently implemented for model gpt-3.5-turbo-0613.See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens. ``` This adds support for counting tokens for those models, by counting tokens the same way they're counted for the previous versions of `gpt-3.5-turbo` and `gpt-4`. #### reviewers - @hwchase17 - @agola11	12 months ago
xu0o0	7ad13cdbdb	feat: add content_format param to ConfluenceLoader.load() (#5922 ) Confluence API supports difference format of page content. The storage format is the raw XML representation for storage. The view format is the HTML representation for viewing with macros rendered as though it is viewed by users. Add the `content_format` parameter to `ConfluenceLoader.load()` to specify the content format, this is set to `ContentFormat.STORAGE` by default. #### Who can review? Tag maintainers/contributors who might be interested: @eyurtsev --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	12 months ago
0xJordan	c5a46e7435	feat: Add support for the Solidity language (#6054 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> ## Add Solidity programming language support for code splitter. Twitter: @0xjord4n_ <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: @hwchase17 <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @hwchase17 VectorStores / Retrievers / Memory - @dev2049 -->	12 months ago
Nuno Campos	17c4ec4812	Add docs for tags (#6155 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @hwchase17 VectorStores / Retrievers / Memory - @dev2049 -->	12 months ago
thiswillbeyourgithub	4a649e3b14	typo: 'following following' to 'following' (#6163 ) Co-authored-by: thiswillbeyourgithub <github@32mail.33mail.com>	12 months ago
Maciej Bryński	8a44c879c6	Update readthedocs_documentation.ipynb (#6148 ) Minor fix in documentation. Change URL in wget call to proper one.	12 months ago
Zander Chase	e0e3ef1c57	Update Name (#6136 )	12 months ago
Zander Chase	4555ad5d1f	Add Run Collector Callback (#6133 ) Add a callback handler that can collect nested run objects. Useful for evaluation.	12 months ago
Harrison Chase	6ac120f299	bump ver to 200 (#6130 )	12 months ago
Harrison Chase	e41f0b341c	add functions agent (#6113 )	12 months ago
Zander Chase	b3b155d488	Return session name in runner response (#6112 ) Makes it easier to then run evals w/o thinking about specifying a session	12 months ago
Harrison Chase	e74733ab9e	support streaming for functions (#6115 )	12 months ago
Nuno Campos	11ab0be11a	Add support for tags (#5898 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 -->	12 months ago
Harrison Chase	1281fdf0f2	Harrison/notebook functions (#6103 )	12 months ago
Harrison Chase	34ebb29726	bump version to 199 (#6102 )	12 months ago
Wenchen Li	f9edf76e7c	Implement `max_marginal_relevance_search` in `VectorStore` of Pinecone (#6056 ) This adds implementation of MMR search in pinecone; and I have two semi-related observations about this vector store class: - Maybe we should also have a `similarity_search_by_vector_returning_embeddings` like in supabase, but it's not in the base `VectorStore` class so I didn't implement - Talking about the base class, there's `similarity_search_with_relevance_scores`, but in pinecone it is called `similarity_search_with_score`; maybe we should consider renaming it to align with other `VectorStore` base and sub classes (or add that as an alias for backward compatibility) #### Who can review? Tag maintainers/contributors who might be interested: - VectorStores / Retrievers / Memory - @dev2049	12 months ago
Harrison Chase	970b2f9d38	convert tools to openai (#6100 )	12 months ago
Harrison Chase	292accde2b	support functions (#6099 )	12 months ago
Lance Martin	ee3d0513ad	Add tests and update notebook for MarkdownHeaderTextSplitter (#6069 ) Add test and update notebook for `MarkdownHeaderTextSplitter`.	12 months ago
Keshav Kumar	8fdf88b8e3	Fix for ModuleNotFoundError while running langchain-server. Issue #5833 (#6077 ) This PR fixes the error `ModuleNotFoundError: No module named 'langchain.cli'` Fixes https://github.com/hwchase17/langchain/issues/5833 (issue)	12 months ago
Zander Chase	0c52275bdb	Use Run object from SDK (#6067 ) Update the Run object in the tracer to extend that in the SDK to include the parameters necessary for tracking/tracing	12 months ago
Harrison Chase	cde1e8739a	turn off repr (#6078 )	12 months ago
Nuno Campos	a9b3b2e327	Enable serialization for anthropic (#6049 )	12 months ago
Harrison Chase	6ac5d80286	propogate kwargs fully (#6076 )	12 months ago
Harrison Chase	ec1a2adf9c	improve tools (#6062 )	12 months ago
Julius Lipp	5b6bbf4ab2	Add embaas document extraction api endpoints (#6048 ) # Introduces embaas document extraction api endpoints In this PR, we add support for embaas document extraction endpoints to Text Embedding Models (with LLMs, in different PRs coming). We currently offer the MTEB leaderboard top performers, will continue to add top embedding models and soon add support for customers to deploy thier own models. Additional Documentation + Infomation can be found [here](https://embaas.io). While developing this integration, I closely followed the patterns established by other langchain integrations. Nonetheless, if there are any aspects that require adjustments or if there's a better way to present a new integration, let me know! :) Additionally, I fixed some docs in the embeddings integration. Related PR: #5976 #### Who can review? DataLoaders - @eyurtsev	12 months ago
Zander Chase	2f0088039d	Log tracer errors (#6066 ) Example (would log several times if not for the helper fn. Would emit no logs due to mulithreading previously) ![image](https://github.com/hwchase17/langchain/assets/130414180/070d25ae-1f06-4487-9617-0a6f66f3f01e)	12 months ago
Lance Martin	b023f0c0f2	Text splitter for Markdown files by header (#5860 ) This creates a new kind of text splitter for markdown files. The user can supply a set of headers that they want to split the file on. We define a new text splitter class, `MarkdownHeaderTextSplitter`, that does a few things: (1) For each line, it determines the associated set of user-specified headers (2) It groups lines with common headers into splits See notebook for example usage and test cases.	12 months ago
Jens Madsen	2c91f0d750	chore: spedd up integration test by using smaller model (#6044 ) Adds a new parameter `relative_chunk_overlap` for the `SentenceTransformersTokenTextSplitter` constructor. The parameter sets the chunk overlap using a relative factor, e.g. for a model where the token limit is 100, a `relative_chunk_overlap=0.5` implies that `chunk_overlap=50` Tag maintainers/contributors who might be interested: @hwchase17, @dev2049	12 months ago
Harrison Chase	5922742d56	comment out	12 months ago

... 2 3 4 5 6 ...

2664 Commits (multi_strategy_parser) All Branches Search

2664 Commits (multi_strategy_parser)

All Branches