langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-31 15:20:26 +00:00

Author	SHA1	Message	Date
Joe Reuter	09aa1eac03	Airbyte loaders: Fix last_state getter (#9314 ) This PR fixes the Airbyte loaders when doing incremental syncs. The notebooks are calling out to access `loader.last_state` to get the current state of incremental syncs, but this didn't work due to a refactoring of how the loaders are structured internally in the original PR. This PR fixes the issue by adding a `last_state` property that forwards the state correctly from the CDK adapter. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-16 15:56:33 -07:00
Eugene Yurtsev	0f9f213833	Pydantic Compatibility (#9327 ) Pydantic Compatibility Guidelines for migration plan + debugging	2023-08-16 15:55:53 -07:00
Chandler May	15f1af8ed6	Fix variable case in code snippet in docs (#9311 ) - Description: Fix a minor variable naming inconsistency in a code snippet in the docs - Issue: N/A - Dependencies: none - Tag maintainer: N/A - Twitter handle: N/A	2023-08-16 13:34:46 -07:00
Jakub Kuciński	8bebc9206f	Add improved sources splitting in BaseQAWithSourcesChain (#8716 ) ## Type: Improvement --- ## Description: Running QAWithSourcesChain sometimes raises ValueError as mentioned in issue #7184: ``` ValueError: too many values to unpack (expected 2) Traceback: response = qa({"question": pregunta}, return_only_outputs=True) File "C:\Anaconda3\envs\iagen_3_10\lib\site-packages\langchain\chains\base.py", line 166, in __call__ raise e File "C:\Anaconda3\envs\iagen_3_10\lib\site-packages\langchain\chains\base.py", line 160, in __call__ self._call(inputs, run_manager=run_manager) File "C:\Anaconda3\envs\iagen_3_10\lib\site-packages\langchain\chains\qa_with_sources\base.py", line 132, in _call answer, sources = re.split(r"SOURCES:\s", answer) ``` This is due to LLM model generating subsequent question, answer and sources, that is complement in a similar form as below: ``` <final_answer> SOURCES: <sources> QUESTION: <new_or_repeated_question> FINAL ANSWER: <new_or_repeated_final_answer> SOURCES: <new_or_repeated_sources> ``` It leads the following line ``` re.split(r"SOURCES:\s", answer) ``` to return more than 2 elements and result in ValueError. The simple fix is to split also with "QUESTION:\s" and take the first two elements: ``` answer, sources = re.split(r"SOURCES:\s\|QUESTION:\s", answer)[:2] ``` Sometimes LLM might also generate some other texts, like alternative answers in a form: ``` <final_answer_1> SOURCES: <sources> <final_answer_2> SOURCES: <sources> <final_answer_3> SOURCES: <sources> ``` In such cases it is the best to split previously obtained sources with new line: ``` sources = re.split(r"\n", sources.lstrip())[0] ``` --- ## Issue: Resolves #7184 --- ## Maintainer: @baskaryan	2023-08-16 13:30:15 -07:00
Bagatur	a3c79b1909	Add tiktoken integration dep (#9332 )	2023-08-16 12:09:22 -07:00
Michael Bianco	23928a3311	docs: remove multiple code blocks from comma-separated docs (#9323 )	2023-08-16 11:51:58 -07:00
Bagatur	ba5fbaba70	bump 266 (#9296 )	2023-08-16 01:13:19 -07:00
Navanit Dubey	3e6cea46e2	Guide import readable json (#9291 )	2023-08-16 00:49:01 -07:00
axiangcoding	63601551b1	fix(llms): improve the ernie chat model (#9289 ) - Description: improve the ernie chat model. - fix missing kwargs to payload - new test cases - add some debug level log - improve description - Issue: None - Dependencies: None - Tag maintainer: @baskaryan	2023-08-16 00:48:42 -07:00
Daniel Chalef	1d55141c50	zep/new ZepVectorStore (#9159 ) - new ZepVectorStore class - ZepVectorStore unit tests - ZepVectorStore demo notebook - update zep-python to ~1.0.2 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-16 00:23:07 -07:00
William FH	2519580994	Add Schema Evals (#9228 ) Simple eval checks for whether a generation is valid json and whether it matches an expected dict	2023-08-15 17:17:32 -07:00
Kenny	74a64cfbab	expose output key to create_openai_fn_chain (#9155 ) I quick change to allow the output key of create_openai_fn_chain to optionally be changed. @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-15 17:01:32 -07:00
Bagatur	b9ca5cc5ea	update guide import (#9279 )	2023-08-15 17:01:06 -07:00
Bagatur	afba2be3dc	update openai functions docs (#9278 )	2023-08-15 17:00:56 -07:00
Bagatur	9abf60acb6	Bagatur/vectara regression (#9276 ) Co-authored-by: Ofer Mendelevitch <ofer@vectara.com> Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com>	2023-08-15 16:19:46 -07:00
Xiaoyu Xee	b30f449dae	Add dashvector vectorstore (#9163 ) ## Description Add `Dashvector` vectorstore for langchain - [dashvector quick start](https://help.aliyun.com/document_detail/2510223.html) - [dashvector package description](https://pypi.org/project/dashvector/) ## How to use ```python from langchain.vectorstores.dashvector import DashVector dashvector = DashVector.from_documents(docs, embeddings) ``` --------- Co-authored-by: smallrain.xuxy <smallrain.xuxy@alibaba-inc.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-15 16:19:30 -07:00
Bagatur	bfbb97b74c	Bagatur/deeplake docs fixes (#9275 ) Co-authored-by: adilkhan <adilkhan.sarsen@nu.edu.kz>	2023-08-15 15:56:36 -07:00
Kunj-2206	1b3942ba74	Added BittensorLLM (#9250 ) Description: Adding NIBittensorLLM via Validator Endpoint to langchain llms Tag maintainer: @Kunj-2206 Maintainer responsibilities: Models / Prompts: @hwchase17, @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-15 15:40:52 -07:00
Toshish Jawale	852722ea45	Improvements in Nebula LLM (#9226 ) - Description: Added improvements in Nebula LLM to perform auto-retry; more generation parameters supported. Conversation is no longer required to be passed in the LLM object. Examples are updated. - Issue: N/A - Dependencies: N/A - Tag maintainer: @baskaryan - Twitter handle: symbldotai --------- Co-authored-by: toshishjawale <toshish@symbl.ai>	2023-08-15 15:33:07 -07:00
Bagatur	358562769a	Bagatur/refac faiss (#9076 ) Code cleanup and bug fix in deletion	2023-08-15 15:19:00 -07:00
Bagatur	3eccd72382	pin pydantic (#9274 ) don't want default to be v2 yet	2023-08-15 15:02:28 -07:00
Erick Friis	76d09b4ed0	hub push/pull (#9225 ) Description: Adds push/pull functions to interact with the hub Issue: n/a Dependencies: `langchainhub` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-15 14:11:43 -07:00
Bagatur	1aae77f26f	fix context nb (#9267 )	2023-08-15 12:53:37 -07:00
Alex Gamble	cf17c58b47	Update documentation for the Context integration with new URL and features (#9259 ) Update documentation and URLs for the Langchain Context integration. We've moved from getcontext.ai to context.ai \o/ Thanks in advance for the review!	2023-08-15 11:38:34 -07:00
Eugene Yurtsev	a091b4bf4c	Update testing workflow to test with both pydantic versions (#9206 ) * PR updates test.yml to test with both pydantic versions * Code should be refactored to make it easier to do testing in matrix format w/ packages * Added steps to assert that pydantic version in the environment is as expected	2023-08-15 13:21:11 -04:00
Bagatur	e0162baa3b	add oai sched tests (#9257 )	2023-08-15 09:40:33 -07:00
Joseph McElroy	5e9687a196	Elasticsearch self-query retriever (#9248 ) Now with ElasticsearchStore VectorStore merged, i've added support for the self-query retriever. I've added a notebook also to demonstrate capability. I've also added unit tests. Credit @elastic and @phoey1 on twitter.	2023-08-15 10:53:43 -04:00
Anthony Mahanna	0a04e63811	docs: Update ArangoDB Links (#9251 ) ready for review - mdx link update - colab link update	2023-08-15 07:43:47 -07:00
Eugene Yurtsev	0470198fb5	Remove packages for pydantic compatibility (#9217 ) # Poetry updates This PR updates LangChains poetry file to remove any dependencies that aren't pydantic v2 compatible yet. All packages remain usable under pydantic v1, and can be installed separately. ## Bumping the following packages: * langsmith ## Removing the following packages not used in extended unit-tests: * zep-python, anthropic, jina, spacy, steamship, betabageldb not used at all: * octoai-sdk Cleaning up extras w/ for removed packages. ## Snapshots updated Some snapshots had to be updated due to a change in the data model in langsmith. RunType used to be Union of Enum and string and was changed to be string only.	2023-08-15 10:41:25 -04:00
Bagatur	e986afa13a	bump 265 (#9253 )	2023-08-15 07:21:32 -07:00
Hech	4b505060bd	fix: max_marginal_relevance_search and docs in Dingo (#9244 )	2023-08-15 01:06:06 -07:00
axiangcoding	664ff28cba	feat(llms): support ernie chat (#9114 ) Description: support ernie (文心一言) chat model Related issue: #7990 Dependencies: None Tag maintainer: @baskaryan	2023-08-15 01:05:46 -07:00
Bharat Ramanathan	08a8363fc6	feat(integration): Add support to serialize protobufs in WandbTracer (#8914 ) This PR adds serialization support for protocol bufferes in `WandbTracer`. This allows code generation chains to be visualized. Additionally, it also fixes a minor bug where the settings are not honored when a run is initialized before using the `WandbTracer` @agola11 --------- Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-15 01:05:12 -07:00
fanyou-wbd	5e43768f61	docs: update LlamaCpp max_tokens args (#9238 ) This PR updates documentations only, `max_length` should be `max_tokens` according to latest LlamaCpp API doc: https://api.python.langchain.com/en/latest/llms/langchain.llms.llamacpp.LlamaCpp.html	2023-08-15 00:50:20 -07:00
Bagatur	a8aa1aba1c	nit (#9243 )	2023-08-15 00:49:12 -07:00
Bagatur	68d8f73698	consolidate redirects (#9242 )	2023-08-15 00:48:23 -07:00
Joshua Sundance Bailey	ef0664728e	ArcGISLoader update (#9240 ) Small bug fixes and added metadata based on user feedback. This PR is from the author of https://github.com/langchain-ai/langchain/pull/8873 .	2023-08-14 23:44:29 -07:00
Joseph McElroy	eac4ddb4bb	Elasticsearch Store Improvements (#8636 ) Todo: - [x] Connection options (cloud, localhost url, es_connection) support - [x] Logging support - [x] Customisable field support - [x] Distance Similarity support - [x] Metadata support - [x] Metadata Filter support - [x] Retrieval Strategies - [x] Approx - [x] Approx with Hybrid - [x] Exact - [x] Custom - [x] ELSER (excluding hybrid as we are working on RRF support) - [x] integration tests - [x] Documentation 👋 this is a contribution to improve Elasticsearch integration with Langchain. Its based loosely on the changes that are in master but with some notable changes: ## Package name & design improvements The import name is now `ElasticsearchStore`, to aid discoverability of the VectorStore. ```py ## Before from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch, ElasticKnnSearch ## Now from langchain.vectorstores.elasticsearch import ElasticsearchStore ``` ## Retrieval Strategy support Before we had a number of classes, depending on the strategy you wanted. `ElasticKnnSearch` for approx, `ElasticVectorSearch` for exact / brute force. With `ElasticsearchStore` we have retrieval strategies: ### Approx Example Default strategy for the vast majority of developers who use Elasticsearch will be inferring the embeddings from outside of Elasticsearch. Uses KNN functionality of _search. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index" ) output = docsearch.similarity_search("foo", k=1) ``` ### Approx, with hybrid Developers who want to search, using both the embedding and the text bm25 match. Its simple to enable. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index", strategy=ElasticsearchStore.ApproxRetrievalStrategy(hybrid=True) ) output = docsearch.similarity_search("foo", k=1) ``` ### Approx, with `query_model_id` Developers who want to infer within Elasticsearch, using the model loaded in the ml node. This relies on the developer to setup the pipeline and index if they wish to embed the text in Elasticsearch. Example of this in the test. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index", strategy=ElasticsearchStore.ApproxRetrievalStrategy( query_model_id="sentence-transformers__all-minilm-l6-v2" ), ) output = docsearch.similarity_search("foo", k=1) ``` ### I want to provide my own custom Elasticsearch Query You might want to have more control over the query, to perform multi-phase retrieval such as LTR, linearly boosting on document parameters like recently updated or geo-distance. You can do this with `custom_query_fn` ```py def my_custom_query(query_body: dict, query: str) -> dict: return {"query": {"match": {"text": {"query": "bar"}}}} texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), **elasticsearch_connection, index_name=index_name ) docsearch.similarity_search("foo", k=1, custom_query=my_custom_query) ``` ### Exact Example Developers who have a small dataset in Elasticsearch, dont want the cost of indexing the dims vs tradeoff on cost at query time. Uses script_score. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index", strategy=ElasticsearchStore.ExactRetrievalStrategy(), ) output = docsearch.similarity_search("foo", k=1) ``` ### ELSER Example Elastic provides its own sparse vector model called ELSER. With these changes, its really easy to use. The vector store creates a pipeline and index thats setup for ELSER. All the developer needs to do is configure, ingest and query via langchain tooling. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index", strategy=ElasticsearchStore.SparseVectorStrategy(), ) output = docsearch.similarity_search("foo", k=1) ``` ## Architecture In future, we can introduce new strategies and allow us to not break bwc as we evolve the index / query strategy. ## Credit On release, could you credit @elastic and @phoey1 please? Thank you! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 23:42:35 -07:00
Harrison Chase	71d5b7c9bf	Harrison/fallbacks (#9233 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 18:27:38 -07:00
Lance Martin	41279a3ae1	Move self-check use case to "more" section (#9137 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 18:27:28 -07:00
Lance Martin	22858d99b5	Move code-writing use case to "more" section (#9134 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 18:27:19 -07:00
Bagatur	249d7d06a2	adapter doc nit (#9234 )	2023-08-14 18:26:37 -07:00
Divyansh Garg	9529483c2a	Improve MultiOn client toolkit prompts (#9222 ) - Updated prompts for the MultiOn toolkit for better functionality - Non-blocking but good to have it merged to improve the overall performance for the toolkit @hinthornw @hwchase17 --------- Co-authored-by: Naman Garg <ngarg3@binghamton.edu>	2023-08-14 17:39:51 -07:00
Lance Martin	969e1683de	Move graph use case to "more" section (#8997 ) Clean `use_cases` by moving the `GraphDB` to `integrations`. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 17:20:38 -07:00
William FH	c478fc208e	Default On Retry (#9230 ) Base callbacks don't have a default on retry event Fix #8542 --------- Co-authored-by: landonsilla <landon.silla@stepstone.com>	2023-08-14 16:45:17 -07:00
Lance Martin	d0a0d560ad	Minor formatting on Web Research Use Case (#9221 )	2023-08-14 16:29:36 -07:00
Leonid Ganeline	93dd499997	docstrings: `document_loaders` consistency 3 (#9216 ) Updated docstrings into the consistent format (probably, the last update for the `document_loaders`.	2023-08-14 16:28:39 -07:00
Kshitij Wadhwa	a69cb95850	track langchain usage for Rockset (#9229 ) Add ability to track langchain usage for Rockset. Rockset's new python client allows setting this. To prevent old clients from failing, it ignore if setting throws exception (we can't track old versions) Tested locally with old and new Rockset python client cc @baskaryan	2023-08-14 16:27:34 -07:00
Leonid Ganeline	7810ea5812	docstrings: `chat_models` consistency (#9227 ) Updated docstrings into the consistent format.	2023-08-14 16:15:56 -07:00
William FH	b0896210c7	Return feedback with failed response if there's an error (#9223 ) In Evals	2023-08-14 15:59:16 -07:00

... 10 11 12 13 14 ...

4387 Commits