langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-04 06:00:26 +00:00

Author	SHA1	Message	Date
Eugene Yurtsev	d19fd0cfae	LogEntry/LogStream use str instead of uuid for id (#11080 ) Cast the UUID to a string	2023-09-26 20:38:51 +01:00
Bagatur	d85339b9f2	extract sublinks exclude by abs path (#11079 )	2023-09-26 12:26:27 -07:00
Bagatur	7ee8b2d1bf	exclude dirs in async recursive loading (#11077 )	2023-09-26 09:59:04 -07:00
Leonid Ganeline	21199cc7b4	📖 docs: fixed `integrations/document loaders` toc (#9281 ) Fixed navbar: - renamed several files, so ToC is sorted correctly - made ToC items consistent: formatted several Titles - added several links - reformatted several docs to a consistent format - renamed several files (removed `_example` suffix) - added renamed files to the `docs/docs_skeleton/vercel.json`	2023-09-26 09:47:37 -07:00
Bagatur	0ea384d575	fix multiple chains lcel how to (#11074 )	2023-09-26 08:39:02 -07:00
Bagatur	12fb393a43	bump 302 (#11070 )	2023-09-26 08:13:01 -07:00
Bagatur	097ecef06b	refactor web base loader (#11057 )	2023-09-26 08:11:31 -07:00
Bagatur	487611521d	fix root import (#11072 )	2023-09-26 08:11:16 -07:00
Bagatur	a2f7246f0e	skip excluded sublinks before recursion (#11036 )	2023-09-26 02:24:54 -07:00
William FH	9c5eca92e4	Update notebook deps (#11053 )	2023-09-25 22:41:29 -07:00
William FH	448426a6ac	Add collab link (#11052 )	2023-09-25 22:35:25 -07:00
William FH	4aec587979	Update LangSmith Walkthrough (#11043 )	2023-09-25 22:32:56 -07:00
Harrison Chase	bea78b3271	make warnings more modular (#11047 )	2023-09-25 20:46:43 -07:00
Harrison Chase	c87e9fb2ce	conditional imports (#11017 )	2023-09-25 15:46:32 -07:00
Tomaz Bratanic	0625ab7a9e	Filtering graph schema for Cypher generation (#10577 ) Sometimes you don't want the LLM to be aware of the whole graph schema, and want it to ignore parts of the graph when it is constructing Cypher statements.	2023-09-25 14:14:15 -07:00
Palau	89ef440c14	Kay retriever (#10657 ) - Description: Adding retrievers for [kay.ai](https://kay.ai) and SEC filings powered by Kay and Cybersyn. Kay provides context as a service: it's an API built for RAG. - Issue: N/A - Dependencies: Just added a dep to the [kay](https://pypi.org/project/kay/) package - Tag maintainer: @baskaryan @hwchase17 Discussed in slack - Twtter handle: [@vishalrohra_](https://twitter.com/vishalrohra_) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-25 13:10:13 -07:00
Harrison Chase	5f13668fa0	Harrison/move vectorstore base (#11030 )	2023-09-25 12:44:23 -07:00
Bagatur	3eb79580c2	fix langsmith link in docs (#11027 )	2023-09-25 12:05:08 -07:00
Jacob Lee	6d072e97c8	Adds GA to docs (#11022 ) CC @baskaryan	2023-09-25 11:54:32 -07:00
Eugene Yurtsev	af5390d416	Add a batch size for cleanup (#10948 ) Add pagination to indexing cleanup to deal with large numbers of documents that need to be deleted.	2023-09-25 14:52:32 -04:00
Eugene Yurtsev	09486ed188	Update Serializable to use classmethods (#10956 )	2023-09-25 18:39:30 +01:00
Taqi Jaffri	b7290f01d8	Batching for hf_pipeline (#10795 ) The huggingface pipeline in langchain (used for locally hosted models) does not support batching. If you send in a batch of prompts, it just processes them serially using the base implementation of _generate: https://github.com/docugami/langchain/blob/master/libs/langchain/langchain/llms/base.py#L1004C2-L1004C29 This PR adds support for batching in this pipeline, so that GPUs can be fully saturated. I updated the accompanying notebook to show GPU batch inference. --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>	2023-09-25 18:23:11 +01:00
Bagatur	aa6e6db8c7	bump 301 (#11018 )	2023-09-25 08:50:47 -07:00
Nuno Campos	956ee981c0	Fix issue where requests wrapper passes auth kwarg twice (#11010 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> Closes #8842	2023-09-25 15:45:04 +01:00
Scotty	88a02076af	fix ChatMessageChunk concat error (#10174 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. --> - Description: fix `ChatMessageChunk` concat error - Issue: #10173 - Dependencies: None - Tag maintainer: @baskaryan, @eyurtsev, @rlancemartin - Twitter handle: None --------- Co-authored-by: wangshuai.scotty <wangshuai.scotty@bytedance.com> Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-09-25 11:17:11 +01:00
Massimiliano Pronesti	4322b246aa	docs: add vLLM chat notebook (#10993 ) This PR aims at showcasing how to use vLLM's OpenAI-compatible chat API. ### Context Lanchain already supports vLLM and its OpenAI-compatible `Completion` API. However, the `ChatCompletion` API was not aligned with OpenAI and for this reason I've waited for this [PR](https://github.com/vllm-project/vllm/pull/852) to be merged before adding this notebook to langchain.	2023-09-24 18:23:19 -07:00
Naveen Tatikonda	b0f21e2b50	[OpenSearch] Pass ids using from_texts and indexname in add_texts and search (#10969 ) ### Description This PR makes the following changes to OpenSearch: 1. Pass optional ids with `from_texts` 2. Pass an optional index name with `add_texts` and `search` instead of using the same index name that was used during `from_texts` ### Issue https://github.com/langchain-ai/langchain/issues/10967 ### Maintainers @rlancemartin, @eyurtsev, @navneet1v Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-09-23 16:12:51 -07:00
deanchanter	f945426874	Resolve GHI 10674 (#10977 )	2023-09-23 16:11:52 -07:00
Anar	ff732e10f8	LLMRails Embedding (#10959 ) LLMRails Embedding Integration This PR provides integration with LLMRails. Implemented here are: langchain/embeddings/llm_rails.py docs/extras/integrations/text_embedding/llm_rails.ipynb Hi @hwchase17 after adding our vectorstore integration to langchain with confirmation of you and @baskaryan, now we want to add our embedding integration --------- Co-authored-by: Anar Aliyev <aaliyev@mgmt.cloudnet.services> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-23 16:11:02 -07:00
Michael Feil	94e31647bd	Support for Gradient.ai embedding (#10968 ) Adds support for gradient.ai's embedding model. This will remain a Draft, as the code will likely be refactored with the `pip install gradientai` python sdk.	2023-09-23 16:10:23 -07:00
Bagatur	5fd13c22ad	redirect mrkl (#10979 )	2023-09-23 16:09:13 -07:00
C.J. Jameson	05d5fcfdf8	fix make-coverage local invocation #10941 (#10974 ) Fix the invocation of `make coverage` in `libs/langchain` Fixes #10941	2023-09-23 16:03:53 -07:00
Bagatur	040d436b3f	Add vertex scheduled test (#10958 )	2023-09-23 15:51:59 -07:00
Piyush Jain	8602a32b7e	Fixes error with providers that don't have model_id (#10966 ) ## Description Fixes error with using the chain for providers that don't have `model_id` field. ![image](https://github.com/langchain-ai/langchain/assets/289369/a86074cf-6c99-4390-a135-b3af7a4f0827)	2023-09-23 15:34:28 -07:00
Nuno Campos	7b13292e35	Remove python eval from vector sql db chain (#10937 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-23 08:51:03 -07:00
Richard Wang	b809c243af	Fix bug in `index` api (#10614 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> - Description: a fix for `index`. - Issue: Not applicable. - Dependencies: None - Tag maintainer: - Twitter handle: richarddwang # Problem Replication code ```python from pprint import pprint from langchain.embeddings import OpenAIEmbeddings from langchain.indexes import SQLRecordManager, index from langchain.schema import Document from langchain.vectorstores import Qdrant from langchain_setup.qdrant import pprint_qdrant_documents, create_inmemory_empty_qdrant # Documents metadata1 = {"source": "fullhell.alchemist"} doc1_1 = Document(page_content="1-1 I have a dog~", metadata=metadata1) doc1_2 = Document(page_content="1-2 I have a daugter~", metadata=metadata1) doc1_3 = Document(page_content="1-3 Ahh! O..Oniichan", metadata=metadata1) doc2 = Document(page_content="2 Lancer died again.", metadata={"source": "fate.docx"}) # Create empty vectorstore collection_name = "secret_of_D_disk" vectorstore: Qdrant = create_inmemory_empty_qdrant() # Create record Manager import tempfile from pathlib import Path record_manager = SQLRecordManager( namespace="qdrant/{collection_name}", db_url=f"sqlite:///{Path(tempfile.gettempdir())/collection_name}.sql", ) record_manager.create_schema() # 必須 sync_result = index( [doc1_1, doc1_2, doc1_2, doc2], record_manager, vectorstore, cleanup="full", source_id_key="source", ) print(sync_result, end="\n\n") pprint_qdrant_documents(vectorstore) ``` <details> <summary>Code of helper functions `pprint_qdrant_documents` and `create_inmemory_empty_qdrant`</summary> ```python def create_inmemory_empty_qdrant(from_texts_kwargs): # Qdrant requires vector size, which can be only know after applying embedder vectorstore = Qdrant.from_texts(["dummy"], location=":memory:", embedding=OpenAIEmbeddings(), from_texts_kwargs) dummy_document_id = vectorstore.client.scroll(vectorstore.collection_name)[0][0].id vectorstore.delete([dummy_document_id]) return vectorstore def pprint_qdrant_documents(vectorstore, limit: int = 100, scroll_kwargs): document_ids, documents = [], [] for record in vectorstore.client.scroll( vectorstore.collection_name, limit=100, scroll_kwargs )[0]: document_ids.append(record.id) documents.append( Document( page_content=record.payload["page_content"], metadata=record.payload["metadata"] or {}, ) ) pprint_documents(documents, document_ids=document_ids) def pprint_document(document: Document = None, document_id=None, return_string=False): displayed_text = "" if document_id: displayed_text += f"Document {document_id}:\n\n" displayed_text += f"{document.page_content}\n\n" metadata_text = pformat(document.metadata, indent=1) if "\n" in metadata_text: displayed_text += f"Metadata:\n{metadata_text}" else: displayed_text += f"Metadata:{metadata_text}" if return_string: return displayed_text else: print(displayed_text) def pprint_documents(documents, document_ids=None): if not document_ids: document_ids = [i + 1 for i in range(len(documents))] displayed_texts = [] for document_id, document in zip(document_ids, documents): displayed_text = pprint_document( document_id=document_id, document=document, return_string=True ) displayed_texts.append(displayed_text) print(f"\n{'-' * 100}\n".join(displayed_texts)) ``` </details> You will get ``` {'num_added': 3, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0} Document 1b19816e-b802-53c0-ad60-5ff9d9b9b911: 1-2 I have a daugter~ Metadata:{'source': 'fullhell.alchemist'} ---------------------------------------------------------------------------------------------------- Document 3362f9bc-991a-5dd5-b465-c564786ce19c: 1-1 I have a dog~ Metadata:{'source': 'fullhell.alchemist'} ---------------------------------------------------------------------------------------------------- Document a4d50169-2fda-5339-a196-249b5f54a0de: 1-2 I have a daugter~ Metadata:{'source': 'fullhell.alchemist'} ``` This is not correct. We should be able to expect that the vectorsotre now includes doc1_1, doc1_2, and doc2, but not doc1_1, doc1_2, and doc1_2. # Reason In `index`, the original code is ```python uids = [] docs_to_index = [] for doc, hashed_doc, doc_exists in zip(doc_batch, hashed_docs, exists_batch): if doc_exists: # Must be updated to refresh timestamp. record_manager.update([hashed_doc.uid], time_at_least=index_start_dt) num_skipped += 1 continue uids.append(hashed_doc.uid) docs_to_index.append(doc) ``` In the aforementioned example, `len(doc_batch) == 4`, but `len(hashed_docs) == len(exists_batch) == 3`. This is because the deduplication of input documents [doc1_1, doc1_2, doc1_2, doc2] is [doc1_1, doc1_2, doc2]. So `index` insert doc1_1, doc1_2, doc1_2 with the uid of doc1_1, doc1_2, doc2. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-22 22:41:07 -04:00
Joshua Sundance Bailey	d67b120a41	Make anthropic_api_key a secret str (#10724 ) This PR makes `ChatAnthropic.anthropic_api_key` a `pydantic.SecretStr` to avoid inadvertently exposing API keys when the `ChatAnthropic` object is represented as a str.	2023-09-22 22:06:20 -04:00
Bagatur	1b65779905	fix integration tests (#10952 )	2023-09-22 12:04:38 -07:00
Bagatur	6f781902ae	vercel fix (#10951 )	2023-09-22 11:31:52 -07:00
Bagatur	f0408c347f	llm feat table revision (#10947 )	2023-09-22 10:29:12 -07:00
Harrison Chase	9062e36722	Harrison/agents structured (#10911 )	2023-09-22 10:21:23 -07:00
C.J. Jameson	b4d2663beb	CONTRIBUTING.md Quick Start: focus on langchain core; clarify docs and experimental are separate (#10906 ) follow up to https://github.com/langchain-ai/langchain/pull/7959 , explaining better to focus just on langchain core no dependencies twitter @cjcjameson	2023-09-22 10:17:08 -07:00
Michael Landis	f30b4697d4	fix: broken link in libs/langchain README (#10920 ) Description Fixes broken link to `CONTRIBUTING.md` in `libs/langchain/README.md`. Because`libs/langchain/README.md` was copied from the top level README, and because the README contains a link to `.github/CONTRIBUTING.md`, the copied README's link relative path must be updated. This commit fixes that link.	2023-09-22 10:14:19 -07:00
Bagatur	3cb460d5d8	bump 300 (#10940 )	2023-09-22 09:44:47 -07:00
Bagatur	281a332784	table fix (#10944 )	2023-09-22 09:37:03 -07:00
Bagatur	5336d87c15	update feat table (#10939 )	2023-09-22 09:16:40 -07:00
Nuno Campos	3d5e92e3ef	Accept run name arg for non-chain runs (#10935 )	2023-09-22 08:41:25 -07:00
Nuno Campos	aac2d4dcef	In MergerRetriever async call all retrievers in parallel (#10938 )	2023-09-22 08:40:16 -07:00
German Martin	66d5a7e7cf	Add async support to multi-query retriever. (#10873 ) Added async support to the MultiQueryRetriever class. --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-09-22 08:33:20 -07:00
Greg Richardson	4eee789dd3	Docs: Using SupabaseVectorStore with existing documents (#10907 ) ## Description Adds additional docs on how to use `SupabaseVectorStore` with existing data in your DB (vs inserting new documents each time).	2023-09-22 08:18:56 -07:00

1 2 3 4 5 ...

4700 Commits