langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Bagatur	040d436b3f	Add vertex scheduled test (#10958 )	2023-09-23 15:51:59 -07:00
Piyush Jain	8602a32b7e	Fixes error with providers that don't have model_id (#10966 ) ## Description Fixes error with using the chain for providers that don't have `model_id` field. ![image](https://github.com/langchain-ai/langchain/assets/289369/a86074cf-6c99-4390-a135-b3af7a4f0827)	2023-09-23 15:34:28 -07:00
Nuno Campos	7b13292e35	Remove python eval from vector sql db chain (#10937 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-23 08:51:03 -07:00
Richard Wang	b809c243af	Fix bug in `index` api (#10614 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> - Description: a fix for `index`. - Issue: Not applicable. - Dependencies: None - Tag maintainer: - Twitter handle: richarddwang # Problem Replication code ```python from pprint import pprint from langchain.embeddings import OpenAIEmbeddings from langchain.indexes import SQLRecordManager, index from langchain.schema import Document from langchain.vectorstores import Qdrant from langchain_setup.qdrant import pprint_qdrant_documents, create_inmemory_empty_qdrant # Documents metadata1 = {"source": "fullhell.alchemist"} doc1_1 = Document(page_content="1-1 I have a dog~", metadata=metadata1) doc1_2 = Document(page_content="1-2 I have a daugter~", metadata=metadata1) doc1_3 = Document(page_content="1-3 Ahh! O..Oniichan", metadata=metadata1) doc2 = Document(page_content="2 Lancer died again.", metadata={"source": "fate.docx"}) # Create empty vectorstore collection_name = "secret_of_D_disk" vectorstore: Qdrant = create_inmemory_empty_qdrant() # Create record Manager import tempfile from pathlib import Path record_manager = SQLRecordManager( namespace="qdrant/{collection_name}", db_url=f"sqlite:///{Path(tempfile.gettempdir())/collection_name}.sql", ) record_manager.create_schema() # 必須 sync_result = index( [doc1_1, doc1_2, doc1_2, doc2], record_manager, vectorstore, cleanup="full", source_id_key="source", ) print(sync_result, end="\n\n") pprint_qdrant_documents(vectorstore) ``` <details> <summary>Code of helper functions `pprint_qdrant_documents` and `create_inmemory_empty_qdrant`</summary> ```python def create_inmemory_empty_qdrant(from_texts_kwargs): # Qdrant requires vector size, which can be only know after applying embedder vectorstore = Qdrant.from_texts(["dummy"], location=":memory:", embedding=OpenAIEmbeddings(), from_texts_kwargs) dummy_document_id = vectorstore.client.scroll(vectorstore.collection_name)[0][0].id vectorstore.delete([dummy_document_id]) return vectorstore def pprint_qdrant_documents(vectorstore, limit: int = 100, scroll_kwargs): document_ids, documents = [], [] for record in vectorstore.client.scroll( vectorstore.collection_name, limit=100, scroll_kwargs )[0]: document_ids.append(record.id) documents.append( Document( page_content=record.payload["page_content"], metadata=record.payload["metadata"] or {}, ) ) pprint_documents(documents, document_ids=document_ids) def pprint_document(document: Document = None, document_id=None, return_string=False): displayed_text = "" if document_id: displayed_text += f"Document {document_id}:\n\n" displayed_text += f"{document.page_content}\n\n" metadata_text = pformat(document.metadata, indent=1) if "\n" in metadata_text: displayed_text += f"Metadata:\n{metadata_text}" else: displayed_text += f"Metadata:{metadata_text}" if return_string: return displayed_text else: print(displayed_text) def pprint_documents(documents, document_ids=None): if not document_ids: document_ids = [i + 1 for i in range(len(documents))] displayed_texts = [] for document_id, document in zip(document_ids, documents): displayed_text = pprint_document( document_id=document_id, document=document, return_string=True ) displayed_texts.append(displayed_text) print(f"\n{'-' * 100}\n".join(displayed_texts)) ``` </details> You will get ``` {'num_added': 3, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0} Document 1b19816e-b802-53c0-ad60-5ff9d9b9b911: 1-2 I have a daugter~ Metadata:{'source': 'fullhell.alchemist'} ---------------------------------------------------------------------------------------------------- Document 3362f9bc-991a-5dd5-b465-c564786ce19c: 1-1 I have a dog~ Metadata:{'source': 'fullhell.alchemist'} ---------------------------------------------------------------------------------------------------- Document a4d50169-2fda-5339-a196-249b5f54a0de: 1-2 I have a daugter~ Metadata:{'source': 'fullhell.alchemist'} ``` This is not correct. We should be able to expect that the vectorsotre now includes doc1_1, doc1_2, and doc2, but not doc1_1, doc1_2, and doc1_2. # Reason In `index`, the original code is ```python uids = [] docs_to_index = [] for doc, hashed_doc, doc_exists in zip(doc_batch, hashed_docs, exists_batch): if doc_exists: # Must be updated to refresh timestamp. record_manager.update([hashed_doc.uid], time_at_least=index_start_dt) num_skipped += 1 continue uids.append(hashed_doc.uid) docs_to_index.append(doc) ``` In the aforementioned example, `len(doc_batch) == 4`, but `len(hashed_docs) == len(exists_batch) == 3`. This is because the deduplication of input documents [doc1_1, doc1_2, doc1_2, doc2] is [doc1_1, doc1_2, doc2]. So `index` insert doc1_1, doc1_2, doc1_2 with the uid of doc1_1, doc1_2, doc2. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-22 22:41:07 -04:00
Joshua Sundance Bailey	d67b120a41	Make anthropic_api_key a secret str (#10724 ) This PR makes `ChatAnthropic.anthropic_api_key` a `pydantic.SecretStr` to avoid inadvertently exposing API keys when the `ChatAnthropic` object is represented as a str.	2023-09-22 22:06:20 -04:00
Bagatur	1b65779905	fix integration tests (#10952 )	2023-09-22 12:04:38 -07:00
Harrison Chase	9062e36722	Harrison/agents structured (#10911 )	2023-09-22 10:21:23 -07:00
C.J. Jameson	b4d2663beb	CONTRIBUTING.md Quick Start: focus on langchain core; clarify docs and experimental are separate (#10906 ) follow up to https://github.com/langchain-ai/langchain/pull/7959 , explaining better to focus just on langchain core no dependencies twitter @cjcjameson	2023-09-22 10:17:08 -07:00
Michael Landis	f30b4697d4	fix: broken link in libs/langchain README (#10920 ) Description Fixes broken link to `CONTRIBUTING.md` in `libs/langchain/README.md`. Because`libs/langchain/README.md` was copied from the top level README, and because the README contains a link to `.github/CONTRIBUTING.md`, the copied README's link relative path must be updated. This commit fixes that link.	2023-09-22 10:14:19 -07:00
Bagatur	3cb460d5d8	bump 300 (#10940 )	2023-09-22 09:44:47 -07:00
Nuno Campos	3d5e92e3ef	Accept run name arg for non-chain runs (#10935 )	2023-09-22 08:41:25 -07:00
Nuno Campos	aac2d4dcef	In MergerRetriever async call all retrievers in parallel (#10938 )	2023-09-22 08:40:16 -07:00
German Martin	66d5a7e7cf	Add async support to multi-query retriever. (#10873 ) Added async support to the MultiQueryRetriever class. --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-09-22 08:33:20 -07:00
Leonid Kuligin	9d4b710a48	small fixes to Vertex (#10934 ) Fixed tests, updated the required version of the SDK and a few minor changes after the recent improvement (https://github.com/langchain-ai/langchain/pull/10910)	2023-09-22 08:18:09 -07:00
wo0d	4e58b78102	Fix chat_history message order (#10869 ) Not all databases uses id as default order, so add it explicitly sqlite uses rawid as default order in select statement: [https://www.sqlite.org/lang_createtable.html#rowid](https://www.sqlite.org/lang_createtable.html#rowid), but some other databases like postgresql not behaves like this. since this class supports multiple db engine. we should have an order.	2023-09-22 11:15:59 -04:00
Roman Shaptala	3d40de75c5	Fix default refine prompt template bug (#10928 ) Description: Default refine template does not actually use the refine template defined above, it uses a string with the variable name. @baskaryan, @eyurtsev, @hwchase17	2023-09-22 11:04:28 -04:00
Bagatur	cab55e9bc1	add vertex prod features (#10910 ) - chat vertex async - vertex stream - vertex full generation info - vertex use server-side stopping - model garden async - update docs for all the above in follow up will add [] chat vertex full generation info [] chat vertex retries [] scheduled tests	2023-09-22 01:44:09 -07:00
Bagatur	dccc20b402	add model feat table (#10921 )	2023-09-22 01:10:27 -07:00
William FH	ee8653f62c	Wfh/allow nonparallel (#10914 )	2023-09-21 20:21:01 -07:00
Leonid Kuligin	95e1d1fae6	fix in the docstring (#10902 ) Description: A fix in the documentation on how to use `GoogleSearchAPIWrapper`.	2023-09-21 14:30:32 -07:00
Bagatur	af41bc84e6	bump 299 (#10904 )	2023-09-21 12:56:52 -07:00
Bagatur	9a858a9107	Bagatur/arxiv kwargs (#10903 ) support all arXiv api wrapper kwargs in loader	2023-09-21 12:49:56 -07:00
niklas	e5f420d2bc	Fix typo in URL document loader example (#10585 ) - Description: Fix typo in URL document loader example - Issue: N/A - Dependencies: N/A - Tag maintainer: not urgent	2023-09-21 11:35:27 -07:00
Nuno Campos	ea26c12b23	Fix Runnable.transform() for false-y inputs (#10893 ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-21 11:27:09 -07:00
Nuno Campos	fcb5aba9f0	Add `Runnable.astream_log()` (#10374 ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-21 10:19:55 -07:00
Harrison Chase	a1ade48e8f	update agent docs (#10894 )	2023-09-21 09:09:33 -07:00
Bagatur	d37ce48e60	sep base url and loaded url in sub link extraction (#10895 )	2023-09-21 08:47:41 -07:00
Bagatur	24cb5cd379	bump 298 (#10892 )	2023-09-21 08:26:11 -07:00
Bagatur	c1f9cc0bc5	recursive loader add status check (#10891 )	2023-09-21 08:25:43 -07:00
Matvey Arye	6e02c45ca4	Add integration for Timescale Vector(Postgres) (#10650 ) Description: This commit adds a vector store for the Postgres-based vector database (`TimescaleVector`). Timescale Vector(https://www.timescale.com/ai) is PostgreSQL++ for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`: - Enhances `pgvector` with faster and more accurate similarity search on 1B+ vectors via DiskANN inspired indexing algorithm. - Enables fast time-based vector search via automatic time-based partitioning and indexing. - Provides a familiar SQL interface for querying vector embeddings and relational data. Timescale Vector scales with you from POC to production: - Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database. - Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security. - Enables a worry-free experience with enterprise-grade security and compliance. Timescale Vector is available on Timescale, the cloud PostgreSQL platform. (There is no self-hosted version at this time.) LangChain users get a 90-day free trial for Timescale Vector. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Avthar Sewrathan <avthar@timescale.com>	2023-09-21 07:33:37 -07:00
Michael Feil	55570e54e1	gradient.ai LLM intregration (#10800 ) - Description: This PR implements a new LLM API to https://gradient.ai - Issue: Feature request for LLM #10745 - Dependencies: No additional dependencies are introduced. - Tag maintainer: I am opening this PR for visibility, once ready for review I'll tag. - ```make format && make lint && make test``` is running. - added a `integration` and `mock unit` test. Co-authored-by: michaelfeil <me@michaelfeil.eu> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-21 07:29:16 -07:00
Bagatur	5097007407	cleanup recursive url session (#10863 )	2023-09-21 07:22:13 -07:00
Harrison Chase	777b33b873	fix experimental imports (#10875 )	2023-09-20 23:44:17 -07:00
Harrison Chase	808caca607	beef up agent docs (#10866 )	2023-09-20 23:09:58 -07:00
Sharath Rajasekar	96023f94d9	Add Javelin integration (#10275 ) We are introducing the py integration to Javelin AI Gateway www.getjavelin.io. Javelin is an enterprise-scale fast llm router & gateway. Could you please review and let us know if there is anything missing. Javelin AI Gateway wraps Embedding, Chat and Completion LLMs. Uses javelin_sdk under the covers (pip install javelin_sdk). Author: Sharath Rajasekar, Twitter: @sharathr, @javelinai Thanks!!	2023-09-20 16:36:39 -07:00
Bagatur	957956ba6d	bump 297 (#10861 )	2023-09-20 14:45:49 -07:00
Harrison Chase	1bc3244db9	fix loading of sql chain (#10860 ) Closing #6889	2023-09-20 14:37:49 -07:00
Bagatur	b05a74b106	fix recursive loader (#10856 )	2023-09-20 13:55:47 -07:00
Bagatur	de0a02f507	fix extract sublink bug (#10855 )	2023-09-20 13:30:42 -07:00
Harrison Chase	7dec2d399b	format intermediate steps (#10794 ) Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2023-09-20 13:02:55 -07:00
Harrison Chase	386ef1e654	add agent output parsers (#10790 )	2023-09-20 12:10:09 -07:00
Mukit Momin	67c5950df3	Amazon Bedrock Support Streaming (#10393 ) ### Description - Add support for streaming with `Bedrock` LLM and `BedrockChat` Chat Model. - Bedrock as of now supports streaming for the `anthropic.claude-` and `amazon.titan-` models only, hence support for those have been built. - Also increased the default `max_token_to_sample` for Bedrock `anthropic` model provider to `256` from `50` to keep in line with the `Anthropic` defaults. - Added examples for streaming responses to the bedrock example notebooks. _NOTE:_: This PR fixes the issues mentioned in #9897 and makes that PR redundant.	2023-09-20 11:55:38 -07:00
Bagatur	0749a642f5	Stream refac and vertex streaming (#10470 ) --------- Co-authored-by: Terry Cruz Melo <tcruz@vozy.co> Co-authored-by: Terry Cruz Melo <33166112+TerryCM@users.noreply.github.com>	2023-09-20 11:49:16 -07:00
William FH	f421af8b80	Criteria Parser Improvements (#10824 )	2023-09-20 11:18:33 -07:00
Bagatur	46aa90062b	bump exp 19 (#10851 )	2023-09-20 10:17:52 -07:00
Bagatur	775f3edffd	bump 296 (#10842 )	2023-09-20 08:31:14 -07:00
Bagatur	96a9c27116	fix recursive loader (#10752 ) maintain same base url throughout recursion, yield initial page, fixing recursion depth tracking	2023-09-20 08:16:54 -07:00
Nuno Campos	276125a33b	Use shallow copy on runnable locals (#10825 ) - deep copy prevents storing complex objects in locals	2023-09-20 08:13:06 -07:00
DanielZzz	ebe08412ad	fix: chat_models Qianfan not compatiable with SystemMessage (#10642 ) - Description: QianfanEndpoint bugs for SystemMessages. When the `SystemMessage` is input as the messages to `chat_models.QianfanEndpoint`. A `TypeError` will be raised. - Issue: #10643 - Dependencies: - Tag maintainer: @baskaryan - Twitter handle: no	2023-09-19 22:35:51 -07:00
Massimiliano Pronesti	f0198354d9	fix(embeddings): number of texts in Azure OpenAIEmbeddings batch (#10707 ) This PR addresses the limitation of Azure OpenAI embeddings, which can handle at maximum 16 texts in a batch. This can be solved setting `chunk_size=16`. However, I'd love to have this automated, not to force the user to figure where the issue comes from and how to solve it. Closes #4575. @baskaryan --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-19 21:50:39 -07:00

1 2 3 4 5 ...

980 Commits