langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-31 15:20:26 +00:00

Author	SHA1	Message	Date
Adilkhan Sarsen	7bb843477f	Removed kwargs from add_texts (#7595 ) Removing **kwargs argument from add_texts method in DeepLake vectorstore as it confuses users and doesn't fail when user is typing incorrect parameters. Also added small test to ensure the change is applies correctly. Guys could pls take a look: @rlancemartin, @eyurtsev, this is a small PR. Thx so much!	2023-07-19 09:23:49 -07:00
Bagatur	4d8b48bdb3	bump 236 (#7938 )	2023-07-19 07:51:40 -07:00
Harutaka Kawamura	f6839a8682	Add integration for MLflow AI Gateway (#7113 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> - Adds integration for MLflow AI Gateway (this will be shipped in MLflow 2.5 this week). Manual testing: ```sh # Move to mlflow repo cd /path/to/mlflow # install langchain pip install git+https://github.com/harupy/langchain.git@gateway-integration # launch gateway service mlflow gateway start --config-path examples/gateway/openai/config.yaml # Then, run the examples in this PR ```	2023-07-19 07:40:55 -07:00
David Preti	6792a3557d	Update openai.py compatibility with azure 2023-07-01-preview (#7937 ) Fixed missing "content" field in azure. Added a check for "content" in _dict (missing for azure api=2023-07-01-preview) @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-19 07:31:18 -07:00
王斌(Bin Wang)	b65102bdb2	fix: pgvector search_type of similarity_score_threshold not working (#7771 ) - Description: VectorStoreRetriever->similarity_score_threshold with search_type of "similarity_score_threshold" not working with the following two minor issues, - Issue: 1. In line 237 of `vectorstores/base.py`, "score_threshold" is passed to `_similarity_search_with_relevance_scores` as in the kwargs, while score_threshold is not a valid argument of this method. As a fix, before calling `_similarity_search_with_relevance_scores`, score_threshold is popped from kwargs. 2. In line 596 to 607 of `vectorstores/pgvector.py`, it's checking the distance_strategy against the string in Enum. However, self.distance_strategy will get the property of distance_strategy from line 316, where the callable function is passed. To solve this issue, self.distance_strategy is changed to self._distance_strategy to avoid calling the property method., - Dependencies: No, - Tag maintainer: @rlancemartin, @eyurtsev, - Twitter handle: No --------- Co-authored-by: Bin Wang <bin@arcanum.ai>	2023-07-19 07:20:52 -07:00
William FH	9d7e57f5c0	Docs Nit (#7918 )	2023-07-18 21:47:28 -07:00
Wilson Leao Neto	8bb33f2296	Exposes Kendra result item DocumentAttributes in the document metadata (#7781 ) - Description: exposes the ResultItem DocumentAttributes as document metadata with key 'document_attributes' and refactors AmazonKendraRetriever by providing a ResultItem base class in order to avoid duplicate code; - Tag maintainer: @3coins @hupe1980 @dev2049 @baskaryan - Twitter handle: wilsonleao ### Why? Some use cases depend on specific document attributes returned by the retriever in order to improve the quality of the overall completion and adjust what will be displayed to the user. For the sake of consistency, we need to expose the DocumentAttributes as document metadata so we are sure that we are using the values returned by the kendra request issued by langchain. I would appreciate your review @3coins @hupe1980 @dev2049. Thank you in advance! ### References - [Amazon Kendra DocumentAttribute](https://docs.aws.amazon.com/kendra/latest/APIReference/API_DocumentAttribute.html) - [Amazon Kendra DocumentAttributeValue](https://docs.aws.amazon.com/kendra/latest/APIReference/API_DocumentAttributeValue.html) --------- Co-authored-by: Piyush Jain <piyushjain@duck.com>	2023-07-18 18:46:38 -07:00
Wilson Leao Neto	efa67ed0ef	fix #7782 : check title and excerpt separately for page_content (#7783 ) - Description: check title and excerpt separately for page_content so that if title is empty but excerpt is present, the page_content will only contain the excerpt - Issue: #7782 - Tag maintainer: @3coins @baskaryan - Twitter handle: wilsonleao	2023-07-18 18:46:23 -07:00
Leonid Ganeline	d92926cbc2	docstrings `chains` (#7892 ) Added/updated docstrings.	2023-07-18 18:25:42 -07:00
Leonid Ganeline	4a810756f8	docstrings `chains` (#7892 ) Added/updated docstrings. @baskaryan	2023-07-18 18:25:27 -07:00
Jarek Kazmierczak	f2ef3ff54a	Google Cloud Enterprise Search retriever (#7857 ) Added a retriever that encapsulated Google Cloud Enterprise Search. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 18:24:08 -07:00
Alonso Silva Allende	1152f4d48b	Allow chat models that do not return token usage (#7907 ) - Description: It allows to use chat models that do not return token usage - Issue: [#7900](https://github.com/hwchase17/langchain/issues/7900) - Dependencies: None - Tag maintainer: @agola11 @hwchase17 - Twitter handle: @alonsosilva --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>	2023-07-18 18:12:09 -07:00
Zizhong Zhang	bdf0c2267f	docs(custom_chain) fix typo (#7898 ) Fix typo in the document of custom_chain	2023-07-18 18:03:19 -07:00
Jeff Huber	2139d0197e	upgrade chroma to 0.4.0 (#7749 ) This should land Monday the 17th Chroma is upgrading from `0.3.29` to `0.4.0`. `0.4.0` is easier to build, more durable, faster, smaller, and more extensible. This comes with a few changes: 1. A simplified and improved client setup. Instead of having to remember weird settings, users can just do `EphemeralClient`, `PersistentClient` or `HttpClient` (the underlying direct `Client` implementation is also still accessible) 2. We migrated data stores away from `duckdb` and `clickhouse`. This changes the api for the `PersistentClient` that used to reference `chroma_db_impl="duckdb+parquet"`. Now we simply set `is_persistent=true`. `is_persistent` is set for you to `true` if you use `PersistentClient`. 3. Because we migrated away from `duckdb` and `clickhouse` - this also means that users need to migrate their data into the new layout and schema. Chroma is committed to providing extension notification and tooling around any schema and data migrations (for example - this PR!). After upgrading to `0.4.0` - if users try to access their data that was stored in the previous regime, the system will throw an `Exception` and instruct them how to use the migration assistant to migrate their data. The migration assitant is a pip installable CLI: `pip install chroma_migrate`. And is runnable by calling `chroma_migrate` -- TODO ADD here is a short video demonstrating how it works. Please reference the readme at [chroma-core/chroma-migrate](https://github.com/chroma-core/chroma-migrate) to see a full write-up of our philosophy on migrations as well as more details about this particular migration. Please direct any users facing issues upgrading to our Discord channel called [#get-help](https://discord.com/channels/1073293645303795742/1129200523111841883). We have also created a [email listserv](https://airtable.com/shrHaErIs1j9F97BE) to notify developers directly in the future about breaking changes. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 17:20:54 -07:00
Gergely Papp	10246375a5	Gpapp/chromadb (#7891 ) - Description: version check to make sure chromadb >=0.4.0 does not throw an error, and uses the default sqlite persistence engine when the directory is set, - Issue: the issue #7887 For attention of - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 17:03:42 -07:00
Lance Martin	41c841ec85	Add Llama-v2 to Llama.cpp notebook (#7913 )	2023-07-18 15:13:27 -07:00
Bagatur	b9639f6067	fix docs (#7911 )	2023-07-18 14:25:45 -07:00
Jeff Huber	dc8b790214	Improve vector store onboarding exp (#6698 ) This PR - fixes the `similarity_search_by_vector` example, makes the code run and adds the example to mirror `similarity_search` - reverts back to chroma from faiss to remove sharp edges / create a happy path for new developers. (1) real metadata filtering, (2) expected functionality like `update`, `delete`, etc to serve beyond the most trivial use cases @hwchase17 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 13:48:42 -07:00
Bagatur	25a2bdfb70	add pr template instructions (#7904 )	2023-07-18 13:22:28 -07:00
Hanit	0d23c0c82a	Allowing additional params for OpenAIEmbeddings. (#7752 ) (#7654) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 12:14:51 -07:00
Lance Martin	862268175e	Add llama-v2 to docs (#7893 )	2023-07-18 12:09:09 -07:00
TRY-ER	21d1c988a9	Try er/redis index retrieval retry00 (#7773 ) Replace this comment with: - Description: Modified the code to return the document id from the redis document search as metadata. - Issue: the issue # it fixes retrieval of id as metadata as string - Tag maintainer: @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 10:49:50 -07:00
shibuiwilliam	177baef3a1	Add test for svm retriever (#7768 ) # What - This is to add unit test for svm retriever. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 09:57:24 -07:00
Filip Michalsky	69b9db2b5e	Notebook update: sales agent with tools (#7753 ) - Description: This is an update to a previously published notebook. Sales Agent now has access to tools, and this notebook shows how to use a Product Knowledge base to reduce hallucinations and act as a better sales person! - Issue: N/A - Dependencies: `chromadb openai tiktoken` - Tag maintainer: @baskaryan @hinthornw - Twitter handle: @FilipMichalsky	2023-07-18 09:53:12 -07:00
shibuiwilliam	f29a5d4bcc	add test for knn retriever (#7769 ) # What - This is to add test for knn retriever. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 09:52:11 -07:00
Orgil	75d3f1e5e6	remove unused import in voice assistant doc (#7757 ) Description: Removed unused import in voice_assistant doc. Tag maintainer: @baskaryan	2023-07-18 09:51:28 -07:00
maciej-skorupka	c6d1d6d7fc	feat: moving azure OpenAI API version to the latest 2023-05-15 (#7764 ) Moving to the latest non-preview Azure OpenAI API version=2023-05-15. The previous 2023-03-15-preview doesn't have support, SLA etc. For instance, OpenAI SDK has moved to this version https://github.com/openai/openai-python/releases/tag/v0.27.7 @baskaryan	2023-07-18 09:50:15 -07:00
satorioh	259a409998	docs(zilliz): connection_args add token description for serverless cl… (#7810 ) Description: Currently, Zilliz only support dedicated clusters using a pair of username and password for connection. Regarding serverless clusters, they can connect to them by using API keys( [ see official note detail](https://docs.zilliz.com/docs/manage-cluster-credentials)), so I add API key(token) description in Zilliz docs to make it more obvious and convenient for this group of users to better utilize Zilliz. No changes done to code. --------- Co-authored-by: Robin.Wang <3Jg$94sbQ@q1> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 09:31:39 -07:00
shibuiwilliam	235264a246	Add/test faiss (#7809 ) # What - Add missing test cases to faiss vectore stores	2023-07-18 08:30:35 -07:00
maciej-skorupka	5de7815310	docs: added comment from azure llm to azure chat about GPT-4 (#7884 ) Azure GPT-4 models can't be accessed via LLM model. It's easy to miss that and a lot of discussions about that are on the Internet. Therefore I added a comment in Azure LLM docs that mentions that and points to Azure Chat OpenAI docs. @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-18 08:05:41 -07:00
Leonid Ganeline	4a05b7f772	docstrings `prompts` (#7844 ) Added missed docstrings in `prompts` @baskaryan	2023-07-18 07:58:22 -07:00
Bill Zhang	dda11d2a05	WeaviateHybridSearchRetriever option to enable scores. (#7861 ) Description: This PR adds the option to retrieve scores and explanations in the WeaviateHybridSearchRetriever. This feature improves the usability of the retriever by allowing users to understand the scoring logic behind the search results and further refine their search queries. Issue: This PR is a solution to the issue #7855 Dependencies: This PR does not introduce any new dependencies. Tag maintainer: @rlancemartin, @eyurtsev I have included a unit test for the added feature, ensuring that it retrieves scores and explanations correctly. I have also included an example notebook demonstrating its use.	2023-07-18 07:57:17 -07:00
Leonid Ganeline	527210972e	docstrings `output_parsers` (#7859 ) Added/updated the docstrings from `output_parsers` @baskaryan	2023-07-18 07:51:44 -07:00
Jonathan Pedoeem	c460c29a64	Adding Docs for `PromptLayerCallbackHandler` (#7860 ) Here I am adding documentation for the `PromptLayerCallbackHandler`. When we created the initial PR for the callback handler the docs were causing issues, so we merged without the docs.	2023-07-18 07:51:16 -07:00
ljeagle	3902b85657	Add metadata and page_content filters of documents in AwaDB (#7862 ) 1. Add the metadata filter of documents. 2. Add the text page_content filter of documents 3. fix the bug of similarity_search_with_score Improvement and fix bug of AwaDB Fix the conflict https://github.com/hwchase17/langchain/pull/7840 @rlancemartin @eyurtsev Thanks! --------- Co-authored-by: vincent <awadb.vincent@gmail.com>	2023-07-18 07:50:17 -07:00
German Martin	f1eaa9b626	Lost in the middle: We have been ordering documents the WRONG way. (for long context) (#7520 ) Motivation, it seems that when dealing with a long context and "big" number of relevant documents we must avoid using out of the box score ordering from vector stores. See: https://arxiv.org/pdf/2306.01150.pdf So, I added an additional parameter that allows you to reorder the retrieved documents so we can work around this performance degradation. The relevance respect the original search score but accommodates the lest relevant document in the middle of the context. Extract from the paper (one image speaks 1000 tokens): ![image](https://github.com/hwchase17/langchain/assets/1821407/fafe4843-6e18-4fa6-9416-50cc1d32e811) This seems to be common to all diff arquitectures. SO I think we need a good generic way to implement this reordering and run some test in our already running retrievers. It could be that my approach is not the best one from the architecture point of view, happy to have a discussion about that. For me this was the best place to introduce the change and start retesting diff implementations. @rlancemartin, @eyurtsev --------- Co-authored-by: Lance Martin <lance@langchain.dev>	2023-07-18 07:45:15 -07:00
Bagatur	6a32f93669	add ls link (#7847 )	2023-07-18 07:39:26 -07:00
Leonid Ganeline	17956ff08e	docstrings `agents` (#7866 ) Added/Updated docstrings for `agents` @baskaryan	2023-07-18 02:23:24 -07:00
William FH	c6f2d27789	Docs Nits (#7874 ) Add links to reference docs	2023-07-18 01:50:14 -07:00
William FH	3179ee3a56	Evals docs (#7460 ) Still don't have good "how to's", and the guides / examples section could be further pruned and improved, but this PR adds a couple examples for each of the common evaluator interfaces. - [x] Example docs for each implemented evaluator - [x] "how to make a custom evalutor" notebook for each low level APIs (comparison, string, agent) - [x] Move docs to modules area - [x] Link to reference docs for more information - [X] Still need to finish the evaluation index page - ~[ ] Don't have good data generation section~ - ~[ ] Don't have good how to section for other common scenarios / FAQs like regression testing, testing over similar inputs to measure sensitivity, etc.~	2023-07-18 01:00:01 -07:00
William FH	d87564951e	LS0010 (#7871 ) Bump langsmith version. Has some additional UX improvements	2023-07-18 00:28:37 -07:00
William FH	e294ba475a	Some mitigations for RCE in PAL chain (#7870 ) Some docstring / small nits to #6003 --------- Co-authored-by: BoazWasserman <49598618+boazwasserman@users.noreply.github.com> Co-authored-by: HippoTerrific <49598618+HippoTerrific@users.noreply.github.com> Co-authored-by: Or Raz <orraz1994@gmail.com>	2023-07-17 22:58:47 -07:00
Nicolas	46330da2e7	docs: Mendable: Fixes pretty sources not working (#7863 ) This new version fixes the"Verified Sources" display that got broken. Instead of displaying the full URL, it shows the title of the page the source is from.	2023-07-17 18:23:46 -07:00
Leonid Ganeline	f5ae8f1980	docstrings `tools` (#7848 ) Added docstrings in `tools`. @baskaryan	2023-07-17 17:50:19 -07:00
Leonid Ganeline	74b701f42b	docstrings `retrievers` (#7858 ) Added/updated docstrings `retrievers` @baskaryan	2023-07-17 17:47:17 -07:00
Jasper	5b4d53e8ef	Add text_content kwarg to BrowserlessLoader (#7856 ) Added keyword argument to toggle between getting the text content of a site versus its HTML when using the `BrowserlessLoader`	2023-07-17 17:02:19 -07:00
William FH	2aa3cf4e5f	update notebook (#7852 )	2023-07-17 14:46:42 -07:00
Matt Robinson	3c489be773	feat: optional post-processing for Unstructured loaders (#7850 ) ### Summary Adds a post-processing method for Unstructured loaders that allows users to optionally modify or clean extracted elements. ### Testing ```python from langchain.document_loaders import UnstructuredFileLoader from unstructured.cleaners.core import clean_extra_whitespace loader = UnstructuredFileLoader( "./example_data/layout-parser-paper.pdf", mode="elements", post_processors=[clean_extra_whitespace], ) docs = loader.load() docs[:5] ``` ### Reviewrs - @rlancemartin - @eyurtsev - @hwchase17	2023-07-17 12:13:05 -07:00
Bagatur	2a315dbee9	fix nb (#7843 )	2023-07-17 09:39:11 -07:00
Bagatur	3f1302a4ab	bump 235 (#7836 )	2023-07-17 09:37:20 -07:00

1 2 3 4 5 ...

3249 Commits