langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-31 15:20:26 +00:00

Author	SHA1	Message	Date
Conrad Fernandez	6eff0fa2ca	Added documentation for add_texts function for Pinecone integration (#7134 ) - Description: added some documentation to the Pinecone vector store docs page. - Issue: #7126 - Dependencies: None - Tag maintainer: @baskaryan I can add more documentation on the Pinecone integration functions as I am going to go in great depth into this area. Just wanted to check with the maintainers is if this is all good.	2023-07-05 13:11:37 -04:00
felixocker	db98c44f8f	Support for SPARQL (#7165 ) # [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) for [LangChain](https://github.com/hwchase17/langchain) ## Description LangChain support for knowledge graphs relying on W3C standards using RDFlib: SPARQL/ RDF(S)/ OWL with special focus on RDF \ * Works with local files, files from the web, and SPARQL endpoints * Supports both SELECT and UPDATE queries * Includes both a Jupyter notebook with an example and integration tests ## Contribution compared to related PRs and discussions * [Wikibase agent](https://github.com/hwchase17/langchain/pull/2690) - uses SPARQL, but specifically for wikibase querying * [Cypher qa](https://github.com/hwchase17/langchain/pull/5078) - graph DB question answering for Neo4J via Cypher * [PR 6050](https://github.com/hwchase17/langchain/pull/6050) - tries something similar, but does not cover UPDATE queries and supports only RDF * Discussions on [w3c mailing list](mailto:semantic-web@w3.org) related to the combination of LLMs (specifically ChatGPT) and knowledge graphs ## Dependencies * [RDFlib](https://github.com/RDFLib/rdflib) ## Tag maintainer Graph database related to memory -> @hwchase17	2023-07-05 13:00:16 -04:00
Prakul Agarwal	38f853dfa3	Fixed typos in MongoDB Atlas Vector Search documentation (#7174 ) Fix for typos in MongoDB Atlas Vector Search documentation <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-07-05 12:48:00 -04:00
Raouf Chebri	6fc24743b7	Add pg_hnsw vectorstore integration (#6893 ) Hi @rlancemartin, @eyurtsev! - Description: Adding HNSW extension support for Postgres. Similar to pgvector vectorstore, with 3 differences 1. it uses HNSW extension for exact and ANN searches, 2. Vectors are of type array of real 3. Only supports L2 - Dependencies: [HNSW](https://github.com/knizhnik/hnsw) extension for Postgres - Example: ```python db = HNSWVectoreStore.from_documents( embedding=embeddings, documents=docs, collection_name=collection_name, connection_string=connection_string ) query = "What did the president say about Ketanji Brown Jackson" docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query) ``` The example notebook is in the PR too.	2023-07-05 08:10:10 -07:00
Simon Cheung	81eebc4070	Add HugeGraphQAChain to support gremlin generating chain (#7132 ) [Apache HugeGraph](https://github.com/apache/incubator-hugegraph) is a convenient, efficient, and adaptable graph database, compatible with the Apache TinkerPop3 framework and the Gremlin query language. In this PR, the HugeGraph and HugeGraphQAChain provide the same functionality as the existing integration with Neo4j and enables query generation and question answering over HugeGraph database. The difference is that the graph query language supported by HugeGraph is not cypher but another very popular graph query language [Gremlin](https://tinkerpop.apache.org/gremlin.html). A notebook example and a simple test case have also been added. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-04 10:21:21 -06:00
Saverio Proto	5585607654	Improve Bing Search example (#7128 ) # Description Improve Bing Search example:	2023-07-04 09:58:03 -06:00
Lance Martin	265c285057	Fix GPT4All bug w/ "n_ctx" param (#7093 ) Running `GPT4All` per the [docs](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/gpt4all), I see: ``` $ from langchain.llms import GPT4All $ model = GPT4All(model=local_path) $ model("The capital of France is ", max_tokens=10) TypeError: generate() got an unexpected keyword argument 'n_ctx' ``` It appears `n_ctx` is [no longer a supported param](https://docs.gpt4all.io/gpt4all_python.html#gpt4all.gpt4all.GPT4All.generate) in the GPT4All API from https://github.com/nomic-ai/gpt4all/pull/1090. It now uses `max_tokens`, so I set this. And I also set other defaults used in GPT4All client [here](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-bindings/python/gpt4all/gpt4all.py). Confirm it now works: ``` $ from langchain.llms import GPT4All $ model = GPT4All(model=local_path) $ model("The capital of France is ", max_tokens=10) < Model logging > "....Paris." ``` --------- Co-authored-by: R. Lance Martin <rlm@Rs-MacBook-Pro.local>	2023-07-04 08:53:52 -07:00
Stefano Lottini	6631fd5168	Align cassio versions between examples for Cassandra integration (#7099 ) Just reducing confusion by requiring cassio>=0.0.7 consistently across examples.	2023-07-04 04:21:48 -06:00
Lance Martin	9ca4c54428	Minor updates to notebook for MultiQueryRetriever (#7102 ) * Add an easier-to-run example. * Add logging per https://github.com/hwchase17/langchain/pull/6891. * Updated params per https://github.com/hwchase17/langchain/pull/5962. --------- Co-authored-by: R. Lance Martin <rlm@Rs-MacBook-Pro.local> Co-authored-by: Lance Martin <lance@langchain.dev>	2023-07-03 17:32:50 -07:00
genewoo	e49abd1277	Add Metal support to llama.cpp doc (#7092 ) - Description: Add Metal support to llama.cpp doc - Issue: #7091 - Dependencies: N/A - Twitter handle: gene_wu	2023-07-03 13:35:39 -06:00
rjarun8	e2d61ab85a	Add SpacyEmbeddings class (#6967 ) - Description: Added a new SpacyEmbeddings class for generating embeddings using the Spacy library. - Issue: Sentencebert/Bert/Spacy/Doc2vec embedding support #6952 - Dependencies: This change requires the Spacy library and the 'en_core_web_sm' Spacy model. - Tag maintainer: @dev2049 - Twitter handle: N/A This change includes a new SpacyEmbeddings class, but does not include a test or an example notebook. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-03 09:38:31 -06:00
adam91holt	80e86b602e	Remove duplicate mongodb integration doc (#7006 )	2023-07-03 02:23:33 -06:00
Johnny Lim	a081e419a0	Fix sample in FAISS section (#7050 ) This PR fixes a sample in the FAISS section in the reference docs.	2023-07-03 02:18:32 -06:00
Leonid Ganeline	200be43da6	added `Brave Search` document_loader (#6989 ) - Added `Brave Search` document loader. - Refactored BraveSearch wrapper - Added a Jupyter Notebook example - Added `Ecosystem/Integrations` BraveSearch page Please review: - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev	2023-07-02 19:01:24 -07:00
Sergey Kozlov	6d15854cda	Add JSON Lines support to JSONLoader (#6913 ) Description: The JSON Lines format is used by some services such as OpenAI and HuggingFace. It's also a convenient alternative to CSV. This PR adds JSON Lines support to `JSONLoader` and also updates related tests. Tag maintainer: @rlancemartin, @eyurtsev. PS I was not able to build docs locally so didn't update related section.	2023-07-02 12:32:41 -07:00
Ofer Mendelevitch	153b56d19b	Vectara upd2 (#6506 ) Update to Vectara integration - By user request added "add_files" to take advantage of Vectara capabilities to process files on the backend, without the need for separate loading of documents and chunking in the chain. - Updated vectara.ipynb example notebook to be broader and added testing of add_file() @hwchase17 - project lead --------- Co-authored-by: rlm <pexpresss31@gmail.com>	2023-07-02 12:15:50 -07:00
Leonid Ganeline	77ae8084a0	docstrings `document_loaders` 1 (#6847 ) - Updated docstrings in `document_loaders` - several code fixes. - added `docs/extras/ecosystem/integrations/airtable.md` @rlancemartin, @eyurtsev	2023-07-02 12:13:04 -07:00
Stefano Lottini	8d2281a8ca	Second Attempt - Add concurrent insertion of vector rows in the Cassandra Vector Store (#7017 ) Retrying with the same improvements as in #6772, this time trying not to mess up with branches. @rlancemartin doing a fresh new PR from a branch with a new name. This should do. Thank you for your help! --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> Co-authored-by: rlm <pexpresss31@gmail.com>	2023-07-01 11:09:52 -07:00
Matt Robinson	0498dad562	feat: enable `UnstructuredEmailLoader` to process attachments (#6977 ) ### Summary Updates `UnstructuredEmailLoader` so that it can process attachments in addition to the e-mail content. The loader will process attachments if the `process_attachments` kwarg is passed when the loader is instantiated. ### Testing ```python file_path = "fake-email-attachment.eml" loader = UnstructuredEmailLoader( file_path, mode="elements", process_attachments=True ) docs = loader.load() docs[-1] ``` ### Reviewers - @rlancemartin - @eyurtsev - @hwchase17	2023-07-01 06:09:26 -07:00
Zander Chase	b0859c9b18	Add New Retriever Interface with Callbacks (#5962 ) Handle the new retriever events in a way that (I think) is entirely backwards compatible? Needs more testing for some of the chain changes and all. This creates an entire new run type, however. We could also just treat this as an event within a chain run presumably (same with memory) Adds a subclass initializer that upgrades old retriever implementations to the new schema, along with tests to ensure they work. First commit doesn't upgrade any of our retriever implementations (to show that we can pass the tests along with additional ones testing the upgrade logic). Second commit upgrades the known universe of retrievers in langchain. - [X] Add callback handling methods for retriever start/end/error (open to renaming to 'retrieval' if you want that) - [X] Update BaseRetriever schema to support callbacks - [X] Tests for upgrading old "v1" retrievers for backwards compatibility - [X] Update existing retriever implementations to implement the new interface - [X] Update calls within chains to .{a]get_relevant_documents to pass the child callback manager - [X] Update the notebooks/docs to reflect the new interface - [X] Test notebooks thoroughly Not handled: - Memory pass throughs: retrieval memory doesn't have a parent callback manager passed through the method --------- Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>	2023-06-30 14:44:03 -07:00
William FH	a5b206caf3	Remove Promptlayer Notebook (#6996 ) It's breaking our docs build	2023-06-30 14:30:24 -07:00
Daniel Chalef	b26cca8008	Zep Authentication (#6728 ) ## Description: Add Zep API Key argument to ZepChatMessageHistory and ZepRetriever - correct docs site links - add zep api_key auth to constructors ZepChatMessageHistory: @hwchase17, ZepRetriever: @rlancemartin, @eyurtsev	2023-06-30 14:24:26 -07:00
William FH	64039b9f11	Promptlayer Callback (#6975 ) Co-authored-by: Saleh Hindi <saleh.hindi.one@gmail.com> Co-authored-by: jped <jonathanped@gmail.com>	2023-06-30 08:32:42 -07:00
Davis Chase	f780678910	Add back in clickhouse mongo vecstore notebooks (#6949 )	2023-06-29 19:21:47 -07:00
Kacper Łukawski	140ba682f1	Support named vectors in Qdrant (#6871 ) # Description This PR makes it possible to use named vectors from Qdrant in Langchain. That was requested multiple times, as people want to reuse externally created collections in Langchain. It doesn't change anything for the existing applications. The changes were covered with some integration tests and included in the docs. ## Example ```python Qdrant.from_documents( docs, embeddings, location=":memory:", collection_name="my_documents", vector_name="custom_vector", ) ``` ### Issue: #2594 Tagging @rlancemartin & @eyurtsev. I'd appreciate your review.	2023-06-29 15:14:22 -07:00
corranmac	20c6ade2fc	Grobid parser for Scientific Articles from PDF (#6729 ) ### Scientific Article PDF Parsing via Grobid `Description:` This change adds the GrobidParser class, which uses the Grobid library to parse scientific articles into a universal XML format containing the article title, references, sections, section text etc. The GrobidParser uses a local Grobid server to return PDFs document as XML and parses the XML to optionally produce documents of individual sentences or of whole paragraphs. Metadata includes the text, paragraph number, pdf relative bboxes, pages (text may overlap over two pages), section title (Introduction, Methodology etc), section_number (i.e 1.1, 2.3), the title of the paper and finally the file path. Grobid parsing is useful beyond standard pdf parsing as it accurately outputs sections and paragraphs within them. This allows for post-fitering of results for specific sections i.e. limiting results to the methodology section or results. While sections are split via headings, ideally they could be classified specifically into introduction, methodology, results, discussion, conclusion. I'm currently experimenting with chatgpt-3.5 for this function, which could later be implemented as a textsplitter. `Dependencies:` For use, the grobid repo must be cloned and Java must be installed, for colab this is: ``` !apt-get install -y openjdk-11-jdk -q !update-alternatives --set java /usr/lib/jvm/java-11-openjdk-amd64/bin/java !git clone https://github.com/kermitt2/grobid.git os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64" os.chdir('grobid') !./gradlew clean install ``` Once installed the server is ran on localhost:8070 via ``` get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &') ``` @rlancemartin, @eyurtsev Twitter Handle: @Corranmac Grobid Demo Notebook is [here](https://colab.research.google.com/drive/1X-St_mQRmmm8YWtct_tcJNtoktbdGBmd?usp=sharing). --------- Co-authored-by: rlm <pexpresss31@gmail.com>	2023-06-29 14:29:29 -07:00
Harrison Chase	0ba175e13f	move octo notebook (#6901 )	2023-06-29 12:20:55 -07:00
Stefano Lottini	75fb9d2fdc	Cassandra support for chat history using CassIO library (#6771 ) ### Overview This PR aims at building on #4378, expanding the capabilities and building on top of the `cassIO` library to interface with the database (as opposed to using the core drivers directly). Usage of `cassIO` (a library abstracting Cassandra access for ML/GenAI-specific purposes) is already established since #6426 was merged, so no new dependencies are introduced. In the same spirit, we try to uniform the interface for using Cassandra instances throughout LangChain: all our appreciation of the work by @jj701 notwithstanding, who paved the way for this incremental work (thank you!), we identified a few reasons for changing the way a `CassandraChatMessageHistory` is instantiated. Advocating a syntax change is something we don't take lighthearted way, so we add some explanations about this below. Additionally, this PR expands on integration testing, enables use of Cassandra's native Time-to-Live (TTL) features and improves the phrasing around the notebook example and the short "integrations" documentation paragraph. We would kindly request @hwchase to review (since this is an elaboration and proposed improvement of #4378 who had the same reviewer). ### About the __init__ breaking changes There are [many](https://docs.datastax.com/en/developer/python-driver/3.28/api/cassandra/cluster/) options when creating the `Cluster` object, and new ones might be added at any time. Choosing some of them and exposing them as `__init__` parameters `CassandraChatMessageHistory` will prove to be insufficient for at least some users. On the other hand, working through `kwargs` or adding a long, long list of arguments to `__init__` is not a desirable option either. For this reason, (as done in #6426), we propose that whoever instantiates the Chat Message History class provide a Cassandra `Session` object, ready to use. This also enables easier injection of mocks and usage of Cassandra-compatible connections (such as those to the cloud database DataStax Astra DB, obtained with a different set of init parameters than `contact_points` and `port`). We feel that a breaking change might still be acceptable since LangChain is at `0.*`. However, while maintaining that the approach we propose will be more flexible in the future, room could be made for a "compatibility layer" that respects the current init method. Honestly, we would to that only if there are strong reasons for it, as that would entail an additional maintenance burden. ### Other changes We propose to remove the keyspace creation from the class code for two reasons: first, production Cassandra instances often employ RBAC so that the database user reading/writing from tables does not necessarily (and generally shouldn't) have permission to create keyspaces, and second that programmatic keyspace creation is not a best practice (it should be done more or less manually, with extra care about schema mismatched among nodes, etc). Removing this (usually unnecessary) operation from the `__init__` path would also improve initialization performance (shorter time). We suggest, likewise, to remove the `__del__` method (which would close the database connection), for the following reason: it is the recommended best practice to create a single Cassandra `Session` object throughout an application (it is a resource-heavy object capable to handle concurrency internally), so in case Cassandra is used in other ways by the app there is the risk of truncating the connection for all usages when the history instance is destroyed. Moreover, the `Session` object, in typical applications, is best left to garbage-collect itself automatically. As mentioned above, we defer the actual database I/O to the `cassIO` library, which is designed to encode practices optimized for LLM applications (among other) without the need to expose LangChain developers to the internals of CQL (Cassandra Query Language). CassIO is already employed by the LangChain's Vector Store support for Cassandra. We added a few more connection options in the companion notebook example (most notably, Astra DB) to encourage usage by anyone who cannot run their own Cassandra cluster. We surface the `ttl_seconds` option for automatic handling of an expiration time to chat history messages, a likely useful feature given that very old messages generally may lose their importance. We elaborated a bit more on the integration testing (Time-to-live, separation of "session ids", ...). ### Remarks from linter & co. We reinstated `cassio` as a dependency both in the "optional" group and in the "integration testing" group of `pyproject.toml`. This might not be the right thing do to, in which case the author of this PR offer his apologies (lack of confidence with Poetry - happy to be pointed in the right direction, though!). During linter tests, we were hit by some errors which appear unrelated to the code in the PR. We left them here and report on them here for awareness: ``` langchain/vectorstores/mongodb_atlas.py:137: error: Argument 1 to "insert_many" of "Collection" has incompatible type "List[Dict[str, Sequence[object]]]"; expected "Iterable[Union[MongoDBDocumentType, RawBSONDocument]]" [arg-type] langchain/vectorstores/mongodb_atlas.py:186: error: Argument 1 to "aggregate" of "Collection" has incompatible type "List[object]"; expected "Sequence[Mapping[str, Any]]" [arg-type] langchain/vectorstores/qdrant.py:16: error: Name "grpc" is not defined [name-defined] langchain/vectorstores/qdrant.py:19: error: Name "grpc" is not defined [name-defined] langchain/vectorstores/qdrant.py:20: error: Name "grpc" is not defined [name-defined] langchain/vectorstores/qdrant.py:22: error: Name "grpc" is not defined [name-defined] langchain/vectorstores/qdrant.py:23: error: Name "grpc" is not defined [name-defined] ``` In the same spirit, we observe that to even get `import langchain` run, it seems that a `pip install bs4` is missing from the minimal package installation path. Thank you!	2023-06-29 10:50:34 -07:00
Shashank Deshpande	99cfe192da	added example notebook - use custom functions with openai agent (#6865 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @dev2049 - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @dev2049 - Memory: @hwchase17 - Agents / Tools / Toolkits: @vowelparrot - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-06-28 22:07:33 -07:00
Robert Lewis	c9c8d2599e	Update Zapier Jupyter notebook to include brief OAuth example (#6892 ) Description: Adds a brief example of using an OAuth access token with the Zapier wrapper. Also links to the Zapier documentation to learn more about OAuth flows. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-28 18:06:22 -07:00
Yaohui Wang	9d1bd18596	feat (documents): add LarkSuite document loader (#6420 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> ### Summary This PR adds a LarkSuite (FeiShu) document loader. > [LarkSuite](https://www.larksuite.com/) is an enterprise collaboration platform developed by ByteDance. ### Tests - an integration test case is added - an example notebook showing usage is added. [Notebook preview](https://github.com/yaohui-wyh/langchain/blob/master/docs/extras/modules/data_connection/document_loaders/integrations/larksuite.ipynb) <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> ### Who can review? - PTAL @eyurtsev @hwchase17 <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @hwchase17 VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: Yaohui Wang <wangyaohui.01@bytedance.com>	2023-06-27 23:08:05 -07:00
Jingsong Gao	a435a436c1	feat(document_loaders): add tencent cos directory and file loader (#6401 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> - add tencent cos directory and file support for document-loader #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? @eyurtsev	2023-06-27 23:07:20 -07:00
Lance Martin	3f9900a864	Create MultiQueryRetriever (#6833 ) Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on "distance". But, retrieval may produce difference results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious. The `MultiQueryRetriever` automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the `MultiQueryRetriever` might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-27 22:59:40 -07:00
Tim Asp	3ca1a387c2	Web Loader: Add proxy support (#6792 ) Proxies are helpful, especially when you start querying against more anti-bot websites. [Proxy services](https://developers.oxylabs.io/advanced-proxy-solutions/web-unblocker/making-requests) (of which there are many) and `requests` make it easy to rotate IPs to prevent banning by just passing along a simple dict to `requests`. CC @rlancemartin, @eyurtsev	2023-06-27 22:27:49 -07:00
Matt Robinson	dd2a151543	Docs/unstructured api key (#6781 ) ### Summary The Unstructured API will soon begin requiring API keys. This PR updates the Unstructured integrations docs with instructions on how to generate Unstructured API keys. ### Reviewers @rlancemartin @eyurtsev @hwchase17	2023-06-27 16:54:15 -07:00
Matt Robinson	b24472eae3	feat: Add `UnstructuredOrgModeLoader` (#6842 ) ### Summary Adds `UnstructuredOrgModeLoader` for processing [Org-mode](https://en.wikipedia.org/wiki/Org-mode) documents. ### Testing ```python from langchain.document_loaders import UnstructuredOrgModeLoader loader = UnstructuredOrgModeLoader( file_path="example_data/README.org", mode="elements" ) docs = loader.load() print(docs[0]) ``` ### Reviewers - @rlancemartin - @eyurtsev - @hwchase17	2023-06-27 16:34:17 -07:00
Cristóbal Carnero Liñán	e494b0a09f	feat (documents): add a source code loader based on AST manipulation (#6486 ) #### Summary A new approach to loading source code is implemented: Each top-level function and class in the code is loaded into separate documents. Then, an additional document is created with the top-level code, but without the already loaded functions and classes. This could improve the accuracy of QA chains over source code. For instance, having this script: ``` class MyClass: def __init__(self, name): self.name = name def greet(self): print(f"Hello, {self.name}!") def main(): name = input("Enter your name: ") obj = MyClass(name) obj.greet() if __name__ == '__main__': main() ``` The loader will create three documents with this content: First document: ``` class MyClass: def __init__(self, name): self.name = name def greet(self): print(f"Hello, {self.name}!") ``` Second document: ``` def main(): name = input("Enter your name: ") obj = MyClass(name) obj.greet() ``` Third document: ``` # Code for: class MyClass: # Code for: def main(): if __name__ == '__main__': main() ``` A threshold parameter is added to control whether small scripts are split in this way or not. At this moment, only Python and JavaScript are supported. The appropriate parser is determined by examining the file extension. #### Tests This PR adds: - Unit tests - Integration tests #### Dependencies Only one dependency was added as optional (needed for the JavaScript parser). #### Documentation A notebook is added showing how the loader can be used. #### Who can review? @eyurtsev @hwchase17 --------- Co-authored-by: rlm <pexpresss31@gmail.com>	2023-06-27 15:58:47 -07:00
Robert Lewis	da462d9dd4	Zapier update oauth support (#6780 ) Description: Update documentation to 1) point to updated documentation links at Zapier.com (we've revamped our help docs and paths), and 2) To provide clarity how to use the wrapper with an access token for OAuth support Demo: Initializing the Zapier Wrapper with an OAuth Access Token `ZapierNLAWrapper(zapier_nla_oauth_access_token="<redacted>")` Using LangChain to resolve the current weather in Vancouver BC leveraging Zapier NLA to lookup weather by coords. ``` > Entering new chain... I need to use a tool to get the current weather. Action: The Weather: Get Current Weather Action Input: Get the current weather for Vancouver BC Observation: {"coord__lon": -123.1207, "coord__lat": 49.2827, "weather": [{"id": 802, "main": "Clouds", "description": "scattered clouds", "icon": "03d", "icon_url": "http://openweathermap.org/img/wn/03d@2x.png"}], "weather[]icon_url": ["http://openweathermap.org/img/wn/03d@2x.png"], "weather[]icon": ["03d"], "weather[]id": [802], "weather[]description": ["scattered clouds"], "weather[]main": ["Clouds"], "base": "stations", "main__temp": 71.69, "main__feels_like": 71.56, "main__temp_min": 67.64, "main__temp_max": 76.39, "main__pressure": 1015, "main__humidity": 64, "visibility": 10000, "wind__speed": 3, "wind__deg": 155, "wind__gust": 11.01, "clouds__all": 41, "dt": 1687806607, "sys__type": 2, "sys__id": 2011597, "sys__country": "CA", "sys__sunrise": 1687781297, "sys__sunset": 1687839730, "timezone": -25200, "id": 6173331, "name": "Vancouver", "cod": 200, "summary": "scattered clouds", "_zap_search_was_found_status": true} Thought: I now know the current weather in Vancouver BC. Final Answer: The current weather in Vancouver BC is scattered clouds with a temperature of 71.69 and wind speed of 3 ```	2023-06-27 11:46:32 -07:00
Joshua Carroll	24e4ae95ba	Initial Streamlit callback integration doc (md) (#6788 ) Description: Add a documentation page for the Streamlit Callback Handler integration (#6315) Notes: - Implemented as a markdown file instead of a notebook since example code runs in a Streamlit app (happy to discuss / consider alternatives now or later) - Contains an embedded Streamlit app -> https://mrkl-minimal.streamlit.app/ Currently this app is hosted out of a Streamlit repo but we're working to migrate the code to a LangChain owned repo ![streamlit_docs](https://github.com/hwchase17/langchain/assets/116604821/0b7a6239-361f-470c-8539-f22c40098d1a) cc @dev2049 @tconkling	2023-06-27 11:43:49 -07:00
WaseemH	7ac9b22886	`RecusiveUrlLoader` to `RecursiveUrlLoader` (#6787 )	2023-06-26 23:12:14 -07:00
Leonid Ganeline	49c864fa18	docs: vectorstore upgrades 2 (#6796 ) updated vectorstores/ notebooks; added new integrations into ecosystem/integrations/ @dev2049 @rlancemartin, @eyurtsev	2023-06-26 22:55:04 -07:00
Chris Pappalardo	70f7c2bb2e	align chroma vectorstore get with chromadb to enable where filtering (#6686 ) allows for where filtering on collection via get - Description: aligns langchain chroma vectorstore get with underlying [chromadb collection get](https://github.com/chroma-core/chroma/blob/main/chromadb/api/models/Collection.py#L103) allowing for where filtering, etc. - Issue: NA - Dependencies: none - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: @pappanaka	2023-06-26 10:51:20 -07:00
Santiago Delgado	d84a3bcf7a	Office365 Tool (#6306 ) #### Background With the development of [structured tools](https://blog.langchain.dev/structured-tools/), the LangChain team expanded the platform's functionality to meet the needs of new applications. The GMail tool, empowered by structured tools, now supports multiple arguments and powerful search capabilities, demonstrating LangChain's ability to interact with dynamic data sources like email servers. #### Challenge The current GMail tool only supports GMail, while users often utilize other email services like Outlook in Office365. Additionally, the proposed calendar tool in PR https://github.com/hwchase17/langchain/pull/652 only works with Google Calendar, not Outlook. #### Changes This PR implements an Office365 integration for LangChain, enabling seamless email and calendar functionality with a single authentication process. #### Future Work With the core Office365 integration complete, future work could include integrating other Office365 tools such as Tasks and Address Book. #### Who can review? @hwchase17 or @vowelparrot can review this PR #### Appendix @janscas, I utilized your [O365](https://github.com/O365/python-o365) library extensively. Given the rising popularity of LangChain and similar AI frameworks, the convergence of libraries like O365 and tools like this one is likely. So, I wanted to keep you updated on our progress. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-26 02:59:09 -07:00
Pau Ramon Revilla	87802c86d9	Added a MHTML document loader (#6311 ) MHTML is a very interesting format since it's used both for emails but also for archived webpages. Some scraping projects want to store pages in disk to process them later, mhtml is perfect for that use case. This is heavily inspired from the beautifulsoup html loader, but extracting the html part from the mhtml file. --------- Co-authored-by: rlm <pexpresss31@gmail.com>	2023-06-25 13:12:08 -07:00
Matt Robinson	be68f6f8ce	feat: Add `UnstructuredRSTLoader` (#6594 ) ### Summary Adds an `UnstructuredRSTLoader` for loading [reStructuredText](https://en.wikipedia.org/wiki/ReStructuredText) file. ### Testing ```python from langchain.document_loaders import UnstructuredRSTLoader loader = UnstructuredRSTLoader( file_path="example_data/README.rst", mode="elements" ) docs = loader.load() print(docs[0]) ``` ### Reviewers - @hwchase17 - @rlancemartin - @eyurtsev	2023-06-25 12:41:57 -07:00
Baichuan Sun	9fbe346860	Amazon API Gateway hosted LLM (#6673 ) This PR adds a new LLM class for the Amazon API Gateway hosted LLM. The PR also includes example notebooks for using the LLM class in an Agent chain. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-23 21:27:25 -07:00
Davis Chase	f1e1ac2a01	chroma nb close img tag (#6669 )	2023-06-23 15:41:54 -07:00
Davis Chase	5e5b30b74f	openapi -> openai nit (#6667 )	2023-06-23 15:09:02 -07:00
Jeff Huber	2acf109c4b	update chroma notebook (#6664 ) @rlancemartin I updated the notebook for Chroma to hopefully be a lot easier for users.	2023-06-23 15:03:06 -07:00
Piyush Jain	b1de927f1b	Kendra retriever api (#6616 ) ## Description Replaces [Kendra Retriever](https://github.com/hwchase17/langchain/blob/master/langchain/retrievers/aws_kendra_index_retriever.py) with an updated version that uses the new [retriever API](https://docs.aws.amazon.com/kendra/latest/dg/searching-retrieve.html) which is better suited for retrieval augmented generation (RAG) systems. Note: This change requires the latest version (1.26.159) of boto3 to work. `pip install -U boto3` to upgrade the boto3 version. cc @hupe1980 cc @dev2049	2023-06-23 14:59:35 -07:00
Ikko Eltociear Ashimine	73da193a4b	Fix typo in myscale_self_query.ipynb (#6601 )	2023-06-23 14:57:12 -07:00
Lance Martin	c2b25c17c5	Recursive URL loader (#6455 ) We may want to process load all URLs under a root directory. For example, let's look at the [LangChain JS documentation](https://js.langchain.com/docs/). This has many interesting child pages that we may want to read in bulk. Of course, the `WebBaseLoader` can load a list of pages. But, the challenge is traversing the tree of child pages and actually assembling that list! We do this using the `RecusiveUrlLoader`. This also gives us the flexibility to exclude some children (e.g., the `api` directory with > 800 child pages).	2023-06-23 13:09:00 -07:00
Lance Martin	393f469eb3	Create merge loader that combines documents from a set of loaders (#6659 ) Simple utility loader that combines documents from a set of specified loaders.	2023-06-23 13:02:48 -07:00
Davis Chase	6988039975	openapi_openai docstring (#6661 )	2023-06-23 11:38:33 -07:00
Davis Chase	e013459b18	Openapi to openai (#6658 )	2023-06-23 11:00:34 -07:00
Lance Martin	6e69bfbb28	Loader for OpenCityData and minor cleanups to Pandas, Airtable loaders (#6301 ) Many cities have open data portals for events like crime, traffic, etc. Socrata provides an API for many, including SF (e.g., see [here](https://dev.socrata.com/foundry/data.sfgov.org/tmnf-yvry)). This is a new data loader for city data that uses Socrata API.	2023-06-22 22:20:42 -07:00
Christoph Kahl	9d42621fa4	added redis method to delete entries by keys (#6222 ) In addition to my last pr (return keys of added entries), we also need a method to delete the entries by keys. @dev2049	2023-06-22 13:26:47 -07:00
Harrison Chase	a9108c1809	add mongo (HOLD) (#6437 ) do not merge in	2023-06-22 11:08:12 -07:00
Lance Martin	30f7288082	MD header text splitter returns Documents (#6571 ) Return `Documents` from MD header text splitter to simplify UX. Updates the test as well as example notebooks.	2023-06-22 09:25:38 -07:00
minhajul-clarifai	6e57306a13	Clarifai integration (#5954 ) # Changes This PR adds [Clarifai](https://www.clarifai.com/) integration to Langchain. Clarifai is an end-to-end AI Platform. Clarifai offers user the ability to use many types of LLM (OpenAI, cohere, ect and other open source models). As well, a clarifai app can be treated as a vector database to upload and retrieve data. The integrations includes: - Clarifai LLM integration: Clarifai supports many types of language model that users can utilize for their application - Clarifai VectorDB: A Clarifai application can hold data and embeddings. You can run semantic search with the embeddings #### Before submitting - [x] Added integration test for LLM - [x] Added integration test for VectorDB - [x] Added notebook for LLM - [x] Added notebook for VectorDB Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-22 08:00:15 -07:00
Jeroen Van Goey	7f6f5c2a6a	Add missing word in comment (#6587 ) Changed ``` # Do this so we can exactly what's going on under the hood ``` to ``` # Do this so we can see exactly what's going on under the hood ```	2023-06-22 07:54:28 -07:00
Davis Chase	d50de2728f	Add AzureML endpoint LLM wrapper (#6580 ) ### Description We have added a new LLM integration `azureml_endpoint` that allows users to leverage models from the AzureML platform. Microsoft recently announced the release of [Azure Foundation Models](https://learn.microsoft.com/en-us/azure/machine-learning/concept-foundation-models?view=azureml-api-2) which users can find in the AzureML Model Catalog. The Model Catalog contains a variety of open source and Hugging Face models that users can deploy on AzureML. The `azureml_endpoint` allows LangChain users to use the deployed Azure Foundation Models. ### Dependencies No added dependencies were required for the change. ### Tests Integration tests were added in `tests/integration_tests/llms/test_azureml_endpoint.py`. ### Notebook A Jupyter notebook demonstrating how to use `azureml_endpoint` was added to `docs/modules/llms/integrations/azureml_endpoint_example.ipynb`. ### Twitters [Prakhar Gupta](https://twitter.com/prakhar_in) [Matthew DeGuzman](https://twitter.com/matthew_d13) --------- Co-authored-by: Matthew DeGuzman <91019033+matthewdeguzman@users.noreply.github.com> Co-authored-by: prakharg-msft <75808410+prakharg-msft@users.noreply.github.com>	2023-06-22 01:46:01 -07:00
Davis Chase	4fabd02d25	Add OpenLLM wrapper(#6578 ) LLM wrapper for models served with OpenLLM --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Chaoyu <paranoyang@gmail.com>	2023-06-22 01:18:14 -07:00
Muhammad Vaid	ae81b96b60	Detailed using the Twilio tool to send messages with 3rd party apps incl. WhatsApp (#6562 ) Everything needed to support sending messages over WhatsApp Business Platform (GA), Facebook Messenger (Public Beta) and Google Business Messages (Private Beta) was present. Just added some details on leveraging it.	2023-06-21 19:26:50 -07:00
Andrey E. Vedishchev	a2a0715bd4	Minor Grammar Fixes in Docs and Comments (#6536 ) Just some grammar fixes: I found "retriver" instead of "retriever" in several comments across the documentation and in the comments. I fixed it. Co-authored-by: andrey.vedishchev <andrey.vedishchev@rgigroup.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-21 09:53:31 -07:00
dirtysalt	57cc3d1d3d	[Feature][VectorStore] Support StarRocks as vector db (#6119 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> Here are some examples to use StarRocks as vectordb ``` from langchain.vectorstores import StarRocks from langchain.vectorstores.starrocks import StarRocksSettings embeddings = OpenAIEmbeddings() # conifgure starrocks settings settings = StarRocksSettings() settings.port = 41003 settings.host = '127.0.0.1' settings.username = 'root' settings.password = '' settings.database = 'zya' # to fill new embeddings docsearch = StarRocks.from_documents(split_docs, embeddings, config = settings) # or to use already-built embeddings in database. docsearch = StarRocks(embeddings, settings) ``` #### Who can review? Tag maintainers/contributors who might be interested: @dev2049 <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @hwchase17 VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-21 09:02:33 -07:00
Harrison Chase	ace442b992	bump to ver 208 (#6540 )	2023-06-21 07:32:36 -07:00
Harrison Chase	53c1f120a8	Harrison/multi tool (#6518 )	2023-06-21 07:19:52 -07:00
Naman Modi	37a89918e0	Infino integration for simplified logs, metrics & search across LLM data & token usage (#6218 ) ### Integration of Infino with LangChain for Enhanced Observability This PR aims to integrate [Infino](https://github.com/infinohq/infino), an open source observability platform written in rust for storing metrics and logs at scale, with LangChain, providing users with a streamlined and efficient method of tracking and recording LangChain experiments. By incorporating Infino into LangChain, users will be able to gain valuable insights and easily analyze the behavior of their language models. #### Please refer to the following files related to integration: - `InfinoCallbackHandler`: A [callback handler](https://github.com/naman-modi/langchain/blob/feature/infino-integration/langchain/callbacks/infino_callback.py) specifically designed for storing chain responses within Infino. - Example `infino.ipynb` file: A comprehensive notebook named [infino.ipynb](https://github.com/naman-modi/langchain/blob/feature/infino-integration/docs/extras/modules/callbacks/integrations/infino.ipynb) has been included to guide users on effectively leveraging Infino for tracking LangChain requests. - [Integration Doc](https://github.com/naman-modi/langchain/blob/feature/infino-integration/docs/extras/ecosystem/integrations/infino.mdx) for Infino integration. By integrating Infino, LangChain users will gain access to powerful visualization and debugging capabilities. Infino enables easy tracking of inputs, outputs, token usage, execution time of LLMs. This comprehensive observability ensures a deeper understanding of individual executions and facilitates effective debugging. Co-authors: @vinaykakade @savannahar68 --------- Co-authored-by: Vinay Kakade <vinaykakade@gmail.com>	2023-06-21 01:38:20 -07:00
Anubhav Bindlish	94c7899257	Integrate Rockset as Vectorstore (#6216 ) This PR adds Rockset as a vectorstore for langchain. [Rockset](https://rockset.com/blog/introducing-vector-search-on-rockset/) is a real time OLAP database which provides a fast and efficient vector search functionality. Further since it is entirely schemaless, it can store metadata in separate columns thereby allowing fast metadata filters during vector similarity search (as opposed to storing the entire metadata in a single JSON column). It currently supports three distance functions: `COSINE_SIMILARITY`, `EUCLIDEAN_DISTANCE`, and `DOT_PRODUCT`. This PR adds `rockset` client as an optional dependency. We would love a twitter shoutout, our handle is https://twitter.com/RocksetCloud --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-21 01:22:27 -07:00
ElReyZero	ab7ecc9c30	Feat: Add a prompt template parameter to qa with structure chains (#6495 ) This pull request introduces a new feature to the LangChain QA Retrieval Chains with Structures. The change involves adding a prompt template as an optional parameter for the RetrievalQA chains that utilize the recently implemented OpenAI Functions. The main purpose of this enhancement is to provide users with the ability to input a more customizable prompt to the chain. By introducing a prompt template as an optional parameter, users can tailor the prompt to their specific needs and context, thereby improving the flexibility and effectiveness of the RetrievalQA chains. ## Changes Made - Created a new optional parameter, "prompt", for the RetrievalQA with structure chains. - Added an example to the RetrievalQA with sources notebook. My twitter handle is @El_Rey_Zero --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-21 00:23:36 -07:00
Hassan Ouda	456ca3d587	Be able to use Codey models on Vertex AI (#6354 ) Added the functionality to leverage 3 new Codey models from Vertex AI: - code-bison - Code generation using the existing LLM integration - code-gecko - Code completion using the existing LLM integration - codechat-bison - Code chat using the existing chat_model integration --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-20 23:11:54 -07:00
囧囧	0fce8ef178	Add KuzuQAChain (#6454 ) This PR adds `KuzuGraph` and `KuzuQAChain` for interacting with [Kùzu database](https://github.com/kuzudb/kuzu). Kùzu is an in-process property graph database management system (GDBMS) built for query speed and scalability. The `KuzuGraph` and `KuzuQAChain` provide the same functionality as the existing integration with NebulaGraph and Neo4j and enables query generation and question answering over Kùzu database. A notebook example and a simple test case have also been added. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-20 22:07:00 -07:00
TheOnlyWayUp	bb437646fc	typo(llamacpp.ipynb): 'condiser' -> 'consider' (#6474 )	2023-06-20 18:48:25 -07:00
Davis Chase	3298bf4f00	docs/fix links (#6498 )	2023-06-20 14:06:50 -07:00
Lance Martin	ae6196507d	Update notebook for MD header splitter and create new cookbook (#6399 ) Move MD header text splitter example to its own cookbook.	2023-06-20 13:53:41 -07:00
Stefano Lottini	22af93d851	Vector store support for Cassandra (#6426 ) This addresses #6291 adding support for using Cassandra (and compatible databases, such as DataStax Astra DB) as a [Vector Store](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor(ANN)+Vector+Search+via+Storage-Attached+Indexes). A new class `Cassandra` is introduced, which complies with the contract and interface for a vector store, along with the corresponding integration test, a sample notebook and modified dependency toml. Dependencies: the implementation relies on the library `cassio`, which simplifies interacting with Cassandra for ML- and LLM-oriented workloads. CassIO, in turn, uses the `cassandra-driver` low-lever drivers to communicate with the database. The former is added as optional dependency (+ in `extended_testing`), the latter was already in the project. Integration testing relies on a locally-running instance of Cassandra. [Here](https://cassio.org/more_info/#use-a-local-vector-capable-cassandra) a detailed description can be found on how to compile and run it (at the time of writing the feature has not made it yet to a release). During development of the integration tests, I added a new "fake embedding" class for what I consider a more controlled way of testing the MMR search method. Likewise, I had to amend what looked like a glitch in the behaviour of `ConsistentFakeEmbeddings` whereby an `embed_query` call would have bypassed storage of the requested text in the class cache for use in later repeated invocations. @dev2049 might be the right person to tag here for a review. Thank you! --------- Co-authored-by: rlm <pexpresss31@gmail.com>	2023-06-20 10:46:20 -07:00
zhaoshengbo	ab44c24333	Add Alibaba Cloud OpenSearch as a new vector store (#6154 ) Hello Folks, Thanks for creating and maintaining this great project. I'm excited to submit this PR to add Alibaba Cloud OpenSearch as a new vector store. OpenSearch is a one-stop platform to develop intelligent search services. OpenSearch was built based on the large-scale distributed search engine developed by Alibaba. OpenSearch serves more than 500 business cases in Alibaba Group and thousands of Alibaba Cloud customers. OpenSearch helps develop search services in different search scenarios, including e-commerce, O2O, multimedia, the content industry, communities and forums, and big data query in enterprises. OpenSearch provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results. This PR includes: A AlibabaCloudOpenSearch class that can connect to the Alibaba Cloud OpenSearch instance. add embedings and metadata into a opensearch datasource. querying by squared euclidean and metadata. integration tests. ipython notebook and docs. I have read your contributing guidelines. And I have passed the tests below - [x] make format - [x] make lint - [x] make coverage - [x] make test --------- Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>	2023-06-20 10:07:40 -07:00
Davis Chase	b7ad4c4c30	fix openai qa chain (#6487 )	2023-06-20 10:01:13 -07:00
Harrison Chase	9eec7c3206	Harrison/unstructured page number (#6464 ) Co-authored-by: Reza Sanaie <reza@sanaie.ca>	2023-06-19 22:31:43 -07:00
volodymyr-memsql	d2e9b621ab	Update SinglStoreDB vectorstore (#6423 ) 1. Introduced new distance strategies support: DOT_PRODUCT and EUCLIDEAN_DISTANCE for enhanced flexibility. 2. Implemented a feature to filter results based on metadata fields. 3. Incorporated connection attributes specifying "langchain python sdk" usage for enhanced traceability and debugging. 4. Expanded the suite of integration tests for improved code reliability. 5. Updated the existing notebook with the usage example @dev2049 --------- Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-19 22:08:58 -07:00
Harrison Chase	02c0a1e77e	Harrison/functions in retrieval (#6463 )	2023-06-19 22:07:58 -07:00
kYLe	3a58c4c3a0	Fixed a link typo /-/route -> /-/routes. and change endpoint format (#6186 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes a link typo from `/-/route` to `/-/routes`. and change endpoint format from `f"{self.anyscale_service_url}/{self.anyscale_service_route}"` to `f"{self.anyscale_service_url}{self.anyscale_service_route}"` Also adding documentation about the format of the endpoint #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @hwchase17 VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-19 22:05:54 -07:00
Leonid Ganeline	03b16ed2b1	docs `retrievers` fixes (#6299 ) Fixed several inconsistencies: - file names and notebook titles should be similar otherwise ToC on the [retrievers page](https://python.langchain.com/en/latest/modules/indexes/retrievers.html) and on the left ToC tab are different. For example, now, `Self-querying with Chroma` is not correctly alphabetically sorted because its file named `chroma_self_query.ipynb` - `Stringing compressors and document transformers...` demoted from `#` to `##`. Otherwise, it appears in Toc. - several formatting problems #### Who can review? @hwchase17 @dev2049 Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-19 22:04:35 -07:00
Dhruvil Shah	9494623869	Update web_base.ipynb (#6430 ) Minor new line character in the markdown. Also, this option is not yet in the latest version of LangChain (0.0.190) from Conda. Maybe in the next update. @eyurtsev @hwchase17	2023-06-19 21:43:35 -07:00
Ismail Pelaseyed	d4e8e0f5ab	Add example for question answering over documents with OpenAI Function Agent (#6448 ) This PR adds an example of doing question answering over documents using OpenAI Function Agents. #### Who can review? @hwchase17 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-19 21:35:45 -07:00
Harrison Chase	286452c7f0	remove mongo	2023-06-19 10:04:14 -07:00
Harrison Chase	e9c2b280db	Harrison/refactor functions (#6408 )	2023-06-18 23:13:42 -07:00
Harrison Chase	6a4a950a3c	changes to llm chain (#6328 ) - return raw and full output (but keep run shortcut method functional) - change output parser to take in generations (good for working with messages) - add output parser to base class, always run (default to same as current) --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-06-18 22:49:47 -07:00
Fei Wang	50556f3b35	support memory for functions (#6165 ) #### Before submitting Add memory support for `OpenAIFunctionsAgent` like `StructuredChatAgent`. #### Who can review? @hwchase17 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-18 19:00:40 -07:00
Dhruvil Shah	ba90e3c990	Update web_base.ipynb for guiding purposes (#6248 ) To bypass SSL verification errors during fetching, you can include the `verify=False` parameter. This markdown proves useful, especially for beginners in the field of web scraping. <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> Fixes #6079 #### Who can review? Tag maintainers/contributors who might be interested: @hwchase17 @eyurtsev --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-18 17:47:10 -07:00
Dhruvil Shah	92f05a67a4	Add markdown to specify important arguments (#6246 ) To bypass SSL verification errors during web scraping, you can include the ssl_verify=False parameter along with the headers parameter. This combination of arguments proves useful, especially for beginners in the field of web scraping. <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> Fixes #1829 #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: @hwchase17 @eyurtsev --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-18 17:47:00 -07:00
Harrison Chase	495128ba95	Harrison/functions docs improvements (#6389 ) Co-authored-by: Sumanth Donthula <46747610+sumanthdonthula@users.noreply.github.com>	2023-06-18 16:57:33 -07:00
Harrison Chase	c0c2fd0782	Harrison/zep mem (#6388 ) Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>	2023-06-18 16:53:35 -07:00
Harrison Chase	b7159c15cc	Harrison/metaphor search fix (#6387 ) Co-authored-by: jeffzwang <jeffreyzhiyuanwang@gmail.com>	2023-06-18 16:53:24 -07:00
Harrison Chase	9bf5b0defa	Harrison/myscale self query (#6376 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	2023-06-18 16:53:10 -07:00
Harrison Chase	bd8d418a95	Merge branch 'master' of github.com:hwchase17/langchain	2023-06-18 16:45:49 -07:00
Harrison Chase	3a75d59c3d	searx - docs	2023-06-18 16:45:42 -07:00
Harrison Chase	a8cb9ee013	Harrison/gdrive enhancements (#6375 ) Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>	2023-06-18 11:07:23 -07:00
Lance Martin	370becdfc2	Add self query retriever example with MD header splitting (#6359 ) Flesh out the notebook example for `MarkdownHeaderTextSplitter`	2023-06-17 21:40:20 -07:00
Lance Martin	2c97fbabbd	Update MD header text splitter notebook (#6339 ) Highlight use case for maintaining header groups when splitting.	2023-06-17 13:19:27 -07:00
Harrison Chase	a2bbe3dda4	Harrison/mmr support for opensearch (#6349 ) Co-authored-by: Mehmet Öner Yalçın <oneryalcin@gmail.com>	2023-06-17 12:22:37 -07:00
Harrison Chase	680d6bbbf8	fix titles in documentation	2023-06-17 11:09:11 -07:00
Saba Sturua	427551eabf	DocArray as a Retriever (#6031 ) ## DocArray as a Retriever [DocArray](https://github.com/docarray/docarray) is an open-source tool for managing your multi-modal data. It offers flexibility to store and search through your data using various document index backends. This PR introduces `DocArrayRetriever` - which works with any available backend and serves as a retriever for Langchain apps. Also, I added 2 notebooks: DocArray Backends - intro to all 5 currently supported backends, how to initialize, index, and use them as a retriever DocArray Usage - showcasing what additional search parameters you can pass to create versatile retrievers Example: ```python from docarray.index import InMemoryExactNNIndex from docarray import BaseDoc, DocList from docarray.typing import NdArray from langchain.embeddings.openai import OpenAIEmbeddings from langchain.retrievers import DocArrayRetriever # define document schema class MyDoc(BaseDoc): description: str description_embedding: NdArray[1536] embeddings = OpenAIEmbeddings() # create documents descriptions = ["description 1", "description 2"] desc_embeddings = embeddings.embed_documents(texts=descriptions) docs = DocList[MyDoc]( [ MyDoc(description=desc, description_embedding=embedding) for desc, embedding in zip(descriptions, desc_embeddings) ] ) # initialize document index with data db = InMemoryExactNNIndex[MyDoc](docs) # create a retriever retriever = DocArrayRetriever( index=db, embeddings=embeddings, search_field="description_embedding", content_field="description", ) # find the relevant document doc = retriever.get_relevant_documents("action movies") print(doc) ``` #### Who can review? @dev2049 --------- Signed-off-by: jupyterjazz <saba.sturua@jina.ai>	2023-06-17 09:09:33 -07:00
Francisco Ingham	83eea230f3	changed height in the nb example (#6327 ) changed height in the example to a more reasonable number (from 9 feet to 6 feet)	2023-06-17 00:05:48 -07:00
Harrison Chase	af18413d97	Harrison/deeplake new features (#6263 ) Co-authored-by: adilkhan <adilkhan.sarsen@nu.edu.kz> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-16 17:53:55 -07:00
ljeagle	ad324a39ae	Improve the performance of add_texts interface and upgrade the AwaDB from 0.3.2 to 0.3.3 (#6316 ) 1. Changed the implementation of add_texts interface for the AwaDB vector store in order to improve the performance 2. Upgrade the AwaDB from 0.3.2 to 0.3.3 --------- Co-authored-by: vincent <awadb.vincent@gmail.com>	2023-06-16 16:50:01 -07:00
Davis Chase	87e502c6bc	Doc refactor (#6300 ) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-16 11:52:56 -07:00

1 2 3 4 5

208 Commits