langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Harrison Chase	9218684759	Add a new vector store - AwaDB (#5971 ) (#5992 ) Added AwaDB vector store, which is a wrapper over the AwaDB, that can be used as a vector storage and has an efficient similarity search. Added integration tests for the vector store Added jupyter notebook with the example Delete a unneeded empty file and resolve the conflict(https://github.com/hwchase17/langchain/pull/5886) Please check, Thanks! @dev2049 @hwchase17 --------- <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: ljeagle <vincent_jieli@yeah.net> Co-authored-by: vincent <awadb.vincent@gmail.com>	2023-06-10 15:42:32 -07:00
Liang Zhang	b93638ef1e	Refactor and update databricks integration page (#5575 ) # Your PR Title (What it does) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 -->	2023-06-07 20:45:47 -07:00
Ning Ren	f15763518a	docs: add Shale Protocol integration guide (#5814 ) This PR adds documentation for Shale Protocol's integration with LangChain. [Shale Protocol](https://shaleprotocol.com) provides forever-free production-ready inference APIs to the open-source community. We have global data centers and plan to support all major open LLMs (estimated ~1,000 by 2025). The team consists of software and ML engineers, AI researchers, designers, and operators across North America and Asia. Combined together, the team has 50+ years experience in machine learning, cloud infrastructure, software engineering and product development. Team members have worked at places like Google and Microsoft. #### Who can review? Tag maintainers/contributors who might be interested: - @hwchase17 - @agola11 --------- Co-authored-by: Karen Sheng <46656667+karensheng@users.noreply.github.com>	2023-06-07 19:25:59 -07:00
kourosh hakhamaneshi	a0d847f636	[Docs][Hotfix] Fix broken links (#5800 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Some links were broken from the previous merge. This PR fixes them. Tested locally. #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2023-06-06 17:17:16 -07:00
Gengliang Wang	2a4b32dee2	Revise DATABRICKS_API_TOKEN as DATABRICKS_TOKEN (#5796 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> In the [Databricks integration](https://python.langchain.com/en/latest/integrations/databricks.html) and [Databricks LLM](https://python.langchain.com/en/latest/modules/models/llms/integrations/databricks.html), we suggestted users to set the ENV variable `DATABRICKS_API_TOKEN`. However, this is inconsistent with the other Databricks library. To make it consistent, this PR changes the variable from `DATABRICKS_API_TOKEN` to `DATABRICKS_TOKEN` After changes, there is no more `DATABRICKS_API_TOKEN` in the doc ``` $ git grep DATABRICKS_API_TOKEN\|wc -l 0 $ git grep DATABRICKS_TOKEN\|wc -l 8 ``` cc @hwchase17 @dev2049 @mengxr since you have reviewed the previous PRs.	2023-06-06 14:22:49 -07:00
Leonid Ganeline	92a5f00ffb	docs: `ecosystem/integrations` update 5 (#5752 ) - added missed integration to `docs/ecosystem/integrations/` - updated notebooks to consistent format: changed titles, file names; added descriptions #### Who can review? @hwchase17 @dev2049	2023-06-05 16:08:55 -07:00
Hao Chen	a4c9053d40	Integrate Clickhouse as Vector Store (#5650 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> #### Description This PR is mainly to integrate open source version of ClickHouse as Vector Store as it is easy for both local development and adoption of LangChain for enterprises who already have large scale clickhouse deployment. ClickHouse is a open source real-time OLAP database with full SQL support and a wide range of functions to assist users in writing analytical queries. Some of these functions and data structures perform distance operations between vectors, [enabling ClickHouse to be used as a vector database](https://clickhouse.com/blog/vector-search-clickhouse-p1). Recently added ClickHouse capabilities like [Approximate Nearest Neighbour (ANN) indices](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes) support faster approximate matching of vectors and provide a promising development aimed to further enhance the vector matching capabilities of ClickHouse. In LangChain, some ClickHouse based commercial variant vector stores like [Chroma](https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/chroma.py) and [MyScale](https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/myscale.py), etc are already integrated, but for some enterprises with large scale Clickhouse clusters deployment, it will be more straightforward to upgrade existing clickhouse infra instead of moving to another similar vector store solution, so we believe it's a valid requirement to integrate open source version of ClickHouse as vector store. As `clickhouse-connect` is already included by other integrations, this PR won't include any new dependencies. #### Before submitting <!-- If you're adding a new integration, please include: 1. Added a test for the integration: https://github.com/haoch/langchain/blob/clickhouse/tests/integration_tests/vectorstores/test_clickhouse.py 2. Added an example notebook and document showing its use: * Notebook: https://github.com/haoch/langchain/blob/clickhouse/docs/modules/indexes/vectorstores/examples/clickhouse.ipynb * Doc: https://github.com/haoch/langchain/blob/clickhouse/docs/integrations/clickhouse.md See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> 1. Added a test for the integration: https://github.com/haoch/langchain/blob/clickhouse/tests/integration_tests/vectorstores/test_clickhouse.py 2. Added an example notebook and document showing its use: * Notebook: https://github.com/haoch/langchain/blob/clickhouse/docs/modules/indexes/vectorstores/examples/clickhouse.ipynb * Doc: https://github.com/haoch/langchain/blob/clickhouse/docs/integrations/clickhouse.md #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> @hwchase17 @dev2049 Could you please help review? --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-05 13:32:04 -07:00
Jason Weill	6c11f94013	Retitles Bedrock doc to appear in correct alphabetical order in site nav (#5639 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes #5638. Retitles "Amazon Bedrock" page to "Bedrock" so that the Integrations section of the left nav is properly sorted in alphabetical order. #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 -->	2023-06-04 14:39:25 -07:00
Leonid Ganeline	b201cfaa0f	docs `ecosystem/integrations` update 4 (#5590 ) # docs `ecosystem/integrations` update 4 Added missed integrations. Fixed inconsistencies. ## Who can review? @hwchase17 @dev2049	2023-06-03 15:29:03 -07:00
Davis Chase	d784401215	Dev2049/add argilla callback (#5621 ) Co-authored-by: Alvaro Bartolome <alvarobartt@gmail.com> Co-authored-by: Daniel Vila Suero <daniel@argilla.io> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>	2023-06-02 09:05:06 -07:00
Bharat Ramanathan	28d6277396	docs(integration): update colab and external links in WandbTracing docs (#5602 ) # Update Wandb Tracking documentation This PR updates the Wandb Tracking documentation for formatting, updated broken links and colab notebook links --------- Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>	2023-06-02 02:58:42 -07:00
Davis Chase	4c572ffe95	nit (#5578 )	2023-06-01 14:21:15 -07:00
Bharat Ramanathan	22603d19e0	feat(integrations): Add WandbTracer (#4521 ) # WandbTracer This PR adds the `WandbTracer` and deprecates the existing `WandbCallbackHandler`. Added an example notebook under the docs section alongside the `LangchainTracer` Here's an example [colab](https://colab.research.google.com/drive/1pY13ym8ENEZ8Fh7nA99ILk2GcdUQu0jR?usp=sharing) with the same notebook and the [trace](https://wandb.ai/parambharat/langchain-tracing/runs/8i45cst6) generated from the colab run Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-01 00:01:19 -07:00
Leonid Ganeline	373ad49157	docs `ecosystem/integrations` update 3 (#5470 ) # docs: `ecosystem_integrations` update 3 Next cycle of updating the `ecosystem/integrations` * Added an integration `template` file * Added missed integration files * Fixed several document_loaders/notebooks ## Who can review? Is it possible to assign somebody to review PRs on docs? Thanks.	2023-05-31 17:54:05 -07:00
Matt Robinson	4c8aad0d1b	docs: unstructured no longer requires installing detectron2 from source (#5524 ) # Update Unstructured docs to remove the `detectron2` install instructions Removes `detectron2` installation instructions from the Unstructured docs because installing `detectron2` is no longer required for `unstructured>=0.7.0`. The `detectron2` model now runs using the ONNX runtime. ## Who can review? @hwchase17 @eyurtsev	2023-05-31 15:03:21 -07:00
Piyush Jain	562fdfc8f9	Bedrock llm and embeddings (#5464 ) # Bedrock LLM and Embeddings This PR adds a new LLM and an Embeddings class for the [Bedrock](https://aws.amazon.com/bedrock) service. The PR also includes example notebooks for using the LLM class in a conversation chain and embeddings usage in creating an embedding for a query and document. Note: AWS is doing a private release of the Bedrock service on 05/31/2023; users need to request access and added to an allowlist in order to start using the Bedrock models and embeddings. Please use the [Bedrock Home Page](https://aws.amazon.com/bedrock) to request access and to learn more about the models available in Bedrock. <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 -->	2023-05-31 07:17:01 -07:00
Leonid Ganeline	1f11f80641	docs: cleaning (#5413 ) # docs cleaning Changed docs to consistent format (probably, we need an official doc integration template): - ClearML - added product descriptions; changed title/headers - Rebuff - added product descriptions; changed title/headers - WhyLabs - added product descriptions; changed title/headers - Docugami - changed title/headers/structure - Airbyte - fixed title - Wolfram Alpha - added descriptions, fixed title - OpenWeatherMap - - added product descriptions; changed title/headers - Unstructured - changed description ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @hwchase17 @dev2049	2023-05-30 13:58:16 -07:00
Leonid Ganeline	1837caa70d	docs: `ecosystem/integrations` update 1 (#5219 ) # docs: ecosystem/integrations update It is the first in a series of `ecosystem/integrations` updates. The ecosystem/integrations list is missing many integrations. I'm adding the missing integrations in a consistent format: 1. description of the integrated system 2. `Installation and Setup` section with 'pip install ...`, Key setup, and other necessary settings 3. Sections like `LLM`, `Text Embedding Models`, `Chat Models`... with links to correspondent examples and imports of the used classes. This PR keeps new docs, that are presented in the `docs/modules/models/text_embedding/examples` but missed in the `ecosystem/integrations`. The next PRs will cover the next example sections. Also updated `integrations.rst`: added the `Dependencies` section with a link to the packages used in LangChain. ## Who can review? @hwchase17 @eyurtsev @dev2049	2023-05-29 07:25:17 -07:00
Leonid Ganeline	a3598193a0	docs: `ecosystem/integrations` update 2 (#5282 ) # docs: ecosystem/integrations update 2 #5219 - part 1 The second part of this update (parts are independent of each other! no overlap): - added diffbot.md - updated confluence.ipynb; added confluence.md - updated college_confidential.md - updated openai.md - added blackboard.md - added bilibili.md - added azure_blob_storage.md - added azlyrics.md - added aws_s3.md ## Who can review? @hwchase17@agola11 @agola11 @vowelparrot @dev2049	2023-05-29 07:19:43 -07:00
Harrison Chase	d6fb25c439	Harrison/prediction guard update (#5404 ) Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>	2023-05-29 07:14:59 -07:00
Harrison Chase	416c8b1da3	Harrison/deep infra (#5403 ) Co-authored-by: Yessen Kanapin <yessenzhar@gmail.com> Co-authored-by: Yessen Kanapin <yessen@deepinfra.com>	2023-05-29 07:10:50 -07:00
Janos Tolgyesi	5f4552391f	Add SKLearnVectorStore (#5305 ) # Add SKLearnVectorStore This PR adds SKLearnVectorStore, a simply vector store based on NearestNeighbors implementations in the scikit-learn package. This provides a simple drop-in vector store implementation with minimal dependencies (scikit-learn is typically installed in a data scientist / ml engineer environment). The vector store can be persisted and loaded from json, bson and parquet format. SKLearnVectorStore has soft (dynamic) dependency on the scikit-learn, numpy and pandas packages. Persisting to bson requires the bson package, persisting to parquet requires the pyarrow package. ## Before submitting Integration tests are provided under `tests/integration_tests/vectorstores/test_sklearn.py` Sample usage notebook is provided under `docs/modules/indexes/vectorstores/examples/sklear.ipynb` Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-28 08:17:42 -07:00
Michael Landis	7047a2c1af	feat: add Momento as a standard cache and chat message history provider (#5221 ) # Add Momento as a standard cache and chat message history provider This PR adds Momento as a standard caching provider. Implements the interface, adds integration tests, and documentation. We also add Momento as a chat history message provider along with integration tests, and documentation. [Momento](https://www.gomomento.com/) is a fully serverless cache. Similar to S3 or DynamoDB, it requires zero configuration, infrastructure management, and is instantly available. Users sign up for free and get 50GB of data in/out for free every month. ## Before submitting ✅ We have added documentation, notebooks, and integration tests demonstrating usage. Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-25 19:13:21 -07:00
Ravindra Marella	b3988621c5	Add C Transformers for GGML Models (#5218 ) # Add C Transformers for GGML Models I created Python bindings for the GGML models: https://github.com/marella/ctransformers Currently it supports GPT-2, GPT-J, GPT-NeoX, LLaMA, MPT, etc. See [Supported Models](https://github.com/marella/ctransformers#supported-models). It provides a unified interface for all models: ```python from langchain.llms import CTransformers llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2') print(llm('AI is going to')) ``` It can be used with models hosted on the Hugging Face Hub: ```py llm = CTransformers(model='marella/gpt-2-ggml') ``` It supports streaming: ```py from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler llm = CTransformers(model='marella/gpt-2-ggml', callbacks=[StreamingStdOutCallbackHandler()]) ``` Please see [README](https://github.com/marella/ctransformers#readme) for more details. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-25 13:42:44 -07:00
Nolan Tremelling	faa26650c9	Beam (#4996 ) # Beam Calls the Beam API wrapper to deploy and make subsequent calls to an instance of the gpt2 LLM in a cloud deployment. Requires installation of the Beam library and registration of Beam Client ID and Client Secret. Additional calls can then be made through the instance of the large language model in your code or by calling the Beam API. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 01:25:18 -07:00
Ofer Mendelevitch	c81fb88035	Vectara (#5069 ) # Vectara Integration This PR provides integration with Vectara. Implemented here are: * langchain/vectorstore/vectara.py * tests/integration_tests/vectorstores/test_vectara.py * langchain/retrievers/vectara_retriever.py And two IPYNB notebooks to do more testing: * docs/modules/chains/index_examples/vectara_text_generation.ipynb * docs/modules/indexes/vectorstores/examples/vectara.ipynb --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 01:24:58 -07:00
Jamie Broomall	d4fd589638	WhyLabs callback (#4906 ) # Add a WhyLabs callback handler * Adds a simple WhyLabsCallbackHandler * Add required dependencies as optional * protect against missing modules with imports * Add docs/ecosystem basic example based on initial prototype from @andrewelizondo > this integration gathers privacy preserving telemetry on text with whylogs and sends stastical profiles to WhyLabs platform to monitoring these metrics over time. For more information on what WhyLabs is see: https://whylabs.ai After you run the notebook (if you have env variables set for the API Keys, org_id and dataset_id) you get something like this in WhyLabs: ![Screenshot (443)](https://github.com/hwchase17/langchain/assets/88007022/6bdb3e1c-4243-4ae8-b974-23a8bb12edac) Co-authored-by: Andre Elizondo <andre@whylabs.ai> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 20:29:47 -07:00
Harrison Chase	224f73e978	move docs	2023-05-21 09:22:35 -07:00
Gengliang Wang	f9f08c4b69	Add documentation for Databricks integration (#5013 ) # Add documentation for Databricks integration This is a follow-up of https://github.com/hwchase17/langchain/pull/4702 It documents the details of how to integrate Databricks using langchain. It also provides examples in a notebook. ## Who can review? @dev2049 @hwchase17 since you are aware of the context. We will promote the integration after this doc is ready. Thanks in advance!	2023-05-20 22:06:24 -07:00
Leonid Ganeline	e2d7677526	docs: compound ecosystem and integrations (#4870 ) # Docs: compound ecosystem and integrations Problem statement: We have a big overlap between the References/Integrations and Ecosystem/LongChain Ecosystem pages. It confuses users. It creates a situation when new integration is added only on one of these pages, which creates even more confusion. - removed References/Integrations page (but move all its information into the individual integration pages - in the next PR). - renamed Ecosystem/LongChain Ecosystem into Integrations/Integrations. I like the Ecosystem term. It is more generic and semantically richer than the Integration term. But it mentally overloads users. The `integration` term is more concrete. UPDATE: after discussion, the Ecosystem is the term. Ecosystem/Integrations is the page (in place of Ecosystem/LongChain Ecosystem). As a result, a user gets a single place to start with the individual integration.	2023-05-18 09:29:57 -07:00

30 Commits