langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Jeff Huber	2acf109c4b	update chroma notebook (#6664 ) @rlancemartin I updated the notebook for Chroma to hopefully be a lot easier for users.	2023-06-23 15:03:06 -07:00
Piyush Jain	b1de927f1b	Kendra retriever api (#6616 ) ## Description Replaces [Kendra Retriever](https://github.com/hwchase17/langchain/blob/master/langchain/retrievers/aws_kendra_index_retriever.py) with an updated version that uses the new [retriever API](https://docs.aws.amazon.com/kendra/latest/dg/searching-retrieve.html) which is better suited for retrieval augmented generation (RAG) systems. Note: This change requires the latest version (1.26.159) of boto3 to work. `pip install -U boto3` to upgrade the boto3 version. cc @hupe1980 cc @dev2049	2023-06-23 14:59:35 -07:00
Ikko Eltociear Ashimine	73da193a4b	Fix typo in myscale_self_query.ipynb (#6601 )	2023-06-23 14:57:12 -07:00
Lance Martin	c2b25c17c5	Recursive URL loader (#6455 ) We may want to process load all URLs under a root directory. For example, let's look at the [LangChain JS documentation](https://js.langchain.com/docs/). This has many interesting child pages that we may want to read in bulk. Of course, the `WebBaseLoader` can load a list of pages. But, the challenge is traversing the tree of child pages and actually assembling that list! We do this using the `RecusiveUrlLoader`. This also gives us the flexibility to exclude some children (e.g., the `api` directory with > 800 child pages).	2023-06-23 13:09:00 -07:00
Lance Martin	393f469eb3	Create merge loader that combines documents from a set of loaders (#6659 ) Simple utility loader that combines documents from a set of specified loaders.	2023-06-23 13:02:48 -07:00
Lance Martin	6e69bfbb28	Loader for OpenCityData and minor cleanups to Pandas, Airtable loaders (#6301 ) Many cities have open data portals for events like crime, traffic, etc. Socrata provides an API for many, including SF (e.g., see [here](https://dev.socrata.com/foundry/data.sfgov.org/tmnf-yvry)). This is a new data loader for city data that uses Socrata API.	2023-06-22 22:20:42 -07:00
Christoph Kahl	9d42621fa4	added redis method to delete entries by keys (#6222 ) In addition to my last pr (return keys of added entries), we also need a method to delete the entries by keys. @dev2049	2023-06-22 13:26:47 -07:00
Harrison Chase	a9108c1809	add mongo (HOLD) (#6437 ) do not merge in	2023-06-22 11:08:12 -07:00
Lance Martin	30f7288082	MD header text splitter returns Documents (#6571 ) Return `Documents` from MD header text splitter to simplify UX. Updates the test as well as example notebooks.	2023-06-22 09:25:38 -07:00
minhajul-clarifai	6e57306a13	Clarifai integration (#5954 ) # Changes This PR adds [Clarifai](https://www.clarifai.com/) integration to Langchain. Clarifai is an end-to-end AI Platform. Clarifai offers user the ability to use many types of LLM (OpenAI, cohere, ect and other open source models). As well, a clarifai app can be treated as a vector database to upload and retrieve data. The integrations includes: - Clarifai LLM integration: Clarifai supports many types of language model that users can utilize for their application - Clarifai VectorDB: A Clarifai application can hold data and embeddings. You can run semantic search with the embeddings #### Before submitting - [x] Added integration test for LLM - [x] Added integration test for VectorDB - [x] Added notebook for LLM - [x] Added notebook for VectorDB Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-22 08:00:15 -07:00
Andrey E. Vedishchev	a2a0715bd4	Minor Grammar Fixes in Docs and Comments (#6536 ) Just some grammar fixes: I found "retriver" instead of "retriever" in several comments across the documentation and in the comments. I fixed it. Co-authored-by: andrey.vedishchev <andrey.vedishchev@rgigroup.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-21 09:53:31 -07:00
dirtysalt	57cc3d1d3d	[Feature][VectorStore] Support StarRocks as vector db (#6119 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> Here are some examples to use StarRocks as vectordb ``` from langchain.vectorstores import StarRocks from langchain.vectorstores.starrocks import StarRocksSettings embeddings = OpenAIEmbeddings() # conifgure starrocks settings settings = StarRocksSettings() settings.port = 41003 settings.host = '127.0.0.1' settings.username = 'root' settings.password = '' settings.database = 'zya' # to fill new embeddings docsearch = StarRocks.from_documents(split_docs, embeddings, config = settings) # or to use already-built embeddings in database. docsearch = StarRocks(embeddings, settings) ``` #### Who can review? Tag maintainers/contributors who might be interested: @dev2049 <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @hwchase17 VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-21 09:02:33 -07:00
Anubhav Bindlish	94c7899257	Integrate Rockset as Vectorstore (#6216 ) This PR adds Rockset as a vectorstore for langchain. [Rockset](https://rockset.com/blog/introducing-vector-search-on-rockset/) is a real time OLAP database which provides a fast and efficient vector search functionality. Further since it is entirely schemaless, it can store metadata in separate columns thereby allowing fast metadata filters during vector similarity search (as opposed to storing the entire metadata in a single JSON column). It currently supports three distance functions: `COSINE_SIMILARITY`, `EUCLIDEAN_DISTANCE`, and `DOT_PRODUCT`. This PR adds `rockset` client as an optional dependency. We would love a twitter shoutout, our handle is https://twitter.com/RocksetCloud --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-21 01:22:27 -07:00
Davis Chase	3298bf4f00	docs/fix links (#6498 )	2023-06-20 14:06:50 -07:00
Lance Martin	ae6196507d	Update notebook for MD header splitter and create new cookbook (#6399 ) Move MD header text splitter example to its own cookbook.	2023-06-20 13:53:41 -07:00
Stefano Lottini	22af93d851	Vector store support for Cassandra (#6426 ) This addresses #6291 adding support for using Cassandra (and compatible databases, such as DataStax Astra DB) as a [Vector Store](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor(ANN)+Vector+Search+via+Storage-Attached+Indexes). A new class `Cassandra` is introduced, which complies with the contract and interface for a vector store, along with the corresponding integration test, a sample notebook and modified dependency toml. Dependencies: the implementation relies on the library `cassio`, which simplifies interacting with Cassandra for ML- and LLM-oriented workloads. CassIO, in turn, uses the `cassandra-driver` low-lever drivers to communicate with the database. The former is added as optional dependency (+ in `extended_testing`), the latter was already in the project. Integration testing relies on a locally-running instance of Cassandra. [Here](https://cassio.org/more_info/#use-a-local-vector-capable-cassandra) a detailed description can be found on how to compile and run it (at the time of writing the feature has not made it yet to a release). During development of the integration tests, I added a new "fake embedding" class for what I consider a more controlled way of testing the MMR search method. Likewise, I had to amend what looked like a glitch in the behaviour of `ConsistentFakeEmbeddings` whereby an `embed_query` call would have bypassed storage of the requested text in the class cache for use in later repeated invocations. @dev2049 might be the right person to tag here for a review. Thank you! --------- Co-authored-by: rlm <pexpresss31@gmail.com>	2023-06-20 10:46:20 -07:00
zhaoshengbo	ab44c24333	Add Alibaba Cloud OpenSearch as a new vector store (#6154 ) Hello Folks, Thanks for creating and maintaining this great project. I'm excited to submit this PR to add Alibaba Cloud OpenSearch as a new vector store. OpenSearch is a one-stop platform to develop intelligent search services. OpenSearch was built based on the large-scale distributed search engine developed by Alibaba. OpenSearch serves more than 500 business cases in Alibaba Group and thousands of Alibaba Cloud customers. OpenSearch helps develop search services in different search scenarios, including e-commerce, O2O, multimedia, the content industry, communities and forums, and big data query in enterprises. OpenSearch provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results. This PR includes: A AlibabaCloudOpenSearch class that can connect to the Alibaba Cloud OpenSearch instance. add embedings and metadata into a opensearch datasource. querying by squared euclidean and metadata. integration tests. ipython notebook and docs. I have read your contributing guidelines. And I have passed the tests below - [x] make format - [x] make lint - [x] make coverage - [x] make test --------- Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>	2023-06-20 10:07:40 -07:00
Harrison Chase	9eec7c3206	Harrison/unstructured page number (#6464 ) Co-authored-by: Reza Sanaie <reza@sanaie.ca>	2023-06-19 22:31:43 -07:00
volodymyr-memsql	d2e9b621ab	Update SinglStoreDB vectorstore (#6423 ) 1. Introduced new distance strategies support: DOT_PRODUCT and EUCLIDEAN_DISTANCE for enhanced flexibility. 2. Implemented a feature to filter results based on metadata fields. 3. Incorporated connection attributes specifying "langchain python sdk" usage for enhanced traceability and debugging. 4. Expanded the suite of integration tests for improved code reliability. 5. Updated the existing notebook with the usage example @dev2049 --------- Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-19 22:08:58 -07:00
Leonid Ganeline	03b16ed2b1	docs `retrievers` fixes (#6299 ) Fixed several inconsistencies: - file names and notebook titles should be similar otherwise ToC on the [retrievers page](https://python.langchain.com/en/latest/modules/indexes/retrievers.html) and on the left ToC tab are different. For example, now, `Self-querying with Chroma` is not correctly alphabetically sorted because its file named `chroma_self_query.ipynb` - `Stringing compressors and document transformers...` demoted from `#` to `##`. Otherwise, it appears in Toc. - several formatting problems #### Who can review? @hwchase17 @dev2049 Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-19 22:04:35 -07:00
Dhruvil Shah	9494623869	Update web_base.ipynb (#6430 ) Minor new line character in the markdown. Also, this option is not yet in the latest version of LangChain (0.0.190) from Conda. Maybe in the next update. @eyurtsev @hwchase17	2023-06-19 21:43:35 -07:00
Harrison Chase	286452c7f0	remove mongo	2023-06-19 10:04:14 -07:00
Dhruvil Shah	ba90e3c990	Update web_base.ipynb for guiding purposes (#6248 ) To bypass SSL verification errors during fetching, you can include the `verify=False` parameter. This markdown proves useful, especially for beginners in the field of web scraping. <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> Fixes #6079 #### Who can review? Tag maintainers/contributors who might be interested: @hwchase17 @eyurtsev --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-18 17:47:10 -07:00
Dhruvil Shah	92f05a67a4	Add markdown to specify important arguments (#6246 ) To bypass SSL verification errors during web scraping, you can include the ssl_verify=False parameter along with the headers parameter. This combination of arguments proves useful, especially for beginners in the field of web scraping. <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> Fixes #1829 #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: @hwchase17 @eyurtsev --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-18 17:47:00 -07:00
Harrison Chase	c0c2fd0782	Harrison/zep mem (#6388 ) Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>	2023-06-18 16:53:35 -07:00
Harrison Chase	9bf5b0defa	Harrison/myscale self query (#6376 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	2023-06-18 16:53:10 -07:00
Harrison Chase	a8cb9ee013	Harrison/gdrive enhancements (#6375 ) Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>	2023-06-18 11:07:23 -07:00
Lance Martin	370becdfc2	Add self query retriever example with MD header splitting (#6359 ) Flesh out the notebook example for `MarkdownHeaderTextSplitter`	2023-06-17 21:40:20 -07:00
Lance Martin	2c97fbabbd	Update MD header text splitter notebook (#6339 ) Highlight use case for maintaining header groups when splitting.	2023-06-17 13:19:27 -07:00
Harrison Chase	a2bbe3dda4	Harrison/mmr support for opensearch (#6349 ) Co-authored-by: Mehmet Öner Yalçın <oneryalcin@gmail.com>	2023-06-17 12:22:37 -07:00
Harrison Chase	680d6bbbf8	fix titles in documentation	2023-06-17 11:09:11 -07:00
Saba Sturua	427551eabf	DocArray as a Retriever (#6031 ) ## DocArray as a Retriever [DocArray](https://github.com/docarray/docarray) is an open-source tool for managing your multi-modal data. It offers flexibility to store and search through your data using various document index backends. This PR introduces `DocArrayRetriever` - which works with any available backend and serves as a retriever for Langchain apps. Also, I added 2 notebooks: DocArray Backends - intro to all 5 currently supported backends, how to initialize, index, and use them as a retriever DocArray Usage - showcasing what additional search parameters you can pass to create versatile retrievers Example: ```python from docarray.index import InMemoryExactNNIndex from docarray import BaseDoc, DocList from docarray.typing import NdArray from langchain.embeddings.openai import OpenAIEmbeddings from langchain.retrievers import DocArrayRetriever # define document schema class MyDoc(BaseDoc): description: str description_embedding: NdArray[1536] embeddings = OpenAIEmbeddings() # create documents descriptions = ["description 1", "description 2"] desc_embeddings = embeddings.embed_documents(texts=descriptions) docs = DocList[MyDoc]( [ MyDoc(description=desc, description_embedding=embedding) for desc, embedding in zip(descriptions, desc_embeddings) ] ) # initialize document index with data db = InMemoryExactNNIndex[MyDoc](docs) # create a retriever retriever = DocArrayRetriever( index=db, embeddings=embeddings, search_field="description_embedding", content_field="description", ) # find the relevant document doc = retriever.get_relevant_documents("action movies") print(doc) ``` #### Who can review? @dev2049 --------- Signed-off-by: jupyterjazz <saba.sturua@jina.ai>	2023-06-17 09:09:33 -07:00
Harrison Chase	af18413d97	Harrison/deeplake new features (#6263 ) Co-authored-by: adilkhan <adilkhan.sarsen@nu.edu.kz> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-16 17:53:55 -07:00
ljeagle	ad324a39ae	Improve the performance of add_texts interface and upgrade the AwaDB from 0.3.2 to 0.3.3 (#6316 ) 1. Changed the implementation of add_texts interface for the AwaDB vector store in order to improve the performance 2. Upgrade the AwaDB from 0.3.2 to 0.3.3 --------- Co-authored-by: vincent <awadb.vincent@gmail.com>	2023-06-16 16:50:01 -07:00
Davis Chase	87e502c6bc	Doc refactor (#6300 ) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-16 11:52:56 -07:00

35 Commits