langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Roma	cb802edf75	[Feature] Add GraphQL Query Tool (#4409 ) # Add GraphQL Query Support This PR introduces a GraphQL API Wrapper tool that allows LLM agents to query GraphQL databases. The tool utilizes the httpx and gql Python packages to interact with GraphQL APIs and provides a simple interface for running queries with LLM agents. @vowelparrot --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-15 14:06:12 -07:00
Lester Yang	cd3f9865f3	Feature: pdfplumber PDF loader with BaseBlobParser (#4552 ) # Feature: pdfplumber PDF loader with BaseBlobParser * Adds pdfplumber as a PDF loader * Adds pdfplumber as a blob parser.	2023-05-15 09:47:02 -04:00
Harrison Chase	b6e3ac17c4	Harrison/sitemap local (#4704 ) Co-authored-by: Lukas Bauer <lukas.bauer@mayflower.de>	2023-05-14 22:04:38 -07:00
Harrison Chase	12b4ee1fc7	Harrison/telegram chat loader (#4698 ) Co-authored-by: Akinwande Komolafe <47945512+Sensei-akin@users.noreply.github.com> Co-authored-by: Akinwande Komolafe <akhinoz@gmail.com>	2023-05-14 22:04:27 -07:00
Leonid Ganeline	e17d0319d5	Add `arxiv` retriever (#4538 )	2023-05-11 22:48:38 -07:00
SimFG	7bcf238a1a	Optimize the initialization method of GPTCache (#4522 ) Optimize the initialization method of GPTCache, so that users can use GPTCache more quickly.	2023-05-11 16:15:23 -07:00
kYLe	0d51a1f12b	Add LLMs support for Anyscale Service (#4350 ) Add Anyscale service integration under LLM Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-11 00:39:59 -07:00
Evan Jones	f668251948	parameterized distance metrics; lint; format; tests (#4375 ) # Parameterize Redis vectorstore index Redis vectorstore allows for three different distance metrics: `L2` (flat L2), `COSINE`, and `IP` (inner product). Currently, the `Redis._create_index` method hard codes the distance metric to COSINE. I've parameterized this as an argument in the `Redis.from_texts` method -- pretty simple. Fixes #4368 ## Before submitting I've added an integration test showing indexes can be instantiated with all three values in the `REDIS_DISTANCE_METRICS` literal. An example notebook seemed overkill here. Normal API documentation would be more appropriate, but no standards are in place for that yet. ## Who can review? Not sure who's responsible for the vectorstore module... Maybe @eyurtsev / @hwchase17 / @agola11 ?	2023-05-11 00:20:01 -07:00
Davis Chase	9ec60ad832	Add azure cognitive search retriever (#4467 ) All credit to @UmerHA, made a couple small changes --------- Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>	2023-05-10 15:27:27 -07:00
Davis Chase	46b100ea63	Add DocArray vector stores (#4483 ) Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made few small changes to get it across the finish line --------- Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> Signed-off-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai> Co-authored-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>	2023-05-10 15:22:16 -07:00
Harrison Chase	b2f920e891	add tracing v2 env var (#4465 ) Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-05-10 11:08:29 -07:00
Matt Robinson	3637d6da6e	feat: add loader for open office odt files (#4405 ) # ODF File Loader Adds a data loader for handling Open Office ODT files. Requires `unstructured>=0.6.3`. ### Testing The following should work using the `fake.odt` example doc from the [`unstructured` repo](https://github.com/Unstructured-IO/unstructured). ```python from langchain.document_loaders import UnstructuredODTLoader loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements") loader.load() loader = UnstructuredODTLoader(file_path="fake.odt", mode="single") loader.load() ```	2023-05-10 01:37:17 -07:00
Rukmani	2b14036126	Update WhatsAppChatLoader to include the character ~ in the sender name (#4420 ) Fixes #4153 If the sender of a message in a group chat isn't in your contact list, they will appear with a ~ prefix in the exported chat. This PR adds support for parsing such lines.	2023-05-09 15:00:04 -07:00
Aivin V. Solatorio	6335cb5b3a	Add support for Qdrant nested filter (#4354 ) # Add support for Qdrant nested filter This extends the filter functionality for the Qdrant vectorstore. The current filter implementation is limited to a single-level metadata structure; however, Qdrant supports nested metadata filtering. This extends the functionality for users to maximize the filter functionality when using Qdrant as the vectorstore. Reference: https://qdrant.tech/documentation/filtering/#nested-key --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	2023-05-09 10:34:11 -07:00
Martin Holzhauer	872605a5c5	Add an option to extract more metadata from crawled websites (#4347 ) This pr makes it possible to extract more metadata from websites for later use. my usecase: parsing ld+json or microdata from sites and store it as structured data in the metadata field	2023-05-09 10:18:33 -07:00
Leonid Ganeline	ce15ffae6a	added `Wikipedia` retriever (#4302 ) - added `Wikipedia` retriever. It is effectively a wrapper for `WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents() - sorted `__all__` in the `retrievers/__init__` - added integration tests for the WikipediaRetriever - added an example (as Jupyter notebook) for the WikipediaRetriever	2023-05-09 10:08:39 -07:00
Eugene Yurtsev	2ceb807da2	Add PDF parser implementations (#4356 ) # Add PDF parser implementations This PR separates the data loading from the parsing for a number of existing PDF loaders. Parser tests have been designed to help encourage developers to create a consistent interface for parsing PDFs. This interface can be made more consistent in the future by adding information into the initializer on desired behavior with respect to splitting by page etc. This code is expected to be backwards compatible -- with the exception of a bug fix with pymupdf parser which was returning `bytes` in the page content rather than strings. Also changing the lazy parser method of document loader to return an Iterator rather than Iterable over documents. ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @ <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoader Abstractions - @eyurtsev LLM/Chat Wrappers - @hwchase17 - @agola11 Tools / Toolkits - @vowelparrot -->	2023-05-09 10:24:17 -04:00
Davis Chase	ba0057c077	Check OpenAI model kwargs (#4366 ) Handle duplicate and incorrectly specified OpenAI params Thanks @PawelFaron for the fix! Made small update Closes #4331 --------- Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com> Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>	2023-05-08 16:37:34 -07:00
Davis Chase	02ebb15c4a	Fix TextSplitter.from_tiktoken(#4361 ) Thanks to @danb27 for the fix! Minor update Fixes https://github.com/hwchase17/langchain/issues/4357 --------- Co-authored-by: Dan Bianchini <42096328+danb27@users.noreply.github.com>	2023-05-08 16:36:38 -07:00
Naveen Tatikonda	782df1db10	OpenSearch: Add Similarity Search with Score (#4089 ) ### Description Add `similarity_search_with_score` method for OpenSearch to return scores along with documents in the search results Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-05-08 16:35:21 -07:00
Jinto Jose	8a338412fa	mongodb support for chat history (#4266 )	2023-05-08 08:34:05 -07:00
Leonid Ganeline	9544b30821	added `Wikipedia` document loader (#4141 ) - Added the `Wikipedia` document loader. It is based on the existing `unilities/WikipediaAPIWrapper` - Added a respective ut-s and example notebook - Sorted list of classes in __init__	2023-05-06 09:32:45 -07:00
Davis Chase	5ca13cc1f0	Dev2049/pypdfium2 (#4209 ) thanks @jerrytigerxu for the addition! --------- Co-authored-by: Jere Xu <jtxu2008@gmail.com> Co-authored-by: jerrytigerxu <jere.tiger.xu@gmailc.om>	2023-05-05 17:55:31 -07:00
George	2324f19c85	Update qdrant interface (#3971 ) Hello 1) Passing `embedding_function` as a callable seems to be outdated and the common interface is to pass `Embeddings` instance 2) At the moment `Qdrant.add_texts` is designed to be used with `embeddings.embed_query`, which is 1) slow 2) causes ambiguity due to 1. It should be used with `embeddings.embed_documents` This PR solves both problems and also provides some new tests	2023-05-05 16:46:40 -07:00
Mike Wang	c3044b1bf0	[test] Add integration_test for PandasAgent (#4056 ) - confirm creation - confirm functionality with a simple dimension check. The test now is calling OpenAI API directly, but learning from @vowelparrot that we’re caching the requests, so that it’s not that expensive. I also found we’re calling OpenAI api in other integration tests. Please lmk if there is any concern of real external API calls. I can alternatively make a fake LLM for this test. Thanks	2023-05-05 14:49:02 -07:00
Aivin V. Solatorio	6567b73e1a	JSON loader (#4067 ) This implements a loader of text passages in JSON format. The `jq` syntax is used to define a schema for accessing the relevant contents from the JSON file. This requires dependency on the `jq` package: https://pypi.org/project/jq/. --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	2023-05-05 14:48:13 -07:00
hp0404	2a3c5f8353	Update WhatsAppChatLoader regex to handle multiple date-time formats (#4186 ) This PR updates the `message_line_regex` used by `WhatsAppChatLoader` to support different date-time formats used in WhatsApp chat exports; resolves #4153. The new regex handles the following input formats: ```terminal [05.05.23, 15:48:11] James: Hi here [11/8/21, 9:41:32 AM] User name: Message 123 1/23/23, 3:19 AM - User 2: Bye! 1/23/23, 3:22_AM - User 1: And let me know if anything changes ``` Tests have been added to verify that the loader works correctly with all formats.	2023-05-05 13:13:05 -07:00
Zander Chase	84cfa76e00	Update Cohere Reranker (#4180 ) The forward ref annotations don't get updated if we only iimport with type checking --------- Co-authored-by: Abhinav Verma <abhinav_win12@yahoo.co.in>	2023-05-05 09:11:37 -07:00
Harrison Chase	d4cf1eb60a	Add firestore memory (#3792 ) (#3941 ) If you have any other suggestions or feedback, please let me know. --------- Co-authored-by: yakigac <10434946+yakigac@users.noreply.github.com>	2023-05-03 22:55:47 -07:00
rogerserper	b1446bea5f	google-serper: async + full json results + support for Google Images, Places and News (#4078 ) * implemented arun, results, and aresults. Reuses aiosession if available. * helper tools GoogleSerperRun and GoogleSerperResults * support for Google Images, Places and News (examples given) and filtering based on time (e.g. past hour) * updated docs	2023-05-03 22:35:48 -07:00
hp0404	374725a715	Refactor TelegramChatLoader and FacebookChatLoader classes and add tests (#3863 ) This PR includes two main changes: - Refactor the `TelegramChatLoader` and `FacebookChatLoader` classes by removing the dependency on pandas and simplifying the message filtering process. - Add test cases for the `TelegramChatLoader` and `FacebookChatLoader` classes. This test ensures that the class correctly loads and processes the example chat data, providing better test coverage for this functionality.	2023-05-03 15:59:19 -07:00
Jon Saginaw	ea64b1716d	Enhancement: option to Get All Tokens with a single Blockchain Document Loader call (#3797 ) The Blockchain Document Loader's default behavior is to return 100 tokens at a time which is the Alchemy API limit. The Document Loader exposes a startToken that can be used for pagination against the API. This enhancement includes an optional get_all_tokens param (default: False) which will: - Iterate over the Alchemy API until it receives all the tokens, and return the tokens in a single call to the loader. - Manage all/most tokenId formats (this can be int, hex16 with zero or all the leading zeros). There aren't constraints as to how smart contracts can represent this value, but these three are most common. Note that a contract with 10,000 tokens will issue 100 calls to the Alchemy API, and could take about a minute, which is why this param will default to False. But I've been using the doc loader with these utilities on the side, so figured it might make sense to build them in for others to use.	2023-05-03 15:46:44 -07:00
Harrison Chase	a5dd73c1a6	Revert "[agent][property type] Change allowed_tools to Set as Duplicate doesn’t make sense" (#4014 ) Reverts hwchase17/langchain#3840	2023-05-02 18:58:05 -07:00
Harrison Chase	48ea27ba60	Harrison/blockwise sitemap (#3940 ) Co-authored-by: Martin Holzhauer <martin@holzhauer.eu>	2023-05-01 21:34:07 -07:00
Harrison Chase	f04faf8496	Harrison/spreedly (#3937 ) Co-authored-by: Esmit Pérez <esmitperez@users.noreply.github.com>	2023-05-01 20:56:56 -07:00
Zander Chase	c4cb55a0c5	[Breaking] Migrate GPT4All to use PyGPT4All (#3934 ) Seems the pyllamacpp package is no longer the supported bindings from gpt4all. Tested that this works locally. Given that the older models weren't very performant, I think it's better to migrate now without trying to include a lot of try / except blocks --------- Co-authored-by: Nissan Pow <npow@users.noreply.github.com> Co-authored-by: Nissan Pow <pownissa@amazon.com>	2023-05-01 20:42:45 -07:00
Mike Wang	ec21b7126c	[agent][property type] Change allowed_tools to Set as Duplicate doesn’t make sense (#3840 ) - ActionAgent has a property called, `allowed_tools`, which is declared as `List`. It stores all provided tools which is available to use during agent action. - This collection shouldn’t allow duplicates. The original datatype List doesn’t make sense. Each tool should be unique. Even when there are variants (assuming in the future), it would be named differently in load_tools. Test: - confirm the functionality in an example by initializing an agent with a list of 2 tools and confirm everything works. ```python3 def test_agent_chain_chat_bot(): from langchain.agents import load_tools from langchain.agents import initialize_agent from langchain.agents import AgentType from langchain.chat_models import ChatOpenAI from langchain.llms import OpenAI from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper chat = ChatOpenAI(temperature=0) llm = OpenAI(temperature=0) tools = load_tools(["ddg-search", "llm-math"], llm=llm) agent = initialize_agent(tools, chat, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True) agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?") test_agent_chain_chat_bot() ``` Result: <img width="863" alt="Screenshot 2023-05-01 at 7 58 11 PM" src="https://user-images.githubusercontent.com/62768671/235572157-0937594c-ddfb-4760-acb2-aea4cacacd89.png">	2023-05-01 20:30:10 -07:00
Davis Chase	e7e29f9937	Dev2049/add modern treasury (#3924 ) Modified Modern Treasury and Strip slightly so credentials don't have to be passed in explicitly. Thanks @mattgmarcus for adding Modern Treasury! --------- Co-authored-by: Matt Marcus <matt.g.marcus@gmail.com>	2023-05-01 20:28:02 -07:00
Davis Chase	5db6b796cf	Dev2049/hf emb encode kwargs (#3925 ) Thanks @amogkam for the addition! Refactored slightly --------- Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2023-05-01 20:27:41 -07:00
James Brotchie	921894960b	Add ChatModel, LLM, and Embeddings for Google's PaLM APIs (#3575 ) - Add langchain.llms.GooglePalm for text completion, - Add langchain.chat_models.ChatGooglePalm for chat completion, - Add langchain.embeddings.GooglePalmEmbeddings for sentence embeddings, - Add example field to HumanMessage and AIMessage so that users can feed in examples into the PaLM Chat API, - Add system and unit tests. Note async completion for the Text API is not yet supported and will be included in a future PR. Happy for feedback on any aspect of this PR, especially our choice of adding an example field to Human and AI Message objects to enable passing example messages to the API.	2023-05-01 15:23:16 -07:00
Davis Chase	2451310975	Chroma fix mmr (#3897 ) Fixes #3628, thanks @derekmoeller for the issue!	2023-05-01 10:47:15 -07:00
Zander Chase	19912d755e	Vwp/arxiv (#3855 ) Co-authored-by: Mike Wang <62768671+skcoirz@users.noreply.github.com>	2023-04-30 18:59:22 -07:00
Ankush Gola	d3ec00b566	Callbacks Refactor [base] (#3256 ) Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com> Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-30 11:14:09 -07:00
Mike Wang	ce4fea983b	[simple] added test case and improve self class return type annotation (#3773 ) a simple follow up of https://github.com/hwchase17/langchain/pull/3748 - added test case - improve annotation when function return type is class itself.	2023-04-28 21:54:07 -07:00
Harrison Chase	0c0f14407c	Harrison/tair (#3770 ) Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>	2023-04-28 21:25:33 -07:00
Harrison Chase	be7a8e0824	Harrison/redis cache (#3766 ) Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>	2023-04-28 20:47:18 -07:00
Mike Wang	b588446bf9	[simple][test] Added test case for schema.py (#3692 ) - added unittest for schema.py covering utility functions and token counting. - fixed a nit. based on huggingface doc, the tokenizer model is gpt-2. [link](https://huggingface.co/transformers/v4.8.2/_modules/transformers/models/gpt2/tokenization_gpt2_fast.html) - make lint && make format, passed on local - screenshot of new test running result <img width="1283" alt="Screenshot 2023-04-27 at 9 51 55 PM" src="https://user-images.githubusercontent.com/62768671/235057441-c0ac3406-9541-453f-ba14-3ebb08656114.png">	2023-04-28 20:42:24 -07:00
Jon Saginaw	f8d69e4e52	Enhancement: Blockchain Document Loader with better Metadata support (#3710 ) This PR includes some minor alignment updates, including: - metadata object extended to support contractAddress, blockchainType, and tokenId - notebook doc better aligned to standard langchain format - startToken changed from int to str to support multiple hex value types on the Alchemy API The updated metadata will look like the below. It's possible for a single contractAddress to exist across multiple blockchains (e.g. Ethereum, Polygon, etc.) so it's important to include the blockchainType. ``` metadata = {"source": self.contract_address, "blockchain": self.blockchainType, "tokenId": tokenId} ```	2023-04-28 20:13:05 -07:00
Davis Chase	220a7076ac	Add Mathpix pdf loader (#3727 ) Inspo https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-28 20:11:22 -07:00
Harrison Chase	40f6e60e68	Harrison/stripe (#3762 ) Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>	2023-04-28 20:03:21 -07:00
plutopulp	6d6fd1b9e1	Add PipelineAI LLM integration (#3644 ) Add PipelineAI LLM integration	2023-04-27 08:22:26 -07:00
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	2023-04-27 08:14:36 -07:00
Harrison Chase	ab749fa1bb	Harrison/opensearch logic (#3631 ) Co-authored-by: engineer-matsuo <95115586+engineer-matsuo@users.noreply.github.com>	2023-04-26 22:08:03 -07:00
Ehsan M. Kermani	4a246e2fd6	Allow clearing cache and fix gptcache (#3493 ) This PR * Adds `clear` method for `BaseCache` and implements it for various caches * Adds the default `init_func=None` and fixes gptcache integtest * Since right now integtest is not running in CI, I've verified the changes by running `docs/modules/models/llms/examples/llm_caching.ipynb` (until proper e2e integtest is done in CI)	2023-04-26 22:03:50 -07:00
cs0lar	440c98e24b	Fix/issue 2695 (#3608 ) ## Background fixes #2695 ## Changes The `add_text` method uses the internal embedding function if one was passes to the `Weaviate` constructor. NOTE: the latest merge on the `Weaviate` class made the specification of a `weaviate_api_key` mandatory which might not be desirable for all users and connection methods (for example weaviate also support Embedded Weaviate which I am happy to add support to here if people think it's desirable). I wrapped the fetching of the api key into a try catch in order to allow the `weaviate_api_key` to be unspecified. Do let me know if this is unsatisfactory. ## Test Plan added test for `add_texts` method.	2023-04-26 21:45:03 -07:00
leo-gan	36c59e0c25	`Arxiv` document loader (#3627 ) It makes sense to use `arxiv` as another source of the documents for downloading. - Added the `arxiv` document_loader, based on the `utilities/arxiv.py:ArxivAPIWrapper` - added tests - added an example notebook - sorted `__all__` in `__init__.py` (otherwise it is hard to find a class in the very long list)	2023-04-26 21:04:56 -07:00
Zander Chase	443a893ffd	Align names of search tools (#3620 ) Tools for Bing, DDG and Google weren't consistent even though the underlying implementations were. All three services now have the same tools and implementations to easily switch and experiment when building chains.	2023-04-26 16:21:34 -07:00
Maciej Bryński	aa345a4bb7	Add get_text_separator parameter to BSHTMLLoader (#3551 ) By default get_text doesn't separate content of different HTML tag. Adding option for specifying separator helps with document splitting.	2023-04-26 16:10:16 -07:00
Davis Chase	d18b0caf0e	Add Anthropic default request timeout (#3540 ) thanks @hitflame! --------- Co-authored-by: Wenqiang Zhao <hitzhaowenqiang@sina.com> Co-authored-by: delta@com <delta@com>	2023-04-25 11:40:41 -07:00
yakigac	f338d6251c	Add a test for cosmos db memory (#3525 ) Test for #3434 @eavanvalkenburg Initially, I was unaware and had submitted a pull request #3450 for the same purpose, but I have now repurposed the one I used for that. And it worked.	2023-04-25 08:10:02 -07:00
Harrison Chase	0fc0aa62f2	Harrison/blockchain docloader (#3491 ) Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com>	2023-04-25 08:07:06 -07:00
Harrison Chase	707741de58	Harrison/prediction guard (#3490 ) Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>	2023-04-24 22:27:22 -07:00
Harrison Chase	7257f9e015	Harrison/tfidf parameters (#3481 ) Co-authored-by: pao <go5kuramubon@gmail.com> Co-authored-by: KyoHattori <kyo.hattori@abejainc.com>	2023-04-24 22:19:58 -07:00
Harrison Chase	eda69b13f3	openai embeddings (#3488 )	2023-04-24 22:19:47 -07:00
Harrison Chase	408a0183cd	Harrison/weaviate (#3494 ) Co-authored-by: Nick Rubell <nick@rubell.com>	2023-04-24 22:15:32 -07:00
Zander Chase	416f3bdf11	Vwp/alpaca streaming (#3468 ) Co-authored-by: Luke Stanley <306671+lukestanley@users.noreply.github.com>	2023-04-24 16:27:51 -07:00
cs0lar	3033c6b964	fixes #1214 (#3003 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search_by_vector` method. ### Changes - a `max_marginal_relevance_search_by_vector` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search_by_vector` implementation ### Change Safety - [x] I have added tests to cover my changes	2023-04-24 11:50:55 -07:00
Davit Buniatyan	2c0023393b	Deep Lake mini upgrades (#3375 ) Improvements * set default num_workers for ingestion to 0 * upgraded notebooks for avoiding dataset creation ambiguity * added `force_delete_dataset_by_path` * bumped deeplake to 3.3.0 * creds arg passing to deeplake object that would allow custom S3 Notes * please double check if poetry is not messed up (thanks!) Asks * Would be great to create a shared slack channel for quick questions --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-23 21:23:54 -07:00
Zander Chase	20f530e9c5	Add Sentence Transformers Embeddings (#3409 ) Add embeddings based on the sentence transformers library. Add a notebook and integration tests. Co-authored-by: khimaros <me@khimaros.com>	2023-04-23 18:25:20 -07:00
Luke Harris	b4de839ed8	Several confluence loader improvements (#3300 ) This PR addresses several improvements: - Previously it was not possible to load spaces of more than 100 pages. The `limit` was being used both as an overall page limit and as a per request pagination limit. This, in combination with the fact that atlassian seem to use a server-side hard limit of 100 when page content is expanded, meant it wasn't possible to download >100 pages. Now `limit` is used only as a per-request pagination limit and `max_pages` is introduced as the way to limit the total number of pages returned by the paginator. - Document metadata now includes `source` (the source url), making it compatible with `RetrievalQAWithSourcesChain`. - It is now possible to include inline and footer comments. - It is now possible to pass `verify_ssl=False` and other parameters to the confluence object for use cases that require it.	2023-04-23 15:06:10 -07:00
Harrison Chase	a6664be79c	Harrison/myscale (#3352 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	2023-04-22 09:17:38 -07:00
Filip Haltmayer	215dcc2d26	Refactor Milvus/Zilliz (#3047 ) Refactoring milvus/zilliz to clean up and have a more consistent experience. Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>	2023-04-22 08:26:19 -07:00
Richy Wang	88a8f59aa7	Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135 ) Hi there！ I'm excited to open this PR to add support for using a fully Postgres syntax compatible database 'AnalyticDB' as a vector. As AnalyticDB has been proved can be used with AutoGPT, ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for you. AnalyticDB is a distributed Alibaba Cloud-Native vector database. It works better when data comes to large scale. The PR includes: - [x] A new memory: AnalyticDBVector - [x] A suite of integration tests verifies the AnalyticDB integration I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below - [x] make format - [x] make lint - [x] make coverage - [x] make test	2023-04-22 08:25:41 -07:00
Zander Chase	05a8aa5447	Fix linting on master (#3327 )	2023-04-21 15:49:46 -07:00
Varun Srinivas	d2f922f525	Change in method name for creating an issue on JIRA (#3307 ) The awesome JIRA tool created by @zywilliamli calls the `create_issue()` method to create issues, however, the actual method is `issue_create()`. Details in the Documentation here: https://atlassian-python-api.readthedocs.io/jira.html#manage-issues	2023-04-21 13:01:33 -07:00
Paul Garner	aa9d5707e0	Add PythonLoader which auto-detects encoding of Python files (#3311 ) This PR contributes a `PythonLoader`, which inherits from `TextLoader` but detects and sets the encoding automatically.	2023-04-21 10:47:57 -07:00
Davis Chase	2fd24d31a4	Cleanup integration test dir (#3308 )	2023-04-21 09:44:09 -07:00
Naveen Tatikonda	bb6c459f7a	OpenSearch: Add Support for Lucene Filter (#3201 ) ### Description Add Support for Lucene Filter. When you specify a Lucene filter for a k-NN search, the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. This filter is supported only for approximate search with the indexes that are created using `lucene` engine. OpenSearch Documentation - https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-20 20:42:53 -07:00
Davis Chase	46542dc774	Contextual compression retriever (#2915 ) Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-20 17:01:14 -07:00
Gabriel Altay	34fb56b633	fix copy/pasta typos wikipedia->arxiv (#3222 ) just updates a few module level docstrings from Wikipedia -> Arxiv	2023-04-20 07:15:41 -07:00
Harrison Chase	d2520a5f1e	Harrison/ddg (#3206 ) Co-authored-by: itai <itai.marks@gmail.com> Co-authored-by: Itai Marks <itaim@users.noreply.github.com> Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com> Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com> Co-authored-by: Justin Flick <Justinjayflick@gmail.com> Co-authored-by: Justin Flick <jflick@homesite.com>	2023-04-19 21:32:26 -07:00
Harrison Chase	9181cd9b22	Harrison/playwright selector (#3185 ) Co-authored-by: zhyuri <4649294+zhyuri@users.noreply.github.com>	2023-04-19 16:54:15 -07:00
Harrison Chase	68cd37175e	Harrison/arxiv tool (#3186 ) Co-authored-by: leo-gan <leo.gan.57@gmail.com>	2023-04-19 16:53:34 -07:00
Naveen Tatikonda	3453b7457c	OpenSearch: Add Support for Boolean Filter with ANN search (#3038 ) ### Description Add Support for Boolean Filter with ANN search Documentation - https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#boolean-filter-with-ann-search ### Issues Resolved https://github.com/hwchase17/langchain/issues/2924 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-17 20:26:26 -07:00
Harrison Chase	afd3e70ae5	Harrison/confluent loader (#2994 ) Co-authored-by: Justin Flick <Justinjayflick@gmail.com>	2023-04-17 20:23:45 -07:00
Jan Backes	a9310a3e8b	Add Annoy as VectorStore (#2939 ) Adds Annoy (https://github.com/spotify/annoy) as vector Store. RESOLVES hwchase17/langchain#2842 discord ref: https://discord.com/channels/1038097195422978059/1051632794427723827/1096089994168377354 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-16 13:44:04 -07:00
cs0lar	8b9e02da9d	Fix/issue 1213 (#2932 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search` method. ### Changes - a `max_marginal_relevance_search` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search` implementation ### Change Safety - [x] I have added tests to cover my changes	2023-04-16 13:11:30 -07:00
vowelparrot	4ffc58e07b	Add similarity_search_with_normalized_similarities (#2916 ) Add a method that exposes a similarity search with corresponding normalized similarity scores. Implement only for FAISS now. ### Motivation: Some memory definitions combine `relevance` with other scores, like recency , importance, etc. While many (but not all) of the `VectorStore`'s expose a `similarity_search_with_score` method, they don't all interpret the units of that score (depends on the distance metric and whether or not the the embeddings are normalized). This PR proposes a `similarity_search_with_normalized_similarities` method that lets consumers of the vector store not have to worry about the metric and embedding scale. Most providers default to euclidean distance, with Pinecone being one exception (defaults to cosine _similarity_). --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-15 21:06:08 -07:00
Davit Buniatyan	b3a5b51728	[minor] Deep Lake auth improvements in docs, kwargs pass, faster tests (#2927 ) Minor cosmetic changes - Activeloop environment cred authentication in notebooks with `getpass.getpass` (instead of CLI which not always works) - much faster tests with Deep Lake pytest mode on - Deep Lake kwargs pass Notes - I put pytest environment creds inside `vectorstores/conftest.py`, but feel free to suggest a better location. For context, if I put in `test_deeplake.py`, `ruff` doesn't let me to set them before import deeplake --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-15 10:49:16 -07:00
Ankush Gola	ec59e9d886	Fix ChatAnthropic stop_sequences error (#2919 ) (#2920 ) Note to self: Always run integration tests, even on "that last minute change you thought would be safe" :) --------- Co-authored-by: Mike Lambert <mike.lambert@anthropic.com>	2023-04-14 17:22:01 -07:00
Mike Lambert	392f1b3218	Add Anthropic ChatModel to langchain (#2293 ) * Adds an Anthropic ChatModel * Factors out common code in our LLMModel and ChatModel * Supports streaming llm-tokens to the callbacks on a delta basis (until a future V2 API does that for us) * Some fixes	2023-04-14 15:09:07 -07:00
Harrison Chase	1e9378d0a8	Harrison/weaviate fixes (#2872 ) Co-authored-by: cs0lar <cristiano.solarino@gmail.com> Co-authored-by: cs0lar <cristiano.solarino@brightminded.com>	2023-04-13 22:37:34 -07:00
sergerdn	04c458a270	feat: improve pinecone tests (#2806 ) Improve the integration tests for Pinecone by adding an `.env.example` file for local testing. Additionally, add some dev dependencies specifically for integration tests. This change also helps me understand how Pinecone deals with certain things, see related issues https://github.com/hwchase17/langchain/issues/2484 https://github.com/hwchase17/langchain/issues/2816	2023-04-13 21:49:31 -07:00
vowelparrot	bf0887c486	Add Slack Directory Loader (#2841 ) Fixes linting issue from #2835 Adds a loader for Slack Exports which can be a very valuable source of knowledge to use for internal QA bots and other use cases. ```py # Export data from your Slack Workspace first. from langchain.document_loaders import SLackDirectoryLoader SLACK_WORKSPACE_URL = "https://awesome.slack.com" loader = ("Slack_Exports", SLACK_WORKSPACE_URL) docs = loader.load() ```	2023-04-13 21:31:59 -07:00
Tim Asp	be4fb24b32	OpenAI LLM: update `modelname_to_contextsize` with new models (#2843 ) Token counts pulled from https://openai.com/pricing	2023-04-13 11:13:34 -07:00
Harrison Chase	e49f1e628c	Harrison/gpt cache (#2744 ) Co-authored-by: SimFG <bang.fu@zilliz.com>	2023-04-12 14:16:58 -07:00
Johnny Lee	0ab364404e	add continue to fix 'continue_on_failure' parameter for URL doc loader (#2735 ) Currently, the function still fails if `continue_on_failure` is set to True, because `elements` is not set. --------- Co-authored-by: leecjohnny <johnny-lee1255@users.noreply.github.com>	2023-04-11 21:12:39 -07:00
sergerdn	4bdcedab54	fix: some imports for integration tests (#2612 ) Add more missed imports for integration tests. Bump `pytest` to the current latest version. Fix `tests/integration_tests/vectorstores/test_elasticsearch.py` to update its cassette(easy fix). Related PR: https://github.com/hwchase17/langchain/pull/2560	2023-04-11 20:45:36 -07:00
Ankush Gola	c1521ddbdb	Add workaround for not having async vector store methods (#2733 ) This allows us to use the async API for the Retrieval chains, though it is not guaranteed to be thread safe.	2023-04-11 18:49:08 -07:00
vowelparrot	709f26b69e	Added bilibili loader (#2673 ) (#2724 ) I've added a bilibili loader, bilibili is a very active video site in China and I think we need this loader. Example: ```python from langchain.document_loaders.bilibili import BiliBiliLoader loader = BiliBiliLoader( ["https://www.bilibili.com/video/BV1xt411o7Xu/", "https://www.bilibili.com/video/av330407025/"] ) docs = loader.load() ``` Co-authored-by: 了空 <568250549@qq.com>	2023-04-11 10:40:32 -07:00
Naveen Tatikonda	4364d3316e	Add custom vector fields and text fields for OpenSearch (#2652 ) Description Add custom vector field name and text field name while indexing and querying for OpenSearch Issues https://github.com/hwchase17/langchain/issues/2500 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-10 21:02:02 -07:00
Chetanya Rastogi	50c511d75f	Add new loader to load pdf as html content (#2607 ) Adds a new pdf loader using the existing dependency on PDFMiner. The new loader can be helpful for chunking texts semantically into sections as the output html content can be parsed via `BeautifulSoup` to get more structured and rich information about font size, page numbers, pdf headers/footers, etc. which may not be available otherwise with other pdf loaders	2023-04-09 17:57:25 -07:00
sergerdn	6dc86ad48f	feat: add pytest-vcr for recording HTTP interactions in integration tests (#2445 ) Using `pytest-vcr` in integration tests has several benefits. Firstly, it removes the need to mock external services, as VCR records and replays HTTP interactions on the fly. Secondly, it simplifies the integration test setup by eliminating the need to set up and tear down external services in some cases. Finally, it allows for more reliable and deterministic integration tests by ensuring that HTTP interactions are always replayed with the same response. Overall, `pytest-vcr` is a valuable tool for simplifying integration test setup and improving their reliability This commit adds the `pytest-vcr` package as a dependency for integration tests in the `pyproject.toml` file. It also introduces two new fixtures in `tests/integration_tests/conftest.py` files for managing cassette directories and VCR configurations. In addition, the `tests/integration_tests/vectorstores/test_elasticsearch.py` file has been updated to use the `@pytest.mark.vcr` decorator for recording and replaying HTTP interactions. Finally, this commit removes the `documents` fixture from the `test_elasticsearch.py` file and replaces it with a new fixture defined in `tests/integration_tests/vectorstores/conftest.py` that yields a list of documents to use in any other tests. This also includes my second attempt to fix issue : https://github.com/hwchase17/langchain/issues/2386 Maybe related https://github.com/hwchase17/langchain/issues/2484	2023-04-07 07:28:57 -07:00
Alex Rad	bd780a8223	Add support for rwkv (#2422 ) This adds support for running RWKV with pytorch. https://github.com/hwchase17/langchain/issues/2398 This does not yet support rwkv.cpp	2023-04-06 14:41:06 -07:00
Davit Buniatyan	b4914888a7	Deep Lake upgrade to include attribute search, distance metrics, returning scores and MMR (#2455 ) ### Features include - Metadata based embedding search - Choice of distance metric function (`L2` for Euclidean, `L1` for Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot' for dot product. Defaults to `L2` - Returning scores - Max Marginal Relevance Search - Deleting samples from the dataset ### Notes - Added numerous tests, let me know if you would like to shorten them or make smarter --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-06 12:47:33 -07:00
leo-gan	fd69cc7e42	Removed duplicate BaseModel dependencies (#2471 ) Removed duplicate BaseModel dependencies in class inheritances. Also, sorted imports by `isort`.	2023-04-06 12:45:16 -07:00
sergerdn	b410dc76aa	fix: elasticsearch (#2402 ) - Create a new docker-compose file to start an Elasticsearch instance for integration tests. - Add new tests to `test_elasticsearch.py` to verify Elasticsearch functionality. - Include an optional group `test_integration` in the `pyproject.toml` file. This group should contain dependencies for integration tests and can be installed using the command `poetry install --with test_integration`. Any new dependencies should be added by running `poetry add some_new_deps --group "test_integration" ` Note: New tests running in live mode, which involve end-to-end testing of the OpenAI API. In the future, adding `pytest-vcr` to record and replay all API requests would be a nice feature for testing process.More info: https://pytest-vcr.readthedocs.io/en/latest/ Fixes https://github.com/hwchase17/langchain/issues/2386	2023-04-05 06:51:32 -07:00
Harrison Chase	0a9f04bad9	Harrison/gpt4all (#2366 ) Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-04-04 06:49:17 -07:00
Harrison Chase	e90d007db3	Harrison/msg files (#2375 ) Co-authored-by: Sahil Masand <masand.sahil@gmail.com> Co-authored-by: Sahil Masand <masands@cbh.com.au>	2023-04-04 06:48:34 -07:00
Kacper Łukawski	585f60a5aa	Qdrant update to 1.1.1 & docs polishing (#2388 ) This PR updates Qdrant to 1.1.1 and introduces local mode, so there is no need to spin up the Qdrant server. By that occasion, the Qdrant example notebooks also got updated, covering more cases and answering some commonly asked questions. All the Qdrant's integration tests were switched to local mode, so no Docker container is required to launch them.	2023-04-04 06:48:21 -07:00
Harrison Chase	d85f57ef9c	Harrison/llama (#2314 ) Co-authored-by: RJ Adriaansen <adriaansen@eshcc.eur.nl>	2023-04-02 14:57:45 -07:00
akmhmgc	94b2f536f3	Modify output for wikipedia api wrapper (#2287 ) ## Description Thanks for the quick maintenance for great repository!! I modified wikipedia api wrapper ## Details - Add output for missing search results - Add tests	2023-04-02 14:00:27 -07:00
Sam Cordner-Matthews	1ddd6dbf0b	Add ability to pass kwargs to loader classes in `DirectoryLoader`, add ability to modify encoding and BeautifulSoup behaviour in `BSHTMLLoader` (#2275 ) Solves #2247. Noted that the only test I added checks for the BeautifulSoup behaviour change. Happy to add a test for `DirectoryLoader` if deemed necessary.	2023-04-01 12:48:27 -07:00
Harrison Chase	b35260ed47	Harrison/memory base (#2122 ) @3coins + @zoltan-fedor.... heres the pr + some minor changes i made. thoguhts? can try to get it into tmrws release --------- Co-authored-by: Zoltan Fedor <zoltan.0.fedor@gmail.com> Co-authored-by: Piyush Jain <piyushjain@duck.com>	2023-03-29 10:10:09 -07:00
Ankush Gola	ccee1aedd2	add async support for anthropic (#2114 ) should not be merged in before https://github.com/anthropics/anthropic-sdk-python/pull/11 gets released	2023-03-28 22:49:14 -04:00
Harrison Chase	3e879b47c1	Harrison/gitbook (#2044 ) Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>	2023-03-28 15:28:33 -07:00
Honkware	aff33d52c5	Add OpenWeatherMap API Tool (#2083 ) Added tool for OpenWeatherMap API	2023-03-28 12:02:14 -07:00
Charlie Holtz	f16c1fb6df	Add replicate take 2 (#2077 ) This PR adds a replicate integration to langchain. It's an updated version of https://github.com/hwchase17/langchain/pull/1993, but with updates to match latest replicate-python code. https://github.com/replicate/replicate-python. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Zeke Sikelianos <zeke@sikelianos.com>	2023-03-28 11:56:57 -07:00
Harrison Chase	f281033362	rm pandas dependency (#2102 )	2023-03-28 08:38:19 -07:00
Harrison Chase	410bf37fb8	Harrison/big query (#2100 ) Co-authored-by: lu-cashmoney <lucas.corley@gmail.com>	2023-03-28 08:17:22 -07:00
Harrison Chase	eff5eed719	Harrison/jina (#2043 ) Co-authored-by: numb3r3 <wangfelix87@gmail.com> Co-authored-by: felix-wang <35718120+numb3r3@users.noreply.github.com>	2023-03-28 08:16:17 -07:00
Harrison Chase	f74a1bebf5	Harrison/duckdb (#2064 ) Co-authored-by: Trent Hauck <trent@trenthauck.com>	2023-03-27 19:51:34 -07:00
Ankush Gola	b7ebb8fe30	enable streaming in anthropic llm wrapper (#2065 )	2023-03-27 20:25:00 -04:00
Harrison Chase	a0cd6672aa	Harrison/site map (#2061 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-03-27 16:28:08 -07:00
Mario Kostelac	e7d6de6b1c	(ChatOpenAI) Add model_name to LLMResult.llm_output (#1960 ) This makes sure OpenAI and ChatOpenAI have the same llm_output, and allow tracking usage per model. Same work for OpenAI was done in https://github.com/hwchase17/langchain/pull/1713.	2023-03-24 08:51:16 -07:00
Harrison Chase	6e1b5b8f7e	Harrison/figma doc loader (#1908 ) Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>	2023-03-22 19:57:46 -07:00
Eli	12f868b292	Propagate "filter" arg in Chroma similarity_search (#1869 ) Technically a duplicate fix to #1619 but with unit tests and a small documentation update - Propagate `filter` arg in Chroma `similarity_search` to delegated call to `similarity_search_with_score` - Add `filter` arg to `similarity_search_by_vector` - Clarify doc strings on FakeEmbeddings	2023-03-22 19:40:10 -07:00
Maurício Maia	f155d9d3ec	Add metadata filter to PGVector search (#1872 ) Add ability to filter pgvector documents by metadata.	2023-03-22 15:21:40 -07:00
Maurício Maia	2212520a6c	Add PGVector collection metadata (#1887 ) The `CollectionStore` for `PGVector` has a `cmetadata` field but it's never used. This PR add the ability to save metadata information to the collection.	2023-03-22 11:27:07 -07:00
LeoGrin	3701b2901e	use namespace argument in Pinecone constructor (#1757 ) Fix #1756 Use the `namespace` argument of `Pinecone.from_exisiting_index` to set the default value of `namespace` for other methods. Leads to more expected behavior and easier integration in chains. For the test, I've added a line to delete and rebuild the `langchain-demo` index at the beginning of the test. I'm not 100% sure if it's a good idea but it makes the test reproducible.	2023-03-18 19:55:38 -07:00
Mario Kostelac	aff44d0a98	(OpenAI) Add model_name to LLMResult.llm_output (#1713 ) Given that different models have very different latencies and pricings, it's benefitial to pass the information about the model that generated the response. Such information allows implementing custom callback managers and track usage and price per model. Addresses https://github.com/hwchase17/langchain/issues/1557.	2023-03-16 21:55:55 -07:00
Daniel Chalef	b157e0c1c3	Add HTML document_loader that includes page title metadata (#1720 ) This `BSHTMLLoader` document_loader loads an HTML document, extracts text and adds the page title to the returned Document's metadata. The loader uses the already installed bs4 package to extract both text content and the page title. Included in this PR is an example HTML file and an integration test that tests against this file. --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org>	2023-03-16 21:47:17 -07:00
Kacper Łukawski	4a327dd1d6	Implement basic metadata filtering in Qdrant (#1689 ) This PR implements a basic metadata filtering mechanism similar to the ones in Chroma and Pinecone. It still cannot express complex conditions, as there are no operators, but some users requested to have that feature available.	2023-03-15 07:31:39 -07:00
Harrison Chase	0b29e68c17	Harrison/pgvector (#1679 ) Co-authored-by: Aman Kumar <krsingh.aman@gmail.com>	2023-03-14 21:13:58 -07:00
Xin Qiu	4e13cef05a	feat: add redisearch vectorstore (#1307 ) # Description Add `RediSearch` vectorstore for LangChain RediSearch: [RediSearch quick start](https://redis.io/docs/stack/search/quick_start/) # How to use ``` from langchain.vectorstores.redisearch import RediSearch rds = RediSearch.from_documents(docs, embeddings,redisearch_url="redis://localhost:6379") ```	2023-03-14 18:06:03 -07:00
Tim Asp	b3234bf3b0	cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615 ) `OnlinePDFLoader` and `PagedPDFSplitter` lived separate from the rest of the pdf loaders. Because they're all similar, I propose moving all to `pdy.py` and the same docs/examples page. Additionally, `PagedPDFSplitter` naming doesn't match the pattern the rest of the loaders follow, so I renamed to `PyPDFLoader` and had it inherit from `BasePDFLoader` so it can now load from remote file sources.	2023-03-13 23:06:50 -07:00
Harrison Chase	cb04ba0136	Add support for intermediate steps to SQLDatabaseSequentialChain (#1583 ) (#1601 ) for https://github.com/hwchase17/langchain/issues/1582 I simply added the `return_intermediate_steps` and changed the `output_keys` function. I added 2 simple tests, 1 for SQLDatabaseSequentialChain without the intermediate steps and 1 with Co-authored-by: brad-nemetski <115185478+brad-nemetski@users.noreply.github.com>	2023-03-11 15:44:41 -08:00
Harrison Chase	3ee32a01ea	Harrison/prompt layer (#1547 ) Co-authored-by: Jonathan Pedoeem <jonathanped@gmail.com> Co-authored-by: AbuBakar <abubakarsohail123@gmail.com>	2023-03-08 21:24:27 -08:00
Harrison Chase	c844d1fd46	Harrison/chunk size (#1549 ) Co-authored-by: Florian Leuerer <31259070+floleuerer@users.noreply.github.com>	2023-03-08 21:24:18 -08:00
Harrison Chase	357d808484	Harrison/remote paths pdf (#1544 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-03-08 20:53:37 -08:00
Ankush Gola	27104d4921	fix `ChatOpenAI.agenerate` (#1504 )	2023-03-07 15:22:05 -08:00
Harrison Chase	7bec461782	Harrison/memory refactor (#1478 ) moves memory to own module, factors out common stuff	2023-03-07 07:59:37 -08:00
Harrison Chase	0e21463f07	(rfc) chat models (#1424 ) Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-03-06 08:34:24 -08:00
Harrison Chase	a1b9dfc099	Harrison/similarity search chroma (#1434 ) Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>	2023-03-04 08:10:15 -08:00
Tim Asp	23231d65a9	Add PyMuPDF PDF loader (#1426 ) Different PDF libraries have different strengths and weaknesses. PyMuPDF does a good job at extracting the most amount of content from the doc, regardless of the source quality, extremely fast (especially compared to Unstructured). https://pymupdf.readthedocs.io/en/latest/index.html	2023-03-03 20:59:28 -08:00
Nuno Campos	499e76b199	Allow the regular openai class to be used for ChatGPT models (#1393 ) Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-03-02 09:04:18 -08:00
Kacper Łukawski	9ac442624c	Add Qdrant named arguments (#1386 ) This PR: - Increases `qdrant-client` version to 1.0.4 - Introduces custom content and metadata keys (as requested in #1087) - Moves all the `QdrantClient` parameters into the method parameters to simplify code completion	2023-03-02 07:05:14 -08:00
Ankush Gola	fe30be6fba	add async and streaming support to `OpenAIChat` (#1378 ) title says it all	2023-03-01 21:55:43 -08:00
Tim Asp	72ef69d1ba	Add new iFixit document loader (#1333 ) iFixit is a wikipedia-like site that has a huge amount of open content on how to fix things, questions/answers for common troubleshooting and "things" related content that is more technical in nature. All content is licensed under CC-BY-SA-NC 3.0 Adding docs from iFixit as context for user questions like "I dropped my phone in water, what do I do?" or "My macbook pro is making a whining noise, what's wrong with it?" can yield significantly better responses than context free response from LLMs.	2023-02-27 20:40:20 -08:00
Harrison Chase	166cda2cc6	Harrison/deeplake (#1316 ) Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-02-26 22:35:04 -08:00
Harrison Chase	aaad6cc954	Harrison/atlas db (#1315 ) Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>	2023-02-26 22:11:38 -08:00
Enrico Shippole	9becdeaadf	Add Writer, Banana, Modal, StochasticAI (#1270 ) Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI Added rigid json format for Banana and Modal	2023-02-24 06:58:58 -08:00
Dennis Antela Martinez	53c67e04d4	add aleph alpha llm (#1207 ) Integrate Aleph Alpha's client into Langchain to provide access to the luminous models - more info on latest benchmarks here: https://www.aleph-alpha.com/luminous-performance-benchmarks	2023-02-22 10:37:36 -08:00
Harrison Chase	44c8d8a9ac	move serpapi wrapper (#1199 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-02-20 21:15:45 -08:00
Naveen Tatikonda	0118706fd6	Add Support for OpenSearch Vector database (#1191 ) ### Description This PR adds a wrapper which adds support for the OpenSearch vector database. Using opensearch-py client we are ingesting the embeddings of given text into opensearch cluster using Bulk API. We can perform the `similarity_search` on the index using the 3 popular searching methods of OpenSearch k-NN plugin: - `Approximate k-NN Search` use approximate nearest neighbor (ANN) algorithms from the [nmslib](https://github.com/nmslib/nmslib), [faiss](https://github.com/facebookresearch/faiss), and [Lucene](https://lucene.apache.org/) libraries to power k-NN search. - `Script Scoring` extends OpenSearch’s script scoring functionality to execute a brute force, exact k-NN search. - `Painless Scripting` adds the distance functions as painless extensions that can be used in more complex combinations. Also, supports brute force, exact k-NN search like Script Scoring. ### Issues Resolved https://github.com/hwchase17/langchain/issues/1054 --------- Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-02-20 18:39:34 -08:00
Andrew White	c5015d77e2	Allow k to be higher than doc size in max_marginal_relevance_search (#1187 ) Fixes issue #1186. For some reason, #1117 didn't seem to fix it.	2023-02-20 16:39:13 -08:00
Harrison Chase	9d6d8f85da	Harrison/self hosted runhouse (#1154 ) Co-authored-by: Donny Greenberg <dongreenberg2@gmail.com> Co-authored-by: John Dagdelen <jdagdelen@users.noreply.github.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net> Co-authored-by: Andrew White <white.d.andrew@gmail.com> Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com> Co-authored-by: Matt Robinson <mthw.wm.robinson@gmail.com> Co-authored-by: jeff <tangj1122@gmail.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local> Co-authored-by: zanderchase <zander@unfold.ag> Co-authored-by: Charles Frye <cfrye59@gmail.com> Co-authored-by: zanderchase <zanderchase@gmail.com> Co-authored-by: Shahriar Tajbakhsh <sh.tajbakhsh@gmail.com> Co-authored-by: Stefan Keselj <skeselj@princeton.edu> Co-authored-by: Francisco Ingham <fpingham@gmail.com> Co-authored-by: Dhruv Anand <105786647+dhruv-anand-aintech@users.noreply.github.com> Co-authored-by: cragwolfe <cragcw@gmail.com> Co-authored-by: Anton Troynikov <atroyn@users.noreply.github.com> Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Oliver Klingefjord <oliver@klingefjord.com> Co-authored-by: blob42 <contact@blob42.xyz> Co-authored-by: blob42 <spike@w530> Co-authored-by: Enrico Shippole <henryshippole@gmail.com> Co-authored-by: Ibis Prevedello <ibiscp@gmail.com> Co-authored-by: jped <jonathanped@gmail.com> Co-authored-by: Justin Torre <justintorre75@gmail.com> Co-authored-by: Ivan Vendrov <ivan@anthropic.com> Co-authored-by: Sasmitha Manathunga <70096033+mmz-001@users.noreply.github.com> Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com> Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io> Co-authored-by: Jeff Huber <jeffchuber@gmail.com> Co-authored-by: Akshay <64036106+akshayvkt@users.noreply.github.com> Co-authored-by: Andrew Huang <jhuang16888@gmail.com> Co-authored-by: rogerserper <124558887+rogerserper@users.noreply.github.com> Co-authored-by: seanaedmiston <seane999@gmail.com> Co-authored-by: Hasegawa Yuya <52068175+Hase-U@users.noreply.github.com> Co-authored-by: Ivan Vendrov <ivendrov@gmail.com> Co-authored-by: Chen Wu (吴尘) <henrychenwu@cmu.edu> Co-authored-by: Dennis Antela Martinez <dennis.antela@gmail.com> Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr> Co-authored-by: Rishabh Raizada <110235735+rishabh-ti@users.noreply.github.com>	2023-02-19 09:53:45 -08:00
Noah Gundotra	8c5fbab72d	[Integration Tests] Cast fake embeddings to ALL float values (#1102 ) Pydantic validation breaks tests for example (`test_qdrant.py`) because fake embeddings contain an integer. This PR casts the embeddings array to all floats. Now the `qdrant` test passes, `poetry run pytest tests/integration_tests/vectorstores/test_qdrant.py`	2023-02-17 15:18:09 -08:00
yakigac	1ed708391e	Fix a bug that shows "KeyError 'items'" (#1118 ) Fix KeyError 'items' when no result found. ## Problem When no result found for a query, google search crashed with `KeyError 'items'`. ## Solution I added a check for an empty response before accessing the 'items' key. It will handle the case correctly. ## Other my twitter: yakigac (I don't mind even if you don't mention me for this PR. But just because last time my real name was shout out :) )	2023-02-17 13:04:02 -08:00
Hasegawa Yuya	e08961ab25	Fixed openai embeddings to be safe by batching them based on token size calculation. (#991 ) I modified the logic of the batch calculation for embedding according to this cookbook https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb	2023-02-15 23:02:32 -08:00
seanaedmiston	f0a258555b	Support similarity search by vector (in FAISS) (#961 ) Alternate implementation to PR #960 Again - only FAISS is implemented. If accepted can add this to other vectorstores or leave as NotImplemented? Suggestions welcome...	2023-02-15 22:50:00 -08:00
rogerserper	e46cd3b7db	Google Search API integration with serper.dev (wrapper, tests, docs, … (#909 ) Adds Google Search integration with [Serper](https://serper.dev) a low-cost alternative to SerpAPI (10x cheaper + generous free tier). Includes documentation, tests and examples. Hopefully I am not missing anything. Developers can sign up for a free account at [serper.dev](https://serper.dev) and obtain an api key. ## Usage ```python from langchain.utilities import GoogleSerperAPIWrapper from langchain.llms.openai import OpenAI from langchain.agents import initialize_agent, Tool import os os.environ["SERPER_API_KEY"] = "" os.environ['OPENAI_API_KEY'] = "" llm = OpenAI(temperature=0) search = GoogleSerperAPIWrapper() tools = [ Tool( name="Intermediate Answer", func=search.run ) ] self_ask_with_search = initialize_agent(tools, llm, agent="self-ask-with-search", verbose=True) self_ask_with_search.run("What is the hometown of the reigning men's U.S. Open champion?") ``` ### Output ``` Entering new AgentExecutor chain... Yes. Follow up: Who is the reigning men's U.S. Open champion? Intermediate answer: Current champions Carlos Alcaraz, 2022 men's singles champion. Follow up: Where is Carlos Alcaraz from? Intermediate answer: El Palmar, Spain So the final answer is: El Palmar, Spain > Finished chain. 'El Palmar, Spain' ```	2023-02-15 22:47:17 -08:00
Ankush Gola	caa8e4742e	Enable streaming for OpenAI LLM (#986 ) * Support a callback `on_llm_new_token` that users can implement when `OpenAI.streaming` is set to `True`	2023-02-14 15:06:14 -08:00
Harrison Chase	88bebb4caa	Harrison/llm integrations (#1039 ) Co-authored-by: jped <jonathanped@gmail.com> Co-authored-by: Justin Torre <justintorre75@gmail.com> Co-authored-by: Ivan Vendrov <ivan@anthropic.com>	2023-02-13 22:06:25 -08:00
Enrico Shippole	f30dcc6359	Add GooseAI, CerebriumAI, Petals, ForefrontAI (#981 ) Add GooseAI, CerebriumAI, Petals, ForefrontAI	2023-02-13 21:20:19 -08:00
Anton Troynikov	d43d430d86	Chroma persistence (#1028 ) This PR adds persistence to the Chroma vector store. Users can supply a `persist_directory` with any of the `Chroma` creation methods. If supplied, the store will be automatically persisted at that directory. If a user creates a new `Chroma` instance with the same persistence directory, it will get loaded up automatically. If they use `from_texts` or `from_documents` in this way, the documents will be loaded into the existing store. There is the chance of some funky behavior if the user passes a different embedding function from the one used to create the collection - we will make this easier in future updates. For now, we log a warning.	2023-02-13 21:09:06 -08:00
Anton Troynikov	78abd277ff	Chroma in LangChain (#1010 ) Chroma is a simple to use, open-source, zero-config, zero setup vectorstore. Simply `pip install chromadb`, and you're good to go. Out-of-the-box Chroma is suitable for most LangChain workloads, but is highly flexible. I tested to 1M embs on my M1 mac, with out issues and reasonably fast query times. Look out for future releases as we integrate more Chroma features with LangChain!	2023-02-12 17:43:48 -08:00
Harrison Chase	c64f98e2bb	Harrison/format agent instructions (#973 ) Co-authored-by: Andrew White <white.d.andrew@gmail.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net> Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com>	2023-02-10 10:07:26 -08:00
Harrison Chase	91c6cea227	Harrison/batch embeds (#972 ) Co-authored-by: John Dagdelen <jdagdelen@users.noreply.github.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-10 06:59:50 -08:00
Ankush Gola	bc7e56e8df	Add asyncio support for LLM (OpenAI), Chain (LLMChain, LLMMathChain), and Agent (#841 ) Supporting asyncio in langchain primitives allows for users to run them concurrently and creates more seamless integration with asyncio-supported frameworks (FastAPI, etc.) Summary of changes: LLM * Add `agenerate` and `_agenerate` * Implement in OpenAI by leveraging `client.Completions.acreate` Chain * Add `arun`, `acall`, `_acall` * Implement them in `LLMChain` and `LLMMathChain` for now Agent * Refactor and leverage async chain and llm methods * Add ability for `Tools` to contain async coroutine * Implement async SerpaPI `arun` Create demo notebook. Open questions: * Should all the async stuff go in separate classes? I've seen both patterns (keeping the same class and having async and sync methods vs. having class separation)	2023-02-07 21:21:57 -08:00
Harrison Chase	bc53c928fc	Harrison/athropic (#921 ) Co-authored-by: Mike Lambert <mlambert@gmail.com> Co-authored-by: mrbean <sam@you.com> Co-authored-by: mrbean <43734688+sam-h-bean@users.noreply.github.com> Co-authored-by: Ivan Vendrov <ivendrov@gmail.com>	2023-02-06 22:29:25 -08:00
Harrison Chase	1e56879d38	Harrison/save faiss (#916 ) Co-authored-by: Shrey Joshi <shreyjoshi2004@gmail.com>	2023-02-06 21:44:50 -08:00
Harrison Chase	ba5a2f06b9	Harrison/inference endpoint (#861 ) Co-authored-by: Eno Reyes <enoreyes@gmail.com>	2023-02-06 18:14:25 -08:00
Kevin Huo	31b054f69d	Add pinecone integration test (#911 ) Basic integration test for pinecone	2023-02-06 18:13:35 -08:00
Harrison Chase	3f48eed5bd	Harrison/milvus (#856 ) Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com> Signed-off-by: Frank Liu <frank.liu@zilliz.com> Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com> Co-authored-by: Frank Liu <frank@frankzliu.com>	2023-02-02 22:05:47 -08:00
kahkeng	4a8f5cdf4b	Add alternative token-based text splitter (#816 ) This does not involve a separator, and will naively chunk input text at the appropriate boundaries in token space. This is helpful if we have strict token length limits that we need to strictly follow the specified chunk size, and we can't use aggressive separators like spaces to guarantee the absence of long strings. CharacterTextSplitter will let these strings through without splitting them, which could cause overflow errors downstream. Splitting at arbitrary token boundaries is not ideal but is hopefully mitigated by having a decent overlap quantity. Also this results in chunks which has exact number of tokens desired, instead of sometimes overcounting if we concatenate shorter strings. Potentially also helps with #528.	2023-02-02 19:55:13 -08:00
Harrison Chase	23d5f64bda	Harrison/ngram example (#846 ) Co-authored-by: Sean Spriggens <ssprigge@syr.edu>	2023-02-02 09:44:42 -08:00
Harrison Chase	d564308e0f	rfc: instruct embeddings (#811 ) Co-authored-by: seanaedmiston <seane999@gmail.com>	2023-02-02 08:44:02 -08:00
Harrison Chase	7b4882a2f4	Harrison/tf embeddings (#817 ) Co-authored-by: Ryohei Kuroki <10434946+yakigac@users.noreply.github.com>	2023-01-31 00:00:08 -08:00
dham	e04b063ff4	add faiss local saving/loading (#676 ) - This uses the faiss built-in `write_index` and `load_index` to save and load faiss indexes locally - Also fixes #674 - The save/load functions also use the faiss library, so I refactored the dependency into a function	2023-01-21 16:08:14 -08:00
Harrison Chase	0b204d8c21	Harrison/quadrant (#665 ) Co-authored-by: Kacper Łukawski <kacperlukawski@users.noreply.github.com>	2023-01-20 09:45:01 -08:00
Harrison Chase	4d4cff0530	Harrison/cohere experimental (#638 ) Co-authored-by: inyourhead <44607279+xettrisomeman@users.noreply.github.com>	2023-01-17 22:28:55 -08:00
Harrison Chase	ffc7e04d44	Harrison/wolfram alpha (#579 ) Co-authored-by: Nicolas <nicolascamara29@gmail.com>	2023-01-11 05:52:19 -08:00
Harrison Chase	0072686aab	Harrison/new search engine (#477 ) Co-authored-by: Nicolas <nicolascamara29@gmail.com>	2022-12-30 08:06:57 -05:00
Harrison Chase	f8b605293f	Harrison/improve memory (#432 ) add AI prefix add new type of memory Co-authored-by: Jason <chisanch@usc.edu>	2022-12-27 08:23:51 -05:00
Harrison Chase	cf98f219f9	Harrison/tools exp (#372 )	2022-12-18 21:51:23 -05:00
Harrison Chase	3474f39e21	Harrison/improve cache (#368 ) make it so everything goes through generate, which removes the need for two types of caches	2022-12-18 16:22:42 -05:00
Harrison Chase	a7084ad6e4	Harrison/version 0040 (#366 )	2022-12-17 07:53:22 -08:00
mrbean	50257fce59	Support Streaming Tokens from OpenAI (#364 ) https://github.com/hwchase17/langchain/issues/363 @hwchase17 how much does this make you want to cry?	2022-12-17 07:02:58 -08:00
mrbean	fe6695b9e7	Add HuggingFacePipeline LLM (#353 ) https://github.com/hwchase17/langchain/issues/354 Add support for running your own HF pipeline locally. This would allow you to get a lot more dynamic with what HF features and models you support since you wouldn't be beholden to what is hosted in HF hub. You could also do stuff with HF Optimum to quantize your models and stuff to get pretty fast inference even running on a laptop.	2022-12-17 07:00:04 -08:00
Harrison Chase	9bb7195085	Harrison/llm saving (#331 ) Co-authored-by: Akash Samant <70665700+asamant21@users.noreply.github.com>	2022-12-13 06:46:01 -08:00
Harrison Chase	3ca2c8d6c5	allow passing of stop params into openai (#232 )	2022-11-30 22:20:13 -08:00
Harrison Chase	ca2394028f	move search to not be a chain (#226 )	2022-11-29 20:07:44 -08:00
Andrew Gleave	ea67c049f0	Support SQL statements that return no results (#222 ) Adds support for statements such as insert, update etc which do not return any rows. `engine.execute` is deprecated and so execution has been updated to use `connection.exec_driver_sql` as-per: https://docs.sqlalchemy.org/en/14/core/connections.html#sqlalchemy.engine.Engine.execute	2022-11-29 08:28:45 -08:00
Harrison Chase	1b9b8efbc9	pal chain (#207 ) from https://arxiv.org/pdf/2211.10435.pdf	2022-11-28 21:38:34 -08:00
Harrison Chase	b94244eb12	nits (#210 ) use json.dump move test to integration tests (since it requires huggingface_hub)	2022-11-27 13:03:09 -08:00
Bagatur	b90e25f786	Add HuggingFace Hub Embeddings (#125 ) Add support for calling HuggingFace embedding models using the HuggingFaceHub Inference API. New class mirrors the existing HuggingFaceHub LLM implementation. Currently only supports 'sentence-transformers' models. Closes #86	2022-11-27 00:24:59 -08:00
Harrison Chase	ae9c6257fe	Harrison/arbitrary params (#186 )	2022-11-24 20:01:20 -08:00
Harrison Chase	d3a7429f61	(WIP) agents (#171 )	2022-11-22 06:16:26 -08:00
Samantha Whitmore	315b0c09c6	wip: add method for both docstore and embeddings (#119 ) this will break atm but wanted to get thoughts on implementation. 1. should add() be on docstore interface? 2. should InMemoryDocstore change to take a list of documents as init? (makes this slightly easier to implement in FAISS -- if we think it is less clean then could expose a method to get the number of documents currently in the dict, and perform the logic of creating the necessary dictionary in the FAISS.add_texts method. Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2022-11-20 16:23:58 -08:00

... 2 3 4 5 6 ...

371 Commits