langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Alon Diament	44abe925df	Add Joplin document loader (#5153 ) # Add Joplin document loader [Joplin](https://joplinapp.org/) is an open source note-taking app. Joplin has a [REST API](https://joplinapp.org/api/references/rest_api/) for accessing its local database. The proposed `JoplinLoader` uses the API to retrieve all notes in the database and their metadata. Joplin needs to be installed and running locally, and an access token is required. - The PR includes an integration test. - The PR includes an example notebook. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 12:31:55 -07:00
Davis Chase	2b2176a3c1	tfidf retriever (#5114 ) Co-authored-by: vempaliakhil96 <vempaliakhil96@gmail.com>	2023-05-24 10:02:09 -07:00
Harrison Chase	11c26ebb55	Harrison/modelscope (#5156 ) Co-authored-by: thomas-yanxin <yx20001210@163.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 08:06:45 -07:00
Nolan Tremelling	faa26650c9	Beam (#4996 ) # Beam Calls the Beam API wrapper to deploy and make subsequent calls to an instance of the gpt2 LLM in a cloud deployment. Requires installation of the Beam library and registration of Beam Client ID and Client Secret. Additional calls can then be made through the instance of the large language model in your code or by calling the Beam API. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 01:25:18 -07:00
Ofer Mendelevitch	c81fb88035	Vectara (#5069 ) # Vectara Integration This PR provides integration with Vectara. Implemented here are: * langchain/vectorstore/vectara.py * tests/integration_tests/vectorstores/test_vectara.py * langchain/retrievers/vectara_retriever.py And two IPYNB notebooks to do more testing: * docs/modules/chains/index_examples/vectara_text_generation.ipynb * docs/modules/indexes/vectorstores/examples/vectara.ipynb --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 01:24:58 -07:00
Daniel King	de6e6c764e	Add MosaicML inference endpoints (#4607 ) # Add MosaicML inference endpoints This PR adds support in langchain for MosaicML inference endpoints. We both serve a select few open source models, and allow customers to deploy their own models using our inference service. Docs are here (https://docs.mosaicml.com/en/latest/inference.html), and sign up form is here (https://forms.mosaicml.com/demo?utm_source=langchain). I'm not intimately familiar with the details of langchain, or the contribution process, so please let me know if there is anything that needs fixing or this is the wrong way to submit a new integration, thanks! I'm also not sure what the procedure is for integration tests. I have tested locally with my api key. ## Who can review? @hwchase17 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-05-23 15:59:08 -07:00
Jeff Vestal	0b542a9706	Add ElasticsearchEmbeddings class for generating embeddings using Elasticsearch models (#3401 ) This PR introduces a new module, `elasticsearch_embeddings.py`, which provides a wrapper around Elasticsearch embedding models. The new ElasticsearchEmbeddings class allows users to generate embeddings for documents and query texts using a [model deployed in an Elasticsearch cluster](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html#ml-nlp-model-ref-text-embedding). ### Main features: 1. The ElasticsearchEmbeddings class initializes with an Elasticsearch connection object and a model_id, providing an interface to interact with the Elasticsearch ML client through [infer_trained_model](https://elasticsearch-py.readthedocs.io/en/v8.7.0/api.html?highlight=trained%20model%20infer#elasticsearch.client.MlClient.infer_trained_model) . 2. The `embed_documents()` method generates embeddings for a list of documents, and the `embed_query()` method generates an embedding for a single query text. 3. The class supports custom input text field names in case the deployed model expects a different field name than the default `text_field`. 4. The implementation is compatible with any model deployed in Elasticsearch that generates embeddings as output. ### Benefits: 1. Simplifies the process of generating embeddings using Elasticsearch models. 2. Provides a clean and intuitive interface to interact with the Elasticsearch ML client. 3. Allows users to easily integrate Elasticsearch-generated embeddings. Related issue https://github.com/hwchase17/langchain/issues/3400 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-23 14:50:33 -07:00
Jettro Coenradie	b950022894	Fixes issue #5072 - adds additional support to Weaviate (#5085 ) Implementation is similar to search_distance and where_filter # adds 'additional' support to Weaviate queries Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 18:57:10 -07:00
Matt Rickard	de6a401a22	Add OpenLM LLM multi-provider (#4993 ) OpenLM is a zero-dependency OpenAI-compatible LLM provider that can call different inference endpoints directly via HTTP. It implements the OpenAI Completion class so that it can be used as a drop-in replacement for the OpenAI API. This changeset utilizes BaseOpenAI for minimal added code. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 18:09:53 -07:00
Gergely Imreh	69de33e024	Add Mastodon toots loader (#5036 ) # Add Mastodon toots loader. Loader works either with public toots, or Mastodon app credentials. Toot text and user info is loaded. I've also added integration test for this new loader as it works with public data, and a notebook with example output run now. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 16:43:07 -07:00
Donger	039f8f1abb	Add the usage of SSL certificates for Elasticsearch and user password authentication (#5058 ) Enhance the code to support SSL authentication for Elasticsearch when using the VectorStore module, as previous versions did not provide this capability. @dev2049 --------- Co-authored-by: caidong <zhucaidong1992@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 11:51:32 -07:00
Harrison Chase	10ba201d05	Harrison/neo4j (#5078 ) Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 07:31:48 -07:00
Zander Chase	785502edb3	Add 'get_token_ids' method (#4784 ) Let user inspect the token ids in addition to getting th enumber of tokens --------- Co-authored-by: Zach Schillaci <40636930+zachschillaci27@users.noreply.github.com>	2023-05-22 13:17:26 +00:00
Matt Robinson	bf3f554357	feat: batch multiple files in a single Unstructured API request (#4525 ) ### Submit Multiple Files to the Unstructured API Enables batching multiple files into a single Unstructured API requests. Support for requests with multiple files was added to both `UnstructuredAPIFileLoader` and `UnstructuredAPIFileIOLoader`. Note that if you submit multiple files in "single" mode, the result will be concatenated into a single document. We recommend using this feature in "elements" mode. ### Testing The following should load both documents, using two of the example docs from the integration tests folder. ```python from langchain.document_loaders import UnstructuredAPIFileLoader file_paths = ["examples/layout-parser-paper.pdf", "examples/whatsapp_chat.txt"] loader = UnstructuredAPIFileLoader( file_paths=file_paths, api_key="FAKE_API_KEY", strategy="fast", mode="elements", ) docs = loader.load() ```	2023-05-21 20:48:20 -07:00
Davis Chase	080eb1b3fc	Fix graphql tool (#4984 ) Fix construction and add unit test.	2023-05-19 15:27:50 -07:00
Eugene Yurtsev	0ff59569dc	Adds 'IN' metadata filter for pgvector for checking set presence (#4982 ) # Adds "IN" metadata filter for pgvector to all checking for set presence PGVector currently supports metadata filters of the form: ``` {"filter": {"key": "value"}} ``` which will return documents where the "key" metadata field is equal to "value". This PR adds support for metadata filters of the form: ``` {"filter": {"key": { "IN" : ["list", "of", "values"]}}} ``` Other vector stores support this via an "$in" syntax. I chose to use "IN" to match postgres' syntax, though happy to switch. Tested locally with PGVector and ChatVectorDBChain. @dev2049 --------- Co-authored-by: jade@spanninglabs.com <jade@spanninglabs.com>	2023-05-19 13:53:23 -07:00
Eugene Yurtsev	06e524416c	power bi api wrapper integration tests & bug fix (#4983 ) # Powerbi API wrapper bug fix + integration tests - Bug fix by removing `TYPE_CHECKING` in in utilities/powerbi.py - Added integration test for power bi api in utilities/test_powerbi_api.py - Added integration test for power bi agent in agent/test_powerbi_agent.py - Edited .env.examples to help set up power bi related environment variables - Updated demo notebook with working code in docs../examples/powerbi.ipynb - AzureOpenAI -> ChatOpenAI Notes: Chat models (gpt3.5, gpt4) are much more capable than davinci at writing DAX queries, so that is important to getting the agent to work properly. Interestingly, gpt3.5-turbo needed the examples=DEFAULT_FEWSHOT_EXAMPLES to write consistent DAX queries, so gpt4 seems necessary as the smart llm. Fixes #4325 ## Before submitting Azure-core and Azure-identity are necessary dependencies check integration tests with the following: `pytest tests/integration_tests/utilities/test_powerbi_api.py` `pytest tests/integration_tests/agent/test_powerbi_agent.py` You will need a power bi account with a dataset id + table name in order to test. See .env.examples for details. ## Who can review? @hwchase17 @vowelparrot --------- Co-authored-by: aditya-pethe <adityapethe1@gmail.com>	2023-05-19 11:25:52 -04:00
Davis Chase	55baa0d153	Update redis integration tests (#4937 )	2023-05-18 10:22:17 -07:00
Harrison Chase	c9a362e482	add alias for model (#4553 ) Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-18 09:12:23 -07:00
Harrison Chase	9e2227ba11	Harrison/serper api bug (#4902 ) Co-authored-by: Jerry Luan <xmaswillyou@gmail.com>	2023-05-17 21:40:39 -07:00
Eugene Yurtsev	0dc304ca80	Add html parsers (#4874 ) # Add bs4 html parser * Some minor refactors * Extract the bs4 html parsing code from the bs html loader * Move some tests from integration tests to unit tests	2023-05-17 22:39:11 -04:00
yujiosaka	2f8eb95a91	Remove unnecessary comment (#4845 ) # Remove unnecessary comment Remove unnecessary comment accidentally included in #4800 ## Before submitting - no test - no document ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:	2023-05-17 11:53:03 -04:00
yujiosaka	6561efebb7	Accept uuids kwargs for weaviate (#4800 ) # Accept uuids kwargs for weaviate Fixes #4791	2023-05-16 15:26:46 -07:00
Magnus Friberg	d126276693	Specify which data to return from chromadb (#4393 ) # Improve the Chroma get() method by adding the optional "include" parameter. The Chroma get() method excludes embeddings by default. You can customize the response by specifying the "include" parameter to selectively retrieve the desired data from the collection. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-16 14:43:09 -07:00
Raduan Al-Shedivat	00c6ec8a2d	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 ) # Fix Telegram API loader + add tests. I was testing this integration and it was broken with next error: ```python message_threads = loader._get_message_threads(df) KeyError: False ``` Also, this particular loader didn't have any tests / related group in poetry, so I added those as well. @hwchase17 / @eyurtsev please take a look on this fix PR. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-16 14:35:25 -07:00
了空	f7e3d97b19	Remove unnecessary spaces from document object’s page_content of BiliBiliLoader (#4619 ) - Remove unnecessary spaces from document object’s page_content of BiliBiliLoader - Fix BiliBiliLoader document and test file	2023-05-16 13:13:57 -04:00
Harrison Chase	a7af32c274	Cassandra support for chat history (#4378 ) (#4764 ) # Cassandra support for chat history ### Description - Store chat messages in cassandra ### Dependency - cassandra-driver - Python Module ## Before submitting - Added Integration Test ## Who can review? @hwchase17 @agola11 # Your PR Title (What it does) <!-- Thank you for contributing to LangChain! Your PR will appear in our next release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> Co-authored-by: Jinto Jose <129657162+jj701@users.noreply.github.com>	2023-05-15 23:43:09 -07:00
Anirudh Suresh	03ac39368f	Fixing DeepLake Overwrite Flag (#4683 ) # Fix DeepLake Overwrite Flag Issue Fixes Issue #4682: essentially, setting overwrite to False in the DeepLake constructor still triggers an overwrite, because the logic is just checking for the presence of "overwrite" in kwargs. The fix is simple--just add some checks to inspect if "overwrite" in kwargs AND kwargs["overwrite"]==True. Added a new test in tests/integration_tests/vectorstores/test_deeplake.py to reflect the desired behavior. Co-authored-by: Anirudh Suresh <ani@Anirudhs-MBP.cable.rcn.com> Co-authored-by: Anirudh Suresh <ani@Anirudhs-MacBook-Pro.local> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-15 17:39:16 -07:00
whuwxl	3f0357f94a	Add summarization task type for HuggingFace APIs (#4721 ) # Add summarization task type for HuggingFace APIs Add summarization task type for HuggingFace APIs. This task type is described by [HuggingFace inference API](https://huggingface.co/docs/api-inference/detailed_parameters#summarization-task) My project utilizes LangChain to connect multiple LLMs, including various HuggingFace models that support the summarization task. Integrating this task type is highly convenient and beneficial. Fixes #4720	2023-05-15 16:26:17 -07:00
Roma	cb802edf75	[Feature] Add GraphQL Query Tool (#4409 ) # Add GraphQL Query Support This PR introduces a GraphQL API Wrapper tool that allows LLM agents to query GraphQL databases. The tool utilizes the httpx and gql Python packages to interact with GraphQL APIs and provides a simple interface for running queries with LLM agents. @vowelparrot --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-15 14:06:12 -07:00
Lester Yang	cd3f9865f3	Feature: pdfplumber PDF loader with BaseBlobParser (#4552 ) # Feature: pdfplumber PDF loader with BaseBlobParser * Adds pdfplumber as a PDF loader * Adds pdfplumber as a blob parser.	2023-05-15 09:47:02 -04:00
Harrison Chase	b6e3ac17c4	Harrison/sitemap local (#4704 ) Co-authored-by: Lukas Bauer <lukas.bauer@mayflower.de>	2023-05-14 22:04:38 -07:00
Harrison Chase	12b4ee1fc7	Harrison/telegram chat loader (#4698 ) Co-authored-by: Akinwande Komolafe <47945512+Sensei-akin@users.noreply.github.com> Co-authored-by: Akinwande Komolafe <akhinoz@gmail.com>	2023-05-14 22:04:27 -07:00
Leonid Ganeline	e17d0319d5	Add `arxiv` retriever (#4538 )	2023-05-11 22:48:38 -07:00
SimFG	7bcf238a1a	Optimize the initialization method of GPTCache (#4522 ) Optimize the initialization method of GPTCache, so that users can use GPTCache more quickly.	2023-05-11 16:15:23 -07:00
kYLe	0d51a1f12b	Add LLMs support for Anyscale Service (#4350 ) Add Anyscale service integration under LLM Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-11 00:39:59 -07:00
Evan Jones	f668251948	parameterized distance metrics; lint; format; tests (#4375 ) # Parameterize Redis vectorstore index Redis vectorstore allows for three different distance metrics: `L2` (flat L2), `COSINE`, and `IP` (inner product). Currently, the `Redis._create_index` method hard codes the distance metric to COSINE. I've parameterized this as an argument in the `Redis.from_texts` method -- pretty simple. Fixes #4368 ## Before submitting I've added an integration test showing indexes can be instantiated with all three values in the `REDIS_DISTANCE_METRICS` literal. An example notebook seemed overkill here. Normal API documentation would be more appropriate, but no standards are in place for that yet. ## Who can review? Not sure who's responsible for the vectorstore module... Maybe @eyurtsev / @hwchase17 / @agola11 ?	2023-05-11 00:20:01 -07:00
Davis Chase	9ec60ad832	Add azure cognitive search retriever (#4467 ) All credit to @UmerHA, made a couple small changes --------- Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>	2023-05-10 15:27:27 -07:00
Davis Chase	46b100ea63	Add DocArray vector stores (#4483 ) Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made few small changes to get it across the finish line --------- Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> Signed-off-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai> Co-authored-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>	2023-05-10 15:22:16 -07:00
Harrison Chase	b2f920e891	add tracing v2 env var (#4465 ) Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-05-10 11:08:29 -07:00
Matt Robinson	3637d6da6e	feat: add loader for open office odt files (#4405 ) # ODF File Loader Adds a data loader for handling Open Office ODT files. Requires `unstructured>=0.6.3`. ### Testing The following should work using the `fake.odt` example doc from the [`unstructured` repo](https://github.com/Unstructured-IO/unstructured). ```python from langchain.document_loaders import UnstructuredODTLoader loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements") loader.load() loader = UnstructuredODTLoader(file_path="fake.odt", mode="single") loader.load() ```	2023-05-10 01:37:17 -07:00
Rukmani	2b14036126	Update WhatsAppChatLoader to include the character ~ in the sender name (#4420 ) Fixes #4153 If the sender of a message in a group chat isn't in your contact list, they will appear with a ~ prefix in the exported chat. This PR adds support for parsing such lines.	2023-05-09 15:00:04 -07:00
Aivin V. Solatorio	6335cb5b3a	Add support for Qdrant nested filter (#4354 ) # Add support for Qdrant nested filter This extends the filter functionality for the Qdrant vectorstore. The current filter implementation is limited to a single-level metadata structure; however, Qdrant supports nested metadata filtering. This extends the functionality for users to maximize the filter functionality when using Qdrant as the vectorstore. Reference: https://qdrant.tech/documentation/filtering/#nested-key --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	2023-05-09 10:34:11 -07:00
Martin Holzhauer	872605a5c5	Add an option to extract more metadata from crawled websites (#4347 ) This pr makes it possible to extract more metadata from websites for later use. my usecase: parsing ld+json or microdata from sites and store it as structured data in the metadata field	2023-05-09 10:18:33 -07:00
Leonid Ganeline	ce15ffae6a	added `Wikipedia` retriever (#4302 ) - added `Wikipedia` retriever. It is effectively a wrapper for `WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents() - sorted `__all__` in the `retrievers/__init__` - added integration tests for the WikipediaRetriever - added an example (as Jupyter notebook) for the WikipediaRetriever	2023-05-09 10:08:39 -07:00
Eugene Yurtsev	2ceb807da2	Add PDF parser implementations (#4356 ) # Add PDF parser implementations This PR separates the data loading from the parsing for a number of existing PDF loaders. Parser tests have been designed to help encourage developers to create a consistent interface for parsing PDFs. This interface can be made more consistent in the future by adding information into the initializer on desired behavior with respect to splitting by page etc. This code is expected to be backwards compatible -- with the exception of a bug fix with pymupdf parser which was returning `bytes` in the page content rather than strings. Also changing the lazy parser method of document loader to return an Iterator rather than Iterable over documents. ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @ <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoader Abstractions - @eyurtsev LLM/Chat Wrappers - @hwchase17 - @agola11 Tools / Toolkits - @vowelparrot -->	2023-05-09 10:24:17 -04:00
Davis Chase	ba0057c077	Check OpenAI model kwargs (#4366 ) Handle duplicate and incorrectly specified OpenAI params Thanks @PawelFaron for the fix! Made small update Closes #4331 --------- Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com> Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>	2023-05-08 16:37:34 -07:00
Davis Chase	02ebb15c4a	Fix TextSplitter.from_tiktoken(#4361 ) Thanks to @danb27 for the fix! Minor update Fixes https://github.com/hwchase17/langchain/issues/4357 --------- Co-authored-by: Dan Bianchini <42096328+danb27@users.noreply.github.com>	2023-05-08 16:36:38 -07:00
Naveen Tatikonda	782df1db10	OpenSearch: Add Similarity Search with Score (#4089 ) ### Description Add `similarity_search_with_score` method for OpenSearch to return scores along with documents in the search results Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-05-08 16:35:21 -07:00
Jinto Jose	8a338412fa	mongodb support for chat history (#4266 )	2023-05-08 08:34:05 -07:00
Leonid Ganeline	9544b30821	added `Wikipedia` document loader (#4141 ) - Added the `Wikipedia` document loader. It is based on the existing `unilities/WikipediaAPIWrapper` - Added a respective ut-s and example notebook - Sorted list of classes in __init__	2023-05-06 09:32:45 -07:00
Davis Chase	5ca13cc1f0	Dev2049/pypdfium2 (#4209 ) thanks @jerrytigerxu for the addition! --------- Co-authored-by: Jere Xu <jtxu2008@gmail.com> Co-authored-by: jerrytigerxu <jere.tiger.xu@gmailc.om>	2023-05-05 17:55:31 -07:00
George	2324f19c85	Update qdrant interface (#3971 ) Hello 1) Passing `embedding_function` as a callable seems to be outdated and the common interface is to pass `Embeddings` instance 2) At the moment `Qdrant.add_texts` is designed to be used with `embeddings.embed_query`, which is 1) slow 2) causes ambiguity due to 1. It should be used with `embeddings.embed_documents` This PR solves both problems and also provides some new tests	2023-05-05 16:46:40 -07:00
Mike Wang	c3044b1bf0	[test] Add integration_test for PandasAgent (#4056 ) - confirm creation - confirm functionality with a simple dimension check. The test now is calling OpenAI API directly, but learning from @vowelparrot that we’re caching the requests, so that it’s not that expensive. I also found we’re calling OpenAI api in other integration tests. Please lmk if there is any concern of real external API calls. I can alternatively make a fake LLM for this test. Thanks	2023-05-05 14:49:02 -07:00
Aivin V. Solatorio	6567b73e1a	JSON loader (#4067 ) This implements a loader of text passages in JSON format. The `jq` syntax is used to define a schema for accessing the relevant contents from the JSON file. This requires dependency on the `jq` package: https://pypi.org/project/jq/. --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	2023-05-05 14:48:13 -07:00
hp0404	2a3c5f8353	Update WhatsAppChatLoader regex to handle multiple date-time formats (#4186 ) This PR updates the `message_line_regex` used by `WhatsAppChatLoader` to support different date-time formats used in WhatsApp chat exports; resolves #4153. The new regex handles the following input formats: ```terminal [05.05.23, 15:48:11] James: Hi here [11/8/21, 9:41:32 AM] User name: Message 123 1/23/23, 3:19 AM - User 2: Bye! 1/23/23, 3:22_AM - User 1: And let me know if anything changes ``` Tests have been added to verify that the loader works correctly with all formats.	2023-05-05 13:13:05 -07:00
Zander Chase	84cfa76e00	Update Cohere Reranker (#4180 ) The forward ref annotations don't get updated if we only iimport with type checking --------- Co-authored-by: Abhinav Verma <abhinav_win12@yahoo.co.in>	2023-05-05 09:11:37 -07:00
Harrison Chase	d4cf1eb60a	Add firestore memory (#3792 ) (#3941 ) If you have any other suggestions or feedback, please let me know. --------- Co-authored-by: yakigac <10434946+yakigac@users.noreply.github.com>	2023-05-03 22:55:47 -07:00
rogerserper	b1446bea5f	google-serper: async + full json results + support for Google Images, Places and News (#4078 ) * implemented arun, results, and aresults. Reuses aiosession if available. * helper tools GoogleSerperRun and GoogleSerperResults * support for Google Images, Places and News (examples given) and filtering based on time (e.g. past hour) * updated docs	2023-05-03 22:35:48 -07:00
hp0404	374725a715	Refactor TelegramChatLoader and FacebookChatLoader classes and add tests (#3863 ) This PR includes two main changes: - Refactor the `TelegramChatLoader` and `FacebookChatLoader` classes by removing the dependency on pandas and simplifying the message filtering process. - Add test cases for the `TelegramChatLoader` and `FacebookChatLoader` classes. This test ensures that the class correctly loads and processes the example chat data, providing better test coverage for this functionality.	2023-05-03 15:59:19 -07:00
Jon Saginaw	ea64b1716d	Enhancement: option to Get All Tokens with a single Blockchain Document Loader call (#3797 ) The Blockchain Document Loader's default behavior is to return 100 tokens at a time which is the Alchemy API limit. The Document Loader exposes a startToken that can be used for pagination against the API. This enhancement includes an optional get_all_tokens param (default: False) which will: - Iterate over the Alchemy API until it receives all the tokens, and return the tokens in a single call to the loader. - Manage all/most tokenId formats (this can be int, hex16 with zero or all the leading zeros). There aren't constraints as to how smart contracts can represent this value, but these three are most common. Note that a contract with 10,000 tokens will issue 100 calls to the Alchemy API, and could take about a minute, which is why this param will default to False. But I've been using the doc loader with these utilities on the side, so figured it might make sense to build them in for others to use.	2023-05-03 15:46:44 -07:00
Harrison Chase	a5dd73c1a6	Revert "[agent][property type] Change allowed_tools to Set as Duplicate doesn’t make sense" (#4014 ) Reverts hwchase17/langchain#3840	2023-05-02 18:58:05 -07:00
Harrison Chase	48ea27ba60	Harrison/blockwise sitemap (#3940 ) Co-authored-by: Martin Holzhauer <martin@holzhauer.eu>	2023-05-01 21:34:07 -07:00
Harrison Chase	f04faf8496	Harrison/spreedly (#3937 ) Co-authored-by: Esmit Pérez <esmitperez@users.noreply.github.com>	2023-05-01 20:56:56 -07:00
Zander Chase	c4cb55a0c5	[Breaking] Migrate GPT4All to use PyGPT4All (#3934 ) Seems the pyllamacpp package is no longer the supported bindings from gpt4all. Tested that this works locally. Given that the older models weren't very performant, I think it's better to migrate now without trying to include a lot of try / except blocks --------- Co-authored-by: Nissan Pow <npow@users.noreply.github.com> Co-authored-by: Nissan Pow <pownissa@amazon.com>	2023-05-01 20:42:45 -07:00
Mike Wang	ec21b7126c	[agent][property type] Change allowed_tools to Set as Duplicate doesn’t make sense (#3840 ) - ActionAgent has a property called, `allowed_tools`, which is declared as `List`. It stores all provided tools which is available to use during agent action. - This collection shouldn’t allow duplicates. The original datatype List doesn’t make sense. Each tool should be unique. Even when there are variants (assuming in the future), it would be named differently in load_tools. Test: - confirm the functionality in an example by initializing an agent with a list of 2 tools and confirm everything works. ```python3 def test_agent_chain_chat_bot(): from langchain.agents import load_tools from langchain.agents import initialize_agent from langchain.agents import AgentType from langchain.chat_models import ChatOpenAI from langchain.llms import OpenAI from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper chat = ChatOpenAI(temperature=0) llm = OpenAI(temperature=0) tools = load_tools(["ddg-search", "llm-math"], llm=llm) agent = initialize_agent(tools, chat, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True) agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?") test_agent_chain_chat_bot() ``` Result: <img width="863" alt="Screenshot 2023-05-01 at 7 58 11 PM" src="https://user-images.githubusercontent.com/62768671/235572157-0937594c-ddfb-4760-acb2-aea4cacacd89.png">	2023-05-01 20:30:10 -07:00
Davis Chase	e7e29f9937	Dev2049/add modern treasury (#3924 ) Modified Modern Treasury and Strip slightly so credentials don't have to be passed in explicitly. Thanks @mattgmarcus for adding Modern Treasury! --------- Co-authored-by: Matt Marcus <matt.g.marcus@gmail.com>	2023-05-01 20:28:02 -07:00
Davis Chase	5db6b796cf	Dev2049/hf emb encode kwargs (#3925 ) Thanks @amogkam for the addition! Refactored slightly --------- Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2023-05-01 20:27:41 -07:00
James Brotchie	921894960b	Add ChatModel, LLM, and Embeddings for Google's PaLM APIs (#3575 ) - Add langchain.llms.GooglePalm for text completion, - Add langchain.chat_models.ChatGooglePalm for chat completion, - Add langchain.embeddings.GooglePalmEmbeddings for sentence embeddings, - Add example field to HumanMessage and AIMessage so that users can feed in examples into the PaLM Chat API, - Add system and unit tests. Note async completion for the Text API is not yet supported and will be included in a future PR. Happy for feedback on any aspect of this PR, especially our choice of adding an example field to Human and AI Message objects to enable passing example messages to the API.	2023-05-01 15:23:16 -07:00
Davis Chase	2451310975	Chroma fix mmr (#3897 ) Fixes #3628, thanks @derekmoeller for the issue!	2023-05-01 10:47:15 -07:00
Zander Chase	19912d755e	Vwp/arxiv (#3855 ) Co-authored-by: Mike Wang <62768671+skcoirz@users.noreply.github.com>	2023-04-30 18:59:22 -07:00
Ankush Gola	d3ec00b566	Callbacks Refactor [base] (#3256 ) Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com> Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-30 11:14:09 -07:00
Mike Wang	ce4fea983b	[simple] added test case and improve self class return type annotation (#3773 ) a simple follow up of https://github.com/hwchase17/langchain/pull/3748 - added test case - improve annotation when function return type is class itself.	2023-04-28 21:54:07 -07:00
Harrison Chase	0c0f14407c	Harrison/tair (#3770 ) Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>	2023-04-28 21:25:33 -07:00
Harrison Chase	be7a8e0824	Harrison/redis cache (#3766 ) Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>	2023-04-28 20:47:18 -07:00
Mike Wang	b588446bf9	[simple][test] Added test case for schema.py (#3692 ) - added unittest for schema.py covering utility functions and token counting. - fixed a nit. based on huggingface doc, the tokenizer model is gpt-2. [link](https://huggingface.co/transformers/v4.8.2/_modules/transformers/models/gpt2/tokenization_gpt2_fast.html) - make lint && make format, passed on local - screenshot of new test running result <img width="1283" alt="Screenshot 2023-04-27 at 9 51 55 PM" src="https://user-images.githubusercontent.com/62768671/235057441-c0ac3406-9541-453f-ba14-3ebb08656114.png">	2023-04-28 20:42:24 -07:00
Jon Saginaw	f8d69e4e52	Enhancement: Blockchain Document Loader with better Metadata support (#3710 ) This PR includes some minor alignment updates, including: - metadata object extended to support contractAddress, blockchainType, and tokenId - notebook doc better aligned to standard langchain format - startToken changed from int to str to support multiple hex value types on the Alchemy API The updated metadata will look like the below. It's possible for a single contractAddress to exist across multiple blockchains (e.g. Ethereum, Polygon, etc.) so it's important to include the blockchainType. ``` metadata = {"source": self.contract_address, "blockchain": self.blockchainType, "tokenId": tokenId} ```	2023-04-28 20:13:05 -07:00
Davis Chase	220a7076ac	Add Mathpix pdf loader (#3727 ) Inspo https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-28 20:11:22 -07:00
Harrison Chase	40f6e60e68	Harrison/stripe (#3762 ) Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>	2023-04-28 20:03:21 -07:00
plutopulp	6d6fd1b9e1	Add PipelineAI LLM integration (#3644 ) Add PipelineAI LLM integration	2023-04-27 08:22:26 -07:00
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	2023-04-27 08:14:36 -07:00
Harrison Chase	ab749fa1bb	Harrison/opensearch logic (#3631 ) Co-authored-by: engineer-matsuo <95115586+engineer-matsuo@users.noreply.github.com>	2023-04-26 22:08:03 -07:00
Ehsan M. Kermani	4a246e2fd6	Allow clearing cache and fix gptcache (#3493 ) This PR * Adds `clear` method for `BaseCache` and implements it for various caches * Adds the default `init_func=None` and fixes gptcache integtest * Since right now integtest is not running in CI, I've verified the changes by running `docs/modules/models/llms/examples/llm_caching.ipynb` (until proper e2e integtest is done in CI)	2023-04-26 22:03:50 -07:00
cs0lar	440c98e24b	Fix/issue 2695 (#3608 ) ## Background fixes #2695 ## Changes The `add_text` method uses the internal embedding function if one was passes to the `Weaviate` constructor. NOTE: the latest merge on the `Weaviate` class made the specification of a `weaviate_api_key` mandatory which might not be desirable for all users and connection methods (for example weaviate also support Embedded Weaviate which I am happy to add support to here if people think it's desirable). I wrapped the fetching of the api key into a try catch in order to allow the `weaviate_api_key` to be unspecified. Do let me know if this is unsatisfactory. ## Test Plan added test for `add_texts` method.	2023-04-26 21:45:03 -07:00
leo-gan	36c59e0c25	`Arxiv` document loader (#3627 ) It makes sense to use `arxiv` as another source of the documents for downloading. - Added the `arxiv` document_loader, based on the `utilities/arxiv.py:ArxivAPIWrapper` - added tests - added an example notebook - sorted `__all__` in `__init__.py` (otherwise it is hard to find a class in the very long list)	2023-04-26 21:04:56 -07:00
Zander Chase	443a893ffd	Align names of search tools (#3620 ) Tools for Bing, DDG and Google weren't consistent even though the underlying implementations were. All three services now have the same tools and implementations to easily switch and experiment when building chains.	2023-04-26 16:21:34 -07:00
Maciej Bryński	aa345a4bb7	Add get_text_separator parameter to BSHTMLLoader (#3551 ) By default get_text doesn't separate content of different HTML tag. Adding option for specifying separator helps with document splitting.	2023-04-26 16:10:16 -07:00
Davis Chase	d18b0caf0e	Add Anthropic default request timeout (#3540 ) thanks @hitflame! --------- Co-authored-by: Wenqiang Zhao <hitzhaowenqiang@sina.com> Co-authored-by: delta@com <delta@com>	2023-04-25 11:40:41 -07:00
yakigac	f338d6251c	Add a test for cosmos db memory (#3525 ) Test for #3434 @eavanvalkenburg Initially, I was unaware and had submitted a pull request #3450 for the same purpose, but I have now repurposed the one I used for that. And it worked.	2023-04-25 08:10:02 -07:00
Harrison Chase	0fc0aa62f2	Harrison/blockchain docloader (#3491 ) Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com>	2023-04-25 08:07:06 -07:00
Harrison Chase	707741de58	Harrison/prediction guard (#3490 ) Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>	2023-04-24 22:27:22 -07:00
Harrison Chase	7257f9e015	Harrison/tfidf parameters (#3481 ) Co-authored-by: pao <go5kuramubon@gmail.com> Co-authored-by: KyoHattori <kyo.hattori@abejainc.com>	2023-04-24 22:19:58 -07:00
Harrison Chase	eda69b13f3	openai embeddings (#3488 )	2023-04-24 22:19:47 -07:00
Harrison Chase	408a0183cd	Harrison/weaviate (#3494 ) Co-authored-by: Nick Rubell <nick@rubell.com>	2023-04-24 22:15:32 -07:00
Zander Chase	416f3bdf11	Vwp/alpaca streaming (#3468 ) Co-authored-by: Luke Stanley <306671+lukestanley@users.noreply.github.com>	2023-04-24 16:27:51 -07:00
cs0lar	3033c6b964	fixes #1214 (#3003 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search_by_vector` method. ### Changes - a `max_marginal_relevance_search_by_vector` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search_by_vector` implementation ### Change Safety - [x] I have added tests to cover my changes	2023-04-24 11:50:55 -07:00
Davit Buniatyan	2c0023393b	Deep Lake mini upgrades (#3375 ) Improvements * set default num_workers for ingestion to 0 * upgraded notebooks for avoiding dataset creation ambiguity * added `force_delete_dataset_by_path` * bumped deeplake to 3.3.0 * creds arg passing to deeplake object that would allow custom S3 Notes * please double check if poetry is not messed up (thanks!) Asks * Would be great to create a shared slack channel for quick questions --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-23 21:23:54 -07:00
Zander Chase	20f530e9c5	Add Sentence Transformers Embeddings (#3409 ) Add embeddings based on the sentence transformers library. Add a notebook and integration tests. Co-authored-by: khimaros <me@khimaros.com>	2023-04-23 18:25:20 -07:00
Luke Harris	b4de839ed8	Several confluence loader improvements (#3300 ) This PR addresses several improvements: - Previously it was not possible to load spaces of more than 100 pages. The `limit` was being used both as an overall page limit and as a per request pagination limit. This, in combination with the fact that atlassian seem to use a server-side hard limit of 100 when page content is expanded, meant it wasn't possible to download >100 pages. Now `limit` is used only as a per-request pagination limit and `max_pages` is introduced as the way to limit the total number of pages returned by the paginator. - Document metadata now includes `source` (the source url), making it compatible with `RetrievalQAWithSourcesChain`. - It is now possible to include inline and footer comments. - It is now possible to pass `verify_ssl=False` and other parameters to the confluence object for use cases that require it.	2023-04-23 15:06:10 -07:00
Harrison Chase	a6664be79c	Harrison/myscale (#3352 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	2023-04-22 09:17:38 -07:00
Filip Haltmayer	215dcc2d26	Refactor Milvus/Zilliz (#3047 ) Refactoring milvus/zilliz to clean up and have a more consistent experience. Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>	2023-04-22 08:26:19 -07:00
Richy Wang	88a8f59aa7	Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135 ) Hi there！ I'm excited to open this PR to add support for using a fully Postgres syntax compatible database 'AnalyticDB' as a vector. As AnalyticDB has been proved can be used with AutoGPT, ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for you. AnalyticDB is a distributed Alibaba Cloud-Native vector database. It works better when data comes to large scale. The PR includes: - [x] A new memory: AnalyticDBVector - [x] A suite of integration tests verifies the AnalyticDB integration I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below - [x] make format - [x] make lint - [x] make coverage - [x] make test	2023-04-22 08:25:41 -07:00
Zander Chase	05a8aa5447	Fix linting on master (#3327 )	2023-04-21 15:49:46 -07:00
Varun Srinivas	d2f922f525	Change in method name for creating an issue on JIRA (#3307 ) The awesome JIRA tool created by @zywilliamli calls the `create_issue()` method to create issues, however, the actual method is `issue_create()`. Details in the Documentation here: https://atlassian-python-api.readthedocs.io/jira.html#manage-issues	2023-04-21 13:01:33 -07:00
Paul Garner	aa9d5707e0	Add PythonLoader which auto-detects encoding of Python files (#3311 ) This PR contributes a `PythonLoader`, which inherits from `TextLoader` but detects and sets the encoding automatically.	2023-04-21 10:47:57 -07:00
Davis Chase	2fd24d31a4	Cleanup integration test dir (#3308 )	2023-04-21 09:44:09 -07:00
Naveen Tatikonda	bb6c459f7a	OpenSearch: Add Support for Lucene Filter (#3201 ) ### Description Add Support for Lucene Filter. When you specify a Lucene filter for a k-NN search, the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. This filter is supported only for approximate search with the indexes that are created using `lucene` engine. OpenSearch Documentation - https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-20 20:42:53 -07:00
Davis Chase	46542dc774	Contextual compression retriever (#2915 ) Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-20 17:01:14 -07:00
Gabriel Altay	34fb56b633	fix copy/pasta typos wikipedia->arxiv (#3222 ) just updates a few module level docstrings from Wikipedia -> Arxiv	2023-04-20 07:15:41 -07:00
Harrison Chase	d2520a5f1e	Harrison/ddg (#3206 ) Co-authored-by: itai <itai.marks@gmail.com> Co-authored-by: Itai Marks <itaim@users.noreply.github.com> Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com> Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com> Co-authored-by: Justin Flick <Justinjayflick@gmail.com> Co-authored-by: Justin Flick <jflick@homesite.com>	2023-04-19 21:32:26 -07:00
Harrison Chase	9181cd9b22	Harrison/playwright selector (#3185 ) Co-authored-by: zhyuri <4649294+zhyuri@users.noreply.github.com>	2023-04-19 16:54:15 -07:00
Harrison Chase	68cd37175e	Harrison/arxiv tool (#3186 ) Co-authored-by: leo-gan <leo.gan.57@gmail.com>	2023-04-19 16:53:34 -07:00
Naveen Tatikonda	3453b7457c	OpenSearch: Add Support for Boolean Filter with ANN search (#3038 ) ### Description Add Support for Boolean Filter with ANN search Documentation - https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#boolean-filter-with-ann-search ### Issues Resolved https://github.com/hwchase17/langchain/issues/2924 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-17 20:26:26 -07:00
Harrison Chase	afd3e70ae5	Harrison/confluent loader (#2994 ) Co-authored-by: Justin Flick <Justinjayflick@gmail.com>	2023-04-17 20:23:45 -07:00
Jan Backes	a9310a3e8b	Add Annoy as VectorStore (#2939 ) Adds Annoy (https://github.com/spotify/annoy) as vector Store. RESOLVES hwchase17/langchain#2842 discord ref: https://discord.com/channels/1038097195422978059/1051632794427723827/1096089994168377354 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-16 13:44:04 -07:00
cs0lar	8b9e02da9d	Fix/issue 1213 (#2932 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search` method. ### Changes - a `max_marginal_relevance_search` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search` implementation ### Change Safety - [x] I have added tests to cover my changes	2023-04-16 13:11:30 -07:00
vowelparrot	4ffc58e07b	Add similarity_search_with_normalized_similarities (#2916 ) Add a method that exposes a similarity search with corresponding normalized similarity scores. Implement only for FAISS now. ### Motivation: Some memory definitions combine `relevance` with other scores, like recency , importance, etc. While many (but not all) of the `VectorStore`'s expose a `similarity_search_with_score` method, they don't all interpret the units of that score (depends on the distance metric and whether or not the the embeddings are normalized). This PR proposes a `similarity_search_with_normalized_similarities` method that lets consumers of the vector store not have to worry about the metric and embedding scale. Most providers default to euclidean distance, with Pinecone being one exception (defaults to cosine _similarity_). --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-15 21:06:08 -07:00
Davit Buniatyan	b3a5b51728	[minor] Deep Lake auth improvements in docs, kwargs pass, faster tests (#2927 ) Minor cosmetic changes - Activeloop environment cred authentication in notebooks with `getpass.getpass` (instead of CLI which not always works) - much faster tests with Deep Lake pytest mode on - Deep Lake kwargs pass Notes - I put pytest environment creds inside `vectorstores/conftest.py`, but feel free to suggest a better location. For context, if I put in `test_deeplake.py`, `ruff` doesn't let me to set them before import deeplake --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-15 10:49:16 -07:00
Ankush Gola	ec59e9d886	Fix ChatAnthropic stop_sequences error (#2919 ) (#2920 ) Note to self: Always run integration tests, even on "that last minute change you thought would be safe" :) --------- Co-authored-by: Mike Lambert <mike.lambert@anthropic.com>	2023-04-14 17:22:01 -07:00
Mike Lambert	392f1b3218	Add Anthropic ChatModel to langchain (#2293 ) * Adds an Anthropic ChatModel * Factors out common code in our LLMModel and ChatModel * Supports streaming llm-tokens to the callbacks on a delta basis (until a future V2 API does that for us) * Some fixes	2023-04-14 15:09:07 -07:00
Harrison Chase	1e9378d0a8	Harrison/weaviate fixes (#2872 ) Co-authored-by: cs0lar <cristiano.solarino@gmail.com> Co-authored-by: cs0lar <cristiano.solarino@brightminded.com>	2023-04-13 22:37:34 -07:00
sergerdn	04c458a270	feat: improve pinecone tests (#2806 ) Improve the integration tests for Pinecone by adding an `.env.example` file for local testing. Additionally, add some dev dependencies specifically for integration tests. This change also helps me understand how Pinecone deals with certain things, see related issues https://github.com/hwchase17/langchain/issues/2484 https://github.com/hwchase17/langchain/issues/2816	2023-04-13 21:49:31 -07:00
vowelparrot	bf0887c486	Add Slack Directory Loader (#2841 ) Fixes linting issue from #2835 Adds a loader for Slack Exports which can be a very valuable source of knowledge to use for internal QA bots and other use cases. ```py # Export data from your Slack Workspace first. from langchain.document_loaders import SLackDirectoryLoader SLACK_WORKSPACE_URL = "https://awesome.slack.com" loader = ("Slack_Exports", SLACK_WORKSPACE_URL) docs = loader.load() ```	2023-04-13 21:31:59 -07:00
Tim Asp	be4fb24b32	OpenAI LLM: update `modelname_to_contextsize` with new models (#2843 ) Token counts pulled from https://openai.com/pricing	2023-04-13 11:13:34 -07:00
Harrison Chase	e49f1e628c	Harrison/gpt cache (#2744 ) Co-authored-by: SimFG <bang.fu@zilliz.com>	2023-04-12 14:16:58 -07:00
Johnny Lee	0ab364404e	add continue to fix 'continue_on_failure' parameter for URL doc loader (#2735 ) Currently, the function still fails if `continue_on_failure` is set to True, because `elements` is not set. --------- Co-authored-by: leecjohnny <johnny-lee1255@users.noreply.github.com>	2023-04-11 21:12:39 -07:00
sergerdn	4bdcedab54	fix: some imports for integration tests (#2612 ) Add more missed imports for integration tests. Bump `pytest` to the current latest version. Fix `tests/integration_tests/vectorstores/test_elasticsearch.py` to update its cassette(easy fix). Related PR: https://github.com/hwchase17/langchain/pull/2560	2023-04-11 20:45:36 -07:00
Ankush Gola	c1521ddbdb	Add workaround for not having async vector store methods (#2733 ) This allows us to use the async API for the Retrieval chains, though it is not guaranteed to be thread safe.	2023-04-11 18:49:08 -07:00
vowelparrot	709f26b69e	Added bilibili loader (#2673 ) (#2724 ) I've added a bilibili loader, bilibili is a very active video site in China and I think we need this loader. Example: ```python from langchain.document_loaders.bilibili import BiliBiliLoader loader = BiliBiliLoader( ["https://www.bilibili.com/video/BV1xt411o7Xu/", "https://www.bilibili.com/video/av330407025/"] ) docs = loader.load() ``` Co-authored-by: 了空 <568250549@qq.com>	2023-04-11 10:40:32 -07:00
Naveen Tatikonda	4364d3316e	Add custom vector fields and text fields for OpenSearch (#2652 ) Description Add custom vector field name and text field name while indexing and querying for OpenSearch Issues https://github.com/hwchase17/langchain/issues/2500 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-10 21:02:02 -07:00
Chetanya Rastogi	50c511d75f	Add new loader to load pdf as html content (#2607 ) Adds a new pdf loader using the existing dependency on PDFMiner. The new loader can be helpful for chunking texts semantically into sections as the output html content can be parsed via `BeautifulSoup` to get more structured and rich information about font size, page numbers, pdf headers/footers, etc. which may not be available otherwise with other pdf loaders	2023-04-09 17:57:25 -07:00
sergerdn	6dc86ad48f	feat: add pytest-vcr for recording HTTP interactions in integration tests (#2445 ) Using `pytest-vcr` in integration tests has several benefits. Firstly, it removes the need to mock external services, as VCR records and replays HTTP interactions on the fly. Secondly, it simplifies the integration test setup by eliminating the need to set up and tear down external services in some cases. Finally, it allows for more reliable and deterministic integration tests by ensuring that HTTP interactions are always replayed with the same response. Overall, `pytest-vcr` is a valuable tool for simplifying integration test setup and improving their reliability This commit adds the `pytest-vcr` package as a dependency for integration tests in the `pyproject.toml` file. It also introduces two new fixtures in `tests/integration_tests/conftest.py` files for managing cassette directories and VCR configurations. In addition, the `tests/integration_tests/vectorstores/test_elasticsearch.py` file has been updated to use the `@pytest.mark.vcr` decorator for recording and replaying HTTP interactions. Finally, this commit removes the `documents` fixture from the `test_elasticsearch.py` file and replaces it with a new fixture defined in `tests/integration_tests/vectorstores/conftest.py` that yields a list of documents to use in any other tests. This also includes my second attempt to fix issue : https://github.com/hwchase17/langchain/issues/2386 Maybe related https://github.com/hwchase17/langchain/issues/2484	2023-04-07 07:28:57 -07:00
Alex Rad	bd780a8223	Add support for rwkv (#2422 ) This adds support for running RWKV with pytorch. https://github.com/hwchase17/langchain/issues/2398 This does not yet support rwkv.cpp	2023-04-06 14:41:06 -07:00
Davit Buniatyan	b4914888a7	Deep Lake upgrade to include attribute search, distance metrics, returning scores and MMR (#2455 ) ### Features include - Metadata based embedding search - Choice of distance metric function (`L2` for Euclidean, `L1` for Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot' for dot product. Defaults to `L2` - Returning scores - Max Marginal Relevance Search - Deleting samples from the dataset ### Notes - Added numerous tests, let me know if you would like to shorten them or make smarter --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-06 12:47:33 -07:00
leo-gan	fd69cc7e42	Removed duplicate BaseModel dependencies (#2471 ) Removed duplicate BaseModel dependencies in class inheritances. Also, sorted imports by `isort`.	2023-04-06 12:45:16 -07:00
sergerdn	b410dc76aa	fix: elasticsearch (#2402 ) - Create a new docker-compose file to start an Elasticsearch instance for integration tests. - Add new tests to `test_elasticsearch.py` to verify Elasticsearch functionality. - Include an optional group `test_integration` in the `pyproject.toml` file. This group should contain dependencies for integration tests and can be installed using the command `poetry install --with test_integration`. Any new dependencies should be added by running `poetry add some_new_deps --group "test_integration" ` Note: New tests running in live mode, which involve end-to-end testing of the OpenAI API. In the future, adding `pytest-vcr` to record and replay all API requests would be a nice feature for testing process.More info: https://pytest-vcr.readthedocs.io/en/latest/ Fixes https://github.com/hwchase17/langchain/issues/2386	2023-04-05 06:51:32 -07:00
Harrison Chase	0a9f04bad9	Harrison/gpt4all (#2366 ) Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-04-04 06:49:17 -07:00
Harrison Chase	e90d007db3	Harrison/msg files (#2375 ) Co-authored-by: Sahil Masand <masand.sahil@gmail.com> Co-authored-by: Sahil Masand <masands@cbh.com.au>	2023-04-04 06:48:34 -07:00
Kacper Łukawski	585f60a5aa	Qdrant update to 1.1.1 & docs polishing (#2388 ) This PR updates Qdrant to 1.1.1 and introduces local mode, so there is no need to spin up the Qdrant server. By that occasion, the Qdrant example notebooks also got updated, covering more cases and answering some commonly asked questions. All the Qdrant's integration tests were switched to local mode, so no Docker container is required to launch them.	2023-04-04 06:48:21 -07:00
Harrison Chase	d85f57ef9c	Harrison/llama (#2314 ) Co-authored-by: RJ Adriaansen <adriaansen@eshcc.eur.nl>	2023-04-02 14:57:45 -07:00
akmhmgc	94b2f536f3	Modify output for wikipedia api wrapper (#2287 ) ## Description Thanks for the quick maintenance for great repository!! I modified wikipedia api wrapper ## Details - Add output for missing search results - Add tests	2023-04-02 14:00:27 -07:00
Sam Cordner-Matthews	1ddd6dbf0b	Add ability to pass kwargs to loader classes in `DirectoryLoader`, add ability to modify encoding and BeautifulSoup behaviour in `BSHTMLLoader` (#2275 ) Solves #2247. Noted that the only test I added checks for the BeautifulSoup behaviour change. Happy to add a test for `DirectoryLoader` if deemed necessary.	2023-04-01 12:48:27 -07:00
Harrison Chase	b35260ed47	Harrison/memory base (#2122 ) @3coins + @zoltan-fedor.... heres the pr + some minor changes i made. thoguhts? can try to get it into tmrws release --------- Co-authored-by: Zoltan Fedor <zoltan.0.fedor@gmail.com> Co-authored-by: Piyush Jain <piyushjain@duck.com>	2023-03-29 10:10:09 -07:00
Ankush Gola	ccee1aedd2	add async support for anthropic (#2114 ) should not be merged in before https://github.com/anthropics/anthropic-sdk-python/pull/11 gets released	2023-03-28 22:49:14 -04:00
Harrison Chase	3e879b47c1	Harrison/gitbook (#2044 ) Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>	2023-03-28 15:28:33 -07:00
Honkware	aff33d52c5	Add OpenWeatherMap API Tool (#2083 ) Added tool for OpenWeatherMap API	2023-03-28 12:02:14 -07:00
Charlie Holtz	f16c1fb6df	Add replicate take 2 (#2077 ) This PR adds a replicate integration to langchain. It's an updated version of https://github.com/hwchase17/langchain/pull/1993, but with updates to match latest replicate-python code. https://github.com/replicate/replicate-python. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Zeke Sikelianos <zeke@sikelianos.com>	2023-03-28 11:56:57 -07:00
Harrison Chase	f281033362	rm pandas dependency (#2102 )	2023-03-28 08:38:19 -07:00
Harrison Chase	410bf37fb8	Harrison/big query (#2100 ) Co-authored-by: lu-cashmoney <lucas.corley@gmail.com>	2023-03-28 08:17:22 -07:00
Harrison Chase	eff5eed719	Harrison/jina (#2043 ) Co-authored-by: numb3r3 <wangfelix87@gmail.com> Co-authored-by: felix-wang <35718120+numb3r3@users.noreply.github.com>	2023-03-28 08:16:17 -07:00

1 2 3 4 5 ...

350 Commits