langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Stanko Kuveljic	9d13dcd17c	Pinecone: Add V4 support (#7473 )	2023-07-10 08:39:47 -07:00
Adilkhan Sarsen	5debd5043e	Added deeplake use case examples of the new features (#6528 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @hwchase17 VectorStores / Retrievers / Memory - @dev2049 --> 1. Added use cases of the new features 2. Done some code refactoring --------- Co-authored-by: Ivo Stranic <istranic@gmail.com>	2023-07-10 07:04:29 -07:00
Yifei Song	7d29bb2c02	Add Xorbits Dataframe as a Document Loader (#7319 ) - [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source computing framework that makes it easy to scale data science and machine learning workloads in parallel. Xorbits can leverage multi cores or GPUs to accelerate computation on a single machine, or scale out up to thousands of machines to support processing terabytes of data. - This PR added support for the Xorbits document loader, which allows langchain to leverage Xorbits to parallelize and distribute the loading of data. - Dependencies: This change requires the Xorbits library to be installed in order to be used. `pip install xorbits` - Request for review: @rlancemartin, @eyurtsev - Twitter handle: https://twitter.com/Xorbitsio Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-10 04:24:47 -04:00
Sergio Moreno	21a353e9c2	feat: ctransformers support async chain (#6859 ) - Description: Adding async method for CTransformers - Issue: I've found impossible without this code to run Websockets inside a FastAPI micro service and a CTransformers model. - Tag maintainer: Not necessary yet, I don't like to mention directly - Twitter handle: @_semoal	2023-07-10 04:23:41 -04:00
Paul-Emile Brotons	d2cf0d16b3	adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310 ) Adding a maximal_marginal_relevance method to the MongoDBAtlasVectorSearch vectorstore enhances the user experience by providing more diverse search results Issue: #7304	2023-07-10 04:04:19 -04:00
Matt Robinson	bcab894f4e	feat: Add `UnstructuredTSVLoader` (#7367 ) ### Summary Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc strings for `UnstructuredCSV` and `UnstructuredExcel` loaders. ### Testing ```python from langchain.document_loaders.tsv import UnstructuredTSVLoader loader = UnstructuredTSVLoader( file_path="example_data/mlb_teams_2012.csv", mode="elements" ) docs = loader.load() ```	2023-07-10 03:07:10 -04:00
Jona Sassenhagen	7ffc431b3a	Add spacy sentencizer (#7442 ) `SpacyTextSplitter` currently uses spacy's statistics-based `en_core_web_sm` model for sentence splitting. This is a good splitter, but it's also pretty slow, and in this case it's doing a lot of work that's not needed given that the spacy parse is then just thrown away. However, there is also a simple rules-based spacy sentencizer. Using this is at least an order of magnitude faster than using `en_core_web_sm` according to my local tests. Also, spacy sentence tokenization based on `en_core_web_sm` can be sped up in this case by not doing the NER stage. This shaves some cycles too, both when loading the model and when parsing the text. Consequently, this PR adds the option to use the basic spacy sentencizer, and it disables the NER stage for the current approach, which is kept as the default. Lastly, when extracting the tokenized sentences, the `text` attribute is called directly instead of doing the string conversion, which is IMO a bit more idiomatic.	2023-07-10 02:52:05 -04:00
Daniel Chalef	c7f7788d0b	Add ZepMemory; improve ZepChatMessageHistory handling of metadata; Fix bugs (#7444 ) Hey @hwchase17 - This PR adds a `ZepMemory` class, improves handling of Zep's message metadata, and makes it easier for folks building custom chains to persist metadata alongside their chat history. We've had plenty confused users unfamiliar with ChatMessageHistory classes and how to wrap the `ZepChatMessageHistory` in a `ConversationBufferMemory`. So we've created the `ZepMemory` class as a light wrapper for `ZepChatMessageHistory`. Details: - add ZepMemory, modify notebook to demo use of ZepMemory - Modify summary to be SystemMessage - add metadata argument to add_message; add Zep metadata to Message.additional_kwargs - support passing in metadata	2023-07-10 01:53:49 -04:00
Delgermurun	a1603fccfb	integrate JinaChat (#6927 ) Integration with https://chat.jina.ai/api. It is OpenAI compatible API. - Twitter handle: [https://twitter.com/JinaAI_](https://twitter.com/JinaAI_) --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-07-08 02:17:04 -04:00
William FH	612a74eb7e	Make Ref Example Threadsafe (#7383 ) Have noticed transient ref example misalignment. I believe this is caused by the logic of assigning an example within the thread executor rather than before.	2023-07-07 21:50:42 -07:00
William FH	4789c99bc2	Add String Distance and Embedding Evaluators (#7123 ) Add a string evaluator and pairwise string evaluator implementation for: - Embedding distance - String distance Update docs	2023-07-07 21:44:31 -07:00
William FH	c5edbea34a	Load Run Evaluator (#7101 ) Current problems: 1. Evaluating LLMs or Chat models isn't smooth. Even specifying 'generations' as the output inserts a redundant list into the eval template 2. Configuring input / prediction / reference keys in the `get_qa_evaluator` function is confusing. Unless you are using a chain with the default keys, you have to specify all the variables and need to reason about whether the key corresponds to the traced run's inputs, outputs or the examples inputs or outputs. Proposal: - Configure the run evaluator according to a model. Use the model type and input/output keys to assert compatibility where possible. Only need to specify a reference_key for certain evaluators (which is less confusing than specifying input keys) When does this work: - If you have your langchain model available (assumed always for run_on_dataset flow) - If you are evaluating an LLM, Chat model, or chain - If the LLM or chat models are traced by langchain (wouldn't work if you add an incompatible schema via the REST API) When would this fail: - Currently if you directly create an example from an LLM run, the outputs are generations with all the extra metadata present. A simple `example_key` and dumping all to the template could make the evaluations unreliable - Doesn't help if you're not using the low level API - If you want to instantiate the evaluator without instantiating your chain or LLM (maybe common for monitoring, for instance) -> could also load from run or run type though What's ugly: - Personally think it's better to load evaluators one by one since passing a config down is pretty confusing. - Lots of testing needs to be added - Inconsistent in that it makes a separate run and example input mapper instead of the original `RunEvaluatorInputMapper`, which maps a run and example to a single input. Example usage running the for an LLM, Chat Model, and Agent. ``` # Test running for the string evaluators evaluator_names = ["qa", "criteria"] model = ChatOpenAI() configured_evaluators = load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer") run_on_dataset(ds_name, model, run_evaluators=configured_evaluators) ``` <details> <summary>Full code with dataset upload</summary> ``` ## Create dataset from langchain.evaluation.run_evaluators.loading import load_run_evaluators_for_model from langchain.evaluation import load_dataset import pandas as pd lcds = load_dataset("llm-math") df = pd.DataFrame(lcds) from uuid import uuid4 from langsmith import Client client = Client() ds_name = "llm-math - " + str(uuid4())[0:8] ds = client.upload_dataframe(df, name=ds_name, input_keys=["question"], output_keys=["answer"]) ## Define the models we'll test over from langchain.llms import OpenAI from langchain.chat_models import ChatOpenAI from langchain.agents import initialize_agent, AgentType from langchain.tools import tool llm = OpenAI(temperature=0) chat_model = ChatOpenAI(temperature=0) @tool def sum(a: float, b: float) -> float: """Add two numbers""" return a + b def construct_agent(): return initialize_agent( llm=chat_model, tools=[sum], agent=AgentType.OPENAI_MULTI_FUNCTIONS, ) agent = construct_agent() # Test running for the string evaluators evaluator_names = ["qa", "criteria"] models = [llm, chat_model, agent] run_evaluators = [] for model in models: run_evaluators.append(load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer")) # Run on LLM, Chat Model, and Agent from langchain.client.runner_utils import run_on_dataset to_test = [llm, chat_model, construct_agent] for model, configured_evaluators in zip(to_test, run_evaluators): run_on_dataset(ds_name, model, run_evaluators=configured_evaluators, verbose=True) ``` </details> --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-07-07 19:57:59 -07:00
Bagatur	4d427b2397	Base language model docstrings (#7104 )	2023-07-07 16:09:10 -04:00
William FH	4e180dc54e	Unset Cache in Tests (#7362 ) This is impacting other unit tests that use callbacks since the cache is still set (just empty)	2023-07-07 11:05:09 -07:00
German Martin	3ce4e46c8c	The Fellowship of the Vectors: New Embeddings Filter using clustering. (#7015 ) Continuing with Tolkien inspired series of langchain tools. I bring to you: The Fellowship of the Vectors, AKA EmbeddingsClusteringFilter. This document filter uses embeddings to group vectors together into clusters, then allows you to pick an arbitrary number of documents vector based on proximity to the cluster centers. That's a representative sample of the cluster. The original idea is from [Greg Kamradt](https://github.com/gkamradt) from this video (Level4): https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s I added few tricks to make it a bit more versatile, so you can parametrize what to do with duplicate documents in case of cluster overlap: replace the duplicates with the next closest document or remove it. This allow you to use it as an special kind of redundant filter too. Additionally you can choose 2 diff orders: grouped by cluster or respecting the original retriever scores. In my use case I was using the docs grouped by cluster to run refine chains per cluster to generate summarization over a large corpus of documents. Let me know if you want to change anything! @rlancemartin, @eyurtsev, @hwchase17, --------- Co-authored-by: rlm <pexpresss31@gmail.com>	2023-07-07 10:28:17 -07:00
Bagatur	927c8eb91a	Refac package version check (#7312 )	2023-07-07 01:21:53 -04:00
Jason B. Koh	d642609a23	Fix: Recognize `List` at `from_function` (#7178 ) - Description: pydantic's `ModelField.type_` only exposes the native data type but not complex type hints like `List`. Thus, generating a Tool with `from_function` through function signature produces incorrect argument schemas (e.g., `str` instead of `List[str]`) - Issue: N/A - Dependencies: N/A - Tag maintainer: @hinthornw - Twitter handle: `mapped` All the unittest (with an additional one in this PR) passed, though I didn't try integration tests...	2023-07-06 17:22:09 -04:00
William FH	e736d60516	Load Evaluator (#6942 ) Create a `load_evaluators()` function so you don't have to import all the individual evaluator classes	2023-07-06 13:58:58 -07:00
Jan Kubica	fed64ae060	Chroma: add vector search with scores (#6864 ) - Description: Adding to Chroma integration the option to run a similarity search by a vector with relevance scores. Fixing two minor typos. - Issue: The "lambda_mult" typo is related to #4861 - Maintainer: @rlancemartin, @eyurtsev	2023-07-06 10:01:55 -04:00
William FH	576880abc5	Re-use Trajectory Evaluator (#7248 ) Use the trajectory eval chain in the run evaluation implementation and update the prepare inputs method to apply to both asynca nd sync	2023-07-06 07:00:24 -07:00
William FH	ec66d5188c	Add Better Errors for Comparison Chain (#7033 ) + change to ABC - this lets us add things like the evaluation name for loading	2023-07-06 06:37:04 -07:00
Sasmitha Manathunga	0c7a5cb206	Fix inconsistent behavior of `CharacterTextSplitter` when changing `keep_separator` (#7263 ) - Description: - When `keep_separator` is `True` the `_split_text_with_regex()` method in `text_splitter` uses regex to split, but when `keep_separator` is `False` it uses `str.split()`. This causes problems when the separator is a special regex character like `.` or `*`. This PR fixes that by using `re.split()` in both cases. - Issue: #7262 - Tag maintainer: @baskaryan	2023-07-06 09:30:03 -04:00
Mike Nitsenko	d669b9ece9	Document loader for Cube Semantic Layer (#6882 ) ### Description This pull request introduces the "Cube Semantic Layer" document loader, which demonstrates the retrieval of Cube's data model metadata in a format suitable for passing to LLMs as embeddings. This enhancement aims to provide contextual information and improve the understanding of data. Twitter handle: @the_cube_dev --------- Co-authored-by: rlm <pexpresss31@gmail.com>	2023-07-05 15:18:12 -07:00
Tom	e533da8bf2	Adding Marqo to vectorstore ecosystem (#7068 ) This PR brings in a vectorstore interface for [Marqo](https://www.marqo.ai/). The Marqo vectorstore exposes some of Marqo's functionality in addition the the VectorStore base class. The Marqo vectorstore also makes the embedding parameter optional because inference for embeddings is an inherent part of Marqo. Docs, notebook examples and integration tests included. Related PR: https://github.com/hwchase17/langchain/pull/2807 --------- Co-authored-by: Tom Hamer <tom@marqo.ai> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-07-05 14:44:12 -07:00
Mike Salvatore	265f05b10e	Enable InMemoryDocstore to be constructed without providing a dict (#6976 ) - Description: Allow `InMemoryDocstore` to be created without passing a dict to the constructor; the constructor can create a dict at runtime if one isn't provided. - Tag maintainer: @dev2049	2023-07-05 16:56:31 -04:00
Harrison Chase	6711854e30	Harrison/dataforseo (#7214 ) Co-authored-by: Alexander <sune357@gmail.com>	2023-07-05 16:02:02 -04:00
Richy Wang	cab7d86f23	Implement delete interface of vector store on AnalyticDB (#7170 ) Hi, there This pull request contains two commit: 1. Implement delete interface with optional ids parameter on AnalyticDB. 2. Allow customization of database connection behavior by exposing engine_args parameter in interfaces. - This commit adds the `engine_args` parameter to the interfaces, allowing users to customize the behavior of the database connection. The `engine_args` parameter accepts a dictionary of additional arguments that will be passed to the create_engine function. Users can now modify various aspects of the database connection, such as connection pool size and recycle time. This enhancement provides more flexibility and control to users when interacting with the database through the exposed interfaces. This commit is related to VectorStores @rlancemartin @eyurtsev Thank you for your attention and consideration.	2023-07-05 13:01:00 -07:00
Jamal	a2f191a322	Replace JIRA Arbitrary Code Execution vulnerability with finer grain API wrapper (#6992 ) This fixes #4833 and the critical vulnerability https://nvd.nist.gov/vuln/detail/CVE-2023-34540 Previously, the JIRA API Wrapper had a mode that simply pipelined user input into an `exec()` function. [The intended use of the 'other' mode is to cover any of Atlassian's API that don't have an existing interface](`cc33bde74f/langchain/tools/jira/prompt.py (L24)`) Fortunately all of the [Atlassian JIRA API methods are subfunctions of their `Jira` class](https://atlassian-python-api.readthedocs.io/jira.html), so this implementation calls these subfunctions directly. As well as passing a string representation of the function to call, the implementation flexibly allows for optionally passing args and/or keyword-args. These are given as part of the dictionary input. Example: ``` { "function": "update_issue_field", #function to execute "args": [ #list of ordered args similar to other examples in this JiraAPIWrapper "key", {"summary": "New summary"} ], "kwargs": {} #dict of key value keyword-args pairs } ``` the above is equivalent to `self.jira.update_issue_field("key", {"summary": "New summary"})` Alternate query schema designs are welcome to make querying easier without passing and evaluating arbitrary python code. I considered parsing (without evaluating) input python code and extracting the function, args, and kwargs from there and then pipelining them into the callable function via `f(args, *kwargs)` - but this seemed more direct. @vowelparrot @dev2049 --------- Co-authored-by: Jamal Rahman <jamal.rahman@builder.ai>	2023-07-05 15:56:01 -04:00
Santiago Delgado	fa55c5a16b	Fixed Office365 tool __init__.py files, tests, and get_tools() function (#7046 ) ## Description Added Office365 tool modules to `__init__.py` files ## Issue As described in Issue https://github.com/hwchase17/langchain/issues/6936, the Office365 toolkit can't be loaded easily because it is not included in the `__init__.py` files. ## Reviewer @dev2049	2023-07-05 15:46:21 -04:00
Ankush Gola	4c1c05c2c7	support adding custom metadata to runs (#7120 ) - [x] wire up tools - [x] wire up retrievers - [x] add integration test <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-07-05 11:11:38 -07:00
Mohammad Mohtashim	7d92e9407b	Jinja2 validation changed to issue warnings rather than issuing exceptions. (#7161 ) - Description: If their are missing or extra variables when validating Jinja 2 template then a warning is issued rather than raising an exception. This allows for better flexibility for the developer as described in #7044. Also changed the relevant test so pytest is checking for raised warnings rather than exceptions. - Issue: #7044 - Tag maintainer: @hwchase17, @baskaryan --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-07-05 14:04:29 -04:00
Nuno Campos	81e5b1ad36	Add serialized object to retriever start callback (#7074 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @dev2049 - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @dev2049 - Memory: @hwchase17 - Agents / Tools / Toolkits: @vowelparrot - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-07-05 18:04:43 +01:00
felixocker	db98c44f8f	Support for SPARQL (#7165 ) # [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) for [LangChain](https://github.com/hwchase17/langchain) ## Description LangChain support for knowledge graphs relying on W3C standards using RDFlib: SPARQL/ RDF(S)/ OWL with special focus on RDF \ * Works with local files, files from the web, and SPARQL endpoints * Supports both SELECT and UPDATE queries * Includes both a Jupyter notebook with an example and integration tests ## Contribution compared to related PRs and discussions * [Wikibase agent](https://github.com/hwchase17/langchain/pull/2690) - uses SPARQL, but specifically for wikibase querying * [Cypher qa](https://github.com/hwchase17/langchain/pull/5078) - graph DB question answering for Neo4J via Cypher * [PR 6050](https://github.com/hwchase17/langchain/pull/6050) - tries something similar, but does not cover UPDATE queries and supports only RDF * Discussions on [w3c mailing list](mailto:semantic-web@w3.org) related to the combination of LLMs (specifically ChatGPT) and knowledge graphs ## Dependencies * [RDFlib](https://github.com/RDFLib/rdflib) ## Tag maintainer Graph database related to memory -> @hwchase17	2023-07-05 13:00:16 -04:00
Harrison Chase	0ad984fa27	Docs combine document chain (#6994 ) Co-authored-by: Dev 2049 <dev.dev2049@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-04 12:51:04 -06:00
Simon Cheung	81eebc4070	Add HugeGraphQAChain to support gremlin generating chain (#7132 ) [Apache HugeGraph](https://github.com/apache/incubator-hugegraph) is a convenient, efficient, and adaptable graph database, compatible with the Apache TinkerPop3 framework and the Gremlin query language. In this PR, the HugeGraph and HugeGraphQAChain provide the same functionality as the existing integration with Neo4j and enables query generation and question answering over HugeGraph database. The difference is that the graph query language supported by HugeGraph is not cypher but another very popular graph query language [Gremlin](https://tinkerpop.apache.org/gremlin.html). A notebook example and a simple test case have also been added. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-04 10:21:21 -06:00
Nuno Campos	696886f397	Use serialized format for messages in tracer (#6827 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @dev2049 - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @dev2049 - Memory: @hwchase17 - Agents / Tools / Toolkits: @vowelparrot - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-07-04 10:19:08 +01:00
Nuno Campos	c8f8b1b327	Add events to tracer runs (#7090 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @dev2049 - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @dev2049 - Memory: @hwchase17 - Agents / Tools / Toolkits: @vowelparrot - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-07-03 12:43:43 -07:00
Mike Salvatore	d0c7f7c317	Remove `None` default value for FAISS relevance_score_fn (#7085 ) ## Description The type hint for `FAISS.__init__()`'s `relevance_score_fn` parameter allowed the parameter to be set to `None`. However, a default function is provided by the constructor. This led to an unnecessary check in the code, as well as a test to verify this check. ASSUMPTION: There's no reason to ever set `relevance_score_fn` to `None`. This PR changes the type hint and removes the unnecessary code.	2023-07-03 10:11:49 -06:00
Sergey Kozlov	6d15854cda	Add JSON Lines support to JSONLoader (#6913 ) Description: The JSON Lines format is used by some services such as OpenAI and HuggingFace. It's also a convenient alternative to CSV. This PR adds JSON Lines support to `JSONLoader` and also updates related tests. Tag maintainer: @rlancemartin, @eyurtsev. PS I was not able to build docs locally so didn't update related section.	2023-07-02 12:32:41 -07:00
Ofer Mendelevitch	153b56d19b	Vectara upd2 (#6506 ) Update to Vectara integration - By user request added "add_files" to take advantage of Vectara capabilities to process files on the backend, without the need for separate loading of documents and chunking in the chain. - Updated vectara.ipynb example notebook to be broader and added testing of add_file() @hwchase17 - project lead --------- Co-authored-by: rlm <pexpresss31@gmail.com>	2023-07-02 12:15:50 -07:00
Bagatur	7acd524210	Rm retriever kwargs (#7013 ) Doesn't actually limit the Retriever interface but hopefully in practice it does	2023-07-02 08:22:24 -06:00
skspark	e5f6f0ffc4	Support params on GoogleSearchApiWrapper (#6810 ) (#7014 ) ## Description Support search params in GoogleSearchApiWrapper's result call, for the extra filtering on search, to support extra query parameters that google cse provides: https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list?hl=ko ## Issue #6810	2023-07-02 01:18:38 -06:00
Stefano Lottini	8d2281a8ca	Second Attempt - Add concurrent insertion of vector rows in the Cassandra Vector Store (#7017 ) Retrying with the same improvements as in #6772, this time trying not to mess up with branches. @rlancemartin doing a fresh new PR from a branch with a new name. This should do. Thank you for your help! --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> Co-authored-by: rlm <pexpresss31@gmail.com>	2023-07-01 11:09:52 -07:00
Harrison Chase	3bfe7cf467	Harrison/split schema dir (#7025 ) should be no functional changes also keep __init__ exposing a lot for backwards compat --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-01 13:39:19 -04:00
Matt Robinson	0498dad562	feat: enable `UnstructuredEmailLoader` to process attachments (#6977 ) ### Summary Updates `UnstructuredEmailLoader` so that it can process attachments in addition to the e-mail content. The loader will process attachments if the `process_attachments` kwarg is passed when the loader is instantiated. ### Testing ```python file_path = "fake-email-attachment.eml" loader = UnstructuredEmailLoader( file_path, mode="elements", process_attachments=True ) docs = loader.load() docs[-1] ``` ### Reviewers - @rlancemartin - @eyurtsev - @hwchase17	2023-07-01 06:09:26 -07:00
Zander Chase	b0859c9b18	Add New Retriever Interface with Callbacks (#5962 ) Handle the new retriever events in a way that (I think) is entirely backwards compatible? Needs more testing for some of the chain changes and all. This creates an entire new run type, however. We could also just treat this as an event within a chain run presumably (same with memory) Adds a subclass initializer that upgrades old retriever implementations to the new schema, along with tests to ensure they work. First commit doesn't upgrade any of our retriever implementations (to show that we can pass the tests along with additional ones testing the upgrade logic). Second commit upgrades the known universe of retrievers in langchain. - [X] Add callback handling methods for retriever start/end/error (open to renaming to 'retrieval' if you want that) - [X] Update BaseRetriever schema to support callbacks - [X] Tests for upgrading old "v1" retrievers for backwards compatibility - [X] Update existing retriever implementations to implement the new interface - [X] Update calls within chains to .{a]get_relevant_documents to pass the child callback manager - [X] Update the notebooks/docs to reflect the new interface - [X] Test notebooks thoroughly Not handled: - Memory pass throughs: retrieval memory doesn't have a parent callback manager passed through the method --------- Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>	2023-06-30 14:44:03 -07:00
Bagatur	e3b7effc8f	Beef up import test (#6979 )	2023-06-30 09:26:05 -07:00
William FH	8c73037dff	Simplify eval arg names (#6944 ) It'll be easier to switch between these if the names of predictions are consistent	2023-06-30 07:47:53 -07:00
Tahjyei Thompson	7d8830f707	Add `OpenAIMultiFunctionsAgent` to import list in agents directory (#6824 ) - Added OpenAIMultiFunctionsAgent to the import list of the Agents directory --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-29 18:34:26 -07:00
Zander Chase	429f4dbe4d	Add Input Mapper in run_on_dataset (#6894 ) If you create a dataset from runs and run the same chain or llm on it later, it usually works great. If you have an agent dataset and want to run a different agent on it, or have more complex schema, it's hard for us to automatically map these values every time. This PR lets you pass in an input_mapper function that converts the example inputs to whatever format your model expects	2023-06-29 16:53:49 -07:00

1 2 3 4 5 ...

693 Commits