langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Lance Martin	4a94f56258	Minor edits to QA docs (#7507 ) Small clean-ups	2023-07-10 22:15:05 -07:00
Raymond Yuan	5171c3bcca	Refactor vector storage to correctly handle relevancy scores (#6570 ) Description: This pull request aims to support generating the correct generic relevancy scores for different vector stores by refactoring the relevance score functions and their selection in the base class and subclasses of VectorStore. This is especially relevant with VectorStores that require a distance metric upon initialization. Note many of the current implenetations of `_similarity_search_with_relevance_scores` are not technically correct, as they just return `self.similarity_search_with_score(query, k, **kwargs)` without applying the relevant score function Also includes changes associated with: https://github.com/hwchase17/langchain/pull/6564 and https://github.com/hwchase17/langchain/pull/6494 See more indepth discussion in thread in #6494 Issue: https://github.com/hwchase17/langchain/issues/6526 https://github.com/hwchase17/langchain/issues/6481 https://github.com/hwchase17/langchain/issues/6346 Dependencies: None The changes include: - Properly handling score thresholding in FAISS `similarity_search_with_score_by_vector` for the corresponding distance metric. - Refactoring the `_similarity_search_with_relevance_scores` method in the base class and removing it from the subclasses for incorrectly implemented subclasses. - Adding a `_select_relevance_score_fn` method in the base class and implementing it in the subclasses to select the appropriate relevance score function based on the distance strategy. - Updating the `__init__` methods of the subclasses to set the `relevance_score_fn` attribute. - Removing the `_default_relevance_score_fn` function from the FAISS class and using the base class's `_euclidean_relevance_score_fn` instead. - Adding the `DistanceStrategy` enum to the `utils.py` file and updating the imports in the vector store classes. - Updating the tests to import the `DistanceStrategy` enum from the `utils.py` file. --------- Co-authored-by: Hanit <37485638+hanit-com@users.noreply.github.com>	2023-07-10 20:37:03 -07:00
Lance Martin	bd0c6381f5	Minor update to clarify map-reduce custom prompt usage (#7453 ) Update docs for map-reduce custom prompt usage	2023-07-10 16:43:44 -07:00
Lance Martin	28d2b213a4	Update landing page for "question answering over documents" (#7152 ) Improve documentation for a central use-case, qa / chat over documents. This will be merged as an update to `index.mdx` [here](https://python.langchain.com/docs/use_cases/question_answering/). Testing w/ local Docusaurus server: ``` From `docs` directory: mkdir _dist cp -r {docs_skeleton,snippets} _dist cp -r extras/* _dist/docs_skeleton/docs cd _dist/docs_skeleton yarn install yarn start ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-10 14:15:13 -07:00
William FH	dd648183fa	Rm create_project line (#7486 ) not needed	2023-07-10 10:49:55 -07:00
Leonid Ganeline	5eec74d9a5	docstrings `document_loaders` 3 (#6937 ) - Updated docstrings for `document_loaders` - Mass update `"""Loader that loads` to `"""Loads` @baskaryan - please, review	2023-07-10 08:56:53 -07:00
Stanko Kuveljic	9d13dcd17c	Pinecone: Add V4 support (#7473 )	2023-07-10 08:39:47 -07:00
Adilkhan Sarsen	5debd5043e	Added deeplake use case examples of the new features (#6528 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @hwchase17 VectorStores / Retrievers / Memory - @dev2049 --> 1. Added use cases of the new features 2. Done some code refactoring --------- Co-authored-by: Ivo Stranic <istranic@gmail.com>	2023-07-10 07:04:29 -07:00
Bagatur	9b615022e2	bump 229 (#7467 )	2023-07-10 04:38:55 -04:00
Kazuki Maeda	92b4418c8c	Datadog logs loader (#7356 ) ### Description Created a Loader to get a list of specific logs from Datadog Logs. ### Dependencies `datadog_api_client` is required. ### Twitter handle [kzk_maeda](https://twitter.com/kzk_maeda) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-10 04:27:55 -04:00
Yifei Song	7d29bb2c02	Add Xorbits Dataframe as a Document Loader (#7319 ) - [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source computing framework that makes it easy to scale data science and machine learning workloads in parallel. Xorbits can leverage multi cores or GPUs to accelerate computation on a single machine, or scale out up to thousands of machines to support processing terabytes of data. - This PR added support for the Xorbits document loader, which allows langchain to leverage Xorbits to parallelize and distribute the loading of data. - Dependencies: This change requires the Xorbits library to be installed in order to be used. `pip install xorbits` - Request for review: @rlancemartin, @eyurtsev - Twitter handle: https://twitter.com/Xorbitsio Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-10 04:24:47 -04:00
Sergio Moreno	21a353e9c2	feat: ctransformers support async chain (#6859 ) - Description: Adding async method for CTransformers - Issue: I've found impossible without this code to run Websockets inside a FastAPI micro service and a CTransformers model. - Tag maintainer: Not necessary yet, I don't like to mention directly - Twitter handle: @_semoal	2023-07-10 04:23:41 -04:00
Paul-Emile Brotons	d2cf0d16b3	adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310 ) Adding a maximal_marginal_relevance method to the MongoDBAtlasVectorSearch vectorstore enhances the user experience by providing more diverse search results Issue: #7304	2023-07-10 04:04:19 -04:00
Bagatur	04cddfba0d	Add lark import error (#7465 )	2023-07-10 03:21:23 -04:00
Matt Robinson	bcab894f4e	feat: Add `UnstructuredTSVLoader` (#7367 ) ### Summary Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc strings for `UnstructuredCSV` and `UnstructuredExcel` loaders. ### Testing ```python from langchain.document_loaders.tsv import UnstructuredTSVLoader loader = UnstructuredTSVLoader( file_path="example_data/mlb_teams_2012.csv", mode="elements" ) docs = loader.load() ```	2023-07-10 03:07:10 -04:00
Ronald Li	490f4a9ff0	Fixes KeyError in AmazonKendraRetriever initializer (#7464 ) ### Description argument variable client is marked as required in commit `81e5b1ad36` which breaks the default way of initialization providing only index_id. This commit avoid KeyError exception when it is initialized without a client variable ### Dependencies no dependency required	2023-07-10 03:02:36 -04:00
Jona Sassenhagen	7ffc431b3a	Add spacy sentencizer (#7442 ) `SpacyTextSplitter` currently uses spacy's statistics-based `en_core_web_sm` model for sentence splitting. This is a good splitter, but it's also pretty slow, and in this case it's doing a lot of work that's not needed given that the spacy parse is then just thrown away. However, there is also a simple rules-based spacy sentencizer. Using this is at least an order of magnitude faster than using `en_core_web_sm` according to my local tests. Also, spacy sentence tokenization based on `en_core_web_sm` can be sped up in this case by not doing the NER stage. This shaves some cycles too, both when loading the model and when parsing the text. Consequently, this PR adds the option to use the basic spacy sentencizer, and it disables the NER stage for the current approach, which is kept as the default. Lastly, when extracting the tokenized sentences, the `text` attribute is called directly instead of doing the string conversion, which is IMO a bit more idiomatic.	2023-07-10 02:52:05 -04:00
charosen	50a9fcccb0	feat(module): add param ids to ElasticVectorSearch.from_texts method (#7425 ) # add param ids to ElasticVectorSearch.from_texts method. - Description: add param ids to ElasticVectorSearch.from_texts method. - Issue: NA. It seems `add_texts` already supports passing in document ids, but param `ids` is omitted in `from_texts` classmethod, - Dependencies: None, - Tag maintainer: @rlancemartin, @eyurtsev please have a look, thanks ``` # ElasticVectorSearch add_texts def add_texts( self, texts: Iterable[str], metadatas: Optional[List[dict]] = None, refresh_indices: bool = True, ids: Optional[List[str]] = None, kwargs: Any, ) -> List[str]: ... ``` ``` # ElasticVectorSearch from_texts @classmethod def from_texts( cls, texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, elasticsearch_url: Optional[str] = None, index_name: Optional[str] = None, refresh_indices: bool = True, kwargs: Any, ) -> ElasticVectorSearch: ``` Co-authored-by: charosen <charosen@bupt.cn>	2023-07-10 02:25:35 -04:00
James Yin	a5fd8873b1	fix: type hint of get_chat_history in BaseConversationalRetrievalChain (#7461 ) The type hint of `get_chat_history` property in `BaseConversationalRetrievalChain` is incorrect. @baskaryan	2023-07-10 02:14:00 -04:00
nikkie	dfc3f83b0f	docs(vectorstores/integrations/chroma): Fix loading and saving (#7437 ) - Description: Fix loading and saving code about Chroma - Issue: the issue #7436 - Dependencies: - - Twitter handle: https://twitter.com/ftnext	2023-07-10 02:05:15 -04:00
Daniel Chalef	c7f7788d0b	Add ZepMemory; improve ZepChatMessageHistory handling of metadata; Fix bugs (#7444 ) Hey @hwchase17 - This PR adds a `ZepMemory` class, improves handling of Zep's message metadata, and makes it easier for folks building custom chains to persist metadata alongside their chat history. We've had plenty confused users unfamiliar with ChatMessageHistory classes and how to wrap the `ZepChatMessageHistory` in a `ConversationBufferMemory`. So we've created the `ZepMemory` class as a light wrapper for `ZepChatMessageHistory`. Details: - add ZepMemory, modify notebook to demo use of ZepMemory - Modify summary to be SystemMessage - add metadata argument to add_message; add Zep metadata to Message.additional_kwargs - support passing in metadata	2023-07-10 01:53:49 -04:00
Saurabh Chaturvedi	8f8e8d701e	Fix info about YouTube (#7447 ) (Unintentionally mean 😅) nit: YouTube wasn't created by Google, this PR fixes the mention in docs.	2023-07-10 01:52:55 -04:00
Leonid Ganeline	560c4dfc98	docstrings: `docstore` and `client` (#6783 ) updated docstrings in `docstore/` and `client/` @baskaryan	2023-07-09 01:34:28 -04:00
Jeroen Van Goey	f5bd88757e	Fix typo (#7416 ) `quesitons` -> `questions`.	2023-07-09 00:54:48 -04:00
Alejandro Garrido Mota	ea9c3cc9c9	Fix syntax erros in documentation (#7409 ) - Description: Tiny documentation fix. In Python, when defining function parameters or providing arguments to a function or class constructor, we do not use the `:` character. - Issue: N/A - Dependencies: N/A, - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: @mogaal	2023-07-08 19:52:01 -04:00
Nolan	5da9f9abcb	docs(agents/toolkits): Fix error in document_comparison_toolkit.ipynb (#7417 ) Replace this comment with: - Description: Removes unneeded output warning in documentation at https://python.langchain.com/docs/modules/agents/toolkits/document_comparison_toolkit - Issue: - - Dependencies: - - Tag maintainer: @baskaryan - Twitter handle: @finnless	2023-07-08 19:51:08 -04:00
nikkie	2eb4a2ceea	docs(retrievers/get-started): Fix broken state_of_the_union.txt link (#7399 ) Thank you for this awesome library. - Description: Fix broken link in documentation - Issue: - https://python.langchain.com/docs/modules/data_connection/retrievers/#get-started - the URL: https://github.com/hwchase17/langchain/blob/master/docs/modules/state_of_the_union.txt - I think the right one is https://github.com/hwchase17/langchain/blob/master/docs/extras/modules/state_of_the_union.txt - Dependencies: - - Tag maintainer: @baskaryan - Twitter handle: -	2023-07-08 11:11:05 -04:00
Delgermurun	e7420789e4	improve description of JinaChat (#7397 ) very small doc string change in the `JinaChat` class.	2023-07-08 10:57:11 -04:00
Bagatur	26c86a197c	bump 228 (#7393 )	2023-07-08 03:05:20 -04:00
SvMax	1d649b127e	Added param to return only a structured json from the get_format_instructions method (#5848 ) I just added a parameter to the method get_format_instructions, to return directly the JSON instructions without the leading instruction sentence. I'm planning to use it to define the structure of a JSON object passed in input, the get_format_instructions(). --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-08 02:57:26 -04:00
Bagatur	362bc301df	fix jina (#7392 )	2023-07-08 02:41:54 -04:00
Delgermurun	a1603fccfb	integrate JinaChat (#6927 ) Integration with https://chat.jina.ai/api. It is OpenAI compatible API. - Twitter handle: [https://twitter.com/JinaAI_](https://twitter.com/JinaAI_) --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-07-08 02:17:04 -04:00
William FH	4ba7396f96	Add single run eval loader (#7390 ) Plus - add evaluation name to make string and embedding validators work with the run evaluator loader. - Rm unused root validator	2023-07-07 23:06:49 -07:00
Roger Yu	633b673b85	Update pinecone.ipynb (#7382 ) Fix typo	2023-07-08 01:48:03 -04:00
Oleg Zabluda	4d697d3f24	Allow passing custom prompts to GraphIndexCreator (#7381 ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-08 01:47:53 -04:00
William FH	612a74eb7e	Make Ref Example Threadsafe (#7383 ) Have noticed transient ref example misalignment. I believe this is caused by the logic of assigning an example within the thread executor rather than before.	2023-07-07 21:50:42 -07:00
William FH	4789c99bc2	Add String Distance and Embedding Evaluators (#7123 ) Add a string evaluator and pairwise string evaluator implementation for: - Embedding distance - String distance Update docs	2023-07-07 21:44:31 -07:00
ljeagle	fb6e63dc36	Upgrade the AwaDB from 0.3.5 to 0.3.6 (#7363 )	2023-07-07 20:41:17 -07:00
William FH	c5edbea34a	Load Run Evaluator (#7101 ) Current problems: 1. Evaluating LLMs or Chat models isn't smooth. Even specifying 'generations' as the output inserts a redundant list into the eval template 2. Configuring input / prediction / reference keys in the `get_qa_evaluator` function is confusing. Unless you are using a chain with the default keys, you have to specify all the variables and need to reason about whether the key corresponds to the traced run's inputs, outputs or the examples inputs or outputs. Proposal: - Configure the run evaluator according to a model. Use the model type and input/output keys to assert compatibility where possible. Only need to specify a reference_key for certain evaluators (which is less confusing than specifying input keys) When does this work: - If you have your langchain model available (assumed always for run_on_dataset flow) - If you are evaluating an LLM, Chat model, or chain - If the LLM or chat models are traced by langchain (wouldn't work if you add an incompatible schema via the REST API) When would this fail: - Currently if you directly create an example from an LLM run, the outputs are generations with all the extra metadata present. A simple `example_key` and dumping all to the template could make the evaluations unreliable - Doesn't help if you're not using the low level API - If you want to instantiate the evaluator without instantiating your chain or LLM (maybe common for monitoring, for instance) -> could also load from run or run type though What's ugly: - Personally think it's better to load evaluators one by one since passing a config down is pretty confusing. - Lots of testing needs to be added - Inconsistent in that it makes a separate run and example input mapper instead of the original `RunEvaluatorInputMapper`, which maps a run and example to a single input. Example usage running the for an LLM, Chat Model, and Agent. ``` # Test running for the string evaluators evaluator_names = ["qa", "criteria"] model = ChatOpenAI() configured_evaluators = load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer") run_on_dataset(ds_name, model, run_evaluators=configured_evaluators) ``` <details> <summary>Full code with dataset upload</summary> ``` ## Create dataset from langchain.evaluation.run_evaluators.loading import load_run_evaluators_for_model from langchain.evaluation import load_dataset import pandas as pd lcds = load_dataset("llm-math") df = pd.DataFrame(lcds) from uuid import uuid4 from langsmith import Client client = Client() ds_name = "llm-math - " + str(uuid4())[0:8] ds = client.upload_dataframe(df, name=ds_name, input_keys=["question"], output_keys=["answer"]) ## Define the models we'll test over from langchain.llms import OpenAI from langchain.chat_models import ChatOpenAI from langchain.agents import initialize_agent, AgentType from langchain.tools import tool llm = OpenAI(temperature=0) chat_model = ChatOpenAI(temperature=0) @tool def sum(a: float, b: float) -> float: """Add two numbers""" return a + b def construct_agent(): return initialize_agent( llm=chat_model, tools=[sum], agent=AgentType.OPENAI_MULTI_FUNCTIONS, ) agent = construct_agent() # Test running for the string evaluators evaluator_names = ["qa", "criteria"] models = [llm, chat_model, agent] run_evaluators = [] for model in models: run_evaluators.append(load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer")) # Run on LLM, Chat Model, and Agent from langchain.client.runner_utils import run_on_dataset to_test = [llm, chat_model, construct_agent] for model, configured_evaluators in zip(to_test, run_evaluators): run_on_dataset(ds_name, model, run_evaluators=configured_evaluators, verbose=True) ``` </details> --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-07-07 19:57:59 -07:00
Bagatur	1ac347b4e3	update databerry-chaindesk redirect (#7378 )	2023-07-07 19:11:46 -04:00
Joshua Carroll	705d2f5b92	Update the API Reference link in Streamlit integration docs (#7377 ) This page: https://python.langchain.com/docs/modules/callbacks/integrations/streamlit Has a bad API Reference link currently. This PR fixes it to the correct link. Also updates the embedded app link to https://langchain-mrkl.streamlit.app/ (better name) which is hosted in langchain-ai/streamlit-agent repo	2023-07-07 17:35:57 -04:00
Georges Petrov	ec033ae277	Rename Databerry to Chaindesk (#7022 ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-07 17:28:04 -04:00
Philip Meier	da5b0723d2	update MosaicML inputs and outputs (#7348 ) As of today (July 7, 2023), the [MosaicML API](https://docs.mosaicml.com/en/latest/inference.html#text-completion-requests) uses `"inputs"` for the prompt This PR adds support for this new format. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-07 17:23:11 -04:00
Bearnardd	184ede4e48	Fix buggy output from GraphQAChain (#7372 ) fixes https://github.com/hwchase17/langchain/issues/7289 A simple fix of the buggy output of `graph_qa`. If we have several entities with triplets then the last entry of `triplets` for a given entity merges with the first entry of the `triplets` of the next entity.	2023-07-07 17:19:53 -04:00
Harrison Chase	7cdf97ba9b	Harrison/add to imports (#7370 ) pgvector cleanup	2023-07-07 16:27:44 -04:00
Bagatur	4d427b2397	Base language model docstrings (#7104 )	2023-07-07 16:09:10 -04:00
ॐ shivam mamgain	2179d4eef8	Fix for KeyError in MlflowCallbackHandler (#7051 ) - Description: `MlflowCallbackHandler` fails with `KeyError: "['name'] not in index"`. See https://github.com/hwchase17/langchain/issues/5770 for more details. Root cause is that LangChain does not pass "name" as a part of `serialized` argument to `on_llm_start()` callback method. The commit where this change was made is probably this: `18af149e91`. My bug fix derives "name" from "id" field. - Issue: https://github.com/hwchase17/langchain/issues/5770 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-07 16:08:06 -04:00
Alex Gamble	df746ad821	Add a callback handler for Context (https://getcontext.ai ) (#7151 ) ### Description Adding a callback handler for Context. Context is a product analytics platform for AI chat experiences to help you understand how users are interacting with your product. I've added the callback library + an example notebook showing its use. ### Dependencies Requires the user to install the `context-python` library. The library is lazily-loaded when the callback is instantiated. ### Announcing the feature We spoke with Harrison a few weeks ago about also doing a blog post announcing our integration, so will coordinate this with him. Our Twitter handle for the company is @getcontextai, and the founders are @_agamble and @HenrySG. Thanks in advance!	2023-07-07 15:33:29 -04:00
Austin	c9a0f24646	Add verbose parameter for llamacpp (#7253 ) Title: Add verbose parameter for llamacpp Description: This pull request adds a 'verbose' parameter to the llamacpp module. The 'verbose' parameter, when set to True, will enable the output of detailed logs during the execution of the Llama model. This added parameter can aid in debugging and understanding the internal processes of the module. The verbose parameter is a boolean that prints verbose output to stderr when set to True. By default, the verbose parameter is set to True but can be toggled off if less output is desired. This new parameter has been added to the `validate_environment` method of the `LlamaCpp` class which initializes the `llama_cpp.Llama` API: ```python class LlamaCpp(LLM): ... @root_validator() def validate_environment(cls, values: Dict) -> Dict: ... model_param_names = [ ... "verbose", # New verbose parameter added ] ... values["client"] = Llama(model_path, **model_params) ... ``` --------- Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>	2023-07-07 15:08:25 -04:00
Kenny	34a2755a54	Allow passing api key into OpenAIWhisperParser (#7281 ) This just allows the user to pass in an api_key directly into OpenAIWhisperParser. Very simple addition.	2023-07-07 15:07:45 -04:00

1 2 3 4 5 ...

3027 Commits