langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-06 03:20:49 +00:00

Author	SHA1	Message	Date
Bagatur	22525bad65	bump 231 (#7584 )	2023-07-12 10:43:12 -04:00
Subsegment	6e1000dc8d	docs : Use more meaningful cnosdb examples (#7587 ) This change makes the ecosystem integrations cnosdb documentation more realistic and easy to understand. - change examples of question and table - modify typo and format	2023-07-12 10:31:55 -04:00
Samuel ROZE	f3c9bf5e4b	fix(typo): Clarify the point of `llm_chain` (#7593 ) Fixes a typo introduced in https://github.com/hwchase17/langchain/pull/7080 by @hwchase17. In the example (visible on [the online documentation](https://api.python.langchain.com/en/latest/chains/langchain.chains.conversational_retrieval.base.ConversationalRetrievalChain.html#langchain-chains-conversational-retrieval-base-conversationalretrievalchain)), the `llm_chain` variable is unused as opposed to being used for the question generator. This change makes it clearer.	2023-07-12 10:31:00 -04:00
Alec Flett	6cdd4b5edc	only add handlers if they are new (#7504 ) When using callbacks, there are times when callbacks can be added redundantly: for instance sometimes you might need to create an llm with specific callbacks, but then also create and agent that uses a chain that has those callbacks already set. This means that "callbacks" might get passed down again to the llm at predict() time, resulting in duplicate calls to the `on_llm_start` callback. For the sake of simplicity, I made it so that langchain never adds an exact handler/callbacks object in `add_handler`, thus avoiding the duplicate handler issue. Tagging @hwchase17 for callback review --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-12 03:48:29 -04:00
ausboss	50316f6477	Adding LLM wrapper for Kobold AI (#7560 ) - Description: add wrapper that lets you use KoboldAI api in langchain - Issue: n/a - Dependencies: none extra, just what exists in lanchain - Tag maintainer: @baskaryan - Twitter handle: @zanzibased --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-12 03:48:12 -04:00
Rohit Kumar Singh	603a0bea29	Fixes incorrect docstore creation in faiss.py (#7026 ) - Description: Current implementation assumes that the length of `texts` and `ids` should be same but if the passed `ids` length is not equal to the passed length of `texts`, current code `dict(zip(index_to_id.values(), documents))` is not failing or giving any warning and silently creating docstores only for the passed `ids` i.e. if `ids = ['A']` and `texts=["I love Open Source","I love langchain"]` then only one `docstore` will be created. But either two docstores should be created assuming same id value for all the elements of `texts` or an error should be raised. - Issue: My change fixes this by using dictionary comprehension instead of `zip`. This was if lengths of `ids` and `texts` mismatches an explicit `IndexError` will be raised. @rlancemartin, @eyurtsev	2023-07-12 03:35:49 -04:00
Tommy Hyeonwoo Kim	3f7213586e	add supported properties for notiondb document loader's metadata (#7570 ) fix #7569 add following properties for Notion DB document loader's metadata - `unique_id` - `status` - `people` @rlancemartin, @eyurtsev (Since this is a change related to `DataLoaders`) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-12 03:34:54 -04:00
Junlin Zhou	5f17c57174	Update chat agents' output parser to extract action by regex (#7511 ) Currently `ChatOutputParser` extracts actions by splitting the text on "```", and then load the second part as a json string. But sometimes the LLM will wrap the action in markdown code block like: ````markdown ```json { "action": "foo", "action_input": "bar" } ``` ```` Splitting text on "```" will cause `OutputParserException` in such case. This PR changes the behaviour to extract the `$JSON_BLOB` by regex, so that it can handle both ` ``` ``` ` and ` ```json ``` ` @hinthornw --------- Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>	2023-07-12 03:12:02 -04:00
Bagatur	ebcb144342	unit test sqlalachemy (#7582 )	2023-07-12 03:03:16 -04:00
Harrison Chase	641fd74baa	Harrison/pg vector move (#7580 )	2023-07-12 02:22:34 -04:00
os1ma	2667ddc686	Fix `make docs_build` and related scripts (#7276 ) Description: a description of the change Fixed `make docs_build` and related scripts which caused errors. There are several changes. First, I made the build of the documentation and the API Reference into two separate commands. This is because it takes less time to build. The commands for documents are `make docs_build`, `make docs_clean`, and `make docs_linkcheck`. The commands for API Reference are `make api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`. It looked like `docs/.local_build.sh` could be used to build the documentation, so I used that. Since `.local_build.sh` was also building API Rerefence internally, I removed that process. `.local_build.sh` also added some Bash options to stop in error or so. Futher more added `cd "${SCRIPT_DIR}"` at the beginning so that the script will work no matter which directory it is executed in. `docs/api_reference/api_reference.rst` is removed, because which is generated by `docs/api_reference/create_api_rst.py`, and added it to .gitignore. Finally, the description of CONTRIBUTING.md was modified. Issue: the issue # it fixes (if applicable) https://github.com/hwchase17/langchain/issues/6413 Dependencies: any dependencies required for this change `nbdoc` was missing in group docs so it was added. I installed it with the `poetry add --group docs nbdoc` command. I am concerned if any modifications are needed to poetry.lock. I would greatly appreciate it if you could pay close attention to this file during the review. Tag maintainer - General / Misc / if you don't know who to tag: @baskaryan If this PR needs any additional changes, I'll be happy to make them! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-11 22:05:14 -04:00
Pharbie	74c28df363	Update Pinecone Upsert method usage (#7358 ) Description: Refactor the upsert method in the Pinecone class to allow for additional keyword arguments. This change adds flexibility and extensibility to the method, allowing for future modifications or enhancements. The upsert method now accepts the `**kwargs` parameter, which can be used to pass any additional arguments to the Pinecone index. This change has been made in both the `upsert` method in the `Pinecone` class and the `upsert` method in the `similarity_search_with_score` class method. Falls in line with the usage of the upsert method in [Pinecone-Python-Client](`4640c4cf27/pinecone/index.py (L73)`) Issue: [This feature request in Pinecone Repo](https://github.com/pinecone-io/pinecone-python-client/issues/184) Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - Memory: @hwchase17 --------- Co-authored-by: kwesi <22204443+yankskwesi@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com>	2023-07-11 21:14:42 -04:00
Kazuki Maeda	5c3fe8b0d1	Enhance Makefile with 'format_diff' Option and Improved Readability (#7394 ) ### Description: This PR introduces a new option format_diff to the existing Makefile. This option allows us to apply the formatting tools (Black and isort) only to the changed Python and ipynb files since the last commit. This will make our development process more efficient as we only format the codes that we modify. Along with this change, comments were added to make the Makefile more understandable and maintainable. ### Issue: N/A ### Dependencies: Add dependency to black. ### Tag maintainer: @baskaryan ### Twitter handle: [kzk_maeda](https://twitter.com/kzk_maeda) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-11 21:03:17 -04:00
Bagatur	2babe3069f	Revert pinecone v4 support (#7566 ) Revert `9d13dcd`	2023-07-11 20:58:59 -04:00
schop-rob	e811c5e8c6	Add OpenAI organization ID to docs (#7398 ) Description: I added an example of how to reference the OpenAI API Organization ID, because I couldn't find it before. In the example, it is mentioned how to achieve this using environment variables as well as parameters for the OpenAI()-class Issue: - Dependencies: - Twitter @schop-rob	2023-07-11 20:51:58 -04:00
Kenny	8741e55e7c	Template formats documentation (#7404 ) Simple addition to the documentation, adding the correct import statement & showcasing using Python FStrings.	2023-07-11 18:24:24 -04:00
Fielding Johnston	00c466627a	minor bug fix: properly await AsyncRunManager's method call in MulitRouteChain (#7487 ) This simply awaits `AsyncRunManager`'s method call in `MulitRouteChain`. Noticed this while playing around with Langchain's implementation of `MultiPromptChain`. @baskaryan cheers --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-11 18:18:47 -04:00
tonomura	cc0585af42	Improvement/add finish reason to generation info in chat open ai (#7478 ) Description: ChatOpenAI model does not return finish_reason in generation_info. Issue: #2702 Dependencies: None Tag maintainer: @baskaryan Thank you --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-11 18:12:57 -04:00
Junlin Zhou	b96ac13f3d	Minor update to reference other sql tool by tool names instead of hard coded string. (#7514 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> Currently there are 4 tools in SQL agent-toolkits, and 2 of them have reference to the other 2. This PR change the reference from hard coded string to `{tool.name}` Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>	2023-07-11 17:44:23 -04:00
OwenElliott	9cb2347453	Fix broken link from Marqo Ecosystem (#7510 ) Small fix to a link from the Marqo page in the ecosystem. The link was not updated correctly when the documentation structure changed to html pages instead of links to notebooks.	2023-07-11 17:15:15 -04:00
Matt Robinson	c4d53f98dc	docs: update unstructured docstrings (#7561 ) ### Summary Updates the docstrings in the Unstructured document loaders to display more useful information on the integrations page.	2023-07-11 17:12:05 -04:00
Ben Auffarth	2c2f0e15a6	clarify about api key (#7540 ) I found it unclear, where to get the API keys for JinaChat. Mentioning this in the docstring should be helpful. #7490 Twitter handle: benji1a @delgermurun	2023-07-11 16:46:06 -04:00
Jona Sassenhagen	0ea7224535	[Minor] Remove tagger from spacy sentencizer (#7534 ) @svlandeg gave me a tip for how to improve a bit on https://github.com/hwchase17/langchain/pull/7442 for some extra speed and memory gains. The tagger isn't needed for sentencization, so can be disabled too.	2023-07-11 16:43:46 -04:00
Kacper Łukawski	1f83b5f47e	Reuse the existing collection if configured properly in Qdrant.from_texts (#7530 ) This PR changes the behavior of `Qdrant.from_texts` so the collection is reused if not requested to recreate it. Previously, calling `Qdrant.from_texts` or `Qdrant.from_documents` resulted in removing the old data which was confusing for many.	2023-07-11 16:24:35 -04:00
Leonid Kuligin	6674b33cf5	Added support for chat_history (#7555 ) #7469 Co-authored-by: Leonid Kuligin <kuligin@google.com>	2023-07-11 15:27:26 -04:00
Felix Brockmeier	406a9dc11f	Add notebook example for Lemon AI NLP Workflow Automation (#7556 ) - Description: Added notebook to LangChain docs that explains how to use Lemon AI NLP Workflow Automation tool with Langchain - Issue: not applicable - Dependencies: not applicable - Tag maintainer: @agola11 - Twitter handle: felixbrockm	2023-07-11 15:15:11 -04:00
Lance Martin	9e067b8cc9	Add env setup (#7550 ) Include setup	2023-07-11 09:48:40 -07:00
Bagatur	3c4338470e	bump 230 (#7544 )	2023-07-11 11:24:08 -04:00
Bagatur	d2137eea9f	fix cpal docs (#7545 )	2023-07-11 11:07:45 -04:00
Boris	9129318466	CPAL (#6255 ) # Causal program-aided language (CPAL) chain ## Motivation This builds on the recent [PAL](https://arxiv.org/abs/2211.10435) to stop LLM hallucination. The problem with the [PAL](https://arxiv.org/abs/2211.10435) approach is that it hallucinates on a math problem with a nested chain of dependence. The innovation here is that this new CPAL approach includes causal structure to fix hallucination. For example, using the below word problem, PAL answers with 5, and CPAL answers with 13. "Tim buys the same number of pets as Cindy and Boris." "Cindy buys the same number of pets as Bill plus Bob." "Boris buys the same number of pets as Ben plus Beth." "Bill buys the same number of pets as Obama." "Bob buys the same number of pets as Obama." "Ben buys the same number of pets as Obama." "Beth buys the same number of pets as Obama." "If Obama buys one pet, how many pets total does everyone buy?" The CPAL chain represents the causal structure of the above narrative as a causal graph or DAG, which it can also plot, as shown below. ![complex-graph](https://github.com/hwchase17/langchain/assets/367522/d938db15-f941-493d-8605-536ad530f576) . The two major sections below are: 1. Technical overview 2. Future application Also see [this jupyter notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb) doc. ## 1. Technical overview ### CPAL versus PAL Like [PAL](https://arxiv.org/abs/2211.10435), CPAL intends to reduce large language model (LLM) hallucination. The CPAL chain is different from the PAL chain for a couple of reasons. * CPAL adds a causal structure (or DAG) to link entity actions (or math expressions). * The CPAL math expressions are modeling a chain of cause and effect relations, which can be intervened upon, whereas for the PAL chain math expressions are projected math identities. PAL's generated python code is wrong. It hallucinates when complexity increases. ```python def solution(): """Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?""" obama_pets = 1 tim_pets = obama_pets cindy_pets = obama_pets + obama_pets boris_pets = obama_pets + obama_pets total_pets = tim_pets + cindy_pets + boris_pets result = total_pets return result # math result is 5 ``` CPAL's generated python code is correct. ```python story outcome data name code value depends_on 0 obama pass 1.0 [] 1 bill bill.value = obama.value 1.0 [obama] 2 bob bob.value = obama.value 1.0 [obama] 3 ben ben.value = obama.value 1.0 [obama] 4 beth beth.value = obama.value 1.0 [obama] 5 cindy cindy.value = bill.value + bob.value 2.0 [bill, bob] 6 boris boris.value = ben.value + beth.value 2.0 [ben, beth] 7 tim tim.value = cindy.value + boris.value 4.0 [cindy, boris] query data { "question": "how many pets total does everyone buy?", "expression": "SELECT SUM(value) FROM df", "llm_error_msg": "" } # query result is 13 ``` Based on the comments below, CPAL's intended location in the library is `experimental/chains/cpal` and PAL's location is`chains/pal`. ### CPAL vs Graph QA Both the CPAL chain and the Graph QA chain extract entity-action-entity relations into a DAG. The CPAL chain is different from the Graph QA chain for a few reasons. * Graph QA does not connect entities to math expressions * Graph QA does not associate actions in a sequence of dependence. * Graph QA does not decompose the narrative into these three parts: 1. Story plot or causal model 4. Hypothetical question 5. Hypothetical condition ### Evaluation Preliminary evaluation on simple math word problems shows that this CPAL chain generates less hallucination than the PAL chain on answering questions about a causal narrative. Two examples are in [this jupyter notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb) doc. ## 2. Future application ### "Describe as Narrative, Test as Code" The thesis here is that the Describe as Narrative, Test as Code approach allows you to represent a causal mental model both as code and as a narrative, giving you the best of both worlds. #### Why describe a causal mental mode as a narrative? The narrative form is quick. At a consensus building meeting, people use narratives to persuade others of their causal mental model, aka. plan. You can share, version control and index a narrative. #### Why test a causal mental model as a code? Code is testable, complex narratives are not. Though fast, narratives are problematic as their complexity increases. The problem is LLMs and humans are prone to hallucination when predicting the outcomes of a narrative. The cost of building a consensus around the validity of a narrative outcome grows as its narrative complexity increases. Code does not require tribal knowledge or social power to validate. Code is composable, complex narratives are not. The answer of one CPAL chain can be the hypothetical conditions of another CPAL Chain. For stochastic simulations, a composable plan can be integrated with the [DoWhy library](https://github.com/py-why/dowhy). Lastly, for the futuristic folk, a composable plan as code allows ordinary community folk to design a plan that can be integrated with a blockchain for funding. An explanation of a dependency planning application is [here.](https://github.com/borisdev/cpal-llm-chain-demo) --- Twitter handle: @boris_dev --------- Co-authored-by: Boris Dev <borisdev@Boriss-MacBook-Air.local>	2023-07-11 10:11:21 -04:00
Alejandra De Luna	2e4047e5e7	feat: support generate as an early stopping method for `OpenAIFunctionsAgent` (#7229 ) This PR proposes an implementation to support `generate` as an `early_stopping_method` for the new `OpenAIFunctionsAgent` class. The motivation behind is to facilitate the user to set a maximum number of actions the agent can take with `max_iterations` and force a final response with this new agent (as with the `Agent` class). The following changes were made: - The `OpenAIFunctionsAgent.return_stopped_response` method was overwritten to support `generate` as an `early_stopping_method` - A boolean `with_functions` parameter was added to the `OpenAIFunctionsAgent.plan` method This way the `OpenAIFunctionsAgent.return_stopped_response` method can call the `OpenAIFunctionsAgent.plan` method with `with_function=False` when the `early_stopping_method` is set to `generate`, making a call to the LLM with no functions and forcing a final response from the `"assistant"`. - Relevant maintainer: @hinthornw - Twitter handle: @aledelunap --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-11 09:25:02 -04:00
Hashem Alsaket	1dd4236177	Fix HF endpoint returns blank for text-generation (#7386 ) Description: Current `_call` function in the `langchain.llms.HuggingFaceEndpoint` class truncates response when `task=text-generation`. Same error discussed a few days ago on Hugging Face: https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/51 Issue: Fixes #7353 Tag maintainer: @hwchase17 @baskaryan @hinthornw --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-11 03:06:05 -04:00
Lance Martin	4a94f56258	Minor edits to QA docs (#7507 ) Small clean-ups	2023-07-10 22:15:05 -07:00
Raymond Yuan	5171c3bcca	Refactor vector storage to correctly handle relevancy scores (#6570 ) Description: This pull request aims to support generating the correct generic relevancy scores for different vector stores by refactoring the relevance score functions and their selection in the base class and subclasses of VectorStore. This is especially relevant with VectorStores that require a distance metric upon initialization. Note many of the current implenetations of `_similarity_search_with_relevance_scores` are not technically correct, as they just return `self.similarity_search_with_score(query, k, **kwargs)` without applying the relevant score function Also includes changes associated with: https://github.com/hwchase17/langchain/pull/6564 and https://github.com/hwchase17/langchain/pull/6494 See more indepth discussion in thread in #6494 Issue: https://github.com/hwchase17/langchain/issues/6526 https://github.com/hwchase17/langchain/issues/6481 https://github.com/hwchase17/langchain/issues/6346 Dependencies: None The changes include: - Properly handling score thresholding in FAISS `similarity_search_with_score_by_vector` for the corresponding distance metric. - Refactoring the `_similarity_search_with_relevance_scores` method in the base class and removing it from the subclasses for incorrectly implemented subclasses. - Adding a `_select_relevance_score_fn` method in the base class and implementing it in the subclasses to select the appropriate relevance score function based on the distance strategy. - Updating the `__init__` methods of the subclasses to set the `relevance_score_fn` attribute. - Removing the `_default_relevance_score_fn` function from the FAISS class and using the base class's `_euclidean_relevance_score_fn` instead. - Adding the `DistanceStrategy` enum to the `utils.py` file and updating the imports in the vector store classes. - Updating the tests to import the `DistanceStrategy` enum from the `utils.py` file. --------- Co-authored-by: Hanit <37485638+hanit-com@users.noreply.github.com>	2023-07-10 20:37:03 -07:00
Lance Martin	bd0c6381f5	Minor update to clarify map-reduce custom prompt usage (#7453 ) Update docs for map-reduce custom prompt usage	2023-07-10 16:43:44 -07:00
Lance Martin	28d2b213a4	Update landing page for "question answering over documents" (#7152 ) Improve documentation for a central use-case, qa / chat over documents. This will be merged as an update to `index.mdx` [here](https://python.langchain.com/docs/use_cases/question_answering/). Testing w/ local Docusaurus server: ``` From `docs` directory: mkdir _dist cp -r {docs_skeleton,snippets} _dist cp -r extras/* _dist/docs_skeleton/docs cd _dist/docs_skeleton yarn install yarn start ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-10 14:15:13 -07:00
William FH	dd648183fa	Rm create_project line (#7486 ) not needed	2023-07-10 10:49:55 -07:00
Leonid Ganeline	5eec74d9a5	docstrings `document_loaders` 3 (#6937 ) - Updated docstrings for `document_loaders` - Mass update `"""Loader that loads` to `"""Loads` @baskaryan - please, review	2023-07-10 08:56:53 -07:00
Stanko Kuveljic	9d13dcd17c	Pinecone: Add V4 support (#7473 )	2023-07-10 08:39:47 -07:00
Adilkhan Sarsen	5debd5043e	Added deeplake use case examples of the new features (#6528 ) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @hwchase17 VectorStores / Retrievers / Memory - @dev2049 --> 1. Added use cases of the new features 2. Done some code refactoring --------- Co-authored-by: Ivo Stranic <istranic@gmail.com>	2023-07-10 07:04:29 -07:00
Bagatur	9b615022e2	bump 229 (#7467 )	2023-07-10 04:38:55 -04:00
Kazuki Maeda	92b4418c8c	Datadog logs loader (#7356 ) ### Description Created a Loader to get a list of specific logs from Datadog Logs. ### Dependencies `datadog_api_client` is required. ### Twitter handle [kzk_maeda](https://twitter.com/kzk_maeda) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-10 04:27:55 -04:00
Yifei Song	7d29bb2c02	Add Xorbits Dataframe as a Document Loader (#7319 ) - [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source computing framework that makes it easy to scale data science and machine learning workloads in parallel. Xorbits can leverage multi cores or GPUs to accelerate computation on a single machine, or scale out up to thousands of machines to support processing terabytes of data. - This PR added support for the Xorbits document loader, which allows langchain to leverage Xorbits to parallelize and distribute the loading of data. - Dependencies: This change requires the Xorbits library to be installed in order to be used. `pip install xorbits` - Request for review: @rlancemartin, @eyurtsev - Twitter handle: https://twitter.com/Xorbitsio Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-07-10 04:24:47 -04:00
Sergio Moreno	21a353e9c2	feat: ctransformers support async chain (#6859 ) - Description: Adding async method for CTransformers - Issue: I've found impossible without this code to run Websockets inside a FastAPI micro service and a CTransformers model. - Tag maintainer: Not necessary yet, I don't like to mention directly - Twitter handle: @_semoal	2023-07-10 04:23:41 -04:00
Paul-Emile Brotons	d2cf0d16b3	adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310 ) Adding a maximal_marginal_relevance method to the MongoDBAtlasVectorSearch vectorstore enhances the user experience by providing more diverse search results Issue: #7304	2023-07-10 04:04:19 -04:00
Bagatur	04cddfba0d	Add lark import error (#7465 )	2023-07-10 03:21:23 -04:00
Matt Robinson	bcab894f4e	feat: Add `UnstructuredTSVLoader` (#7367 ) ### Summary Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc strings for `UnstructuredCSV` and `UnstructuredExcel` loaders. ### Testing ```python from langchain.document_loaders.tsv import UnstructuredTSVLoader loader = UnstructuredTSVLoader( file_path="example_data/mlb_teams_2012.csv", mode="elements" ) docs = loader.load() ```	2023-07-10 03:07:10 -04:00
Ronald Li	490f4a9ff0	Fixes KeyError in AmazonKendraRetriever initializer (#7464 ) ### Description argument variable client is marked as required in commit `81e5b1ad36` which breaks the default way of initialization providing only index_id. This commit avoid KeyError exception when it is initialized without a client variable ### Dependencies no dependency required	2023-07-10 03:02:36 -04:00
Jona Sassenhagen	7ffc431b3a	Add spacy sentencizer (#7442 ) `SpacyTextSplitter` currently uses spacy's statistics-based `en_core_web_sm` model for sentence splitting. This is a good splitter, but it's also pretty slow, and in this case it's doing a lot of work that's not needed given that the spacy parse is then just thrown away. However, there is also a simple rules-based spacy sentencizer. Using this is at least an order of magnitude faster than using `en_core_web_sm` according to my local tests. Also, spacy sentence tokenization based on `en_core_web_sm` can be sped up in this case by not doing the NER stage. This shaves some cycles too, both when loading the model and when parsing the text. Consequently, this PR adds the option to use the basic spacy sentencizer, and it disables the NER stage for the current approach, which is kept as the default. Lastly, when extracting the tokenized sentences, the `text` attribute is called directly instead of doing the string conversion, which is IMO a bit more idiomatic.	2023-07-10 02:52:05 -04:00
charosen	50a9fcccb0	feat(module): add param ids to ElasticVectorSearch.from_texts method (#7425 ) # add param ids to ElasticVectorSearch.from_texts method. - Description: add param ids to ElasticVectorSearch.from_texts method. - Issue: NA. It seems `add_texts` already supports passing in document ids, but param `ids` is omitted in `from_texts` classmethod, - Dependencies: None, - Tag maintainer: @rlancemartin, @eyurtsev please have a look, thanks ``` # ElasticVectorSearch add_texts def add_texts( self, texts: Iterable[str], metadatas: Optional[List[dict]] = None, refresh_indices: bool = True, ids: Optional[List[str]] = None, kwargs: Any, ) -> List[str]: ... ``` ``` # ElasticVectorSearch from_texts @classmethod def from_texts( cls, texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, elasticsearch_url: Optional[str] = None, index_name: Optional[str] = None, refresh_indices: bool = True, kwargs: Any, ) -> ElasticVectorSearch: ``` Co-authored-by: charosen <charosen@bupt.cn>	2023-07-10 02:25:35 -04:00

... 3 4 5 6 7 ...

3259 Commits