langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-11 19:11:02 +00:00

Author	SHA1	Message	Date
Ricardo Reis	33ea606f45	Update youtube.py - Fix metadata validation error in YoutubeLoader (#5479 ) This commit addresses a ValueError occurring when the YoutubeLoader class tries to add datetime metadata from a YouTube video's publish date. The error was happening because the ChromaDB metadata validation only accepts str, int, or float data types. In the `_get_video_info` method of the `YoutubeLoader` class, the publish date retrieved from the YouTube video was of datetime type. This commit fixes the issue by converting the datetime object to a string before adding it to the metadata dictionary. Additionally, this commit introduces error handling in the `_get_video_info` method to ensure that all metadata fields have valid values. If a metadata field is found to be None, a default value is assigned. This prevents potential errors during metadata validation when metadata fields are None. The file modified in this commit is youtube.py. # Your PR Title (What it does) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-03 16:56:17 -07:00
Shuqian	5af2c51e78	refactor: BaseStringMessagePromptTemplate from_template method (#5332 ) # refactor BaseStringMessagePromptTemplate from_template method Refactor the `from_template` method of the `BaseStringMessagePromptTemplate` class to allow passing keyword arguments to the `from_template` method of `PromptTemplate`. Enable the usage of arguments like `template_format`. In my scenario, I intend to utilize Jinja2 for formatting the human message prompt in the chat template. ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Models - @hwchase17 - @agola11 - @jonasalexander --> --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-03 16:55:58 -07:00
mbchang	d3bdb8ea6d	FileCallbackHandler (#5589 ) # like [StdoutCallbackHandler](https://github.com/hwchase17/langchain/blob/master/langchain/callbacks/stdout.py), but writes to a file When running experiments I have found myself wanting to log the outputs of my chains in a more lightweight way than using WandB tracing. This PR contributes a callback handler that writes to file what `StdoutCallbackHandler` would print. <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> ## Example Notebook <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> See the included `filecallbackhandler.ipynb` notebook for usage. Would it be better to include this notebook under `modules/callbacks` or under `integrations/`? ![image](https://github.com/hwchase17/langchain/assets/6439365/c624de0e-343f-4eab-a55b-8808a887489f) ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @agola11 <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 -->	2023-06-03 16:48:48 -07:00
rajib	1c51d3db0f	Created fix for 5475 (#5659 ) Created fix for 5475 Currently in PGvector, we do not have any function that returns the instance of an existing store. The from_documents always adds embeddings and then returns the store. This fix is to add a function that will return the instance of an existing store Also changed the jupyter example for PGVector to show the example of using the function <!-- Remove if not applicable --> Fixes # 5475 #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? @dev2049 @hwchase17 Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: rajib76 <rajib76@yahoo.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-03 16:47:52 -07:00
Michael Landis	475007d63a	fix: correct momento chat history notebook typo and title (#5646 ) This PR corrects a minor typo in the Momento chat message history notebook and also expands the title from "Momento" to "Momento Chat History", inline with other chat history storage providers. #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? cc @dev2049 who reviewed the original integration	2023-06-03 16:39:27 -07:00
Paul-Emile Brotons	92f218207b	removing client+namespace in favor of collection (#5610 ) removing client+namespace in favor of collection for an easier instantiation and to be similar to the typescript library @dev2049	2023-06-03 16:27:31 -07:00
Harrison Chase	ad09367a92	Harrison/pubmed integration (#5664 ) Co-authored-by: younis basher <71520361+younis-ba@users.noreply.github.com> Co-authored-by: Younis Bashir <younis@omicmd.com>	2023-06-03 16:25:28 -07:00
Harrison Chase	9921f8cc3a	Harrison/update azure nb (#5665 ) Co-authored-by: NEWTON MALLICK <38786893+N-E-W-T-O-N@users.noreply.github.com>	2023-06-03 16:25:08 -07:00
C.J. Jameson	4e71a1702b	nit: pgvector python example notebook, fix variable reference (#5595 ) # Your PR Title (What it does) Fixes the pgvector python example notebook : one of the variables was not referencing anything ## Before submitting ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: VectorStores / Retrievers / Memory - @dev2049	2023-06-03 15:29:34 -07:00
Leonid Ganeline	b201cfaa0f	docs `ecosystem/integrations` update 4 (#5590 ) # docs `ecosystem/integrations` update 4 Added missed integrations. Fixed inconsistencies. ## Who can review? @hwchase17 @dev2049	2023-06-03 15:29:03 -07:00
Davis Chase	ae3611730a	handle single arg to and/or (#5637 ) @ryderwishart @eyurtsev thoughts on handling this in the parser itself? related to #5570	2023-06-03 15:18:46 -07:00
khallbobo	934319fc28	Add parameters to send_message() call for vertexai chat models (PaLM2) (#5566 ) # Ensure parameters are used by vertexai chat models (PaLM2) The current version of the google aiplatform contains a bug where parameters for a chat model are not used as intended. See https://github.com/googleapis/python-aiplatform/issues/2263 Params can be passed both to start_chat() and send_message(); however, the parameters passed to start_chat() will not be used if send_message() is called without the overrides. This is due to the defaults in send_message() being global values rather than None (there is code in send_message() which would use the params from start_chat() if the param passed to send_message() evaluates to False, but that won't happen as the defaults are global values). Fixes # 5531 @hwchase17 @agola11	2023-06-03 15:17:38 -07:00
UmerHA	44ad9628c9	QuickFix for FinalStreamingStdOutCallbackHandler: Ignore new lines & white spaces (#5497 ) # Make FinalStreamingStdOutCallbackHandler more robust by ignoring new lines & white spaces `FinalStreamingStdOutCallbackHandler` doesn't work out of the box with `ChatOpenAI`, as it tokenized slightly differently than `OpenAI`. The response of `OpenAI` contains the tokens `["\nFinal", " Answer", ":"]` while `ChatOpenAI` contains `["Final", " Answer", ":"]`. This PR make `FinalStreamingStdOutCallbackHandler` more robust by ignoring new lines & white spaces when determining if the answer prefix has been reached. Fixes #5433 ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: Tracing / Callbacks - @agola11 Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) \| Discord: RicChilligerDude#7589	2023-06-03 15:05:58 -07:00
Nathan Azrak	1f4abb265a	Adds the option to pass the original prompt into the AgentExecutor for PlanAndExecute agents (#5401 ) # Adds the option to pass the original prompt into the AgentExecutor for PlanAndExecute agents This PR allows the user to optionally specify that they wish for the original prompt/objective to be passed into the Executor agent used by the PlanAndExecute agent. This solves a potential problem where the plan is formed referring to some context contained in the original prompt, but which is not included in the current prompt. Currently, the prompt format given to the Executor is: ``` System: Respond to the human as helpfully and accurately as possible. You have access to the following tools: <Tool and Action Description> <Output Format Description> Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation:. Thought: Human: <Previous steps> <Current step> ``` This PR changes the final part after `Human:` to optionally insert the objective: ``` Human: <objective> <Previous steps> <Current step> ``` I have given a specific example in #5400 where the context of a database path is lost, since the plan refers to the "given path". The PR has been linted and formatted. So that existing behaviour is not changed, I have defaulted the argument to `False` and added it as the last argument in the signature, so it does not cause issues for any users passing args positionally as opposed to using keywords. Happy to take any feedback or make required changes! Fixes #5400 ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @vowelparrot --------- Co-authored-by: Nathan Azrak <nathan.azrak@gmail.com>	2023-06-03 14:59:09 -07:00
Felipe Ferreira	ae2cf1f598	Implements support for Personal Access Token Authentication in the ConfluenceLoader (#5385 ) # Implements support for Personal Access Token Authentication in the ConfluenceLoader Fixes #5191 Implements a new optional parameter for the ConfluenceLoader: `token`. This allows the use of personal access authentication when using the on-prem server version of Confluence. ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @eyurtsev @Jflick58 Twitter Handle: felipe_yyc --------- Co-authored-by: Felipe <feferreira@ea.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-03 14:57:49 -07:00
Gardner Bickford	b81f98b8a6	Update confluence.py to return spaces between elements (#5383 ) # Update confluence.py to return spaces between elements like headers and links. Please see https://stackoverflow.com/questions/48913975/how-to-return-nicely-formatted-text-in-beautifulsoup4-when-html-text-is-across-m Given: ```html <address> 183 Main St<br>East Copper<br>Massachusetts<br>U S A<br> MA 01516-113 </address> ``` The document loader currently returns: ``` '183 Main StEast CopperMassachusettsU S A MA 01516-113' ``` After this change, the document loader will return: ``` 183 Main St East Copper Massachusetts U S A MA 01516-113 ``` @eyurtsev would you prefer this to be an option that can be passed in?	2023-06-03 14:57:25 -07:00
Zeeland	b72401b47b	pref: reduce DB query error rate (#5339 ) # Reduce DB query error rate If you use sql agent of `SQLDatabaseToolkit` to query data, it is prone to errors in query fields and often uses fields that do not exist in database tables for queries. However, the existing prompt does not effectively make the agent aware that there are problems with the fields they query. At this time, we urgently need to improve the prompt so that the agent realizes that they have queried non-existent fields and allows them to use the `schema_sql_db`, that is,` ListSQLDatabaseTool` first queries the corresponding fields in the table in the database, and then uses `QuerySQLDatabaseTool` for querying. There is a demo of my project to show this problem. Original Agent ```python def create_mysql_kit(): db = SQLDatabase.from_uri("mysql+pymysql://xxxxxxx") llm = OpenAI(temperature=0) toolkit = SQLDatabaseToolkit(db=db, llm=llm) agent_executor = create_sql_agent( llm=OpenAI(temperature=0), toolkit=toolkit, verbose=True ) agent_executor.run("Who are the users of sysuser in this system? Tell me the username of all users") if __name__ == '__main__': create_mysql_kit() ``` original output ``` > Entering new AgentExecutor chain... Action: list_tables_sql_db Action Input: "" Observation: app_sysrole_menus, app_bimfacemodel, app_project_users, app_measuringpointdata, auth_user, auth_user_groups, django_apscheduler_djangojobexecution, app_project, app_elementpoint, django_apscheduler_djangojob, django_content_type, app_sysrole, django_admin_log, app_bimfaceaccount, app_measuringpoint_warning_thresholds, app_measuringpoint, app_company, auth_group_permissions, app_sysuser, app_sysuser_companies, app_sysmenu, app_datawarningthreshold, auth_group, auth_permission, app_datawarningrecord, auth_user_user_permissions, app_bimfaceaccount_bimface_models, django_migrations, app_measuringitem, django_session Thought: I should query the app_sysuser table to get the users in the system. Action: query_sql_db Action Input: SELECT username FROM app_sysuser LIMIT 10; Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'username' in 'field list'") [SQL: SELECT username FROM app_sysuser LIMIT 10;] (Background on this error at: https://sqlalche.me/e/20/e3q8) Thought: I should double check my query before executing it. Action: query_checker_sql_db Action Input: SELECT username FROM app_sysuser LIMIT 10; Observation: SELECT username FROM app_sysuser LIMIT 10; Thought: The query looks correct, so I should execute it. Action: query_sql_db Action Input: SELECT username FROM app_sysuser LIMIT 10; Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'username' in 'field list'") [SQL: SELECT username FROM app_sysuser LIMIT 10;] (Background on this error at: https://sqlalche.me/e/20/e3q8) Thought: I should double check my query before executing it. Action: query_checker_sql_db Action Input: SELECT name FROM app_sysuser LIMIT 10; Observation: SELECT name FROM app_sysuser LIMIT 10; Thought: The query looks correct, so I should execute it. Action: query_sql_db Action Input: SELECT name FROM app_sysuser LIMIT 10; Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'name' in 'field list'") [SQL: SELECT name FROM app_sysuser LIMIT 10;] (Background on this error at: https://sqlalche.me/e/20/e3q8) Thought: I should double check my query before executing it. Action: query_checker_sql_db Action Input: SELECT id, username FROM app_sysuser LIMIT 10; Observation: SELECT id, username FROM app_sysuser LIMIT 10; Thought: The query looks correct, so I should execute it. Action: query_sql_db Action Input: SELECT id, username FROM app_sysuser LIMIT 10; Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'id' in 'field list'") [SQL: SELECT id, username FROM app_sysuser LIMIT 10;] (Background on this error at: https://sqlalche.me/e/20/e3q8) Thought: I should double check my query before executing it. Action: query_checker_sql_db Action Input: SELECT user_id, username FROM app_sysuser LIMIT 10; Observation: SELECT user_id, username FROM app_sysuser WHERE user_id IS NOT NULL AND user_id BETWEEN 0 AND 10 UNION ALL SELECT user_id, username FROM app_sysuser WHERE username IS NOT NULL ORDER BY user_id ASC LIMIT 10; Thought: The query looks correct, so I should execute it. Action: query_sql_db Action Input: SELECT user_id, username FROM app_sysuser WHERE user_id IS NOT NULL AND user_id BETWEEN 0 AND 10 UNION ALL SELECT user_id, username FROM app_sysuser WHERE username IS NOT NULL ORDER BY user_id ASC LIMIT 10; Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'username' in 'field list'") [SQL: SELECT user_id, username FROM app_sysuser WHERE user_id IS NOT NULL AND user_id BETWEEN 0 AND 10 UNION ALL SELECT user_id, username FROM app_sysuser WHERE username IS NOT NULL ORDER BY user_id ASC LIMIT 10;] (Background on this error at: https://sqlalche.me/e/20/e3q8) Thought: I should double check my query before executing it. Action: query_checker_sql_db Action Input: SELECT user_id, user_name FROM app_sysuser LIMIT 10; Observation: SELECT user_id, user_name FROM app_sysuser WHERE user_id NOT IN (NULL) AND user_name BETWEEN 'A' AND 'Z' UNION ALL SELECT user_id, user_name FROM app_sysuser WHERE user_id IS NOT NULL AND user_name LIKE '%a%' LIMIT 10; Thought: The query looks correct, so I should execute it. Action: query_sql_db Action Input: SELECT user_id, user_name FROM app_sysuser WHERE user_id NOT IN (NULL) AND user_name BETWEEN 'A' AND 'Z' UNION ALL SELECT user_id, user_name FROM app_sysuser WHERE user_id IS NOT NULL AND user_name LIKE '%a%' LIMIT 10; Observation: [] Thought: I now know the final answer. Final Answer: There are no users in the sysuser table. > Finished chain. Process finished with exit code 0 ``` As you can see, agent always use incorrect sql to query the data. Actually, the username field name of my `app_sysuser` table is `user_name`, not the `username` believed by the agent. Optimized agent output ``` > Entering new AgentExecutor chain... Action: list_tables_sql_db Action Input: "" Observation: app_project_users, django_content_type, app_measuringpoint_warning_thresholds, app_bimfaceaccount_bimface_models, django_session, app_sysuser, django_migrations, app_bimfacemodel, app_elementpoint, app_measuringpoint, django_apscheduler_djangojobexecution, auth_permission, app_sysuser_companies, app_measuringpointdata, app_measuringitem, app_sysrole_menus, django_admin_log, auth_group, django_apscheduler_djangojob, app_sysmenu, app_project, app_bimfaceaccount, app_datawarningthreshold, app_datawarningrecord, auth_user_groups, auth_user_user_permissions, auth_group_permissions, auth_user, app_company, app_sysrole Thought: I should query the app_sysuser table to get the usernames of all the users. Action: query_sql_db Action Input: SELECT username FROM app_sysuser Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'username' in 'field list'") [SQL: SELECT username FROM app_sysuser] (Background on this error at: https://sqlalche.me/e/20/e3q8) Thought: I should use the schema_sql_db tool to get the correct table fields. Action: schema_sql_db Action Input: app_sysuser Observation: CREATE TABLE app_sysuser ( user_id INTEGER NOT NULL AUTO_INCREMENT, user_account VARCHAR(50) COLLATE utf8mb4_general_ci NOT NULL, user_password VARCHAR(255) COLLATE utf8mb4_general_ci NOT NULL, user_name VARCHAR(50) COLLATE utf8mb4_general_ci NOT NULL, user_avatar VARCHAR(255) COLLATE utf8mb4_general_ci, user_email VARCHAR(254) COLLATE utf8mb4_general_ci, user_phone VARCHAR(20) COLLATE utf8mb4_general_ci, user_create_time DATETIME(6) NOT NULL, user_update_time DATETIME(6) NOT NULL, user_status SMALLINT, role_id_id INTEGER, PRIMARY KEY (user_id), CONSTRAINT app_sysuser_role_id_id_51b50f39_fk_app_sysrole_role_id FOREIGN KEY(role_id_id) REFERENCES app_sysrole (role_id) )COLLATE utf8mb4_general_ci DEFAULT CHARSET=utf8mb4 ENGINE=InnoDB /* 3 rows from app_sysuser table: user_id user_account user_password user_name user_avatar user_email user_phone user_create_time user_update_time user_status role_id_id xxxxxxxxxxxxxx */ Thought: I should query the app_sysuser table to get the usernames of all the users. Action: query_sql_db Action Input: SELECT user_account FROM app_sysuser LIMIT 10 Observation: [('baiyun',), ('eatrice',), ('lisi',), ('pingxiang',), ('wangwu',), ('zeeland',), ('zsj',), ('zzw',)] Thought: I now know the final answer Final Answer: The usernames of the users in the sysuser table are baiyun, eatrice, lisi, pingxiang, wangwu, zeeland, zsj, and zzw. > Finished chain. Process finished with exit code 0 ``` I have tested about 10 related prompts and they all work properly, with a much lower error rate compared to before ## Who can review? @vowelparrot --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-03 14:55:56 -07:00
mbchang	ce6dbe41a9	minor refactor GenerativeAgentMemory (#5315 ) # minor refactor of GenerativeAgentMemory <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. --> <!-- Remove if not applicable --> - refactor `format_memories_detail` to be more reusable - modified prompts for getting topics for reflection and for generating insights - update `characters.ipynb` to reflect changes ## Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> @vowelparrot @hwchase17 @dev2049	2023-06-03 14:53:14 -07:00
Leonid Ganeline	95c6ed0568	docs: `modules` pages simplified (#5116 ) # docs: modules pages simplified Fixied #5627 issue Merged several repetitive sections in the `modules` pages. Some texts, that were hard to understand, were also simplified. ## Who can review? @hwchase17 @dev2049	2023-06-03 14:44:32 -07:00
Chandan Routray	bc875a9df1	Fixed multi input prompt for MapReduceChain (#4979 ) # Fixed multi input prompt for MapReduceChain Added `kwargs` support for inner chains of `MapReduceChain` via `from_params` method Currently the `from_method` method of intialising `MapReduceChain` chain doesn't work if prompt has multiple inputs. It happens because it uses `StuffDocumentsChain` and `MapReduceDocumentsChain` underneath, both of them require specifying `document_variable_name` if `prompt` of their `llm_chain` has more than one `input`. With this PR, I have added support for passing their respective `kwargs` via the `from_params` method. ## Fixes https://github.com/hwchase17/langchain/issues/4752 ## Who can review? @dev2049 @hwchase17 @agola11 --------- Co-authored-by: imeckr <chandanroutray2012@gmail.com>	2023-06-03 14:41:03 -07:00
Matt Robinson	a97e4252e3	feat: add `UnstructuredExcelLoader` for `.xlsx` and `.xls` files (#5617 ) # Unstructured Excel Loader Adds an `UnstructuredExcelLoader` class for `.xlsx` and `.xls` files. Works with `unstructured>=0.6.7`. A plain text representation of the Excel file will be available under the `page_content` attribute in the doc. If you use the loader in `"elements"` mode, an HTML representation of the Excel file will be available under the `text_as_html` metadata key. Each sheet in the Excel document is its own document. ### Testing ```python from langchain.document_loaders import UnstructuredExcelLoader loader = UnstructuredExcelLoader( "example_data/stanley-cups.xlsx", mode="elements" ) docs = loader.load() ``` ## Who can review? @hwchase17 @eyurtsev	2023-06-03 12:44:12 -07:00
Leonid Ganeline	9a7488a5ce	fix import issue (#5636 ) # fix for the import issue Added document loader classes from [`figma`, `iugu`, `onedrive_file`] to `document_loaders/__inti__.py` imports Also sorted `__all__` Fixed #5623 issue	2023-06-02 14:58:41 -07:00
Zander Chase	20ec1173f4	Update Tracer Auth / Reduce Num Calls (#5517 ) Update the session creation and calls --------- Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-06-02 12:13:56 -07:00
Sean Morgan	949729ff5c	Fix bedrock llm boto3 client instantiation (#5629 ) Same issue as https://github.com/hwchase17/langchain/pull/5574	2023-06-02 12:04:49 -07:00
Caleb Ellington	c5a7a85a4e	fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug (#5584 ) # Chroma update_document full document embeddings bugfix Chroma update_document takes a single document, but treats the page_content sting of that document as a list when getting the new document embedding. This is a two-fold problem, where the resulting embedding for the updated document is incorrect (it's only an embedding of the first character in the new page_content) and it calls the embedding function for every character in the new page_content string, using many tokens in the process. Fixes #5582 Co-authored-by: Caleb Ellington <calebellington@Calebs-MBP.hsd1.ca.comcast.net>	2023-06-02 11:12:48 -07:00
Davis Chase	3c6fa9126a	bump 189 (#5620 )	2023-06-02 09:09:22 -07:00
Davis Chase	d784401215	Dev2049/add argilla callback (#5621 ) Co-authored-by: Alvaro Bartolome <alvarobartt@gmail.com> Co-authored-by: Daniel Vila Suero <daniel@argilla.io> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>	2023-06-02 09:05:06 -07:00
Kacper Łukawski	71a7c16ee0	Fix: Qdrant ids (#5515 ) # Fix Qdrant ids creation There has been a bug in how the ids were created in the Qdrant vector store. They were previously calculated based on the texts. However, there are some scenarios in which two documents may have the same piece of text but different metadata, and that's a valid case. Deduplication should be done outside of insertion. It has been fixed and covered with the integration tests. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-02 08:57:34 -07:00
Jeff Vestal	d1f65d8dc1	Es knn index search 5346 (#5569 ) # Create elastic_vector_search.ElasticKnnSearch class This extends `langchain/vectorstores/elastic_vector_search.py` by adding a new class `ElasticKnnSearch` Features: - Allow creating an index with the `dense_vector` mapping compataible with kNN search - Store embeddings in index for use with kNN search (correct mapping creates HNSW data structure) - Perform approximate kNN search - Perform hybrid BM25 (`query{}`) + kNN (`knn{}`) search - perform knn search by either providing a `query_vector` or passing a hosted `model_id` to use query_vector_builder to automatically generate a query_vector at search time Connection options - Using `cloud_id` from Elastic Cloud - Passing elasticsearch client object search options - query - k - query_vector - model_id - size - source - knn_boost (hybrid search) - query_boost (hybrid search) - fields This also adds examples to `docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb` Fixes # [5346](https://github.com/hwchase17/langchain/issues/5346) cc: @dev2049 --> --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-02 08:40:35 -07:00
Davis Chase	8b3df18bcc	human approval callback (#5581 ) ![Screenshot 2023-06-01 at 2 39 40 PM](https://github.com/hwchase17/langchain/assets/130488702/769f1480-7e51-46d9-bcde-698d0b091803)	2023-06-02 06:59:33 -07:00
Zander Chase	6655f43282	Rm Template Title (#5616 ) Remove the redundant title from the PR template #### Before submitting	2023-06-02 06:54:55 -07:00
Bharat Ramanathan	28d6277396	docs(integration): update colab and external links in WandbTracing docs (#5602 ) # Update Wandb Tracking documentation This PR updates the Wandb Tracking documentation for formatting, updated broken links and colab notebook links --------- Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>	2023-06-02 02:58:42 -07:00
Waldecir Santos	db45970a66	Fix SQLAlchemy truncating text when it is too big (#5206 ) # Fixes SQLAlchemy truncating the result if you have a big/text column with many chars. SQLAlchemy truncates columns if you try to convert a Row or Sequence to a string directly For comparison: - Before: ```[('Harrison', 'That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio ... (2 characters truncated) ... hat is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio ')]``` - After: ```[('Harrison', 'That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio ')]``` ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: I'm not sure who to tag for chains, maybe @vowelparrot ?	2023-06-01 21:33:31 -04:00
Davis Chase	4c572ffe95	nit (#5578 )	2023-06-01 14:21:15 -07:00
sseide	001b147450	Documentation fixes (linting and broken links) (#5563 ) # Lint sphinx documentation and fix broken links This PR lints multiple warnings shown in generation of the project documentation (using "make docs_linkcheck" and "make docs_build"). Additionally documentation internal links to (now?) non-existent files are modified to point to existing documents as it seemed the new correct target. The documentation is not updated content wise. There are no source code changes. Fixes # (issue) - broken documentation links to other files within the project - sphinx formatting (linting) ## Before submitting No source code changes, so no new tests added. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-01 13:06:17 -07:00
Sean Morgan	8441cff1d7	Fix bedrock auth validation (#5574 ) https://github.com/hwchase17/langchain/pull/5523 has a small bug if client was not passed in constructor	2023-06-01 12:35:06 -07:00
Andrew Lei	6258f72a00	Add missing comma in conv chat agent prompt json (#5573 ) # Add missing comma in conversational chat agent prompt json Inspired by: https://github.com/hwchase17/langchainjs/pull/1498	2023-06-01 12:12:44 -07:00
Ikko Eltociear Ashimine	14a611775c	Fix typo in docugami.ipynb (#5571 ) # Fix typo in docugami.ipynb Fixed typo. infromation -> information	2023-06-01 11:45:56 -07:00
Blithe	80b3fdf2f7	make the elasticsearch api support version which below 8.x (#5495 ) the api which create index or search in the elasticsearch below 8.x is different with 8.x. When use the es which below 8.x , it will throw error. I fix the problem Co-authored-by: gaofeng27692 <gaofeng27692@hundsun.com>	2023-06-01 10:58:20 -07:00
Davis Chase	6632188606	bump 188 (#5568 )	2023-06-01 08:50:54 -07:00
Davis Chase	6afb463e9b	Qdrant self query (#5567 ) Add self query abilities to qdrant vectorstore	2023-06-01 08:40:31 -07:00
Patrick Keane	47c2ec2d0b	Corrects inconsistently misspelled variable name. (#5559 ) Corrects a spelling error (of the word separator) in several variable names. Three cut/paste instances of this were corrected, amidst instances of it also being named properly, which would likely would lead to issues for someone in the future. Here is one such example: ``` seperators = self.get_separators_for_language(Language.PYTHON) super().__init__(separators=seperators, kwargs) ``` becomes ``` separators = self.get_separators_for_language(Language.PYTHON) super().__init__(separators=separators, kwargs) ``` Make test results below: ``` ============================== 708 passed, 52 skipped, 27 warnings in 11.70s ============================== ```	2023-06-01 10:27:58 -04:00
Harrison Chase	342b671d05	add brave search util (#5538 ) Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-01 01:11:51 -07:00
Davis Chase	983a213bdc	add maxcompute (#5533 ) cc @pengwork (fresh branch, no creds)	2023-06-01 00:54:42 -07:00
Bharat Ramanathan	22603d19e0	feat(integrations): Add WandbTracer (#4521 ) # WandbTracer This PR adds the `WandbTracer` and deprecates the existing `WandbCallbackHandler`. Added an example notebook under the docs section alongside the `LangchainTracer` Here's an example [colab](https://colab.research.google.com/drive/1pY13ym8ENEZ8Fh7nA99ILk2GcdUQu0jR?usp=sharing) with the same notebook and the [trace](https://wandb.ai/parambharat/langchain-tracing/runs/8i45cst6) generated from the colab run Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-06-01 00:01:19 -07:00
Leonid Ganeline	373ad49157	docs `ecosystem/integrations` update 3 (#5470 ) # docs: `ecosystem_integrations` update 3 Next cycle of updating the `ecosystem/integrations` * Added an integration `template` file * Added missed integration files * Fixed several document_loaders/notebooks ## Who can review? Is it possible to assign somebody to review PRs on docs? Thanks.	2023-05-31 17:54:05 -07:00
Aditi Viswanathan	bc66b3fb8d	make BaseEntityStore inherit from BaseModel (#5478 ) # Make BaseEntityStore inherit from BaseModel This enables initializing InMemoryEntityStore by optionally passing in a value for the store field. ## Who can review? It's a small change so I think any of the reviewers can review, but tagging @dev2049 who seems most relevant since the change relates to Memory.	2023-05-31 17:32:19 -07:00
Sheng Han Lim	3bae595182	Add texts with embeddings to PGVector wrapper (#5500 ) Similar to #1813 for faiss, this PR is to extend functionality to pass text and its vector pair to initialize and add embeddings to the PGVector wrapper. Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: - @dev2049	2023-05-31 17:31:52 -07:00
Tobias van der Werff	8d07ba0d51	Fix wrong class instantiation in docs MMR example (#5501 ) # Fix wrong class instantiation in docs MMR example <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> When looking at the Maximal Marginal Relevance ExampleSelector example at https://python.langchain.com/en/latest/modules/prompts/example_selectors/examples/mmr.html, I noticed that there seems to be an error. Initially, the `MaxMarginalRelevanceExampleSelector` class is used as an `example_selector` argument to the `FewShotPromptTemplate` class. Then, according to the text, a comparison is made to regular similarity search. However, the `FewShotPromptTemplate` still uses the `MaxMarginalRelevanceExampleSelector` class, so the output is the same. To fix it, I added an instantiation of the `SemanticSimilarityExampleSelector` class, because this seems to be what is intended. ## Who can review? @hwchase17	2023-05-31 17:30:59 -07:00
Taras Tsugrii	b61f50665e	[retrievers][knn] Replace loop appends with list comprehension. (#5529 ) # Replace loop appends with list comprehension. It's much faster, more idiomatic and slightly more readable.	2023-05-31 16:57:24 -07:00

... 3 4 5 6 7 ...

2529 Commits