langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
Harrison Chase	705431aecc	big docs refactor (#1978 ) Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-03-26 19:49:46 -07:00
Mario Kostelac	e7d6de6b1c	(ChatOpenAI) Add model_name to LLMResult.llm_output (#1960 ) This makes sure OpenAI and ChatOpenAI have the same llm_output, and allow tracking usage per model. Same work for OpenAI was done in https://github.com/hwchase17/langchain/pull/1713.	2023-03-24 08:51:16 -07:00
Harrison Chase	6e1b5b8f7e	Harrison/figma doc loader (#1908 ) Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>	2023-03-22 19:57:46 -07:00
Eli	12f868b292	Propagate "filter" arg in Chroma similarity_search (#1869 ) Technically a duplicate fix to #1619 but with unit tests and a small documentation update - Propagate `filter` arg in Chroma `similarity_search` to delegated call to `similarity_search_with_score` - Add `filter` arg to `similarity_search_by_vector` - Clarify doc strings on FakeEmbeddings	2023-03-22 19:40:10 -07:00
Maurício Maia	f155d9d3ec	Add metadata filter to PGVector search (#1872 ) Add ability to filter pgvector documents by metadata.	2023-03-22 15:21:40 -07:00
Maurício Maia	2212520a6c	Add PGVector collection metadata (#1887 ) The `CollectionStore` for `PGVector` has a `cmetadata` field but it's never used. This PR add the ability to save metadata information to the collection.	2023-03-22 11:27:07 -07:00
Harrison Chase	ce5d97bcb3	Harrison/guarded output parser (#1804 ) Co-authored-by: jerwelborn <jeremy.welborn@gmail.com>	2023-03-21 22:07:23 -07:00
Matt Tucker	a92344f476	Use regex match for bash process error output test assertion. (#1837 ) I was getting the same issue reported in #1339 by [MacYang555](https://github.com/MacYang555) when running the test suite on my Mac. I implemented the fix they suggested to use a regex match in the output assertion for the scenario under test. Resolves #1339	2023-03-21 09:06:52 -07:00
LeoGrin	3701b2901e	use namespace argument in Pinecone constructor (#1757 ) Fix #1756 Use the `namespace` argument of `Pinecone.from_exisiting_index` to set the default value of `namespace` for other methods. Leads to more expected behavior and easier integration in chains. For the test, I've added a line to delete and rebuild the `langchain-demo` index at the beginning of the test. I'm not 100% sure if it's a good idea but it makes the test reproducible.	2023-03-18 19:55:38 -07:00
Mario Kostelac	aff44d0a98	(OpenAI) Add model_name to LLMResult.llm_output (#1713 ) Given that different models have very different latencies and pricings, it's benefitial to pass the information about the model that generated the response. Such information allows implementing custom callback managers and track usage and price per model. Addresses https://github.com/hwchase17/langchain/issues/1557.	2023-03-16 21:55:55 -07:00
Daniel Chalef	b157e0c1c3	Add HTML document_loader that includes page title metadata (#1720 ) This `BSHTMLLoader` document_loader loads an HTML document, extracts text and adds the page title to the returned Document's metadata. The loader uses the already installed bs4 package to extract both text content and the page title. Included in this PR is an example HTML file and an integration test that tests against this file. --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org>	2023-03-16 21:47:17 -07:00
Kacper Łukawski	4a327dd1d6	Implement basic metadata filtering in Qdrant (#1689 ) This PR implements a basic metadata filtering mechanism similar to the ones in Chroma and Pinecone. It still cannot express complex conditions, as there are no operators, but some users requested to have that feature available.	2023-03-15 07:31:39 -07:00
Harrison Chase	0b29e68c17	Harrison/pgvector (#1679 ) Co-authored-by: Aman Kumar <krsingh.aman@gmail.com>	2023-03-14 21:13:58 -07:00
Xin Qiu	4e13cef05a	feat: add redisearch vectorstore (#1307 ) # Description Add `RediSearch` vectorstore for LangChain RediSearch: [RediSearch quick start](https://redis.io/docs/stack/search/quick_start/) # How to use ``` from langchain.vectorstores.redisearch import RediSearch rds = RediSearch.from_documents(docs, embeddings,redisearch_url="redis://localhost:6379") ```	2023-03-14 18:06:03 -07:00
Jon Luo	0a1b1806e9	sql: do not hard code the LIMIT clause in the table_info section (#1563 ) Seeing a lot of issues in Discord in which the LLM is not using the correct LIMIT clause for different SQL dialects. ie, it's using `LIMIT` for mssql instead of `TOP`, or instead of `ROWNUM` for Oracle, etc. I think this could be due to us specifying the LIMIT statement in the example rows portion of `table_info`. So the LLM is seeing the `LIMIT` statement used in the prompt. Since we can't specify each dialect's method here, I think it's fine to just replace the `SELECT... LIMIT 3;` statement with `3 rows from table_name table:`, and wrap everything in a block comment directly following the `CREATE` statement. The Rajkumar et al paper wrapped the example rows and `SELECT` statement in a block comment as well anyway. Thoughts @fpingham?	2023-03-13 23:08:27 -07:00
Tim Asp	b3234bf3b0	cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615 ) `OnlinePDFLoader` and `PagedPDFSplitter` lived separate from the rest of the pdf loaders. Because they're all similar, I propose moving all to `pdy.py` and the same docs/examples page. Additionally, `PagedPDFSplitter` naming doesn't match the pattern the rest of the loaders follow, so I renamed to `PyPDFLoader` and had it inherit from `BasePDFLoader` so it can now load from remote file sources.	2023-03-13 23:06:50 -07:00
Luis	562d9891ea	Add regex dict: (#1616 ) This class enables us to send a dictionary containing an output key and the expected format, which in turn allows us to retrieve the result of the matching formats and extract specific information from it. To exclude irrelevant information from our return dictionary, we can prompt the LLM to use a specific command that notifies us when it doesn't know the answer. We refer to this variable as the "no_update_value". Regarding the updated regular expression pattern (r"{}:\s?([^.'\n']).?"), it enables us to retrieve a format as 'Output Key':'value'. We have improved the regex by adding an optional space between ':' and 'value' with "s?", and by excluding points and line jumps from the matches using "[^.'\n']".	2023-03-13 23:05:39 -07:00
Harrison Chase	aed9f9febe	Harrison/return intermediate (#1633 ) Co-authored-by: Mario Kostelac <mario@intercom.io>	2023-03-13 07:54:29 -07:00
yakigac	acd86d33bc	Add read only shared memory (#1491 ) Provide shared memory capability for the Agent. Inspired by #1293 . ## Problem If both Agent and Tools (i.e., LLMChain) use the same memory, both of them will save the context. It can be annoying in some cases. ## Solution Create a memory wrapper that ignores the save and clear, thereby preventing updates from Agent or Tools.	2023-03-12 09:34:36 -07:00
Harrison Chase	c9b5a30b37	move output parsing (#1605 )	2023-03-11 16:41:03 -08:00
Harrison Chase	cb04ba0136	Add support for intermediate steps to SQLDatabaseSequentialChain (#1583 ) (#1601 ) for https://github.com/hwchase17/langchain/issues/1582 I simply added the `return_intermediate_steps` and changed the `output_keys` function. I added 2 simple tests, 1 for SQLDatabaseSequentialChain without the intermediate steps and 1 with Co-authored-by: brad-nemetski <115185478+brad-nemetski@users.noreply.github.com>	2023-03-11 15:44:41 -08:00
Harrison Chase	f95d551f7a	Harrison/shallow metadata (#1599 ) Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>	2023-03-11 09:18:25 -08:00
Harrison Chase	9f78717b3c	Harrison/callbacks (#1587 )	2023-03-10 12:53:09 -08:00
Harrison Chase	3ee32a01ea	Harrison/prompt layer (#1547 ) Co-authored-by: Jonathan Pedoeem <jonathanped@gmail.com> Co-authored-by: AbuBakar <abubakarsohail123@gmail.com>	2023-03-08 21:24:27 -08:00
Harrison Chase	c844d1fd46	Harrison/chunk size (#1549 ) Co-authored-by: Florian Leuerer <31259070+floleuerer@users.noreply.github.com>	2023-03-08 21:24:18 -08:00
Harrison Chase	357d808484	Harrison/remote paths pdf (#1544 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-03-08 20:53:37 -08:00
Harrison Chase	cc423f40f1	Harrison/youtube loader (#1545 ) Co-authored-by: Julian Wustl <57504258+Julianwustl@users.noreply.github.com>	2023-03-08 20:53:27 -08:00
Harrison Chase	7ade419a0e	allow passing of messages into prompt template (#1505 )	2023-03-07 21:10:12 -08:00
Harrison Chase	064741db58	Harrison/fix text splitter (#1511 ) Co-authored-by: ajaysolanky <ajsolanky@gmail.com> Co-authored-by: Ajay Solanky <ajaysolanky@saw-l14668307kd.myfiosgateway.com>	2023-03-07 15:42:28 -08:00
Ankush Gola	27104d4921	fix `ChatOpenAI.agenerate` (#1504 )	2023-03-07 15:22:05 -08:00
Harrison Chase	7bec461782	Harrison/memory refactor (#1478 ) moves memory to own module, factors out common stuff	2023-03-07 07:59:37 -08:00
Harrison Chase	0e21463f07	(rfc) chat models (#1424 ) Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-03-06 08:34:24 -08:00
Harrison Chase	63a5614d23	Harrison/simple memory (#1435 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-03-04 08:15:52 -08:00
Harrison Chase	a1b9dfc099	Harrison/similarity search chroma (#1434 ) Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>	2023-03-04 08:10:15 -08:00
Tim Asp	23231d65a9	Add PyMuPDF PDF loader (#1426 ) Different PDF libraries have different strengths and weaknesses. PyMuPDF does a good job at extracting the most amount of content from the doc, regardless of the source quality, extremely fast (especially compared to Unstructured). https://pymupdf.readthedocs.io/en/latest/index.html	2023-03-03 20:59:28 -08:00
Nuno Campos	499e76b199	Allow the regular openai class to be used for ChatGPT models (#1393 ) Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-03-02 09:04:18 -08:00
Kacper Łukawski	9ac442624c	Add Qdrant named arguments (#1386 ) This PR: - Increases `qdrant-client` version to 1.0.4 - Introduces custom content and metadata keys (as requested in #1087) - Moves all the `QdrantClient` parameters into the method parameters to simplify code completion	2023-03-02 07:05:14 -08:00
Ankush Gola	fe30be6fba	add async and streaming support to `OpenAIChat` (#1378 ) title says it all	2023-03-01 21:55:43 -08:00
Harrison Chase	1cd8996074	Harrison/summarizer chain (#1356 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-03-01 20:59:07 -08:00
Ankush Gola	82baecc892	Add a SQL agent for interacting with SQL Databases and JSON Agent for interacting with large JSON blobs (#1150 ) This PR adds * `ZeroShotAgent.as_sql_agent`, which returns an agent for interacting with a sql database. This builds off of `SQLDatabaseChain`. The main advantages are 1) answering general questions about the db, 2) access to a tool for double checking queries, and 3) recovering from errors * `ZeroShotAgent.as_json_agent` which returns an agent for interacting with json blobs. * Several examples in notebooks --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-02-28 19:44:39 -08:00
Harrison Chase	786852e9e6	partial variables (#1308 )	2023-02-28 08:40:35 -08:00
Tim Asp	72ef69d1ba	Add new iFixit document loader (#1333 ) iFixit is a wikipedia-like site that has a huge amount of open content on how to fix things, questions/answers for common troubleshooting and "things" related content that is more technical in nature. All content is licensed under CC-BY-SA-NC 3.0 Adding docs from iFixit as context for user questions like "I dropped my phone in water, what do I do?" or "My macbook pro is making a whining noise, what's wrong with it?" can yield significantly better responses than context free response from LLMs.	2023-02-27 20:40:20 -08:00
Harrison Chase	166cda2cc6	Harrison/deeplake (#1316 ) Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-02-26 22:35:04 -08:00
Harrison Chase	aaad6cc954	Harrison/atlas db (#1315 ) Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>	2023-02-26 22:11:38 -08:00
Enrico Shippole	9becdeaadf	Add Writer, Banana, Modal, StochasticAI (#1270 ) Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI Added rigid json format for Banana and Modal	2023-02-24 06:58:58 -08:00
Dennis Antela Martinez	53c67e04d4	add aleph alpha llm (#1207 ) Integrate Aleph Alpha's client into Langchain to provide access to the luminous models - more info on latest benchmarks here: https://www.aleph-alpha.com/luminous-performance-benchmarks	2023-02-22 10:37:36 -08:00
Harrison Chase	b7708bbec6	rfc: callback changes (#1165 ) conceptually, no reason a tool should know what an "agent action" is unless any objections, can change in all callback handlers	2023-02-20 22:54:15 -08:00
Harrison Chase	44c8d8a9ac	move serpapi wrapper (#1199 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-02-20 21:15:45 -08:00
Naveen Tatikonda	0118706fd6	Add Support for OpenSearch Vector database (#1191 ) ### Description This PR adds a wrapper which adds support for the OpenSearch vector database. Using opensearch-py client we are ingesting the embeddings of given text into opensearch cluster using Bulk API. We can perform the `similarity_search` on the index using the 3 popular searching methods of OpenSearch k-NN plugin: - `Approximate k-NN Search` use approximate nearest neighbor (ANN) algorithms from the [nmslib](https://github.com/nmslib/nmslib), [faiss](https://github.com/facebookresearch/faiss), and [Lucene](https://lucene.apache.org/) libraries to power k-NN search. - `Script Scoring` extends OpenSearch’s script scoring functionality to execute a brute force, exact k-NN search. - `Painless Scripting` adds the distance functions as painless extensions that can be used in more complex combinations. Also, supports brute force, exact k-NN search like Script Scoring. ### Issues Resolved https://github.com/hwchase17/langchain/issues/1054 --------- Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-02-20 18:39:34 -08:00
Andrew White	c5015d77e2	Allow k to be higher than doc size in max_marginal_relevance_search (#1187 ) Fixes issue #1186. For some reason, #1117 didn't seem to fix it.	2023-02-20 16:39:13 -08:00

1 2 3 4

198 Commits