langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-31 15:20:26 +00:00

Author	SHA1	Message	Date
Harrison Chase	7bec461782	Harrison/memory refactor (#1478 ) moves memory to own module, factors out common stuff	2023-03-07 07:59:37 -08:00
kahkeng	df6865cd52	Allow no token limit for ChatGPT API (#1481 ) The endpoint default is inf if we don't specify max_tokens, so unlike regular completion API, we don't need to calculate this based on the prompt.	2023-03-06 13:18:55 -08:00
Harrison Chase	312c319d8b	bump version to 102 (#1471 )	2023-03-06 10:50:44 -08:00
Harrison Chase	0e21463f07	(rfc) chat models (#1424 ) Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-03-06 08:34:24 -08:00
Juanky Soriano	dec3750875	Change method to calculate number of tokens for OpenAIChat (#1457 ) Solves https://github.com/hwchase17/langchain/issues/1412 Currently `OpenAIChat` inherits the way it calculates the number of tokens, `get_num_token`, from `BaseLLM`. In the other hand `OpenAI` inherits from `BaseOpenAI`. `BaseOpenAI` and `BaseLLM` uses different methodologies for doing this. The first relies on `tiktoken` while the second on `GPT2TokenizerFast`. The motivation of this PR is: 1. Bring consistency about the way of calculating number of tokens `get_num_token` to the `OpenAI` family, regardless of `Chat` vs `non Chat` scenarios. 2. Give preference to the `tiktoken` method as it's serverless friendly. It doesn't require downloading models which might make it incompatible with `readonly` filesystems.	2023-03-06 07:20:25 -08:00
Tim Asp	763f879536	fix always verbose on summarization checker (#1440 )	2023-03-05 07:10:08 -08:00
Harrison Chase	56b850648f	cr (#1436 )	2023-03-04 08:38:56 -08:00
Harrison Chase	63a5614d23	Harrison/simple memory (#1435 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-03-04 08:15:52 -08:00
Harrison Chase	a1b9dfc099	Harrison/similarity search chroma (#1434 ) Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>	2023-03-04 08:10:15 -08:00
Peng Qu	68ce68f290	Fix an unusual issue that occurs when using OpenAIChat for llm_math (#1410 ) Fix an issue that occurs when using OpenAIChat for llm_math, refer to the code style of the "Final Answer:" in Mrkl。 the reason is I found a issue when I try OpenAIChat for llm_math, when I try the question in Chinese, the model generate the format like "\n\nQuestion: What is the square of 29?\nAnswer: 841", it translate the question first , then answer. below is my snapshot: <img width="945" alt="snapshot" src="https://user-images.githubusercontent.com/82029664/222642193-10ecca77-db7b-4759-bc46-32a8f8ddc48f.png">	2023-03-04 07:56:07 -08:00
Ikko Eltociear Ashimine	b8a7828d1f	Update huggingface_datasets.ipynb (#1417 ) HuggingFace -> Hugging Face	2023-03-04 00:22:31 -08:00
Kentaro Tanaka	6a4ee07e4f	Fix type hint of 'vectorstore_cls' arg in `SemanticSimilarityExampleSelector` (#1427 ) Hello! Thank you for the amazing library you've created! While following the tutorial at [the link(`Using an example selector`)](https://langchain.readthedocs.io/en/latest/modules/prompts/examples/few_shot_examples.html#using-an-example-selector), I noticed that passing Chroma as an argument to from_examples results in a type hint error. Error message(mypy): ``` Argument 3 to "from_examples" of "SemanticSimilarityExampleSelector" has incompatible type "Type[Chroma]"; expected "VectorStore" [arg-type]mypy(error) ``` This pull request fixes the type hint and allows the VectorStore class to be specified as an argument.	2023-03-04 00:20:18 -08:00
Tim Asp	23231d65a9	Add PyMuPDF PDF loader (#1426 ) Different PDF libraries have different strengths and weaknesses. PyMuPDF does a good job at extracting the most amount of content from the doc, regardless of the source quality, extremely fast (especially compared to Unstructured). https://pymupdf.readthedocs.io/en/latest/index.html	2023-03-03 20:59:28 -08:00
blob42	3d54b05863	searx: add install instructions, update doc and notebooks (#1420 ) - Added instructions on setting up self hosted searx - Add notebook example with agent - Use `localhost:8888` as example url to stay consistent since public instances are not really usable. Co-authored-by: blob42 <spike@w530>	2023-03-03 20:57:50 -08:00
Tim Asp	bca0935d90	[docs] fix minor import error (#1425 )	2023-03-03 16:10:07 -08:00
Jon Luo	882f7964fb	fix sql misinterpretation of % in query (#1408 ) % is being misinterpreted by sqlalchemy as parameter passing, so any `LIKE 'asdf%'` will result in a value error with mysql, mariadb, and maybe some others. This is one way to fix it - the alternative is to simply double up %, like `LIKE 'asdf%%'` but this seemed cleaner in terms of output. Fixes #1383	2023-03-02 16:03:16 -08:00
JonLuca De Caro	443992c4d5	[Docs] Add missing word from prompt docs (#1406 ) The prompt in the first example of the quickstart guide was missing `for `	2023-03-02 16:02:54 -08:00
Eugene Yurtsev	a83a371069	Minor documentation update in initialize_agent (#1397 ) Updating documentation in initialize_agent. One thing that could benefit from further clarification is the responsibility breakdown by between an AgentExecutor vs. an Agent. The documentation for an AgentExecutor does not clarify that. From the class attributes, it appears that executor has access to the tools, while the agent is only aware of the tool names. Anyway, additional clarification would be beneficial on the AgentExecutor class.	2023-03-02 11:46:35 -08:00
Nuno Campos	499e76b199	Allow the regular openai class to be used for ChatGPT models (#1393 ) Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-03-02 09:04:18 -08:00
Kacper Łukawski	8947797250	Return Cohere embeddings as lists of floats (#1394 ) This PR fixes the types returned by Cohere embeddings. Currently, Cohere client returns instances of `cohere.embeddings.Embeddings`. Since the transport layer relies on JSON, some numbers might be represented as ints, not floats, which happens quite often. While that doesn't seem to be an issue, it breaks some pydantic models if they require strict floats.	2023-03-02 09:02:10 -08:00
Jason Gill	1989e7d4c2	Update examples to prevent confusing missing _type warning (#1391 ) The YAML and JSON examples of prompt serialization now give a strange `No '_type' key found, defaulting to 'prompt'` message when you try to run them yourself or copy the format of the files. The reason for this harmless warning is that the _type key was not in the config files, which means they are parsed as a standard prompt. This could be confusing to new users (like it was confusing to me after upgrading from 0.0.85 to 0.0.86+ for my few_shot prompts that needed a _type added to the example_prompt config), so this update includes the _type key just for clarity. Obviously this is not critical as the warning is harmless, but it could be confusing to track down or be interpreted as an error by a new user, so this update should resolve that.	2023-03-02 07:39:57 -08:00
Harrison Chase	dda5259f68	bump version to 0.0.99 (#1390 )	2023-03-02 07:25:59 -08:00
Kacper Łukawski	f032609f8d	Add `recursive` parameter to `DirectoryLoader` (#1389 ) This PR allows loading a directory recursively.	2023-03-02 07:06:26 -08:00
Kacper Łukawski	9ac442624c	Add Qdrant named arguments (#1386 ) This PR: - Increases `qdrant-client` version to 1.0.4 - Introduces custom content and metadata keys (as requested in #1087) - Moves all the `QdrantClient` parameters into the method parameters to simplify code completion	2023-03-02 07:05:14 -08:00
Francisco Ingham	34abcd31b9	remove limit clause from prompt for compatibility with ms sql server (#1385 ) For reference see: `8a35811556` Co-authored-by: Francisco Ingham <>	2023-03-02 07:02:42 -08:00
Ankush Gola	fe30be6fba	add async and streaming support to `OpenAIChat` (#1378 ) title says it all	2023-03-01 21:55:43 -08:00
Lakshya Agarwal	cfed0497ac	Minor grammatical fixes (#1325 ) Fixed typos and links in a few places across documents	2023-03-01 21:18:09 -08:00
Ryan Dao	59157b6891	Bug: Fix Python version validation in PythonAstREPLTool (#1373 ) The current logic checks if the Python major version is < 8, which is wrong. This checks if the major and minor version is < 3.9.	2023-03-01 21:15:27 -08:00
Harrison Chase	e178008b75	Harrison/track token usage (#1382 ) Co-authored-by: Zak King <zaking17@gmail.com>	2023-03-01 21:15:13 -08:00
Harrison Chase	1cd8996074	Harrison/summarizer chain (#1356 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-03-01 20:59:07 -08:00
yakigac	cfae03042d	Fix the openaichat example (#1377 ) The example was wrong.	2023-03-01 18:24:32 -08:00
Harrison Chase	4b5e850361	chatgpt wrapper (#1367 )	2023-03-01 11:47:01 -08:00
Harrison Chase	4d4b43cf5a	fix doc names (#1354 )	2023-03-01 09:40:31 -08:00
Harrison Chase	c01f9100e4	bump version to 0097 (#1365 )	2023-03-01 08:20:24 -08:00
Christie Jacob	edb3915ee7	typo in vectorstores (#1362 )	2023-03-01 07:21:37 -08:00
Harrison Chase	fe7dbecfe6	pandas and csv agents (#1353 )	2023-02-28 22:19:11 -08:00
Harrison Chase	02ec72df87	improve docs (#1351 )	2023-02-28 21:37:18 -08:00
Jon Luo	92ab27e4b8	sql doc formatting (#1350 ) My bad, missed a few tabs between the two PRs	2023-02-28 19:54:46 -08:00
Ankush Gola	82baecc892	Add a SQL agent for interacting with SQL Databases and JSON Agent for interacting with large JSON blobs (#1150 ) This PR adds * `ZeroShotAgent.as_sql_agent`, which returns an agent for interacting with a sql database. This builds off of `SQLDatabaseChain`. The main advantages are 1) answering general questions about the db, 2) access to a tool for double checking queries, and 3) recovering from errors * `ZeroShotAgent.as_json_agent` which returns an agent for interacting with json blobs. * Several examples in notebooks --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-02-28 19:44:39 -08:00
Jon Luo	35f1e8f569	separate columns by tabs instead of single space in sql sample rows (#1348 ) Use tabs to separate columns instead of a single space - confusing when there are spaces in a cell	2023-02-28 18:59:53 -08:00
kurehajime	6c629b54e6	Fixed arguments passed to InvalidTool.run(). (#1340 ) [InvalidTool.run()](`72ef69d1ba/langchain/agents/tools.py (L43)`) returns "{arg}is not a valid tool, try another one.". However, no function name is actually given in the argument. This causes LLM to be stuck in a loop, unable to find the right tool. This may resolve these Issues. https://github.com/hwchase17/langchain/issues/998 https://github.com/hwchase17/langchain/issues/702	2023-02-28 18:58:23 -08:00
James Brotchie	3574418a40	Fix link in summarization.md (#1344 ) "Utilities for working with Documents" was linking to a non-useful page. Re-linked to the utils page that includes info about working with docs.	2023-02-28 18:58:12 -08:00
Jon Luo	5bf8772f26	add option to use user-defined SQL table info (#1347 ) Currently, table information is gathered through SQLAlchemy as complete table DDL and a user-selected number of sample rows from each table. This PR adds the option to use user-defined table information instead of automatically collecting it. This will use the provided table information and fall back to the automatic gathering for tables that the user didn't provide information for. Off the top of my head, there are a few cases where this can be quite useful: - The first n rows of a table are uninformative, or very similar to one another. In this case, hand-crafting example rows for a table such that they provide the good, diverse information can be very helpful. Another approach we can think about later is getting a random sample of n rows instead of the first n rows, but there are some performance considerations that need to be taken there. Even so, hand-crafting the sample rows is useful and can guarantee the model sees informative data. - The user doesn't want every column to be available to the model. This is not an elegant way to fulfill this specific need since the user would have to provide the table definition instead of a simple list of columns to include or ignore, but it does work for this purpose. - For the developers, this makes it a lot easier to compare/benchmark the performance of different prompting structures for providing table information in the prompt. These are cases I've run into myself (particularly cases 1 and 3) and I've found these changes useful. Personally, I keep custom table info for a few tables in a yaml file for versioning and easy loading. Definitely open to other opinions/approaches though!	2023-02-28 18:58:04 -08:00
Harrison Chase	924bba5ce9	bump version (#1342 )	2023-02-28 08:48:32 -08:00
Harrison Chase	786852e9e6	partial variables (#1308 )	2023-02-28 08:40:35 -08:00
Tim Asp	72ef69d1ba	Add new iFixit document loader (#1333 ) iFixit is a wikipedia-like site that has a huge amount of open content on how to fix things, questions/answers for common troubleshooting and "things" related content that is more technical in nature. All content is licensed under CC-BY-SA-NC 3.0 Adding docs from iFixit as context for user questions like "I dropped my phone in water, what do I do?" or "My macbook pro is making a whining noise, what's wrong with it?" can yield significantly better responses than context free response from LLMs.	2023-02-27 20:40:20 -08:00
Matt Robinson	1aa41b5741	feat: document loader for image files (#1330 ) ### Summary Adds a document loader for image files such as `.jpg` and `.png` files. ### Testing Run the following using the example document from the [`unstructured` repo](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs). ```python from langchain.document_loaders.image import UnstructuredImageLoader loader = UnstructuredImageLoader("layout-parser-paper-fast.jpg") loader.load() ```	2023-02-27 14:43:32 -08:00
Eugene Yurtsev	c14cff60d0	Documentation: Minor typo fixes (#1327 ) Fixing a few minor typos in the documentation (and likely introducing other ones in the process).	2023-02-27 14:40:43 -08:00
Harrison Chase	f61858163d	bump version to 0.0.95 (#1324 )	2023-02-27 07:45:54 -08:00
Harrison Chase	0824d65a5c	Harrison/indexing pipeline (#1317 )	2023-02-27 00:31:36 -08:00

... 75 76 77 78 79 ...

4540 Commits