langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
fabi.s	5b0e747f9a	Fix description of UnstructuredURLLoader & UnstructuredHTMLLoader (#1570 )	2023-03-10 07:08:58 -08:00
Zach Schillaci	624c72c266	Add wikipedia tool doc (#1579 )	2023-03-10 07:07:27 -08:00
Ryan Dao	a950287206	Strip trailing whitespaces in agent's stop sequences (#1566 ) Fixes #1489	2023-03-09 16:36:15 -08:00
Tim Asp	30383abb12	Add CSVLoader document loader (#1573 ) Simple CSV document loader which wraps `csv` reader, and preps the file with a single `Document` per row. The column header is prepended to each value for context which is useful for context with embedding and semantic search	2023-03-09 16:35:18 -08:00
Zach Schillaci	cdb97f3dfb	Add Wikipedia search utility and tool (#1561 ) The Python `wikipedia` package gives easy access for searching and fetching pages from Wikipedia, see https://pypi.org/project/wikipedia/. It can serve as an additional search and retrieval tool, like the existing Google and SerpAPI helpers, for both chains and agents.	2023-03-09 16:34:39 -08:00
Felix Altenberger	b44c8bd969	Add optional `base_url` arg to `GitbookLoader` (#1552 ) First of all, big kudos on what you guys are doing, langchain is enabling some really amazing usecases and I'm having lot's of fun playing around with it. It's really cool how many data sources it supports out of the box. However, I noticed some limitations of the current `GitbookLoader` which this PR adresses: The main change is that I added an optional `base_url` arg to `GitbookLoader`. This enables use cases where one wants to crawl docs from a start page other than the index page, e.g., the following call would scrape all pages that are reachable via nav bar links from "https://docs.zenml.io/v/0.35.0": ```python GitbookLoader( web_page="https://docs.zenml.io/v/0.35.0", load_all_paths=True, base_url="https://docs.zenml.io", ) ``` Previously, this would fail because relative links would be of the form `/v/0.35.0/...` and the full link URLs would become `docs.zenml.io/v/0.35.0/v/0.35.0/...`. I also fixed another issue of the `GitbookLoader` where the link URLs were constructed incorrectly as `website//relative_url` if the provided `web_page` had a trailing slash.	2023-03-09 16:32:40 -08:00
Andriy Mulyar	c9189d354a	AtlasDB vector store documentation updates. (#1572 ) - Updated errors in the AtlasDB vector store documentation - Removed extraneous output logs in example notebook.	2023-03-09 16:31:14 -08:00
blob42	622578a022	docs: fix typo in searx tool (#1569 ) Co-authored-by: blob42 <spike@w530>	2023-03-09 15:58:33 -08:00
Matt Robinson	7018806a92	feat: document loader for markdown files (#1558 ) ### Summary Adds a document loader for handling markdown files. This document loader requires `unstructured>=0.4.16`. ### Testing ```python from langchain.document_loaders import UnstructuredMarkdownLoader loader = UnstructuredMarkdownLoader("README.md") loader.load() ```	2023-03-09 10:55:07 -08:00
Harrison Chase	bd335ffd64	bump version to 106 (#1562 )	2023-03-09 10:20:54 -08:00
Harrison Chase	a094c49153	add chat agent (#1509 )	2023-03-09 09:12:08 -08:00
Brenton Wheeler	99fe023496	docs: fix typo in modules/indexes/chain_examples/question_answering (#1551 ) docs: fix typo in modules/indexes/chain_examples/question_answering ![image](https://user-images.githubusercontent.com/11394076/224007874-3a52adf6-ff7a-4f22-9dbf-18c83d08167f.png)	2023-03-09 09:11:43 -08:00
Harrison Chase	3ee32a01ea	Harrison/prompt layer (#1547 ) Co-authored-by: Jonathan Pedoeem <jonathanped@gmail.com> Co-authored-by: AbuBakar <abubakarsohail123@gmail.com>	2023-03-08 21:24:27 -08:00
Harrison Chase	c844d1fd46	Harrison/chunk size (#1549 ) Co-authored-by: Florian Leuerer <31259070+floleuerer@users.noreply.github.com>	2023-03-08 21:24:18 -08:00
Harrison Chase	9405af6919	Harrison/hf inf error (#1543 ) Co-authored-by: Konstantin Hebenstreit <57603012+KonstantinHebenstreit@users.noreply.github.com>	2023-03-08 20:53:46 -08:00
Harrison Chase	357d808484	Harrison/remote paths pdf (#1544 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-03-08 20:53:37 -08:00
Harrison Chase	cc423f40f1	Harrison/youtube loader (#1545 ) Co-authored-by: Julian Wustl <57504258+Julianwustl@users.noreply.github.com>	2023-03-08 20:53:27 -08:00
Harrison Chase	b053f831cd	Harrison/contributing (#1542 ) Co-authored-by: Saurav Maheshkar <sauravvmaheshkar@gmail.com>	2023-03-08 20:53:16 -08:00
Harrison Chase	523ad8d2e2	Harrison/chat history formatter1 (#1538 ) Co-authored-by: Youssef A. Abukwaik <yousseb@users.noreply.github.com>	2023-03-08 20:46:37 -08:00
Graham Neubig	31303d0b11	Added other evaluation metrics for data-augmented QA (#1521 ) This PR adds additional evaluation metrics for data-augmented QA, resulting in a report like this at the end of the notebook: ![Screen Shot 2023-03-08 at 8 53 23 AM](https://user-images.githubusercontent.com/398875/223731199-8eb8e77f-5ff3-40a2-a23e-f3bede623344.png) The score calculation is based on the [Critique](https://docs.inspiredco.ai/critique/) toolkit, an API-based toolkit (like OpenAI) that has minimal dependencies, so it should be easy for people to run if they choose. The code could further be simplified by actually adding a chain that calls Critique directly, but that probably should be saved for another PR if necessary. Any comments or change requests are welcome!	2023-03-08 20:41:03 -08:00
gidler	494c9d341a	[DOCS] Assorted wording, punctuation, and consistency revisions (#1443 ) Contributing some small fixes I noticed while reading through the documentation. Thank you for a creating and maintaining this project!	2023-03-08 20:16:09 -08:00
Harrison Chase	519f0187b6	Harrison/gdrive pdf (#1433 ) Co-authored-by: LM <93918064+LuisMalhadas@users.noreply.github.com> Co-authored-by: Luis Malhadas <luis@sia.so>	2023-03-08 20:15:36 -08:00
Florian Leuerer	64c6435545	Added client_settings support for chromadb vecstore (#1528 ) # Problem The ChromaDB vecstore only supported local connection. There was no way to use a chromadb server. # Fix Added `client_settings` as Chroma attribute. # Usage ``` from chromadb.config import Settings from langchain.vectorstores import Chroma chroma_settings = Settings(chroma_api_impl="rest", chroma_server_host="localhost", chroma_server_http_port="80") docsearch = Chroma.from_documents(chunks, embeddings, metadatas=metadatas, client_settings=chroma_settings, collection_name=COLLECTION_NAME) ```	2023-03-08 17:42:09 -08:00
Harrison Chase	7eba828e1b	Harrison/update regex (#1534 ) Co-authored-by: Luis <57528712+LuisLechugaRuiz@users.noreply.github.com>	2023-03-08 17:41:17 -08:00
Harrison Chase	2a7215bc3b	Harrison/prompt issues (#1537 )	2023-03-08 16:56:10 -08:00
Alpri Else	784d24a1d5	Support S3 Object keys with `/` in `S3FileLoader` (#1517 ) Resolves https://github.com/hwchase17/langchain/issues/1510 ### Problem When loading S3 Objects with `/` in the object key (eg. `folder/some-document.txt`) using `S3FileLoader`, the objects are downloaded into a temporary directory and saved as a file. This errors out when the parent directory does not exist within the temporary directory. See https://github.com/hwchase17/langchain/issues/1510#issuecomment-1459583696 on how to reproduce this bug ### What this pr does Creates parent directories based on object key. This also works with deeply nested keys: `folder/subfolder/some-document.txt`	2023-03-08 16:17:26 -08:00
Harrison Chase	aba58e9e2e	Harrison/bumpver104 (#1525 )	2023-03-08 09:46:02 -08:00
Harrison Chase	c4a557bdd4	add concept of prompt collection (#1507 )	2023-03-08 08:31:29 -08:00
Ivan	97e3666e0d	changed requests.run to requests.get (#1485 ) This pull request proposes an update to the Lightweight wrapper library's documentation. The current documentation provides an example of how to use the library's requests.run method, as follows: requests.run("https://www.google.com"). However, this example does not work for the 0.0.102 version of the library. Testing: The changes have been tested locally to ensure they are working as intended. Thank you for considering this pull request.	2023-03-07 21:10:23 -08:00
Harrison Chase	7ade419a0e	allow passing of messages into prompt template (#1505 )	2023-03-07 21:10:12 -08:00
Harrison Chase	a4a2d79087	Harrison/rtd loader (#1513 ) Co-authored-by: Youssef A. Abukwaik <yousseb@users.noreply.github.com>	2023-03-07 21:09:54 -08:00
Harrison Chase	8f21605d71	add return source docs (#1515 )	2023-03-07 21:09:36 -08:00
Harrison Chase	064741db58	Harrison/fix text splitter (#1511 ) Co-authored-by: ajaysolanky <ajsolanky@gmail.com> Co-authored-by: Ajay Solanky <ajaysolanky@saw-l14668307kd.myfiosgateway.com>	2023-03-07 15:42:28 -08:00
Tom Dyson	e3354404ad	Fix link to Pinecone notebook (#1492 )	2023-03-07 15:24:03 -08:00
Harrison Chase	3610ef2830	add fake embeddings class (#1503 )	2023-03-07 15:23:46 -08:00
Ankush Gola	27104d4921	fix `ChatOpenAI.agenerate` (#1504 )	2023-03-07 15:22:05 -08:00
Harrison Chase	4f41e20f09	memory docs (#1501 )	2023-03-07 11:02:46 -08:00
Harrison Chase	d0062c7a9a	bump version to 103 (#1498 )	2023-03-07 10:08:01 -08:00
Harrison Chase	8e6f599822	change to baselanguagemodel (#1496 )	2023-03-07 09:29:59 -08:00
Harrison Chase	f276bfad8e	Harrison/chat memory (#1495 )	2023-03-07 09:02:40 -08:00
Harrison Chase	7bec461782	Harrison/memory refactor (#1478 ) moves memory to own module, factors out common stuff	2023-03-07 07:59:37 -08:00
kahkeng	df6865cd52	Allow no token limit for ChatGPT API (#1481 ) The endpoint default is inf if we don't specify max_tokens, so unlike regular completion API, we don't need to calculate this based on the prompt.	2023-03-06 13:18:55 -08:00
Harrison Chase	312c319d8b	bump version to 102 (#1471 )	2023-03-06 10:50:44 -08:00
Harrison Chase	0e21463f07	(rfc) chat models (#1424 ) Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-03-06 08:34:24 -08:00
Juanky Soriano	dec3750875	Change method to calculate number of tokens for OpenAIChat (#1457 ) Solves https://github.com/hwchase17/langchain/issues/1412 Currently `OpenAIChat` inherits the way it calculates the number of tokens, `get_num_token`, from `BaseLLM`. In the other hand `OpenAI` inherits from `BaseOpenAI`. `BaseOpenAI` and `BaseLLM` uses different methodologies for doing this. The first relies on `tiktoken` while the second on `GPT2TokenizerFast`. The motivation of this PR is: 1. Bring consistency about the way of calculating number of tokens `get_num_token` to the `OpenAI` family, regardless of `Chat` vs `non Chat` scenarios. 2. Give preference to the `tiktoken` method as it's serverless friendly. It doesn't require downloading models which might make it incompatible with `readonly` filesystems.	2023-03-06 07:20:25 -08:00
Tim Asp	763f879536	fix always verbose on summarization checker (#1440 )	2023-03-05 07:10:08 -08:00
Harrison Chase	56b850648f	cr (#1436 )	2023-03-04 08:38:56 -08:00
Harrison Chase	63a5614d23	Harrison/simple memory (#1435 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	2023-03-04 08:15:52 -08:00
Harrison Chase	a1b9dfc099	Harrison/similarity search chroma (#1434 ) Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>	2023-03-04 08:10:15 -08:00
Peng Qu	68ce68f290	Fix an unusual issue that occurs when using OpenAIChat for llm_math (#1410 ) Fix an issue that occurs when using OpenAIChat for llm_math, refer to the code style of the "Final Answer:" in Mrkl。 the reason is I found a issue when I try OpenAIChat for llm_math, when I try the question in Chinese, the model generate the format like "\n\nQuestion: What is the square of 29?\nAnswer: 841", it translate the question first , then answer. below is my snapshot: <img width="945" alt="snapshot" src="https://user-images.githubusercontent.com/82029664/222642193-10ecca77-db7b-4759-bc46-32a8f8ddc48f.png">	2023-03-04 07:56:07 -08:00

... 6 7 8 9 10 ...

1130 Commits