langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-31 15:20:26 +00:00

Author	SHA1	Message	Date
Matt Robinson	3ea1e5af1e	feat: added element metadata to unstructured loader (#1068 ) ### Summary Adds tracked metadata from `unstructured` elements to the document metadata when `UnstructuredFileLoader` is used in `"elements"` mode. Tracked metadata is available in `unstructured>=0.4.9`, but the code is written for backward compatibility with older `unstructured` versions. ### Testing Before running, make sure to upgrade to `unstructured==0.4.9`. In the code snippet below, you should see `page_number`, `filename`, and `category` in the metadata for each document. `doc[0]` should have `page_number: 1` and `doc[-1]` should have `page_number: 2`. The example document is `layout-parser-paper-fast.pdf` from the [`unstructured` sample docs](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs). ```python from langchain.document_loaders import UnstructuredFileLoader loader = UnstructuredFileLoader(file_path=f"layout-parser-paper-fast.pdf", mode="elements") docs = loader.load() ```	2023-02-15 22:36:18 -08:00
Harrison Chase	bac676c8e7	bump version (#1057 )	2023-02-15 07:09:10 -08:00
Ankush Gola	d8ac274fc2	add to async chain notebook (#1056 )	2023-02-14 18:20:38 -08:00
Ankush Gola	caa8e4742e	Enable streaming for OpenAI LLM (#986 ) * Support a callback `on_llm_new_token` that users can implement when `OpenAI.streaming` is set to `True`	2023-02-14 15:06:14 -08:00
Harrison Chase	f05f025e41	bump version to 0086 (#1050 )	2023-02-14 07:14:40 -08:00
Sasmitha Manathunga	c67c5383fd	docs: fix typo in notebook (#1046 )	2023-02-14 07:06:08 -08:00
Harrison Chase	88bebb4caa	Harrison/llm integrations (#1039 ) Co-authored-by: jped <jonathanped@gmail.com> Co-authored-by: Justin Torre <justintorre75@gmail.com> Co-authored-by: Ivan Vendrov <ivan@anthropic.com>	2023-02-13 22:06:25 -08:00
Harrison Chase	ec727bf166	Align table info (#999 ) (#1034 ) Currently the chain is getting the column names and types on the one side and the example rows on the other. It is easier for the llm to read the table information if the column name and examples are shown together so that it can easily understand to which columns do the examples refer to. For an instantiation of this, please refer to the changes in the `sqlite.ipynb` notebook. Also changed `eval` for `ast.literal_eval` when interpreting the results from the sample row query since it is a better practice. --------- Co-authored-by: Francisco Ingham <> --------- Co-authored-by: Francisco Ingham <fpingham@gmail.com>	2023-02-13 21:48:41 -08:00
Harrison Chase	8c45f06d58	Harrison/standarize prompt loading (#1036 ) Co-authored-by: Ibis Prevedello <ibiscp@gmail.com>	2023-02-13 21:48:09 -08:00
Enrico Shippole	f30dcc6359	Add GooseAI, CerebriumAI, Petals, ForefrontAI (#981 ) Add GooseAI, CerebriumAI, Petals, ForefrontAI	2023-02-13 21:20:19 -08:00
Anton Troynikov	d43d430d86	Chroma persistence (#1028 ) This PR adds persistence to the Chroma vector store. Users can supply a `persist_directory` with any of the `Chroma` creation methods. If supplied, the store will be automatically persisted at that directory. If a user creates a new `Chroma` instance with the same persistence directory, it will get loaded up automatically. If they use `from_texts` or `from_documents` in this way, the documents will be loaded into the existing store. There is the chance of some funky behavior if the user passes a different embedding function from the one used to create the collection - we will make this easier in future updates. For now, we log a warning.	2023-02-13 21:09:06 -08:00
Harrison Chase	012a6dfb16	Harrison/makefile (#1033 ) Co-authored-by: blob42 <contact@blob42.xyz> Co-authored-by: blob42 <spike@w530>	2023-02-13 21:08:47 -08:00
Harrison Chase	6a31a59400	add links (#1027 )	2023-02-13 16:33:30 -08:00
Oliver Klingefjord	20889205e8	Added retry for openai.error.ServiceUnavailableError (#1022 ) Imho retries should be performed for ServiceUnavailableError (which tends to happen to me quite often).	2023-02-13 13:30:06 -08:00
Harrison Chase	fc2502cd81	bump version to 0085 (#1017 )	2023-02-13 07:32:36 -08:00
Harrison Chase	0f0e69adce	agent refactors (#997 )	2023-02-12 23:02:13 -08:00
Harrison Chase	7fb33fca47	chroma docs (#1012 )	2023-02-12 23:02:01 -08:00
Harrison Chase	0c553d2064	Harrion/kg (#1016 ) Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>	2023-02-12 23:01:26 -08:00
Anton Troynikov	78abd277ff	Chroma in LangChain (#1010 ) Chroma is a simple to use, open-source, zero-config, zero setup vectorstore. Simply `pip install chromadb`, and you're good to go. Out-of-the-box Chroma is suitable for most LangChain workloads, but is highly flexible. I tested to 1M embs on my M1 mac, with out issues and reasonably fast query times. Look out for future releases as we integrate more Chroma features with LangChain!	2023-02-12 17:43:48 -08:00
cragwolfe	05d8969c79	Unstructured example notebook: add a pdf, related deps (#1011 ) Updates the Unstructured example notebook with a PDF example. Includes additional dependencies for PDF processing (and images, etc).	2023-02-12 14:56:48 -08:00
Dhruv Anand	03e5794978	typo fix on chat vector db docs (#1007 ) simple typo fix: because --> between	2023-02-12 12:09:21 -08:00
Harrison Chase	6d44a2285c	bump version to 0084 (#1005 )	2023-02-12 07:47:10 -08:00
Harrison Chase	0998577dfe	Harrison/unstructured structured (#1004 )	2023-02-12 07:36:11 -08:00
Harrison Chase	bbb06ca4cf	pdfminer (#1003 )	2023-02-12 07:29:26 -08:00
Francisco Ingham	0b6aa6a024	Added initial capital letter to bullet points that had it missing (#1000 ) Co-authored-by: Francisco Ingham <>	2023-02-11 20:31:34 -08:00
Harrison Chase	10e7297306	Harrison/fake llm (#990 ) Co-authored-by: Stefan Keselj <skeselj@princeton.edu> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-11 15:12:35 -08:00
Harrison Chase	e51fad1488	Harrison/0083 (#996 ) Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-11 08:29:28 -08:00
Shahriar Tajbakhsh	b7747017d7	Import of `declarative_base` when SQLAlchemy <1.4 (#883 ) In [pyproject.toml](https://github.com/hwchase17/langchain/blob/master/pyproject.toml), the expectation is `SQLAlchemy = "^1"`. But, the way `declarative_base` is imported in [cache.py](https://github.com/hwchase17/langchain/blob/master/langchain/cache.py) will only work with SQLAlchemy >=1.4. This PR makes sure Langchain can be run in environments with SQLAlchemy <1.4	2023-02-10 18:33:47 -08:00
Harrison Chase	2e96704d59	Harrison/airbyte (#989 ) Co-authored-by: zanderchase <zanderchase@gmail.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>	2023-02-10 18:08:00 -08:00
Charles Frye	e9799d6821	improves huggingface_hub example (#988 ) The provided example uses the default `max_length` of `20` tokens, which leads to the example generation getting cut off. 20 tokens is way too short to show CoT reasoning, so I boosted it to `64`. Without knowing HF's API well, it can be hard to figure out just where those `model_kwargs` come from, and `max_length` is a super critical one.	2023-02-10 17:56:15 -08:00
zanderchase	c2d1d903fa	Zander/online pdf loader (#984 )	2023-02-10 15:42:30 -08:00
Harrison Chase	055a53c27f	add texts example (#985 ) Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>	2023-02-10 12:32:44 -08:00
Harrison Chase	231da14771	bump version to 0082 (#980 ) Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>	2023-02-10 11:38:24 -08:00
jeff	6ab432d62e	docs: update spelling typos (#982 ) Wonder why "with" is spelled "wiht" so many times by human	2023-02-10 11:37:59 -08:00
Matt Robinson	07a407d89a	feat: adds `UnstructuredURLLoader` for loading data from urls (#979 ) ### Summary Adds a `UnstructuredURLLoader` that supports loading data from a list of URLs. ### Testing ```python from langchain.document_loaders import UnstructuredURLLoader urls = [ "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023", "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023" ] loader = UnstructuredURLLoader(urls=urls) raw_documents = loader.load() ```	2023-02-10 10:18:38 -08:00
Harrison Chase	c64f98e2bb	Harrison/format agent instructions (#973 ) Co-authored-by: Andrew White <white.d.andrew@gmail.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net> Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com>	2023-02-10 10:07:26 -08:00
Harrison Chase	5469d898a9	Harrison/everynote (#974 ) Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-10 08:02:35 -08:00
Harrison Chase	3d639d1539	update lint (#975 ) Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-10 08:01:13 -08:00
Harrison Chase	91c6cea227	Harrison/batch embeds (#972 ) Co-authored-by: John Dagdelen <jdagdelen@users.noreply.github.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-10 06:59:50 -08:00
Harrison Chase	ba54d36787	Harrison/tiktoken spec (#964 ) Co-authored-by: James Briggs <35938317+jamescalam@users.noreply.github.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-09 23:30:18 -08:00
Harrison Chase	5f8082bdd7	Harrison/deps (#963 ) Co-authored-by: Jon Luo <20971593+jzluo@users.noreply.github.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-09 23:19:19 -08:00
Kevin Huo	512c523368	remove sample_row_in_table_info and simplify set operations in SQLDB (#932 ) -Address TODO: deprecate for sample_row_in_table_info -Simplify set operations by casting to sets to not need multiple set casts + .difference() calls	2023-02-09 23:15:41 -08:00
Harrison Chase	e323d0cfb1	bump version 0081 (#956 ) Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-09 08:29:11 -08:00
Harrison Chase	01fa2d8117	Harrison/youtube fixes (#955 ) Co-authored-by: Ji <jizhang.work@gmail.com> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-09 08:12:22 -08:00
zanderchase	8e126bc9bd	adding webpage loading logic (#942 )	2023-02-09 07:52:50 -08:00
Harrison Chase	c71027e725	add docs for steamship deployment (#949 ) Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-08 16:01:19 -08:00
Usama Navid	e85c53ce68	Update readthedocs.py (#943 ) Sometimes, the docs may be empty. For example for the text = soup.find_all("main", {"id": "main-content"}) was an empty list. To cater to these edge cases, the clean function needs to be checked if it is empty or not.	2023-02-08 16:01:07 -08:00
Harrison Chase	3e1901e1aa	gutenberg books (#946 ) Co-authored-by: zanderchase <zander@unfold.ag> Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>	2023-02-08 12:00:47 -08:00
jeff	6a4f602156	docs: fix spelling typo (#934 )	2023-02-08 11:13:35 -08:00
Ikko Eltociear Ashimine	6023d5be09	Update huggingface_hub.ipynb (#944 ) HuggingFace -> Hugging Face	2023-02-08 11:05:28 -08:00

... 7 8 9 10 11 ...

973 Commits