langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Matt Robinson	2bee8d4941	feat: add support for `.ppt` files in `UnstructuredPowerPointLoader` (#1124 ) ### Summary Adds support for older `.ppt` file in the PowerPoint loader. ### Testing The following should work on `unstructured==0.4.11` using the example docs from the `unstructured` repo. ```python from langchain.document_loaders import UnstructuredPowerPointLoader filename = "../unstructured/example-docs/fake-power-point.pptx" loader = UnstructuredPowerPointLoader(filename) loader.load() filename = "../unstructured/example-docs/fake-power-point.ppt" loader = UnstructuredPowerPointLoader(filename) loader.load() ``` Now downgrade `unstructured` to version `0.4.10`. The following should work: ```python from langchain.document_loaders import UnstructuredPowerPointLoader filename = "../unstructured/example-docs/fake-power-point.pptx" loader = UnstructuredPowerPointLoader(filename) loader.load() ``` and the following should give you a `ValueError` and invite you to upgrade `unstructured`. ```python from langchain.document_loaders import UnstructuredPowerPointLoader filename = "../unstructured/example-docs/fake-power-point.ppt" loader = UnstructuredPowerPointLoader(filename) loader.load() ```	2023-02-17 13:03:25 -08:00
Matt Robinson	b956070f08	docs: add an unstructured section to the ecosystem page (#1125 ) ### Summary Adds an Unstructured section to the ecosystem page.	2023-02-17 13:02:23 -08:00
Hasegawa Yuya	383c67c1b2	Fix Issue #1100 (#1101 ) https://github.com/hwchase17/langchain/issues/1100 When faiss data and doc.index are created in past versions, error occurs that say there was no attribute. So I put hasattr in the check as a simple solution. However, increasing the number of such checks is not good for conservatism, so I think there is a better solution. Also, the code for the batch process was left out, so I put it back in.	2023-02-17 00:53:16 -08:00
Harrison Chase	3f50feb280	fix telegram imports (#1110 )	2023-02-17 00:53:01 -08:00
trigaten	6fafcd0a70	Strange behavior with LLM import requirements (#1104 ) This import works fine: ```python from langchain import Anthropic ``` This import does not: ```python from langchain import AI21 ``` ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name 'AI21' from 'langchain' (/opt/anaconda3/envs/fed_nlp/lib/python3.9/site-packages/langchain/__init__.py) ``` I think there is a slight documentation inconsistency here: https://langchain.readthedocs.io/en/latest/reference/modules/llms.html This PR starts to solve that. Should all the import examples be `from langchain.llms import X` instead of `from langchain import X`?	2023-02-16 23:13:34 -08:00
Kacper Łukawski	ab1a3cccac	Hotfix: Qdrant content retrieval (revert: #1088 ) (#1093 ) The #1088 introduced a bug in Qdrant integration. That PR reverts those changes and provides class attributes to ensure consistent payload keys. In addition to that, an exception will be thrown if any of texts is None (that could have been an issue reported in #1087)	2023-02-16 12:46:06 -08:00
Harrison Chase	6322b6f657	bump version 0.0.88 (#1090 )	2023-02-16 07:32:32 -08:00
Francisco Ingham	3462130e2d	Modify number of types of chains (#1089 ) Changed number of types of chains to make it consistent with the rest of the docs	2023-02-16 07:06:30 -08:00
Rishabh Raizada	5d11e5da40	Update qdrant.py (#1088 ) Fixes #1087	2023-02-16 07:06:02 -08:00
Harrison Chase	7745505482	chat qa with sources (#1084 )	2023-02-16 00:29:47 -08:00
Harrison Chase	badeeb37b0	fix stuff count (#1083 )	2023-02-15 23:57:13 -08:00
Harrison Chase	971458c5de	docs for batch size (#1082 )	2023-02-15 23:53:56 -08:00
Harrison Chase	5e10e19bfe	Harrison/align table (#1081 ) Co-authored-by: Francisco Ingham <fpingham@gmail.com>	2023-02-15 23:53:37 -08:00
Harrison Chase	c60954d0f8	Harrison/telegram loader (#1080 ) Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr>	2023-02-15 23:24:32 -08:00
Dennis Antela Martinez	a1c296bc3c	docs: increase width (#1049 ) This addresses #948. I set the documentation max width to 2560px, but can be adjusted - see screenshot below. <img width="1741" alt="Screenshot 2023-02-14 at 13 05 57" src="https://user-images.githubusercontent.com/23406704/218749076-ea51e90a-a220-4558-b4fe-5a95b39ebf15.png">	2023-02-15 23:07:01 -08:00
Harrison Chase	c96ac3e591	Harrison/semantic subset (#1079 ) Co-authored-by: Chen Wu (吴尘) <henrychenwu@cmu.edu>	2023-02-15 23:06:48 -08:00
Harrison Chase	19c2797bed	add anthropic example (#1041 ) Co-authored-by: Ivan Vendrov <ivendrov@gmail.com> Co-authored-by: Sasmitha Manathunga <70096033+mmz-001@users.noreply.github.com>	2023-02-15 23:04:28 -08:00
blob42	3ecdea8be4	SearxNG meta search api helper (#854 ) This is a work in progress PR to track my progres. ## TODO: - [x] Get results using the specifed searx host - [x] Prioritize returning an `answer` or results otherwise - [ ] expose the field `infobox` when available - [ ] expose `score` of result to help agent's decision - [ ] expose the `suggestions` field to agents so they could try new queries if no results are found with the orignial query ? - [ ] Dynamic tool description for agents ? - Searx offers many engines and a search syntax that agents can take advantage of. It would be nice to generate a dynamic Tool description so that it can be used many times as a tool but for different purposes. - [x] Limit number of results - [ ] Implement paging - [x] Miror the usage of the Google Search tool - [x] easy selection of search engines - [x] Documentation - [ ] update HowTo guide notebook on Search Tools - [ ] Handle async - [ ] Tests ### Add examples / documentation on possible uses with - [ ] getting factual answers with `!wiki` option and `infoboxes` - [ ] getting `suggestions` - [ ] getting `corrections` --------- Co-authored-by: blob42 <spike@w530> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-02-15 23:03:57 -08:00
Hasegawa Yuya	e08961ab25	Fixed openai embeddings to be safe by batching them based on token size calculation. (#991 ) I modified the logic of the batch calculation for embedding according to this cookbook https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb	2023-02-15 23:02:32 -08:00
seanaedmiston	f0a258555b	Support similarity search by vector (in FAISS) (#961 ) Alternate implementation to PR #960 Again - only FAISS is implemented. If accepted can add this to other vectorstores or leave as NotImplemented? Suggestions welcome...	2023-02-15 22:50:00 -08:00
Jonathan Pedoeem	05ad399abe	Update PromptLayerOpenAI LLM to include support for ASYNC API (#1066 ) This PR updates `PromptLayerOpenAI` to now support requests using the [Async API](https://langchain.readthedocs.io/en/latest/modules/llms/async_llm.html) It also updates the documentation on Async API to let users know that PromptLayerOpenAI also supports this. `PromptLayerOpenAI` now redefines `_agenerate` a similar was to how it redefines `_generate`	2023-02-15 22:48:09 -08:00
Harrison Chase	98186ef180	Harrison/evernote nb (#1078 ) Co-authored-by: Akshay <64036106+akshayvkt@users.noreply.github.com>	2023-02-15 22:47:30 -08:00
rogerserper	e46cd3b7db	Google Search API integration with serper.dev (wrapper, tests, docs, … (#909 ) Adds Google Search integration with [Serper](https://serper.dev) a low-cost alternative to SerpAPI (10x cheaper + generous free tier). Includes documentation, tests and examples. Hopefully I am not missing anything. Developers can sign up for a free account at [serper.dev](https://serper.dev) and obtain an api key. ## Usage ```python from langchain.utilities import GoogleSerperAPIWrapper from langchain.llms.openai import OpenAI from langchain.agents import initialize_agent, Tool import os os.environ["SERPER_API_KEY"] = "" os.environ['OPENAI_API_KEY'] = "" llm = OpenAI(temperature=0) search = GoogleSerperAPIWrapper() tools = [ Tool( name="Intermediate Answer", func=search.run ) ] self_ask_with_search = initialize_agent(tools, llm, agent="self-ask-with-search", verbose=True) self_ask_with_search.run("What is the hometown of the reigning men's U.S. Open champion?") ``` ### Output ``` Entering new AgentExecutor chain... Yes. Follow up: Who is the reigning men's U.S. Open champion? Intermediate answer: Current champions Carlos Alcaraz, 2022 men's singles champion. Follow up: Where is Carlos Alcaraz from? Intermediate answer: El Palmar, Spain So the final answer is: El Palmar, Spain > Finished chain. 'El Palmar, Spain' ```	2023-02-15 22:47:17 -08:00
Harrison Chase	52753066ef	Harrison/handle stop tokens ai21 (#1077 ) Co-authored-by: Andrew Huang <jhuang16888@gmail.com>	2023-02-15 22:44:55 -08:00
Akshay	d8ed286200	Update and rename everynote.py to evernote.py (#1060 ) Updating this base file as well as the .ipynb file of the example on the website: https://github.com/hwchase17/langchain/compare/master...akshayvkt:langchain:patch-1 https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/everynote.html	2023-02-15 22:41:42 -08:00
Jeff Huber	34cba2da32	Fix typo in integration with Chroma (#1070 ) We introduced a breaking change but missed this call. This PR fixes `langchain` to work with upstream `chroma`.	2023-02-15 22:37:58 -08:00
Jonathan Pedoeem	05df480376	Update `PromptLayerOpenAI` LLM usage instructions in documentation (#1053 ) This PR updates the usage instructions for PromptLayerOpenAI in Langchain's documentation. The updated instructions provide more detail and conform better to the style of other LLM integration documentation pages. No code changes were made in this PR, only improvements to the documentation. This update will make it easier for users to understand how to use `PromptLayerOpenAI`	2023-02-15 22:37:48 -08:00
Matt Robinson	3ea1e5af1e	feat: added element metadata to unstructured loader (#1068 ) ### Summary Adds tracked metadata from `unstructured` elements to the document metadata when `UnstructuredFileLoader` is used in `"elements"` mode. Tracked metadata is available in `unstructured>=0.4.9`, but the code is written for backward compatibility with older `unstructured` versions. ### Testing Before running, make sure to upgrade to `unstructured==0.4.9`. In the code snippet below, you should see `page_number`, `filename`, and `category` in the metadata for each document. `doc[0]` should have `page_number: 1` and `doc[-1]` should have `page_number: 2`. The example document is `layout-parser-paper-fast.pdf` from the [`unstructured` sample docs](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs). ```python from langchain.document_loaders import UnstructuredFileLoader loader = UnstructuredFileLoader(file_path=f"layout-parser-paper-fast.pdf", mode="elements") docs = loader.load() ```	2023-02-15 22:36:18 -08:00
Harrison Chase	bac676c8e7	bump version (#1057 )	2023-02-15 07:09:10 -08:00
Ankush Gola	d8ac274fc2	add to async chain notebook (#1056 )	2023-02-14 18:20:38 -08:00
Ankush Gola	caa8e4742e	Enable streaming for OpenAI LLM (#986 ) * Support a callback `on_llm_new_token` that users can implement when `OpenAI.streaming` is set to `True`	2023-02-14 15:06:14 -08:00
Harrison Chase	f05f025e41	bump version to 0086 (#1050 )	2023-02-14 07:14:40 -08:00
Sasmitha Manathunga	c67c5383fd	docs: fix typo in notebook (#1046 )	2023-02-14 07:06:08 -08:00
Harrison Chase	88bebb4caa	Harrison/llm integrations (#1039 ) Co-authored-by: jped <jonathanped@gmail.com> Co-authored-by: Justin Torre <justintorre75@gmail.com> Co-authored-by: Ivan Vendrov <ivan@anthropic.com>	2023-02-13 22:06:25 -08:00
Harrison Chase	ec727bf166	Align table info (#999 ) (#1034 ) Currently the chain is getting the column names and types on the one side and the example rows on the other. It is easier for the llm to read the table information if the column name and examples are shown together so that it can easily understand to which columns do the examples refer to. For an instantiation of this, please refer to the changes in the `sqlite.ipynb` notebook. Also changed `eval` for `ast.literal_eval` when interpreting the results from the sample row query since it is a better practice. --------- Co-authored-by: Francisco Ingham <> --------- Co-authored-by: Francisco Ingham <fpingham@gmail.com>	2023-02-13 21:48:41 -08:00
Harrison Chase	8c45f06d58	Harrison/standarize prompt loading (#1036 ) Co-authored-by: Ibis Prevedello <ibiscp@gmail.com>	2023-02-13 21:48:09 -08:00
Enrico Shippole	f30dcc6359	Add GooseAI, CerebriumAI, Petals, ForefrontAI (#981 ) Add GooseAI, CerebriumAI, Petals, ForefrontAI	2023-02-13 21:20:19 -08:00
Anton Troynikov	d43d430d86	Chroma persistence (#1028 ) This PR adds persistence to the Chroma vector store. Users can supply a `persist_directory` with any of the `Chroma` creation methods. If supplied, the store will be automatically persisted at that directory. If a user creates a new `Chroma` instance with the same persistence directory, it will get loaded up automatically. If they use `from_texts` or `from_documents` in this way, the documents will be loaded into the existing store. There is the chance of some funky behavior if the user passes a different embedding function from the one used to create the collection - we will make this easier in future updates. For now, we log a warning.	2023-02-13 21:09:06 -08:00
Harrison Chase	012a6dfb16	Harrison/makefile (#1033 ) Co-authored-by: blob42 <contact@blob42.xyz> Co-authored-by: blob42 <spike@w530>	2023-02-13 21:08:47 -08:00
Harrison Chase	6a31a59400	add links (#1027 )	2023-02-13 16:33:30 -08:00
Oliver Klingefjord	20889205e8	Added retry for openai.error.ServiceUnavailableError (#1022 ) Imho retries should be performed for ServiceUnavailableError (which tends to happen to me quite often).	2023-02-13 13:30:06 -08:00
Harrison Chase	fc2502cd81	bump version to 0085 (#1017 )	2023-02-13 07:32:36 -08:00
Harrison Chase	0f0e69adce	agent refactors (#997 )	2023-02-12 23:02:13 -08:00
Harrison Chase	7fb33fca47	chroma docs (#1012 )	2023-02-12 23:02:01 -08:00
Harrison Chase	0c553d2064	Harrion/kg (#1016 ) Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>	2023-02-12 23:01:26 -08:00
Anton Troynikov	78abd277ff	Chroma in LangChain (#1010 ) Chroma is a simple to use, open-source, zero-config, zero setup vectorstore. Simply `pip install chromadb`, and you're good to go. Out-of-the-box Chroma is suitable for most LangChain workloads, but is highly flexible. I tested to 1M embs on my M1 mac, with out issues and reasonably fast query times. Look out for future releases as we integrate more Chroma features with LangChain!	2023-02-12 17:43:48 -08:00
cragwolfe	05d8969c79	Unstructured example notebook: add a pdf, related deps (#1011 ) Updates the Unstructured example notebook with a PDF example. Includes additional dependencies for PDF processing (and images, etc).	2023-02-12 14:56:48 -08:00
Dhruv Anand	03e5794978	typo fix on chat vector db docs (#1007 ) simple typo fix: because --> between	2023-02-12 12:09:21 -08:00
Harrison Chase	6d44a2285c	bump version to 0084 (#1005 )	2023-02-12 07:47:10 -08:00
Harrison Chase	0998577dfe	Harrison/unstructured structured (#1004 )	2023-02-12 07:36:11 -08:00

... 7 8 9 10 11 ...

1000 Commits