langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
John-David Wuarin	a63bfb6c9f	fix: kwargs.pop("redis_url") KeyError: 'redis_url' (#3121 ) This occurred when redis_url was not passed as a parameter even though a REDIS_URL env variable was present. This occurred for all methods that eventually called any of: (from_texts, drop_index, from_existing_index) - i.e. virtually all methods in the class. This fixes it	2023-04-19 16:44:39 -07:00
engkheng	dbbc340f25	Validate `input_variables` when using `jinja2` templates (#3140 ) `langchain.prompts.PromptTemplate` and `langchain.prompts.FewShotPromptTemplate` do not validate `input_variables` when initialized as `jinja2` template. ```python # Using langchain v0.0.144 template = """"\ Your variable: {{ foo }} {% if bar %} You just set bar boolean variable to true {% endif %} """ # Missing variable, should raise ValueError prompt_template = PromptTemplate(template=template, input_variables=["bar"], template_format="jinja2", validate_template=True) # Extra variable, should raise ValueError prompt_template = PromptTemplate(template=template, input_variables=["bar", "foo", "extra", "thing"], template_format="jinja2", validate_template=True) ```	2023-04-19 16:18:32 -07:00
Matt Robinson	3e0c44bae8	enhancement: support headers for non-html urls (#3166 ) ### Summary Updates the `UnstructuredURLLoader` to support passing in headers for non HTML content types. While this update maintains backward compatibility with older versions of `unstructured`, we strongly recommended upgrading to `unstructured>=0.5.13` if you are using the `UnstructuredURLLoader`. ### Testing #### With headers ```python from langchain.document_loaders import UnstructuredURLLoader urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"] loader = UnstructuredURLLoader(urls=urls, headers={"Accept": "application/json"}, strategy="fast") docs = loader.load() print(docs[0].page_content[:1000]) ``` #### Without headers ```python from langchain.document_loaders import UnstructuredURLLoader urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"] loader = UnstructuredURLLoader(urls=urls, strategy="fast") docs = loader.load() print(docs[0].page_content[:1000]) ``` --------- Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>	2023-04-19 16:16:24 -07:00
Pranabendra Prasad Chandra	7b1f0656b8	Fix typo in ElasticSearch sample notebook (#3171 ) Added missing parenthesis in example notebook [elasticsearch.ipynb](https://github.com/hwchase17/langchain/blob/master/docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb)	2023-04-19 16:06:31 -07:00
Davis Chase	10e4b32ecb	Add document transformer abstraction (#3182 ) Add DocumentTransformer abstraction so that in #2915 we don't have to wrap TextSplitter and RedundantEmbeddingFilter (neither of which uses the query) in the contextual doc compression abstractions. with this change, doc filter (doc extractor, whatever we call it) would look something like ```python class BaseDocumentFilter(BaseDocumentTransformer[_RetrievedDocument], ABC): @abstractmethod def filter(self, documents: List[_RetrievedDocument], query: str) -> List[_RetrievedDocument]: ... def transform_documents(self, documents: List[_RetrievedDocument], query: Optional[str] = None, **kwargs: Any) -> List[_RetrievedDocument]: if query is None: raise ValueError("Must pass in non-null query to DocumentFilter") return self.filter(documents, query) ```	2023-04-19 16:05:05 -07:00
Zander Chase	74342ab209	Update the marathon notebook (#3183 ) There were some steps that didn't make sense. Update now. This time it produced a nice markdown formatted table too	2023-04-19 16:03:21 -07:00
leo-gan	a78f55b851	Additional resources - `YouTube` (#3180 ) Added links to the YouTube tutorials and videos in the `youtube.md`. Added link to the ^ in `index.rst`.	2023-04-19 15:16:29 -07:00
det-sys	26c8cd1ea2	Update gallery.rst (#3176 ) Add https://anysummary.app to the gallery	2023-04-19 15:06:59 -07:00
Happydog	5e66d05928	Fix: typo in custom_mrkl_agents.ipynb document (#3159 ) I have noticed a typo error in the `custom_mrkl_agents.ipynb` document while trying the example from the documentation page. As a result, I have opened a pull request (PR) to address this minor issue, even though it may seem insignificant 😂.	2023-04-19 14:57:33 -07:00
Harrison Chase	99b1983461	add example	2023-04-19 14:35:24 -07:00
Zander Chase	89c63cf8a6	Add Marathon Notebook (#3163 ) Add an example using autogpt to get the boston marathon winning times Add a web browser + summarization tool in the notebook	2023-04-19 11:23:08 -07:00
Dariel Dato-on	0b542661b4	Prevent `kwargs` from being overwritten (#3158 ) Fixes #3157. Prevents `kwargs` from being overwritten by `_to_args_and_kwargs()` and sending the wrong `kwargs` in line 109.	2023-04-19 09:00:10 -07:00
Quentin Pleplé	126d7f11dd	Fix notebook example (#3142 ) The following calls were throwing an exception: `575b717d10/docs/use_cases/evaluation/agent_vectordb_sota_pg.ipynb (L192)` `575b717d10/docs/use_cases/evaluation/agent_vectordb_sota_pg.ipynb (L239)` Exception: ``` --------------------------------------------------------------------------- ValidationError Traceback (most recent call last) Cell In[14], line 1 ----> 1 chain_sota = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vectorstore_sota, input_key="question") File ~/github/langchain/venv/lib/python3.9/site-packages/langchain/chains/retrieval_qa/base.py:89, in BaseRetrievalQA.from_chain_type(cls, llm, chain_type, chain_type_kwargs, kwargs) 85 _chain_type_kwargs = chain_type_kwargs or {} 86 combine_documents_chain = load_qa_chain( 87 llm, chain_type=chain_type, _chain_type_kwargs 88 ) ---> 89 return cls(combine_documents_chain=combine_documents_chain, *kwargs) File ~/github/langchain/venv/lib/python3.9/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__() ValidationError: 1 validation error for RetrievalQA retriever instance of BaseRetriever expected (type=type_error.arbitrary_type; expected_arbitrary_type=BaseRetriever) ``` The vectorstores had to be converted to retrievers: `vectorstore_sota.as_retriever()` and `vectorstore_pg.as_retriever()`. The PR also: - adds the file `paul_graham_essay.txt` referenced by this notebook - adds to gitignore .pkl and *.bin files that are generated by this notebook Interestingly enough, the performance of the prediction greatly increased (new version of langchain or ne version of OpenAI models since the last run of the notebook): from 19/33 correct to 28/33 correct!	2023-04-19 08:55:06 -07:00
Jakub Kukul	599e17cea8	Working example for Anthropic (#3151 ) would be great if the provided example worked out of the box 😄	2023-04-19 08:52:33 -07:00
Harrison Chase	575b717d10	bump version to 144 (#3136 )	2023-04-18 23:29:23 -07:00
ProxyCausal	72b7d76d79	Print exception type for Python tool (#3126 ) Useful for debugging agents e.g. KeyError in addition to just printing the missing key	2023-04-18 22:45:06 -07:00
Harrison Chase	b7dc04c086	fix links	2023-04-18 22:44:53 -07:00
Zander Chase	8a050ba4bf	Notebook Nit (#3125 ) The required arg is `question` not `query`	2023-04-18 22:43:52 -07:00
Harrison Chase	364257d967	agent docs fixes (#3128 )	2023-04-18 21:54:30 -07:00
Zander Chase	f329196cf4	Agents 4 18 (#3122 ) Creating an experimental agents folder, containing BabyAGI, AutoGPT, and later, other examples --------- Co-authored-by: Rahul Behal <rahulbehal01@hotmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-18 21:41:03 -07:00
engkheng	8e386613ac	Import jinja2 only when used (#3123 ) Addressing #3113	2023-04-18 21:23:03 -07:00
Zander Chase	90ef705ced	Update Tool Input (#3103 ) - Remove dynamic model creation in the `args()` property. _Only infer for the decorator (and add an argument to NOT infer if someone wishes to only pass as a string)_ - Update the validation example to make it less likely to be misinterpreted as a "safe" way to run a repl There is one example of "Multi-argument tools" in the custom_tools.ipynb from yesterday, but we could add more. The output parsing for the base MRKL agent hasn't been adapted to handle structured args at this point in time --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-18 18:18:33 -07:00
Francesco	19116010ee	Add exeption for when version metadata cannot be found for package (#3107 ) Solves #3097 Already ran tests and lint.	2023-04-18 16:44:40 -07:00
Carmen Sam	d54c88aa21	Add allowed and disallowed special arguments to BaseOpenAI (#3012 ) ## Background This PR fixes this error when there are special tokens when querying the chain: ``` Encountered text corresponding to disallowed special token '<\|endofprompt\|>'. If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<\|endofprompt\|>', ...}`. If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<\|endofprompt\|>'})`. To disable this check for all special tokens, pass `disallowed_special=()`. ``` Refer to the code snippet below, it breaks in the chain line. ``` chain = ConversationalRetrievalChain.from_llm( ChatOpenAI(openai_api_key=OPENAI_API_KEY), retriever=vectorstore.as_retriever(), qa_prompt=prompt, condense_question_prompt=condense_prompt, ) answer = chain({"question": f"{question}"}) ``` However `ChatOpenAI` class is not accepting `allowed_special` and `disallowed_special` at the moment so they cannot be passed to the `encode()` in `get_num_tokens` method to avoid the errors. ## Change - Add `allowed_special` and `disallowed_special` attributes to `BaseOpenAI` class. - Pass in `allowed_special` and `disallowed_special` as arguments of `encode()` in tiktoken. --------- Co-authored-by: samcarmen <“carmen.samkahman@gmail.com”>	2023-04-18 09:34:08 -07:00
Harrison Chase	9d23cfc7dd	bump version to 143 (#3095 )	2023-04-18 09:12:57 -07:00
Harrison Chase	aad0a498ac	Harrison/output error (#3094 ) Co-authored-by: yummydum <sumita@nowcast.co.jp>	2023-04-18 08:59:56 -07:00
Harrison Chase	1c1b77bbfe	Harrison/discord (#3092 ) Co-authored-by: Rajtilak Bhattacharjee <rajtilak.blog@gmail.com>	2023-04-18 08:19:23 -07:00
Boris Feld	14e4d30659	Comet ml updates 17 04 2023 (#3074 ) I made a couple of improvements to the Comet tracker: * The Comet project name is configurable in various ways (code, environment variable or file), having a default value in code meant that users couldn't set the project name in an environment variable or in a file. * I added error catching when the `flush_tracker` is called in order to avoid crashing the whole process. Instead we are gonna display a warning or error log message (`extra={"show_traceback": True}` is an internal convention to force the display of the traceback when using our own logger). I decided to add the error catching after seeing the following error in the third example of the notebook: ``` COMET ERROR: Failed to export agent or LLM to Comet Traceback (most recent call last): File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 484, in _log_model langchain_asset.save(langchain_asset_path) File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 591, in save raise ValueError( ValueError: Saving not supported for agent executors. If you are trying to save the agent, please use the `.save_agent(...)` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 449, in flush_tracker self._log_model(langchain_asset) File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 488, in _log_model langchain_asset.save_agent(langchain_asset_path) File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 599, in save_agent return self.agent.save(file_path) File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 145, in save agent_dict = self.dict() File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 119, in dict _dict = super().dict() File "pydantic/main.py", line 449, in pydantic.main.BaseModel.dict File "pydantic/main.py", line 868, in _iter File "pydantic/main.py", line 743, in pydantic.main.BaseModel._get_value File "/home/lothiraldan/project/cometml/langchain/langchain/schema.py", line 381, in dict output_parser_dict["_type"] = self._type File "/home/lothiraldan/project/cometml/langchain/langchain/schema.py", line 376, in _type raise NotImplementedError NotImplementedError ``` I still need to investigate and try to fix it, it looks related to saving an agent to a file.	2023-04-18 07:32:29 -07:00
engkheng	fe68051d34	Fix typo in `docs/reference.rst` (#3081 ) fix typo	2023-04-18 07:31:00 -07:00
Azam Iftikhar	188e9b9beb	Allowing HuggingFaceEmbeddings from the cached weight (#3084 ) ### https://github.com/hwchase17/langchain/issues/3079 Allow initializing HuggingFaceEmbeddings from the cached weight	2023-04-18 07:30:35 -07:00
Roma	55f6f80a59	fix typo (#3085 )	2023-04-18 07:29:33 -07:00
TysBradford	7dae39b57d	slightly clearer docs (#3088 ) Took me a second to realise the examples required to manually print the output of the conversation predict. This might make it clearer for others	2023-04-18 07:28:29 -07:00
James O'Dwyer	0257829776	Bump Metal to use index_id (#3089 ) ## Use `index_id` over `app_id` We made a major update to index + retrieve based on Metal Indexes (instead of apps). With this change, we accept an index instead of an app in each of our respective core apis. [More details here](https://docs.getmetal.io/api-reference/core/indexing).	2023-04-18 07:28:13 -07:00
Hamza Kyamanywa	064a1db2b2	[Documentation] Show how to initiate pinecone from an existing index (#3070 ) ## What is this PR for: * This PR adds a commented line of code in the documentation that shows how someone can use the Pinecone client with an already existing Pinecone index * The documentation currently only shows how to create a pinecone index from langchain documents but not how to load one that already exists	2023-04-18 07:27:46 -07:00
Harrison Chase	894c272a56	tool validation logic	2023-04-17 21:59:32 -07:00
Harrison Chase	1920536d99	Harrison/obsidian (#3060 ) Co-authored-by: Ben Hofferber <hofferber.ben@gmail.com>	2023-04-17 21:57:32 -07:00
Zander Chase	93c0514105	Add Twitter Tweet Loader (#3050 ) Reformatted version of #3022 --------- Co-authored-by: LiaoKong <568250549@qq.com>	2023-04-17 21:44:54 -07:00
__Jay__	2984ad3964	updated llm response parsing action (#3058 ) Sometimes the LLM response (generated code) tends to miss the ending ticks "```". Therefore causing the text parsing to fail due to not enough values to unpack. The 2 extra `_` don't add value and can cause errors. Suggest to simply update the `_, action, _` to just `action` then with index. Fixes issue #3057	2023-04-17 21:42:13 -07:00
Harrison Chase	db968284f8	tools refactor (#2961 ) Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-17 21:35:29 -07:00
Sebastian	7a8c935b90	Edited for better readability (#3059 ) It looks like some dropdown functionality was intended, but it caused the markdown code to glitch which hurt readability.	2023-04-17 21:34:57 -07:00
Matthieu	822cdb161b	Adding shared chromaDB client option (#2886 ) This pull request addresses the need to share a single `chromadb.Client` instance across multiple instances of the `Chroma` class. By implementing a shared client, we can maintain consistency and reduce resource usage when multiple instances of the `Chroma` classes are created. This is especially relevant in a web app, where having multiple `Chroma` instances with a `persist_directory` leads to these clients not being synced. This PR implements this option while keeping the rest of the architecture unchanged. Changes: 1. Add a client attribute to the `Chroma` class to store the shared `chromadb.Client` instance. 2. Modify the `from_documents` method to accept an optional client parameter. 3. Update the `from_documents` method to use the shared client if provided or create a new client if not provided. Let me know if anything needs to be modified - thanks again for your work on this incredible repo	2023-04-17 21:22:39 -07:00
Harrison Chase	b140d366e3	Harrison/jira (#3055 ) Co-authored-by: William Li <32046231+zywilliamli@users.noreply.github.com> Co-authored-by: William Li <twelvehertz@Williams-MacBook-Air.local>	2023-04-17 21:14:40 -07:00
Amir Karimi	ae7ed31386	Fix redundancy check about config_type in AGENT_TO_CLASS (#2934 ) Fix of issue #2874	2023-04-17 21:05:48 -07:00
J Wynia	b40f90ea04	Spelling to correct conservation to conservation (#3049 ) Issue #3048 corrected spelling	2023-04-17 21:03:03 -07:00
leo-gan	c33883a40e	fixed the Cohere example title (#3053 ) - fixed the Cohere example title (bug in #3041, sorry for it) - fixed the runhouse.ipynb file name inconsistency	2023-04-17 21:02:52 -07:00
Harrison Chase	5107fac656	Harrison/rec gd (#3054 ) Co-authored-by: Benjamin Scholtz <BenSchZA@users.noreply.github.com>	2023-04-17 21:02:35 -07:00
Harrison Chase	eee2f23a79	Harrison/qa eg (#3052 ) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com>	2023-04-17 20:56:42 -07:00
Harrison Chase	db7106cb79	Harrison/image caption loader (#3051 ) Co-authored-by: Sean Saito <saitosean@ymail.com>	2023-04-17 20:49:10 -07:00
Benjamin Scholtz	36138f28c8	Add GoogleSQL prompt (#2992 ) This PR extends upon @jzluo 's PR #2748 which addressed dialect-specific issues with SQL prompts, and adds a prompt that uses backticks for column names when querying BigQuery. See [GoogleSQL quoted identifiers](https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#quoted_identifiers). Additionally, the SQL agent currently uses a generic prompt. Not sure how best to adopt the same optional dialect-specific prompts as above, but will consider making an issue and PR for that too. See [langchain/agents/agent_toolkits/sql/prompt.py](langchain/agents/agent_toolkits/sql/prompt.py).	2023-04-17 20:44:54 -07:00
Naveen Tatikonda	bb619cd535	Pass kwargs to get OpenSearch client from_texts (#2993 ) ### Description Pass kwargs to get OpenSearch client from `from_texts` function ### Issues Resolved https://github.com/hwchase17/langchain/issues/2819 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-17 20:44:30 -07:00

... 3 4 5 6 7 ...

1609 Commits