langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Harrison Chase	d2520a5f1e	Harrison/ddg (#3206 ) Co-authored-by: itai <itai.marks@gmail.com> Co-authored-by: Itai Marks <itaim@users.noreply.github.com> Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com> Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com> Co-authored-by: Justin Flick <Justinjayflick@gmail.com> Co-authored-by: Justin Flick <jflick@homesite.com>	2023-04-19 21:32:26 -07:00
Harrison Chase	36c10f8a52	nits (#3203 )	2023-04-19 21:14:46 -07:00
Daniel Chalef	27cdf8d675	supabase vectorstore - first cut (#3100 ) First cut of a supabase vectorstore loosely patterned on the langchainjs equivalent. Doesn't support async operations which is a limitation of the supabase python client. --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org>	2023-04-19 21:06:44 -07:00
Harrison Chase	9a0356d276	Harrison/file chat history (#3198 ) Co-authored-by: Young Lee <joybro201@gmail.com>	2023-04-19 21:05:20 -07:00
Kazon Wilson	a66cab8b71	Add new line to refine prompt tmpl (#3197 ) Adding a new line to fix issue #3117	2023-04-19 21:04:52 -07:00
Harrison Chase	96809b5794	Harrison/discord loader (#3200 ) Co-authored-by: Rajtilak Bhattacharjee <rajtilak.blog@gmail.com>	2023-04-19 21:04:12 -07:00
Justin Flick	8faef1a91a	Confluence DL retry/backoff (#3168 ) Implemented a retry/backoff logic in response to #2473 --------- Co-authored-by: Justin Flick <jflick@homesite.com>	2023-04-19 20:50:39 -07:00
Adilzhan Ismailov	c03a65c6dc	Fix from_embeddings method examples (#3174 ) Fix examples for `from_embeddings` method for annoy and faiss vectorstores	2023-04-19 20:49:33 -07:00
Harrison Chase	f19b3890c9	Harrison/site map tqdm (#3184 ) Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com>	2023-04-19 20:48:47 -07:00
Harrison Chase	e55db5841a	Harrison/svm speedup (#3195 ) Co-authored-by: Lance Martin <122662504+PineappleExpress808@users.noreply.github.com>	2023-04-19 20:14:01 -07:00
obbiondo	d6b2f2b9bd	add ConfluenceLoader to document_loaders init (#3143 ) Fix ConfluenceLoader import Co-authored-by: Andrea Biondo <a.biondo@reply.it>	2023-04-19 20:05:31 -07:00
Zander Chase	c757c3cde4	Add HuggingFace Examples (#3187 ) Add a Pipeline example and add other models in th ehub notebook To close issue [#3077](https://github.com/hwchase17/langchain/issues/3099)	2023-04-19 17:08:10 -07:00
Donald "Max" Ziff	6adf2d1c39	first draft (#2690 ) There is a long way to go on this! --------- Co-authored-by: Max Ziff <max.ziff@concur.com>	2023-04-19 17:06:55 -07:00
Harrison Chase	9181cd9b22	Harrison/playwright selector (#3185 ) Co-authored-by: zhyuri <4649294+zhyuri@users.noreply.github.com>	2023-04-19 16:54:15 -07:00
Harrison Chase	68cd37175e	Harrison/arxiv tool (#3186 ) Co-authored-by: leo-gan <leo.gan.57@gmail.com>	2023-04-19 16:53:34 -07:00
Tunay Okumus	6e48107734	fix: separate model and deployment for OpenAIEmbeddings (#3076 ) Separated the deployment from model to support Azure OpenAI Embeddings properly. Also removed the deprecated document_model_name and query_model_name attributes.	2023-04-19 16:49:18 -07:00
Zander Chase	4adfd790f0	Update File Management Tools to Include Root Directory (#3112 ) - Permit the specification of a `root_dir` to the read/write file tools to specify a working directory - Add validation for attempts to read/write outside the directory (e.g., through `../../` or symlinks or `/abs/path`'s that don't lie in the correct path) - Add some tests for all One question is whether we should make a default root directory for these? tradeoffs either way	2023-04-19 16:46:10 -07:00
John-David Wuarin	a63bfb6c9f	fix: kwargs.pop("redis_url") KeyError: 'redis_url' (#3121 ) This occurred when redis_url was not passed as a parameter even though a REDIS_URL env variable was present. This occurred for all methods that eventually called any of: (from_texts, drop_index, from_existing_index) - i.e. virtually all methods in the class. This fixes it	2023-04-19 16:44:39 -07:00
engkheng	dbbc340f25	Validate `input_variables` when using `jinja2` templates (#3140 ) `langchain.prompts.PromptTemplate` and `langchain.prompts.FewShotPromptTemplate` do not validate `input_variables` when initialized as `jinja2` template. ```python # Using langchain v0.0.144 template = """"\ Your variable: {{ foo }} {% if bar %} You just set bar boolean variable to true {% endif %} """ # Missing variable, should raise ValueError prompt_template = PromptTemplate(template=template, input_variables=["bar"], template_format="jinja2", validate_template=True) # Extra variable, should raise ValueError prompt_template = PromptTemplate(template=template, input_variables=["bar", "foo", "extra", "thing"], template_format="jinja2", validate_template=True) ```	2023-04-19 16:18:32 -07:00
Matt Robinson	3e0c44bae8	enhancement: support headers for non-html urls (#3166 ) ### Summary Updates the `UnstructuredURLLoader` to support passing in headers for non HTML content types. While this update maintains backward compatibility with older versions of `unstructured`, we strongly recommended upgrading to `unstructured>=0.5.13` if you are using the `UnstructuredURLLoader`. ### Testing #### With headers ```python from langchain.document_loaders import UnstructuredURLLoader urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"] loader = UnstructuredURLLoader(urls=urls, headers={"Accept": "application/json"}, strategy="fast") docs = loader.load() print(docs[0].page_content[:1000]) ``` #### Without headers ```python from langchain.document_loaders import UnstructuredURLLoader urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"] loader = UnstructuredURLLoader(urls=urls, strategy="fast") docs = loader.load() print(docs[0].page_content[:1000]) ``` --------- Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>	2023-04-19 16:16:24 -07:00
Pranabendra Prasad Chandra	7b1f0656b8	Fix typo in ElasticSearch sample notebook (#3171 ) Added missing parenthesis in example notebook [elasticsearch.ipynb](https://github.com/hwchase17/langchain/blob/master/docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb)	2023-04-19 16:06:31 -07:00
Davis Chase	10e4b32ecb	Add document transformer abstraction (#3182 ) Add DocumentTransformer abstraction so that in #2915 we don't have to wrap TextSplitter and RedundantEmbeddingFilter (neither of which uses the query) in the contextual doc compression abstractions. with this change, doc filter (doc extractor, whatever we call it) would look something like ```python class BaseDocumentFilter(BaseDocumentTransformer[_RetrievedDocument], ABC): @abstractmethod def filter(self, documents: List[_RetrievedDocument], query: str) -> List[_RetrievedDocument]: ... def transform_documents(self, documents: List[_RetrievedDocument], query: Optional[str] = None, **kwargs: Any) -> List[_RetrievedDocument]: if query is None: raise ValueError("Must pass in non-null query to DocumentFilter") return self.filter(documents, query) ```	2023-04-19 16:05:05 -07:00
Zander Chase	74342ab209	Update the marathon notebook (#3183 ) There were some steps that didn't make sense. Update now. This time it produced a nice markdown formatted table too	2023-04-19 16:03:21 -07:00
leo-gan	a78f55b851	Additional resources - `YouTube` (#3180 ) Added links to the YouTube tutorials and videos in the `youtube.md`. Added link to the ^ in `index.rst`.	2023-04-19 15:16:29 -07:00
det-sys	26c8cd1ea2	Update gallery.rst (#3176 ) Add https://anysummary.app to the gallery	2023-04-19 15:06:59 -07:00
Happydog	5e66d05928	Fix: typo in custom_mrkl_agents.ipynb document (#3159 ) I have noticed a typo error in the `custom_mrkl_agents.ipynb` document while trying the example from the documentation page. As a result, I have opened a pull request (PR) to address this minor issue, even though it may seem insignificant 😂.	2023-04-19 14:57:33 -07:00
Harrison Chase	99b1983461	add example	2023-04-19 14:35:24 -07:00
Zander Chase	89c63cf8a6	Add Marathon Notebook (#3163 ) Add an example using autogpt to get the boston marathon winning times Add a web browser + summarization tool in the notebook	2023-04-19 11:23:08 -07:00
Dariel Dato-on	0b542661b4	Prevent `kwargs` from being overwritten (#3158 ) Fixes #3157. Prevents `kwargs` from being overwritten by `_to_args_and_kwargs()` and sending the wrong `kwargs` in line 109.	2023-04-19 09:00:10 -07:00
Quentin Pleplé	126d7f11dd	Fix notebook example (#3142 ) The following calls were throwing an exception: `575b717d10/docs/use_cases/evaluation/agent_vectordb_sota_pg.ipynb (L192)` `575b717d10/docs/use_cases/evaluation/agent_vectordb_sota_pg.ipynb (L239)` Exception: ``` --------------------------------------------------------------------------- ValidationError Traceback (most recent call last) Cell In[14], line 1 ----> 1 chain_sota = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vectorstore_sota, input_key="question") File ~/github/langchain/venv/lib/python3.9/site-packages/langchain/chains/retrieval_qa/base.py:89, in BaseRetrievalQA.from_chain_type(cls, llm, chain_type, chain_type_kwargs, kwargs) 85 _chain_type_kwargs = chain_type_kwargs or {} 86 combine_documents_chain = load_qa_chain( 87 llm, chain_type=chain_type, _chain_type_kwargs 88 ) ---> 89 return cls(combine_documents_chain=combine_documents_chain, *kwargs) File ~/github/langchain/venv/lib/python3.9/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__() ValidationError: 1 validation error for RetrievalQA retriever instance of BaseRetriever expected (type=type_error.arbitrary_type; expected_arbitrary_type=BaseRetriever) ``` The vectorstores had to be converted to retrievers: `vectorstore_sota.as_retriever()` and `vectorstore_pg.as_retriever()`. The PR also: - adds the file `paul_graham_essay.txt` referenced by this notebook - adds to gitignore .pkl and *.bin files that are generated by this notebook Interestingly enough, the performance of the prediction greatly increased (new version of langchain or ne version of OpenAI models since the last run of the notebook): from 19/33 correct to 28/33 correct!	2023-04-19 08:55:06 -07:00
Jakub Kukul	599e17cea8	Working example for Anthropic (#3151 ) would be great if the provided example worked out of the box 😄	2023-04-19 08:52:33 -07:00
Harrison Chase	575b717d10	bump version to 144 (#3136 )	2023-04-18 23:29:23 -07:00
ProxyCausal	72b7d76d79	Print exception type for Python tool (#3126 ) Useful for debugging agents e.g. KeyError in addition to just printing the missing key	2023-04-18 22:45:06 -07:00
Harrison Chase	b7dc04c086	fix links	2023-04-18 22:44:53 -07:00
Zander Chase	8a050ba4bf	Notebook Nit (#3125 ) The required arg is `question` not `query`	2023-04-18 22:43:52 -07:00
Harrison Chase	364257d967	agent docs fixes (#3128 )	2023-04-18 21:54:30 -07:00
Zander Chase	f329196cf4	Agents 4 18 (#3122 ) Creating an experimental agents folder, containing BabyAGI, AutoGPT, and later, other examples --------- Co-authored-by: Rahul Behal <rahulbehal01@hotmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-18 21:41:03 -07:00
engkheng	8e386613ac	Import jinja2 only when used (#3123 ) Addressing #3113	2023-04-18 21:23:03 -07:00
Zander Chase	90ef705ced	Update Tool Input (#3103 ) - Remove dynamic model creation in the `args()` property. _Only infer for the decorator (and add an argument to NOT infer if someone wishes to only pass as a string)_ - Update the validation example to make it less likely to be misinterpreted as a "safe" way to run a repl There is one example of "Multi-argument tools" in the custom_tools.ipynb from yesterday, but we could add more. The output parsing for the base MRKL agent hasn't been adapted to handle structured args at this point in time --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-18 18:18:33 -07:00
Francesco	19116010ee	Add exeption for when version metadata cannot be found for package (#3107 ) Solves #3097 Already ran tests and lint.	2023-04-18 16:44:40 -07:00
Carmen Sam	d54c88aa21	Add allowed and disallowed special arguments to BaseOpenAI (#3012 ) ## Background This PR fixes this error when there are special tokens when querying the chain: ``` Encountered text corresponding to disallowed special token '<\|endofprompt\|>'. If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<\|endofprompt\|>', ...}`. If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<\|endofprompt\|>'})`. To disable this check for all special tokens, pass `disallowed_special=()`. ``` Refer to the code snippet below, it breaks in the chain line. ``` chain = ConversationalRetrievalChain.from_llm( ChatOpenAI(openai_api_key=OPENAI_API_KEY), retriever=vectorstore.as_retriever(), qa_prompt=prompt, condense_question_prompt=condense_prompt, ) answer = chain({"question": f"{question}"}) ``` However `ChatOpenAI` class is not accepting `allowed_special` and `disallowed_special` at the moment so they cannot be passed to the `encode()` in `get_num_tokens` method to avoid the errors. ## Change - Add `allowed_special` and `disallowed_special` attributes to `BaseOpenAI` class. - Pass in `allowed_special` and `disallowed_special` as arguments of `encode()` in tiktoken. --------- Co-authored-by: samcarmen <“carmen.samkahman@gmail.com”>	2023-04-18 09:34:08 -07:00
Harrison Chase	9d23cfc7dd	bump version to 143 (#3095 )	2023-04-18 09:12:57 -07:00
Harrison Chase	aad0a498ac	Harrison/output error (#3094 ) Co-authored-by: yummydum <sumita@nowcast.co.jp>	2023-04-18 08:59:56 -07:00
Harrison Chase	1c1b77bbfe	Harrison/discord (#3092 ) Co-authored-by: Rajtilak Bhattacharjee <rajtilak.blog@gmail.com>	2023-04-18 08:19:23 -07:00
Boris Feld	14e4d30659	Comet ml updates 17 04 2023 (#3074 ) I made a couple of improvements to the Comet tracker: * The Comet project name is configurable in various ways (code, environment variable or file), having a default value in code meant that users couldn't set the project name in an environment variable or in a file. * I added error catching when the `flush_tracker` is called in order to avoid crashing the whole process. Instead we are gonna display a warning or error log message (`extra={"show_traceback": True}` is an internal convention to force the display of the traceback when using our own logger). I decided to add the error catching after seeing the following error in the third example of the notebook: ``` COMET ERROR: Failed to export agent or LLM to Comet Traceback (most recent call last): File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 484, in _log_model langchain_asset.save(langchain_asset_path) File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 591, in save raise ValueError( ValueError: Saving not supported for agent executors. If you are trying to save the agent, please use the `.save_agent(...)` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 449, in flush_tracker self._log_model(langchain_asset) File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 488, in _log_model langchain_asset.save_agent(langchain_asset_path) File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 599, in save_agent return self.agent.save(file_path) File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 145, in save agent_dict = self.dict() File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 119, in dict _dict = super().dict() File "pydantic/main.py", line 449, in pydantic.main.BaseModel.dict File "pydantic/main.py", line 868, in _iter File "pydantic/main.py", line 743, in pydantic.main.BaseModel._get_value File "/home/lothiraldan/project/cometml/langchain/langchain/schema.py", line 381, in dict output_parser_dict["_type"] = self._type File "/home/lothiraldan/project/cometml/langchain/langchain/schema.py", line 376, in _type raise NotImplementedError NotImplementedError ``` I still need to investigate and try to fix it, it looks related to saving an agent to a file.	2023-04-18 07:32:29 -07:00
engkheng	fe68051d34	Fix typo in `docs/reference.rst` (#3081 ) fix typo	2023-04-18 07:31:00 -07:00
Azam Iftikhar	188e9b9beb	Allowing HuggingFaceEmbeddings from the cached weight (#3084 ) ### https://github.com/hwchase17/langchain/issues/3079 Allow initializing HuggingFaceEmbeddings from the cached weight	2023-04-18 07:30:35 -07:00
Roma	55f6f80a59	fix typo (#3085 )	2023-04-18 07:29:33 -07:00
TysBradford	7dae39b57d	slightly clearer docs (#3088 ) Took me a second to realise the examples required to manually print the output of the conversation predict. This might make it clearer for others	2023-04-18 07:28:29 -07:00
James O'Dwyer	0257829776	Bump Metal to use index_id (#3089 ) ## Use `index_id` over `app_id` We made a major update to index + retrieve based on Metal Indexes (instead of apps). With this change, we accept an index instead of an app in each of our respective core apis. [More details here](https://docs.getmetal.io/api-reference/core/indexing).	2023-04-18 07:28:13 -07:00

1 2 3 4 5 ...

1426 Commits