langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-06 03:20:49 +00:00

Author	SHA1	Message	Date
Carmen Sam	d54c88aa21	Add allowed and disallowed special arguments to BaseOpenAI (#3012 ) ## Background This PR fixes this error when there are special tokens when querying the chain: ``` Encountered text corresponding to disallowed special token '<\|endofprompt\|>'. If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<\|endofprompt\|>', ...}`. If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<\|endofprompt\|>'})`. To disable this check for all special tokens, pass `disallowed_special=()`. ``` Refer to the code snippet below, it breaks in the chain line. ``` chain = ConversationalRetrievalChain.from_llm( ChatOpenAI(openai_api_key=OPENAI_API_KEY), retriever=vectorstore.as_retriever(), qa_prompt=prompt, condense_question_prompt=condense_prompt, ) answer = chain({"question": f"{question}"}) ``` However `ChatOpenAI` class is not accepting `allowed_special` and `disallowed_special` at the moment so they cannot be passed to the `encode()` in `get_num_tokens` method to avoid the errors. ## Change - Add `allowed_special` and `disallowed_special` attributes to `BaseOpenAI` class. - Pass in `allowed_special` and `disallowed_special` as arguments of `encode()` in tiktoken. --------- Co-authored-by: samcarmen <“carmen.samkahman@gmail.com”>	2023-04-18 09:34:08 -07:00
Harrison Chase	9d23cfc7dd	bump version to 143 (#3095 )	2023-04-18 09:12:57 -07:00
Harrison Chase	aad0a498ac	Harrison/output error (#3094 ) Co-authored-by: yummydum <sumita@nowcast.co.jp>	2023-04-18 08:59:56 -07:00
Harrison Chase	1c1b77bbfe	Harrison/discord (#3092 ) Co-authored-by: Rajtilak Bhattacharjee <rajtilak.blog@gmail.com>	2023-04-18 08:19:23 -07:00
Boris Feld	14e4d30659	Comet ml updates 17 04 2023 (#3074 ) I made a couple of improvements to the Comet tracker: * The Comet project name is configurable in various ways (code, environment variable or file), having a default value in code meant that users couldn't set the project name in an environment variable or in a file. * I added error catching when the `flush_tracker` is called in order to avoid crashing the whole process. Instead we are gonna display a warning or error log message (`extra={"show_traceback": True}` is an internal convention to force the display of the traceback when using our own logger). I decided to add the error catching after seeing the following error in the third example of the notebook: ``` COMET ERROR: Failed to export agent or LLM to Comet Traceback (most recent call last): File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 484, in _log_model langchain_asset.save(langchain_asset_path) File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 591, in save raise ValueError( ValueError: Saving not supported for agent executors. If you are trying to save the agent, please use the `.save_agent(...)` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 449, in flush_tracker self._log_model(langchain_asset) File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 488, in _log_model langchain_asset.save_agent(langchain_asset_path) File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 599, in save_agent return self.agent.save(file_path) File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 145, in save agent_dict = self.dict() File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 119, in dict _dict = super().dict() File "pydantic/main.py", line 449, in pydantic.main.BaseModel.dict File "pydantic/main.py", line 868, in _iter File "pydantic/main.py", line 743, in pydantic.main.BaseModel._get_value File "/home/lothiraldan/project/cometml/langchain/langchain/schema.py", line 381, in dict output_parser_dict["_type"] = self._type File "/home/lothiraldan/project/cometml/langchain/langchain/schema.py", line 376, in _type raise NotImplementedError NotImplementedError ``` I still need to investigate and try to fix it, it looks related to saving an agent to a file.	2023-04-18 07:32:29 -07:00
engkheng	fe68051d34	Fix typo in `docs/reference.rst` (#3081 ) fix typo	2023-04-18 07:31:00 -07:00
Azam Iftikhar	188e9b9beb	Allowing HuggingFaceEmbeddings from the cached weight (#3084 ) ### https://github.com/hwchase17/langchain/issues/3079 Allow initializing HuggingFaceEmbeddings from the cached weight	2023-04-18 07:30:35 -07:00
Roma	55f6f80a59	fix typo (#3085 )	2023-04-18 07:29:33 -07:00
TysBradford	7dae39b57d	slightly clearer docs (#3088 ) Took me a second to realise the examples required to manually print the output of the conversation predict. This might make it clearer for others	2023-04-18 07:28:29 -07:00
James O'Dwyer	0257829776	Bump Metal to use index_id (#3089 ) ## Use `index_id` over `app_id` We made a major update to index + retrieve based on Metal Indexes (instead of apps). With this change, we accept an index instead of an app in each of our respective core apis. [More details here](https://docs.getmetal.io/api-reference/core/indexing).	2023-04-18 07:28:13 -07:00
Hamza Kyamanywa	064a1db2b2	[Documentation] Show how to initiate pinecone from an existing index (#3070 ) ## What is this PR for: * This PR adds a commented line of code in the documentation that shows how someone can use the Pinecone client with an already existing Pinecone index * The documentation currently only shows how to create a pinecone index from langchain documents but not how to load one that already exists	2023-04-18 07:27:46 -07:00
Harrison Chase	894c272a56	tool validation logic	2023-04-17 21:59:32 -07:00
Harrison Chase	1920536d99	Harrison/obsidian (#3060 ) Co-authored-by: Ben Hofferber <hofferber.ben@gmail.com>	2023-04-17 21:57:32 -07:00
Zander Chase	93c0514105	Add Twitter Tweet Loader (#3050 ) Reformatted version of #3022 --------- Co-authored-by: LiaoKong <568250549@qq.com>	2023-04-17 21:44:54 -07:00
__Jay__	2984ad3964	updated llm response parsing action (#3058 ) Sometimes the LLM response (generated code) tends to miss the ending ticks "```". Therefore causing the text parsing to fail due to not enough values to unpack. The 2 extra `_` don't add value and can cause errors. Suggest to simply update the `_, action, _` to just `action` then with index. Fixes issue #3057	2023-04-17 21:42:13 -07:00
Harrison Chase	db968284f8	tools refactor (#2961 ) Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-17 21:35:29 -07:00
Sebastian	7a8c935b90	Edited for better readability (#3059 ) It looks like some dropdown functionality was intended, but it caused the markdown code to glitch which hurt readability.	2023-04-17 21:34:57 -07:00
Matthieu	822cdb161b	Adding shared chromaDB client option (#2886 ) This pull request addresses the need to share a single `chromadb.Client` instance across multiple instances of the `Chroma` class. By implementing a shared client, we can maintain consistency and reduce resource usage when multiple instances of the `Chroma` classes are created. This is especially relevant in a web app, where having multiple `Chroma` instances with a `persist_directory` leads to these clients not being synced. This PR implements this option while keeping the rest of the architecture unchanged. Changes: 1. Add a client attribute to the `Chroma` class to store the shared `chromadb.Client` instance. 2. Modify the `from_documents` method to accept an optional client parameter. 3. Update the `from_documents` method to use the shared client if provided or create a new client if not provided. Let me know if anything needs to be modified - thanks again for your work on this incredible repo	2023-04-17 21:22:39 -07:00
Harrison Chase	b140d366e3	Harrison/jira (#3055 ) Co-authored-by: William Li <32046231+zywilliamli@users.noreply.github.com> Co-authored-by: William Li <twelvehertz@Williams-MacBook-Air.local>	2023-04-17 21:14:40 -07:00
Amir Karimi	ae7ed31386	Fix redundancy check about config_type in AGENT_TO_CLASS (#2934 ) Fix of issue #2874	2023-04-17 21:05:48 -07:00
J Wynia	b40f90ea04	Spelling to correct conservation to conservation (#3049 ) Issue #3048 corrected spelling	2023-04-17 21:03:03 -07:00
leo-gan	c33883a40e	fixed the Cohere example title (#3053 ) - fixed the Cohere example title (bug in #3041, sorry for it) - fixed the runhouse.ipynb file name inconsistency	2023-04-17 21:02:52 -07:00
Harrison Chase	5107fac656	Harrison/rec gd (#3054 ) Co-authored-by: Benjamin Scholtz <BenSchZA@users.noreply.github.com>	2023-04-17 21:02:35 -07:00
Harrison Chase	eee2f23a79	Harrison/qa eg (#3052 ) Co-authored-by: Sukhpal Saini <bdcorps@users.noreply.github.com>	2023-04-17 20:56:42 -07:00
Harrison Chase	db7106cb79	Harrison/image caption loader (#3051 ) Co-authored-by: Sean Saito <saitosean@ymail.com>	2023-04-17 20:49:10 -07:00
Benjamin Scholtz	36138f28c8	Add GoogleSQL prompt (#2992 ) This PR extends upon @jzluo 's PR #2748 which addressed dialect-specific issues with SQL prompts, and adds a prompt that uses backticks for column names when querying BigQuery. See [GoogleSQL quoted identifiers](https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#quoted_identifiers). Additionally, the SQL agent currently uses a generic prompt. Not sure how best to adopt the same optional dialect-specific prompts as above, but will consider making an issue and PR for that too. See [langchain/agents/agent_toolkits/sql/prompt.py](langchain/agents/agent_toolkits/sql/prompt.py).	2023-04-17 20:44:54 -07:00
Naveen Tatikonda	bb619cd535	Pass kwargs to get OpenSearch client from_texts (#2993 ) ### Description Pass kwargs to get OpenSearch client from `from_texts` function ### Issues Resolved https://github.com/hwchase17/langchain/issues/2819 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-17 20:44:30 -07:00
Harutaka Kawamura	ba9cc230fa	Stringify `AgentType` before saving to yaml (#2998 ) Code to reproduce the issue (with `langchain==0.0.141`): ```python from langchain.agents import initialize_agent, load_tools from langchain.llms import OpenAI llm = OpenAI(temperature=0.9, verbose=True) tools = load_tools(["llm-math"], llm=llm) agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True) agent.save_agent("agent.yaml") with open("agent.yaml") as f: print(f.read()) ``` Output: ``` _type: !!python/object/apply:langchain.agents.agent_types.AgentType - zero-shot-react-description allowed_tools: - Calculator ... ``` I expected `_type` to be `zero-shot-react-description` but it's actually not. This PR fixes it by stringifying `AgentType` (`Enum`). Signed-off-by: harupy <hkawamura0130@gmail.com>	2023-04-17 20:43:39 -07:00
Nuno Campos	e25528c4f0	Fix incorrect value of outputKeys on AnalyzeDocumentsChain (#3010 )	2023-04-17 20:32:46 -07:00
engkheng	19febc77d6	Support inference of `input_variables` from `jinja2` template (#3013 ) `langchain.prompts.PromptTemplate` is unable to infer `input_variables` from jinja2 template. ```python # Using langchain v0.0.141 template_string = """\ Hello world Your variable: {{ var }} {# This will not get rendered #} {% if verbose %} Congrats! You just turned on verbose mode and got extra messages! {% endif %} """ template = PromptTemplate.from_template(template_string, template_format="jinja2") print(template.input_variables) # Output ['# This will not get rendered #', '% endif %', '% if verbose %'] ``` --------- Co-authored-by: engkheng <ongengkheng929@example.com>	2023-04-17 20:31:03 -07:00
Nuno Campos	dac32c59e5	Nc/combining output parser (#3014 ) Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-17 20:29:53 -07:00
Nuno Campos	79bb5c4f95	Port format instructions fix from js (#3015 )	2023-04-17 20:29:17 -07:00
Harrison Chase	e3cf00b88b	redis from url (#3024 )	2023-04-17 20:28:12 -07:00
Davis Chase	19c85aa990	Factor out doc formatting and add validation (#3026 ) @cnhhoang850 slightly more generic fix for #2944, works for whatever the expected metadata keys are not just `source`	2023-04-17 20:28:01 -07:00
Naveen Tatikonda	3453b7457c	OpenSearch: Add Support for Boolean Filter with ANN search (#3038 ) ### Description Add Support for Boolean Filter with ANN search Documentation - https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#boolean-filter-with-ann-search ### Issues Resolved https://github.com/hwchase17/langchain/issues/2924 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-17 20:26:26 -07:00
leo-gan	5420a0e404	updated langchain/docs/modules/models/llms/integrations/ notebooks (#3041 ) - Updated `langchain/docs/modules/models/llms/integrations/` notebooks: added links to the original sites, the install information, etc. - Added the `nlpcloud` notebook. - Removed "Example" from Titles of some notebooks, so all notebook titles are consistent.	2023-04-17 20:25:32 -07:00
Azam Iftikhar	471ef84835	Examples fixed (#3042 ) ### https://github.com/hwchase17/langchain/issues/2997 Replaced `conversation.memory.store` to `conversation.memory.entity_store.store` As conversation.memory.store doesn't exist and re-ran the whole file.	2023-04-17 20:25:01 -07:00
Tim Asp	dcdcd3f636	bugfix: throw exception if structured output parser doesn't get what it wants (#3044 ) allows the user to catch the issue and handle it rather than failing hard. This happens more than you'd expect when using output parsers with chatgpt, especially if the temp is anything but 0. Sometimes it doesn't want to listen and just does its own thing.	2023-04-17 20:24:40 -07:00
Harrison Chase	afd3e70ae5	Harrison/confluent loader (#2994 ) Co-authored-by: Justin Flick <Justinjayflick@gmail.com>	2023-04-17 20:23:45 -07:00
Altay Sansal	95d578d246	Fix type hint regression (#3033 ) Not sure what happened here but some of the file got overwritten by #2859 which broke filtering logic. Here is it fixed back to normal. @hwchase17 can we expedite this if possible :-) --------- Co-authored-by: Altay Sansal <altay.sansal@tgs.com>	2023-04-17 15:49:18 -07:00
Noah Gundotra	577ec92f16	Include testing instructions for getting setup in CONTRIBUTING.md (#3020 ) Running tests is good sanity check for new users to ensure their development environment is setup correctly.	2023-04-17 08:34:07 -07:00
Harrison Chase	98c70bc190	bump version to 142 (#3021 )	2023-04-17 08:00:00 -07:00
vowelparrot	2356447323	Update Characters notebook (#3019 ) - Most important - fixes the relevance_fn name in the notebook to align with the docs - Updates comments for the summary: <img width="787" alt="image" src="https://user-images.githubusercontent.com/130414180/232520616-2a99e8c3-a821-40c2-a0d5-3f3ea196c9bb.png"> - The new conversation is a bit better, still unfortunate they try to schedule a followup. - Rm the max dialogue turns argument to the conversation function	2023-04-17 07:48:48 -07:00
Harrison Chase	f1d15b4a75	update nb	2023-04-16 22:09:31 -07:00
Harrison Chase	e54f1b69ca	add notebook	2023-04-16 21:54:15 -07:00
vowelparrot	99c0382209	Generative Characters (#2859 ) Add a time-weighted memory retriever and a notebook that approximates a Generative Agent from https://arxiv.org/pdf/2304.03442.pdf The "daily plan" components are removed for now since they are less useful without a virtual world, but the memory is an interesting component to build off. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-16 21:41:00 -07:00
Jan Backes	a9310a3e8b	Add Annoy as VectorStore (#2939 ) Adds Annoy (https://github.com/spotify/annoy) as vector Store. RESOLVES hwchase17/langchain#2842 discord ref: https://discord.com/channels/1038097195422978059/1051632794427723827/1096089994168377354 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-16 13:44:04 -07:00
Harrison Chase	e12e00df12	use output parsers in agents (#2987 )	2023-04-16 13:15:21 -07:00
cs0lar	8b9e02da9d	Fix/issue 1213 (#2932 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search` method. ### Changes - a `max_marginal_relevance_search` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search` implementation ### Change Safety - [x] I have added tests to cover my changes	2023-04-16 13:11:30 -07:00
Harrison Chase	4c02f4bc30	Fix bug in svm.LinearSVC, add support for a relevancy_threshold (#2959 ) (#2981 ) - Modify SVMRetriever class to add an optional relevancy_threshold - Modify SVMRetriever.get_relevant_documents method to filter out documents with similarity scores below the relevancy threshold - Normalized the similarities to be between 0 and 1 so the relevancy_threshold makes more sense - The number of results are limited to the top k documents or the maximum number of relevant documents above the threshold, whichever is smaller This code will now return the top self.k results (or less, if there are not enough results that meet the self.relevancy_threshold criteria). The svm.LinearSVC implementation in scikit-learn is non-deterministic, which means SVMRetriever.from_texts(["bar", "world", "foo", "hello", "foo bar"]) could return [3 0 5 4 2 1] instead of [0 3 5 4 2 1] with a query of "foo". If you pass in multiple "foo" texts, the order could be different each time. Here, we only care if the 0 is the first element, otherwise it will offset the text and similarities. Example: ```python retriever = SVMRetriever.from_texts( ["foo", "bar", "world", "hello", "foo bar"], OpenAIEmbeddings(), k=4, relevancy_threshold=.25 ) result = retriever.get_relevant_documents("foo") ``` yields ```python [Document(page_content='foo', metadata={}), Document(page_content='foo bar', metadata={})] ``` --------- Co-authored-by: Brandon Sandoval <52767641+account00001@users.noreply.github.com>	2023-04-16 12:57:18 -07:00

1 2 3 4 5 ...

1486 Commits