langchain

Commit Graph

Author	SHA1	Message	Date
Eugene Yurtsev	e0f6ba08d6	FileSysteBlobLoader: Expand user path (#10133 ) Fix for: https://github.com/langchain-ai/langchain/issues/10019 Verified fix manually	1 year ago
Krish Dholakia	31bbe80758	add additional model support to chatlitellm (#10134 ) --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
IlyaKIS1	de3322609e	Implemented Milvus translator for self-querying (#10162 ) - Implemented the MilvusTranslator for self-querying using Milvus vector store - Made unit tests to test its functionality - Documented the Milvus self-querying	1 year ago
Christophe Bornet	803d0d9656	Add the possibility to configure boto3 in the S3 loaders (#9304 ) - Description: this PR adds the possibility to configure boto3 in the S3 loaders. Any named argument you add will be used to create the Boto3 session. This is useful when the AWS credentials can't be passed as env variables or can't be read from the credentials file. - Issue: N/A - Dependencies: N/A - Tag maintainer: ? - Twitter handle: cbornet_ --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Xiaoyu Xee	9bcfd58580	Add dashvector self query retriever (#9684 ) ## Description Add `Dashvector` retriever and self-query retriever ## How to use ```python from langchain.vectorstores.dashvector import DashVector vectorstore = DashVector.from_documents(docs, embeddings) retriever = SelfQueryRetriever.from_llm( llm, vectorstore, document_content_description, metadata_field_info, verbose=True ) ``` --------- Co-authored-by: smallrain.xuxy <smallrain.xuxy@alibaba-inc.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Sajal Sharma	0b6993987f	feature: add verbosity to create_qa_with_sources_chain (#9742 ) Adds a verbose parameter to the create_qa_with_sources_chain and create_qa_with_structure_chain functions	1 year ago
Jayson Ng	68f2363f5d	Allow specifying arbitrary keyword arguments in `langchain.llms.VLLM` (#9683 ) Description: add arbitrary keyword arguments for VLLM Issue: https://github.com/langchain-ai/langchain/issues/9682 Dependencies: none Tag maintainer: @hwchase17, @baskaryan	1 year ago
Ackermann Yuriy	c585351bdc	Fixed query/instruction typoes (#10158 ) Fixed typoes in embedding parameters.	1 year ago
Stefano Lottini	c9ff0ab2e9	Cassandra support for LLM cache (exact-match and semantic) (#9772 ) This PR implements two new classes in the cache module: `CassandraCache` and `CassandraSemanticCache`, similar in structure and functionality to their Redis counterpart: providing a cache for the response to a (prompt, llm) pair. Integration tests are included. Moreover, linting and type checks are all passing on my machine. Dependencies: the `pyproject.toml` and `poetry.lock` have the newest version of cassIO (the very same as in the Cassandra vector store metadata PR, submitted as #9280). If I may suggest, this issue and #9280 might be reviewed together (as they bring the same poetry changes along), so I'm tagging @baskaryan who already helped out a little with poetry-related conflicts there. (Thank you!) I'd be happy to add a short notebook if this is deemed necessary (but it seems to me that, contrary e.g. to vector stores, caches are not covered in specific notebooks). Thank you! --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Terry Tan	8bc452a466	Enhance Google search tool SerpApi response (#10157 ) Enhance SerpApi response which potential to have more relevant output. <img width="345" alt="Screenshot 2023-09-01 at 8 26 13 AM" src="https://github.com/langchain-ai/langchain/assets/10222402/80ff684d-e02e-4143-b218-5c1b102cbf75"> Query: What is the weather in Pomfret? Before: > I should look up the current weather conditions. ... Final Answer: The current weather in Pomfret is 73°F with 1% chance of precipitation and winds at 10 mph. After: > I should look up the current weather conditions. ... Final Answer: The current weather in Pomfret is 62°F, 1% precipitation, 61% humidity, and 4 mph wind. --- Query: Top team in english premier league? Before: > I need to find out which team is currently at the top of the English Premier League ... Final Answer: Liverpool FC is currently at the top of the English Premier League. After: > I need to find out which team is currently at the top of the English Premier League ... Final Answer: Man City is currently at the top of the English Premier League. --- Query: Top team in english premier league? Before: > I need to find out which team is currently at the top of the English Premier League ... Final Answer: Liverpool FC is currently at the top of the English Premier League. After: > I need to find out which team is currently at the top of the English Premier League ... Final Answer: Man City is currently at the top of the English Premier League. --- Query: Any upcoming events in Paris? Before: > I should look for events in Paris Action: Search ... Final Answer: Upcoming events in Paris this month include Whit Sunday & Whit Monday (French National Holiday), Makeup in Paris, Paris Jazz Festival, Fete de la Musique, and Salon International de la Maison de. After: > I should look for events in Paris Action: Search ... Final Answer: Upcoming events in Paris include Elektric Park 2023, The Aces, and BEING AS AN OCEAN.	1 year ago
liunux4odoo	7d48c2884e	Update json_loader.py: encoding bug (#9785 ) JSONLoader.load does not specify `encoding` in `self.file_path.read_text()` as `self.file_path.open()` <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. -->	1 year ago
Juhee Kim	50ca44c79f	fix multipart email body retrieval (#9790 ) Description: Gmail message retrieval in GmailGetMessage and GmailSearch returned an empty string when encountering multipart emails. This change correctly extracts the email body for multipart emails. Dependencies: None @hwchase17 @vowelparrot	1 year ago
Cameron Hutchison	7d8bb78e5c	Extraction Chain - Custom Prompt (#9828 ) # Description This change allows you to customize the prompt used in `create_extraction_chain` as well as `create_extraction_chain_pydantic`. It also adds the `verbose` argument to `create_extraction_chain_pydantic` - because `create_extraction_chain` had it already and `create_extraction_chain_pydantic` did not. # Issue N/A # Dependencies N/A # Twitter https://twitter.com/CamAHutchison	1 year ago
mgvalverde	33f43cc1b0	Bugfix/jsonloader metadata (#9793 ) Hi, - Description: - Solves the issue #6478. - Includes some additional rework on the `JSONLoader` class: - Getting metadata is decoupled from `_get_text` - Validating metadata_func is perform now by `_validate_metadata_func`, instead of `_validate_content_key` - Issue: #6478 - Dependencies: NA - Tag maintainer: @hwchase17	1 year ago
Dane Summers	7d1b0fbe79	Adds dataview fields and tags to metadata #9800 (#9801 ) Description: Adds tags and dataview fields to ObsidianLoader doc metadata. - Issue: #9800, #4991 - Dependencies: none - Tag maintainer: My best guess is @hwchase17 looking through the git logs - Twitter handle: I don't use twitter, sorry!	1 year ago
Harrison Chase	ce47124e8f	add numbered list parser (#9837 )	1 year ago
Viktor Zhemchuzhnikov	507e46844e	Extend SQLChatMessageHistory (#9849 ) ### Description There is a really nice class for saving chat messages into a database - SQLChatMessageHistory. It leverages SqlAlchemy to be compatible with any supported database (in contrast with PostgresChatMessageHistory, which is basically the same but is limited to Postgres). However, the class is not really customizable in terms of what you can store. I can imagine a lot of use cases, when one will need to save a message date, along with some additional metadata. To solve this, I propose to extract the converting logic from BaseMessage to SQLAlchemy model (and vice versa) into a separate class - message converter. So instead of rewriting the whole SQLChatMessageHistory class, a user will only need to write a custom model and a simple mapping class, and pass its instance as a parameter. I also noticed that there is no documentation on this class, so I added that too, with an example of custom message converter. ### Issue N/A ### Dependencies N/A ### Tag maintainer Not yet ### Twitter handle N/A	1 year ago
Jon Bennion	fed137a8a9	adding new chain for logical fallacy removal from model output in chain (#9887 ) Description: new chain for logical fallacy removal from model output in chain and docs Issue: n/a see above Dependencies: none Tag maintainer: @hinthornw in past from my end but not sure who that would be for maintenance of chains Twitter handle: no twitter feel free to call out my git user if shout out j-space-b Note: created documentation in docs/extras --------- Co-authored-by: Jon Bennion <jb@Jons-MacBook-Pro.local> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Harrison Chase	794ff2dae8	Harrison/hf lru (#10154 ) Co-authored-by: Pascal Bro <git@pascalbrokmeier.de> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Stanko Kuveljic	4765c09703	Pinecone upsert parallelization (#9859 ) Issue: closes #9855 * consolidates `from_texts` and `add_texts` functions for pinecone upsert * adds two types of batching (one for embeddings and one for index upsert) * adds thread pool size when instantiating pinecone index	1 year ago
Lorenzo	00a7c31ffd	Fix: Nested Dicts Handling of Document Metadata (#9880 ) ## Description When the `MultiQueryRetriever` is used to get the list of documents relevant according to a query, inside a vector store, and at least one of these contain metadata with nested dictionaries, a `TypeError: unhashable type: 'dict'` exception is thrown. This is caused by the `unique_union` function which, to guarantee the uniqueness of the returned documents, tries, unsuccessfully, to hash the nested dictionaries and use them as a part of key. ```python unique_documents_dict = { (doc.page_content, tuple(sorted(doc.metadata.items()))): doc for doc in documents } ``` ## Issue #9872 (MultiQueryRetriever (get_relevant_documents) raises TypeError: unhashable type: 'dict' with dic metadata) ## Solution A possible solution is to dump the metadata dict to a string and use it as a part of hashed key. ```python unique_documents_dict = { (doc.page_content, json.dumps(doc.metadata, sort_keys=True)): doc for doc in documents } ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Davide Menini	b8baead70c	fix (Html2TextTransformer): allow configuration of html2text (#9914 ) Hi, this PR enables configuring the html2text package, instead of being bound to use the hardcoded values. While simply passing `ignore_links` and `ignore_images` to the `transform_documents` method was possible, I preferred passing them to the `__init__` method for 2 reasons: 1. It is more efficient in case of subsequent calls to `transform_documents`. 2. It allows to move the "complexity" to the instantiation, keeping the actual execution simple and general enough. IMO the transformers should all follow this pattern, allowing something like this: ```python # Instantiate transformers transformers = [ TransformerA(foo='bar'), TransformerB(bar='foo'), # others ] # During execution, call them sequentially documents = ... for tr in transformers: documents = tr.transform_documents(documents) ``` Thanks for the reviews! --------- Co-authored-by: taamedag <Davide.Menini@swisscom.com>	1 year ago
Frédéric Lepied	4dc47bd3ac	time_weighted_retriever: use a timestamp if needed (#9906 ) If last_accessed_at metadata is a float use it as a timestamp. This allows to support vector stores that do not store datetime objects like ChromaDb. Fixes: https://github.com/langchain-ai/langchain/issues/3685 <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. -->	1 year ago
Josh White	bc8cceebf7	Extend DynamoDBChatMessageHistory to support composite keys (#9896 ) - Description: Adds two optional parameters to the DynamoDBChatMessageHistory class to enable users to pass in a name for their PrimaryKey, or a Key object itself to enable the use of composite keys, a common DynamoDB paradigm. [AWS DynamoDB Key docs](https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/) - Issue: N/A - Dependencies: N/A - Twitter handle: N/A --------- Co-authored-by: Josh White <josh@ctrlstack.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Programmers Emperor	872d829201	Update __init__.py (#9955 ) Add SQLDatabaseSequentialChain Class to __init__.py so it can be accessed and used <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: SQLDatabaseSequentialChain is not found when importing Langchain_experimental package, when I open __init__.py Langchain_expermental.sql, I found that SQLDatabaseSequentialChain is imported and add to __all__ list - Issue: SQLDatabaseSequentialChain is not found in Langchain_experimental package - Dependencies: None, - Tag maintainer: None, - Twitter handle: None, Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. -->	1 year ago
Lucas Rodrigues Pereira	5c7afe8aae	Fix json parsing error of MULTI_PROMPT_ROUTER_TEMPLATE (#9944 ) The output at times lacks the closing markdown code block. The prompt is changed to explicitly request the closing backticks. <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. -->	1 year ago
Lance Martin	387813bfb2	Sort by most recent chatIDs (#9946 ) When we `lazy_load` iMessage chats, return chats w/ most recent msg first (matches what is visualized in app).	1 year ago
German Martin	cf5a50469f	TextGen is missing async methods. (#9986 ) Adding _acall and _astream method that were missing. Preventing streaming during async executions. @rlancemartin.	1 year ago
Blake (Yung Cher Ho)	f4bed8a04c	Takeoff baseurl support (#10091 ) ## Description This PR introduces a minor change to the TitanTakeoff integration. Instead of specifying a port on localhost, this PR will allow users to specify a baseURL instead. This will allow users to use the integration if they have TitanTakeoff deployed externally (not on localhost). This removes the hardcoded reference to localhost "http://localhost:{port}". ### Info about Titan Takeoff Titan Takeoff is an inference server created by [TitanML](https://www.titanml.co/) that allows you to deploy large language models locally on your hardware in a single command. Most generative model architectures are included, such as Falcon, Llama 2, GPT2, T5 and many more. Read more about Titan Takeoff here: - [Blog](https://medium.com/@TitanML/introducing-titan-takeoff-6c30e55a8e1e) - [Docs](https://docs.titanml.co/docs/titan-takeoff/getting-started) ### Dependencies No new dependencies are introduced. However, users will need to install the titan-iris package in their local environment and start the Titan Takeoff inferencing server in order to use the Titan Takeoff integration. Thanks for your help and please let me know if you have any questions. cc: @hwchase17 @baskaryan --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Eddie Cohen	565c021730	Add ne comparator (#10006 ) Description: Adds the not comparator and operator to pinecone, chroma and deeplake. Issue: Not a registered issue but when using a selfqueryretriever with pinecone I got this error + stacktrace when I entered a query that asked to not include specific data: > raised following `error:` > Received unrecognized function ne. Valid functions are [<Operator.AND: 'and'>, <Operator.OR: 'or'>, <Operator.NOT: 'not'>, <Comparator.EQ: 'eq'>, <Comparator.GT: 'gt'>, <Comparator.GTE: 'gte'>, <Comparator.LT: 'lt'>, <Comparator.LTE: 'lte'>] I noticed that chroma and deeplake also support not equals/not filtering so I added it there as well [pinecone](https://docs.pinecone.io/docs/metadata-filtering#metadata-query-language) [chroma](https://docs.trychroma.com/usage-guide#filtering-by-metadata) [deeplake](https://docs.activeloop.ai/enterprise-features/compute-engine/querying-datasets/query-syntax#and-or-not)	1 year ago
Leonid Ganeline	2221194450	`Yahoo Finance News` tool (#10014 ) Added: - the `Yahoo Finance News` tool - Ut-s - An example	1 year ago
Lars von Wedel	6d82503eb1	Add parser and loader for Azure document intelligence service. (#10136 ) Hi, this PR contains loader / parser for Azure Document intelligence which is a ML-based service to ingest arbitrary PDFs / images, even if scanned. The loader generates Documents by pages of the original document. This is my first contribution to LangChain. Unfortunately I could not find the correct place for test cases. Happy to add one if you can point me to the location, but as this is a cloud-based service, a test would require network access and credentials - so might be of limited help. Dependencies: The needed dependency was already part of pyproject.toml, no change. Twitter: feel free to mention @LarsAC on the announcement	1 year ago
Harrison Chase	4abe85be57	Harrison/string inplace (#10153 ) Co-authored-by: Wrick Talukdar <wrick.talukdar@gmail.com> Co-authored-by: Anjan Biswas <anjanavb@amazon.com> Co-authored-by: Jha <nikjha@amazon.com> Co-authored-by: Lucky-Lance <77819606+Lucky-Lance@users.noreply.github.com> Co-authored-by: 陆徐东 <luxudong@MacBook-Pro.local>	1 year ago
Harrison Chase	f5af756397	fake messages list model (#10152 ) create a fake chat model that you can configure with list of messages	1 year ago
Harrison Chase	9e6cc7b236	make hub push public by default (#10138 )	1 year ago
Bagatur	0e4c5dd176	bump 13 (#10130 )	1 year ago
Bagatur	42582adb66	bump 280 (#10117 )	1 year ago
Bagatur	9e196cb470	rm sqlite3 import (#10115 )	1 year ago
Arpan Pokharel	f8bca156d4	Add where filter in weaviate similarity search with score (#9978 ) - Description: Add where filter in weaviate similarity search with score - Issue: #9853 - Dependencies: - - Tag maintainer: - - Twitter handle: -	1 year ago
Leonid Kuligin	30239b3025	added support for inference from Model Garden (#9367 ) #8850 --------- Co-authored-by: Leonid Kuligin <kuligin@google.com>	1 year ago
Benjamin Matson	58d7d86e51	feat: add bedrock chat model (#8017 ) Replace this comment with: - Description: Add Bedrock implementation of Anthropic Claude for Chat - Tag maintainer: @hwchase17, @baskaryan - Twitter handle: @bwmatson --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Massimiliano Pronesti	a7c9bd30d4	feat(llms): add missing params to huggingface text-generation (#9724 ) This small PR aims at supporting the following missing parameters in the `HuggingfaceTextGen` LLM: - `return_full_text` - sometimes useful for completion tasks - `do_sample` - quite handy to control the randomness of the model. - `watermark` @hwchase17 @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
KyrianC	491089754d	EdenAI LLM update. Add models name option (#8963 ) This PR follows the Eden AI (LLM + embeddings) integration. #8633 We added an optional parameter to choose different AI models for providers (like 'text-bison' for provider 'google', 'text-davinci-003' for provider 'openai', etc.). Usage: ```python llm = EdenAI( feature="text", provider="google", params={ "model": "text-bison", # new "temperature": 0.2, "max_tokens": 250, }, ) ``` You can also change the provider + model after initialization ```python llm = EdenAI( feature="text", provider="google", params={ "temperature": 0.2, "max_tokens": 250, }, ) prompt = """ hi """ llm(prompt, providers='openai', model='text-davinci-003') # change provider & model ``` The jupyter notebook as been updated with an example well. Ping: @hwchase17, @baskaryan --------- Co-authored-by: RedhaWassim <rwasssim@gmail.com> Co-authored-by: sam <melaine.samy@gmail.com>	1 year ago
maks-operlejn-ds	b5a74fb973	Temporarily remove language selection (#10097 ) Adapting Microsoft Presidio to other languages requires a bit more work, so for now it will be good idea to remove the language option to choose, so as not to cause errors and confusion. https://microsoft.github.io/presidio/analyzer/languages/ I will handle different languages after the weekend 😄	1 year ago
Bagatur	71c418725f	index rename delete_mode -> cleanup (#10103 )	1 year ago
Nuno Campos	427f696fb0	Nc/runnables seqmap tags (#9753 )	1 year ago
Harrison Chase	d7bf7dc412	add repr for not serializable (#10071 ) Co-authored-by: Nuno Campos <nuno@boringbits.io>	1 year ago
Bagatur	355ff09cce	bump 279 (#10098 )	1 year ago
Pihplipe Oegr	3dafbd852e	Add sqlite-vss as a vector database (#10047 ) This adds sqlite-vss as an option for a vector database. Contains the code and a few tests. Tests are passing and the library sqlite-vss is added as optional as explained in the contributing guidelines. I adjusted the code for lint/black/ and mypy. It looks that everything is currently passing. Adding sqlite-vss was mentioned in this issue: https://github.com/langchain-ai/langchain/issues/1019. Also mentioned here in the sqlite-vss repo for the curious: https://github.com/asg017/sqlite-vss/issues/66 Maintainer tag: @baskaryan --------- Co-authored-by: Philippe Oger <philippe.oger@adevinta.com>	1 year ago
KyrianC	c7a5504789	Add EdenAI Tools (#9764 ) This PR follows the Eden AI (LLM + embeddings) integration. #8633 We added different Tools to empower agents with new capabilities : - text: explicit content detection - image: explicit content detection - image: object detection - OCR: invoice parsing - OCR: ID parsing - audio: speech to text - audio: text to speech We plan to add more in the future (like translation, language detection, + others). Usage: ```python llm=EdenAI(feature="text",provider="openai", params={"temperature" : 0.2,"max_tokens" : 250}) tools = [ EdenAiTextModerationTool(providers=["openai"],language="en"), EdenAiObjectDetectionTool(providers=["google","api4ai"]), EdenAiTextToSpeechTool(providers=["amazon"],language="en",voice="MALE"), EdenAiExplicitImageTool(providers=["amazon","google"]), EdenAiSpeechToTextTool(providers=["amazon"]), EdenAiParsingIDTool(providers=["amazon","klippa"],language="en"), EdenAiParsingInvoiceTool(providers=["amazon","google"],language="en"), ] agent_chain = initialize_agent( tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, return_intermediate_steps=True, ) result = agent_chain(""" i have this text : 'i want to slap you' first : i want to know if this text contains explicit content or not . second : if it does contain explicit content i want to know what is the explicit content in this text, third : i want to make the text into speech . if there is URL in the observations , you will always put it in the output (final answer) . """) ``` output: > Entering new AgentExecutor chain... > I need to extract the information from the ID and then convert it to text and then to speech > Action: edenai_identity_parsing > Action Input: "https://www.citizencard.com/images/citizencard-uk-id-card-2023.jpg" > Observation: last_name : > value : ANGELA > given_names : > value : GREENE > birth_place : > birth_date : > value : 2000-11-09 > issuance_date : > expire_date : > document_id : > issuing_state : > address : > age : > country : > document_type : > value : DRIVER LICENSE FRONT > gender : > image_id : > image_signature : > mrz : > nationality : > Thought: I now need to convert the information to text and then to speech > Action: edenai_text_to_speech > Action Input: "Welcome Angela Greene!" > Observation: https://d14uq1pz7dzsdq.cloudfront.net/0c494819-0bbc-4433-bfa4-6e99bd9747ea_.mp3?Expires=1693316851&Signature=YcMoVQgPuIMEOuSpFuvhkFM8JoBMSoGMcZb7MVWdqw7JEf5~67q9dEI90o5todE5mYXB5zSYoib6rGrmfBl4Rn5~yqDwZ~Tmc24K75zpQZIEyt5~ZSnHuXy4IFWGmlIVuGYVGMGKxTGNeCRNUXDhT6TXGZlr4mwa79Ei1YT7KcNyc1dsTrYB96LphnsqOERx4X9J9XriSwxn70X8oUPFfQmLcitr-syDhiwd9Wdpg6J5yHAJjf657u7Z1lFTBMoXGBuw1VYmyno-3TAiPeUcVlQXPueJ-ymZXmwaITmGOfH7HipZngZBziofRAFdhMYbIjYhegu5jS7TxHwRuox32A__&Key-Pair-Id=K1F55BTI9AHGIK > Thought: I now know the final answer > Final Answer: https://d14uq1pz7dzsdq.cloudfront.net/0c494819-0bbc-4433-bfa4-6e99bd9747ea_.mp3?Expires=1693316851&Signature=YcMoVQgPuIMEOuSpFuvhkFM8JoBMSoGMcZb7MVWdqw7JEf5~67q9dEI90o5todE5mYXB5zSYoib6rGrmfBl4Rn5~yqDwZ~Tmc24K75zpQZIEyt5~ZSnHuXy4IFWGmlIVuGYVGMGKxTGNeCRNUXDhT6TXGZlr4mwa79Ei1YT7KcNyc1dsTrYB96LphnsqOERx4X9J9XriSwxn70X8oUPFfQmLcitr-syDhiwd9Wdpg6J5y > > Finished chain. Other examples are available in the jupyter notebook. This PR is made in parallel with EdenAI LLM update #8963 I apologize for the messy PR. While working in implementing Tools we realized there was a few problems we needed to fix on LLM as well. Ping: @hwchase17, @baskaryan --------- Co-authored-by: RedhaWassim <rwasssim@gmail.com>	1 year ago
Nuno Campos	5569385ee1	Lint	1 year ago
Nuno Campos	e17275ee57	Add root run wrapping call to RunnableEach()	1 year ago
Nuno Campos	63306899a2	PR review suggestions	1 year ago
Nuno Campos	7966af1e9c	Lint	1 year ago
Nuno Campos	4c0e1e501c	Re-implement retry, adding a root run, and implement return_exception for batch() and abatch()	1 year ago
Nuno Campos	0eba80912f	Lint	1 year ago
Nuno Campos	af2e4ce2cd	Use a non-inheritable tag	1 year ago
Nuno Campos	85088dc5df	Lint	1 year ago
Nuno Campos	4eecf90f33	Lint	1 year ago
Nuno Campos	2242e2160f	Lint	1 year ago
Nuno Campos	b2ac835466	Add .with_retry() to Runnables	1 year ago
Nuno Campos	81ebcc161e	Lint	1 year ago
Nuno Campos	fc42726ea0	Styling	1 year ago
Nuno Campos	897f791940	Remove run_id from patch	1 year ago
William Fu-Hinthorn	4d7cd6db5f	add cm	1 year ago
Nuno Campos	f9a845b382	Lint	1 year ago
Nuno Campos	06e89c1caa	Lint	1 year ago
Nuno Campos	738d93215d	Allow patching run_name and max_concurrency	1 year ago
Nuno Campos	9a07032055	Lint	1 year ago
Nuno Campos	5426712311	Adjust merge logic	1 year ago
Nuno Campos	f95bd0bcd9	Fix issue	1 year ago
Nuno Campos	f69155b4f7	Add run_id, run_name to RunnableConfig	1 year ago
Nuno Campos	a3c69cf41d	Add .with_config() method to Runnables which allows binding any config values to a Runnable	1 year ago
jmhayes3	324c86acd5	fix typo in web_research.py (#10076 ) fix spelling	1 year ago
Davide Menini	3f8f3de28e	fix (parsers/json): do not escape double quotes if already escaped (#9916 ) This PR fixes an issues I found when upgrading to a more recent version of Langchain. I was using 0.0.142 before, and this issue popped up already when the `_custom_parser` was added to `output_parsers/json`. Anyway, the issue is that the parser tries to escape quotes when they are double-escaped (e.g. `\\"`), leading to OutputParserException. This is particularly undesired in my app, because I have an Agent that uses a single input Tool, which expects as input a JSON string with the structure: ```python { "foo": string, "bar": string } ``` The LLM (GPT3.5) response is (almost) always something like `"action_input": "{\\"foo\\": \\"bar\\", \\"bar\\": \\"foo\\"}"` and since the upgrade this is not correctly parsed. --------- Co-authored-by: taamedag <Davide.Menini@swisscom.com>	1 year ago
Harrison Chase	566ce06f4a	add async support for tools (#10058 )	1 year ago
Jiří Moravčík	86646ec555	feat: Add `ApifyWrapper` class (#10067 ) If you look at documentation https://python.langchain.com/docs/integrations/tools/apify (or the actual file https://github.com/langchain-ai/langchain/blob/master/docs/extras/integrations/tools/apify.ipynb ), there's a class `ApifyWrapper` mentioned. It seems it got lost in some refactoring, i.e. it does not exist in the codebase ATM. I just propose to add it back. It would fix issues e.g. https://github.com/langchain-ai/langchain/issues/8307 or https://github.com/langchain-ai/langchain/issues/8201 To add, Apify is a wanted integration, e.g. see https://twitter.com/hwchase17/status/1695490295914545626 or https://twitter.com/hwchase17/status/1695470765343461756 Lastly, I offer taking ownership of the Apify-related parts of the codebase, so you can tag me if anything is needed.	1 year ago
Robert Perrotta	02e51f4217	update_forward_refs for Run (#9969 ) Adds a call to Pydantic's `update_forward_refs` for the `Run` class (in addition to the `ChainRun` and `ToolRun` classes, for which that method is already called). Without it, the self-reference of child classes (type `List[Run]`) is problematic. For example: ```python from langchain.callbacks import StdOutCallbackHandler from langchain.chains import LLMChain from langchain.llms import OpenAI from langchain.prompts import PromptTemplate from wandb.integration.langchain import WandbTracer llm = OpenAI() prompt = PromptTemplate.from_template("1 + {number} = ") chain = LLMChain(llm=llm, prompt=prompt, callbacks=[StdOutCallbackHandler(), WandbTracer()]) print(chain.run(number=2)) ``` results in the following output before the change ``` WARNING:root:Error in on_chain_start callback: field "child_runs" not yet prepared so type is still a ForwardRef, you might need to call Run.update_forward_refs(). > Entering new LLMChain chain... Prompt after formatting: 1 + 2 = WARNING:root:Error in on_chain_end callback: No chain Run found to be traced > Finished chain. 3 ``` but afterwards the callback error messages are gone.	1 year ago
Eugene Yurtsev	74fcfed4e2	lint for pydantic imports (#9937 ) Catch pydantic imports	1 year ago
Zizhong Zhang	641b71e2cd	refactor: rename to OpaquePrompts (#10013 ) Renamed to OpaquePrompts cc @baskaryan Thanks in advance!	1 year ago
Bagatur	19400ba253	bump 278 (#10052 )	1 year ago
Bagatur	29270e0378	fix #3117 (#9957 ) fix #3117	1 year ago
Bagatur	5b913003e0	bump	1 year ago
Bagatur	4b15328767	Add indexing support for postgresql (#9933 ) Add support to postgresql for the SQL Manager Record This code was tested locally. I'm looking at how to add testing with postgres in a separate PR.	1 year ago
Bagatur	e60e1cdf23	fixed openai_functions api_response format args err (#9968 ) root cause: args may not have a key (params) resulting in an error	1 year ago
Bagatur	3efab8d3df	implement vectorstores by tencent vectordb (#9989 ) Hi there！ I'm excited to open this PR to add support for using 'Tencent Cloud VectorDB' as a vector store. Tencent Cloud VectorDB is a fully-managed, self-developed, enterprise-level distributed database service designed for storing, retrieving, and analyzing multi-dimensional vector data. The database supports multiple index types and similarity calculation methods, with a single index supporting vector scales up to 1 billion and capable of handling millions of QPS with millisecond-level query latency. Tencent Cloud VectorDB not only provides external knowledge bases for large models to improve their accuracy, but also has wide applications in AI fields such as recommendation systems, NLP services, computer vision, and intelligent customer service. The PR includes: Implementation of Vectorstore. I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below make format make lint make coverage make test	1 year ago
Bagatur	d43a36c32a	Bagatur/dereference tool schema (#10007 ) fix for #9375	1 year ago
Bagatur	6b5a970949	refactor(document_loaders): abstract page evaluation logic in PlaywrightURLLoader (#9995 ) This PR brings structural updates to `PlaywrightURLLoader`, aiming at making the code more readable and extensible through the abstraction of page evaluation logic. These changes also align this implementation with a similar structure used in LangChain.js. The key enhancements include: 1. Introduction of 'PlaywrightEvaluator', an abstract base class for all evaluators. 2. Creation of 'UnstructuredHtmlEvaluator', a concrete class implementing 'PlaywrightEvaluator', which uses `unstructured` library for processing page's HTML content. 3. Extension of 'PlaywrightURLLoader' constructor to optionally accept an evaluator of the type 'PlaywrightEvaluator'. It defaults to 'UnstructuredHtmlEvaluator' if no evaluator is provided. 4. Refactoring of 'load' and 'aload' methods to use the 'evaluate' and 'evaluate_async' methods of the provided 'PageEvaluator' for page content handling. This update brings flexibility to 'PlaywrightURLLoader' as it can now utilize different evaluators for page processing depending on the requirement. The abstraction also improves code maintainability and readability. Twitter: @ywkim	1 year ago
Bagatur	b1644bc9ad	cr	1 year ago
Hunsmore	13fef1e5d3	add bloomz_7b, llama-2-7b, llama-2-13b, llama-2-70b to ErnieBotChat (#10024 ) - Description: Add bloomz_7b, llama-2-7b, llama-2-13b, llama-2-70b to ErnieBotChat, which only supported ERNIE-Bot-turbo and ERNIE-Bot. - Issue: #10022, - Dependencies: no extra dependencies --------- Co-authored-by: hetianfeng <hetianfeng@meituan.com>	1 year ago
skspark	52a3e8a261	Add integration TCs on bing search (#8068 ) (#10021 ) ## Description Added integration TCs on bing search utility ## Issue #8068 ## Dependencies None	1 year ago
William FH	5341b04d68	Update error message (#9970 ) in evals	1 year ago
William FH	b82ad19ed2	Check memory address (#9971 ) Don't want to dup the collector but can have multiple	1 year ago
Bagatur	e805f8e263	add tests	1 year ago
Bagatur	1f5c579ef4	add	1 year ago
Bagatur	240cc289e6	wip	1 year ago
maks-operlejn-ds	a8f804a618	Add data anonymizer (#9863 ) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Bagatur	b3e3a31240	bump 277 (#9997 )	1 year ago
Bagatur	9828701de1	mv base cache to schema (#9953 ) if you remove all other imports from langchain.init it exposes a circular dep	1 year ago
Christophe Bornet	9870bfb9cd	Add bucket and object key to metadata in S3 loader (#9317 ) - Description: this PR adds `s3_object_key` and `s3_bucket` to the doc metadata when loading an S3 file. This is particularly useful when using `S3DirectoryLoader` to remove the files from the dir once they have been processed (getting the object keys from the metadata `source` field seems brittle) - Dependencies: N/A - Tag maintainer: ? - Twitter handle: _cbornet --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	1 year ago
Eugene Yurtsev	6da158388b	Merge branch 'master' into ywkim/master	1 year ago
Guy Korland	24c0b01c38	Extend the FalkorDB QA demo (#9992 ) - Description: Extend the FalkorDB QA demo - Tag maintainer: @baskaryan	1 year ago
Eugene Yurtsev	588237ef30	Make document serializable, create utility to create a docstore (#9674 ) This PR makes the following changes: 1. Documents become serializable using langhchain serialization 2. Make a utility to create a docstore kw store Will help to address issue here: https://github.com/langchain-ai/langchain/issues/9345	1 year ago
Eugene Yurtsev	e8f29be350	x	1 year ago
Buckler89	a28e888b36	fix call _get_keys for custom_evaluator (#9763 ) In the function _load_run_evaluators the function _get_keys was not called if only custom_evaluators parameter is used - Description: In the function _load_run_evaluators the function _get_keys was not called if only custom_evaluators parameter is used, - Issue: no issue created for this yet, - Dependencies: None, - Tag maintainer: @vowelparrot, - Twitter handle: Buckler89 --------- Co-authored-by: ddroghini <d.droghini@mflgroup.com>	1 year ago
Eugene Yurtsev	cafce9ed23	x	1 year ago
wlleiiwang	8c4e29240c	implement vectorstores by tencent vectordb	1 year ago
Bagatur	2d2b097fab	mv chat history (#9725 )	1 year ago
Bagatur	d762a6b51f	rm mutable defaults (#9974 )	1 year ago
Arjun Aravindan	6a51672164	Update SeleniumURLLoader to use webdriver Service in favor of deprecated executable_path parameter (#9814 ) Description: This commit uses the new Service object in Selenium webdriver as executable_path has been [deprecated and removed in selenium version 4.11.2](`9f5801c82f`) Issue: https://github.com/langchain-ai/langchain/issues/9808 Tag Maintainer: @eyurtsev	1 year ago
William FH	c844aaa7a6	Weakref to tracer (#9954 ) Prevent memory/thread leakage	1 year ago
Jurik-001	a05fed9369	Fix add callbacks to spark_sql due to depreciation of callback_manager (#9831 ) Description: Due to depreciation (regarding to line 109 in [langchain/libs/langchain/langchain/chains/base.py](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/base.py) of callback_manager i replaced several parts Issue: None Dependencies: Maintainer: @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
dafu	c26deb6b38	fixed openai_functions api_response format args err root cause: args may not have a key (params) resulting in an error	1 year ago
axiangcoding	ffa5625134	feat(llms): improve ERNIE-Bot chat model (#9833 ) - Description: improve ERNIE-Bot chat model, add request timeout and more testcases. - Issue: None - Dependencies: None - Tag maintainer: @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Bagatur	d966ba63e2	fixed GoogleCloudEnterpriseSearchRetriever returning an empty array (#9858 ) `GoogleCloudEnterpriseSearchRetriever` returned an empty array of documents earlier, fixed	1 year ago
Bagatur	ec362ecbe2	Fixed regex bug in RetrievalQAWithSources in previous update (#9898 ) - Description: In my previous PR, I had modified the code to catch all kinds of [SOURCES, sources, Source, Sources]. However, this change included checking for a colon or a white space which should actually have been only checking for a colon. - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change,	1 year ago
Nikhil Suresh	56a0165a4e	cleaned up unit test example	1 year ago
William FH	cedfad541d	don't emit none from eval config (#9963 )	1 year ago
Nikhil Suresh	b31475c622	minor updates to regex	1 year ago
Bagatur	8fb0a9594c	Add LLMonitor Callback Handler Integration - open-source observability & analytics (#9870 ) Adds support for [llmonitor](https://llmonitor.com) callbacks. It enables: - Requests tracking / logging / analytics - Error debugging - Cost analytics - User tracking Let me know if anythings neds to be changed for merge. Thank you!	1 year ago
William FH	d799963870	Wfh/async tool (#9878 ) Co-authored-by: Daniel Brenot <dbrenot@pelmorex.com> Co-authored-by: Daniel <daniel.alexander.brenot@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Bagatur	16eb935469	Fix for similarity_search_with_score (#9903 ) - Description: the implementation for similarity_search_with_score did not actually include a score or logic to filter. Now fixed. - Tag maintainer: @rlancemartin - Twitter handle: @ofermend	1 year ago
Bagatur	c70bb0ec28	Activeloopai runtime arg (#9961 )	1 year ago
Bagatur	0f85671630	fmt	1 year ago
Bagatur	78c014399f	fmt	1 year ago
Eugene Yurtsev	5cce6529a4	Speed up openai tests (#9943 ) Saves ~8-10 seconds from total unit tests times --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Guy Korland	7cbe872af8	Add support for Falkordb (ex-RedisGraph) (#9821 ) Replace this entire comment with: - Description: Add support for Falkordb (ex-RedisGraph) - Tag maintainer: @hwchase17 - Twitter handle: @g_korland	1 year ago
Bagatur	9f2d908316	cr	1 year ago
Bagatur	3c1547925a	fix	1 year ago
William FH	fbd792ac7c	Fix import (#9945 )	1 year ago
Zizhong Zhang	8bd7a9d18e	feat: PromptGuard takes a list of str (#9948 ) Recently we made the decision that PromptGuard takes a list of strings instead of a string. @ggroode implemented the integration change. --------- Co-authored-by: ggroode <ggroode@berkeley.edu> Co-authored-by: ggroode <46691276+ggroode@users.noreply.github.com>	1 year ago
Predrag Gruevski	8dbf4cbe80	Add notice about security-sensitive experimental code to experimental README. (#9936 ) It renders like this: https://github.com/langchain-ai/langchain/tree/pg/experimental-readme/libs/experimental ![image](https://github.com/langchain-ai/langchain/assets/2348618/a5f9569d-96f6-44c6-8559-921adb3e337d)	1 year ago
Predrag Gruevski	b5cd1e0fed	Add security notices on PAL and CPAL experimental chains. (#9938 ) Clearly document that the PAL and CPAL techniques involve generating code, and that such code must be properly sandboxed and given appropriate narrowly-scoped credentials in order to ensure security. While our implementations include some mitigations, Python and SQL sandboxing is well-known to be a very hard problem and our mitigations are no replacement for proper sandboxing and permissions management. The implementation of such techniques must be performed outside the scope of the Python process where this package's code runs, so its correct setup and administration must therefore be the responsibility of the user of this code.	1 year ago
Jan-Luca Barthel	f5faac8859	addition of cosine distance function for faiss (#9939 ) - Description: added the _cosine_relevance_score_fn to _select_relevance_score_fn of faiss.py to enable the use of cosine distance for similarity for this vector store and to comply with the Error Message, that implies, that cosine should be a valid distance strategy - Issue: no relevant Issue found, but needed this function myself and tested it in a private repo - Dependencies: none	1 year ago
Eugene Yurtsev	880bf06290	x	1 year ago
Eugene Yurtsev	9efc29e3d1	x	1 year ago
Bagatur	d6957921f0	bump 276 (#9931 )	1 year ago
Tomaz Bratanic	db13fba7ea	Add neo4j vector support (#9770 ) Neo4j has added vector index integration just recently. To allow both ingestion and integrating it as vector RAG applications, I wrapped it as a vector store as the implementation is completely different from `GraphCypherQAChain`. Here, we are not generating any Cypher statements at query time, we are simply doing the vector similarity search using the new vector index as if we were dealing with a vector database. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Bagatur	49ebbe4bcd	fix pydantic import (#9930 )	1 year ago
Mike Nitsenko	c80e406e95	Cube semantic loader: allow cubes processing (#9927 ) We've started to receive feedback (after launch) that using only views is confusing. We're considering this as a good practice, as a view serves as a "facade" for your data - however, we decided to let users decide this on their own. Solves the questions from: - https://github.com/cube-js/cube/issues/7028 - https://github.com/langchain-ai/langchain/pull/9690	1 year ago
Nikhil Suresh	dd10cf945c	fixed minor linting issues	1 year ago
adilkhan	bbae8cb88f	Added runtime argument	1 year ago
Ofer Mendelevitch	4454204455	reformat black	1 year ago
Ofer Mendelevitch	318a21e267	fixed typo in spelling	1 year ago
hughcrt	e71f4760db	Change multiline comment width	1 year ago
Ofer Mendelevitch	a5450be32e	fixed lint	1 year ago
Ofer Mendelevitch	8b8d2a6535	fixed similarity_search_with_score to really use a score updated unit test with a test for score threshold Updated demo notebook	1 year ago
hughcrt	7979cef06a	Replace `\|` by `Union`	1 year ago
Nikhil Suresh	23ef836b48	matches colon and any number of white spaces after colon	1 year ago
Nikhil Suresh	64eb5a6082	removed unnecessary white space in regex that breaks qa with sources chain	1 year ago
Nikhil Suresh	8a4670e127	updated formatting changes	1 year ago
Nikhil Suresh	b1f649bca5	fixed issue with white space and added unit tests	1 year ago
Nikhil Suresh	6d3485e798	fixed regex to match sources for all cases, also includes source	1 year ago
Predrag Gruevski	47499c6db4	Avoid `type: ignore` suppression by adding mypy type hint. (#9881 ) Mypy was not able to determine a good type for `type_to_loader_dict`, since the values in the dict are functions whose return types are related to each other in a complex way. One can see this by adding a line like `reveal_type(type_to_loader_dict)` and running mypy, which will get mypy to show what type it has inferred for that value. Adding an explicit type hint to help out mypy avoids the need for a mypy suppression and allows the code to type-check cleanly.	1 year ago
maks-operlejn-ds	f327535eda	Add conftest file to langchain experimental (#9886 ) In order to use `requires` marker in langchain-experimental, there's a need for conftest.py file inside. Everything is identical to the main langchain module. Co-authored-by: maks-operlejn-ds <maks.operlejn@gmail.com>	1 year ago
William FH	907c57e324	Add collect_runs callback (#9885 )	1 year ago
William FH	3103f07e03	Use existing required args obj if specified (#9883 ) We always overwrote the required args but we infer them by default. Doing it only the old way makes it so the llm guesses even if an arg is optional (e.g., for uuids)	1 year ago
William FH	b14d74dd4d	iMessage loader (#9832 ) Add an iMessage chat loader	1 year ago
Predrag Gruevski	eb3d1fa93c	Add security warning to experimental `SQLDatabaseChain` class. (#9867 ) The most reliable way to not have a chain run an undesirable SQL command is to not give it database permissions to run that command. That way the database itself performs the rule enforcement, so it's much easier to configure and use properly than anything we could add in ourselves.	1 year ago
hughcrt	97741d41c5	Add LLMonitorCallbackHandler	1 year ago
eryk-dsai	7f5713b80a	feat: grammar-based sampling in llama-cpp (#9712 ) ## Description The following PR enables the [grammar-based sampling](https://github.com/ggerganov/llama.cpp/tree/master/grammars) in llama-cpp LLM. In short, loading file with formal grammar definition will constrain model outputs. For instance, one can force the model to generate valid JSON or generate only python lists. In the follow-up PR we will add: * docs with some description why it is cool and how it works * maybe some code sample for some task such as in llama repo --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
William FH	cb642ef658	Return feedback (#9629 ) Return the feedback values in an eval run result Also made a helper method to display as a dataframe but it may be overkill	1 year ago
Bagatur	5e2d0cf54e	bump 275 (#9860 )	1 year ago
Leonid Kuligin	00baddf34c	fixed enterprise search returning an empty array	1 year ago
Eugene Yurtsev	5edf819524	Qdrant Client: Expose instance for creating client (#9706 ) Expose classmethods to convenient initialize the vectostore. The purpose of this PR is to make it easy for users to initialize an empty vectorstore that's properly pre-configured without having to index documents into it via `from_documents`. This will make it easier for users to rely on the following indexing code: https://github.com/langchain-ai/langchain/pull/9614 to help manage data in the qdrant vectorstore.	1 year ago
Harrison Chase	610f46d83a	accept openai terms (#9826 )	1 year ago
Harrison Chase	c1badc1fa2	add gmail loader (#9810 )	1 year ago
Bagatur	0d01cede03	bump 274 (#9805 )	1 year ago
Nikhil Suresh	0da5803f5a	fixed regex to match sources for all cases, also includes source (#9775 ) - Description: Updated the regex to handle all the different cases for string matching (SOURCES, sources, Sources), - Issue: https://github.com/langchain-ai/langchain/issues/9774 - Dependencies: N/A	1 year ago
Sam Partee	a28eea5767	Redis metadata filtering and specification, index customization (#8612 ) ### Description The previous Redis implementation did not allow for the user to specify the index configuration (i.e. changing the underlying algorithm) or add additional metadata to use for querying (i.e. hybrid or "filtered" search). This PR introduces the ability to specify custom index attributes and metadata attributes as well as use that metadata in filtered queries. Overall, more structure was introduced to the Redis implementation that should allow for easier maintainability moving forward. # New Features The following features are now available with the Redis integration into Langchain ## Index schema generation The schema for the index will now be automatically generated if not specified by the user. For example, the data above has the multiple metadata categories. The the following example ```python from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores.redis import Redis embeddings = OpenAIEmbeddings() rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` Loading the data in through this and the other ``from_documents`` and ``from_texts`` methods will now generate index schema in Redis like the following. view index schema with the ``redisvl`` tool. [link](redisvl.com) ```bash $ rvl index info -i users ``` Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|---------------\|-----------------\|------------\| \| users \| HASH \| ['doc:users'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Metadata specification The metadata schema generation has the following rules 1. All text fields are indexed as text fields. 2. All numeric fields are index as numeric fields. If you would like to have a text field as a tag field, users can specify overrides like the following for the example data ```python # this can also be a path to a yaml file index_schema = { "text": [{"name": "user"}, {"name": "job"}], "tag": [{"name": "credit_score"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` This will change the index specification to Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users2 \| HASH \| ['doc:users2'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| and throw a warning to the user (log output) that the generated schema does not match the specified schema. ```text index_schema does not match generated schema from metadata. index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} ``` As long as this is on purpose, this is fine. The schema can be defined as a yaml file or a dictionary ```yaml text: - name: user - name: job tag: - name: credit_score numeric: - name: age ``` and you pass in a path like ```python rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", index_schema=Path("sample1.yml").resolve() ) ``` Which will create the same schema as defined in the dictionary example Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users3 \| HASH \| ['doc:users3'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Vector Indexing Schema Users with large use cases may want to change how they formulate the vector index created by Langchain To utilize all the features of Redis for vector database use cases like this, you can now do the following to pass in index attribute modifiers like changing the indexing algorithm to HNSW. ```python vector_schema = { "algorithm": "HNSW" } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` A more complex example may look like ```python vector_schema = { "algorithm": "HNSW", "ef_construction": 200, "ef_runtime": 20 } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` All names correspond to the arguments you would set if using Redis-py or RedisVL. (put in doc link later) ### Better Querying Both vector queries and Range (limit) queries are now available and metadata is returned by default. The outputs are shown. ```python >>> query = "foo" >>> results = rds.similarity_search(query, k=1) >>> print(results) [Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})] >>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False) >>> print(results) # no metadata, but with scores [(Document(page_content='foo', metadata={}), 7.15255737305e-07)] >>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001) >>> print(len(results)) # range query (only above threshold even if k is higher) 4 ``` ### Custom metadata filtering A big advantage of Redis in this space is being able to do filtering on data stored alongside the vector itself. With the example above, the following is now possible in langchain. The equivalence operators are overridden to describe a new expression language that mimic that of [redisvl](redisvl.com). This allows for arbitrarily long sequences of filters that resemble SQL commands that can be used directly with vector queries and range queries. There are two interfaces by which to do so and both are shown. ```python >>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText >>> age_filter = RedisFilter.num("age") > 18 >>> age_filter = RedisNum("age") > 18 # equivalent >>> results = rds.similarity_search(query, filter=age_filter) >>> print(len(results)) 3 >>> job_filter = RedisFilter.text("job") == "engineer" >>> job_filter = RedisText("job") == "engineer" # equivalent >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # fuzzy match text search >>> job_filter = RedisFilter.text("job") % "eng*" >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # combined filters (AND) >>> combined = age_filter & job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 1 # combined filters (OR) >>> combined = age_filter \| job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 4 ``` All the above filter results can be checked against the data above. ### Other - Issue: #3967 - Dependencies: No added dependencies - Tag maintainer: @hwchase17 @baskaryan @rlancemartin - Twitter handle: @sampartee --------- Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
nikhilkjha	d57d08fd01	Initial commit for comprehend moderator (#9665 ) This PR implements a custom chain that wraps Amazon Comprehend API calls. The custom chain is aimed to be used with LLM chains to provide moderation capability that let’s you detect and redact PII, Toxic and Intent content in the LLM prompt, or the LLM response. The implementation accepts a configuration object to control what checks will be performed on a LLM prompt and can be used in a variety of setups using the LangChain expression language to not only detect the configured info in chains, but also other constructs such as a retriever. The included sample notebook goes over the different configuration options and how to use it with other chains. ### Usage sample ```python from langchain_experimental.comprehend_moderation import BaseModerationActions, BaseModerationFilters moderation_config = { "filters":[ BaseModerationFilters.PII, BaseModerationFilters.TOXICITY, BaseModerationFilters.INTENT ], "pii":{ "action": BaseModerationActions.ALLOW, "threshold":0.5, "labels":["SSN"], "mask_character": "X" }, "toxicity":{ "action": BaseModerationActions.STOP, "threshold":0.5 }, "intent":{ "action": BaseModerationActions.STOP, "threshold":0.5 } } comp_moderation_with_config = AmazonComprehendModerationChain( moderation_config=moderation_config, #specify the configuration client=comprehend_client, #optionally pass the Boto3 Client verbose=True ) template = """Question: {question} Answer:""" prompt = PromptTemplate(template=template, input_variables=["question"]) responses = [ "Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like 323-22-9980. John Doe's phone number is (999)253-9876.", "Final Answer: This is a really shitty way of constructing a birdhouse. This is fucking insane to think that any birds would actually create their motherfucking nests here." ] llm = FakeListLLM(responses=responses) llm_chain = LLMChain(prompt=prompt, llm=llm) chain = ( prompt \| comp_moderation_with_config \| {llm_chain.input_keys[0]: lambda x: x['output'] } \| llm_chain \| { "input": lambda x: x['text'] } \| comp_moderation_with_config ) response = chain.invoke({"question": "A sample SSN number looks like this 123-456-7890. Can you give me some more samples?"}) print(response['output']) ``` ### Output ``` > Entering new AmazonComprehendModerationChain chain... Running AmazonComprehendModerationChain... Running pii validation... Found PII content..stopping.. The prompt contains PII entities and cannot be processed ``` --------- Co-authored-by: Piyush Jain <piyushjain@duck.com> Co-authored-by: Anjan Biswas <anjanavb@amazon.com> Co-authored-by: Jha <nikjha@amazon.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
William FH	1960ac8d25	token chunks (#9739 ) Co-authored-by: Andrew <abatutin@gmail.com>	1 year ago
Bagatur	9731ce5a40	bump 273 (#9751 )	1 year ago
Fabrizio Ruocco	cacaf487c3	Azure Cognitive Search - update sdk b8, mod user agent, search with scores (#9191 ) Description: Update Azure Cognitive Search SDK to version b8 (breaking change) Customizable User Agent. Implemented Similarity search with scores @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Sergey Kozlov	135cb86215	Fix QuestionListOutputParser (#9738 ) This PR fixes `QuestionListOutputParser` text splitting. `QuestionListOutputParser` incorrectly splits numbered list text into lines. If text doesn't end with `\n` , the regex doesn't capture the last item. So it always returns `n - 1` items, and `WebResearchRetriever.llm_chain` generates less queries than requested in the search prompt. How to reproduce: ```python from langchain.retrievers.web_research import QuestionListOutputParser parser = QuestionListOutputParser() good = parser.parse( """1. This is line one. 2. This is line two. """ # <-- ! ) bad = parser.parse( """1. This is line one. 2. This is line two.""" # <-- No new line. ) assert good.lines == ['1. This is line one.\n', '2. This is line two.\n'], good.lines assert bad.lines == ['1. This is line one.\n', '2. This is line two.'], bad.lines ``` NOTE: Last item will not contain a line break but this seems ok because the items are stripped in the `WebResearchRetriever.clean_search_query()`.	1 year ago
Jurik-001	d04fe0d3ea	remove Value error "pyspark is not installed. Please install it with `pip i… (#9723 ) Description: You cannot execute spark_sql with versions prior to 3.4 due to the introduction of pyspark.errors in version 3.4. And if you are below you get 3.4 "pyspark is not installed. Please install it with pip nstall pyspark" which is not helpful. Also if you not have pyspark installed you get already the error in init. I would return all errors. But if you have a different idea feel free to comment. Issue: None Dependencies: None Maintainer: --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Margaret Qian	30151c99c7	Update Mosaic endpoint input/output api (#7391 ) As noted in prior PRs (https://github.com/hwchase17/langchain/pull/6060, https://github.com/hwchase17/langchain/pull/7348), the input/output format has changed a few times as we've stabilized our inference API. This PR updates the API to the latest stable version as indicated in our docs: https://docs.mosaicml.com/en/latest/inference.html The input format looks like this: `{"inputs": [<prompt>]} ` The output format looks like this: ` {"outputs": [<output_text>]} ` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Leonid Kuligin	87da56fb1e	Added a pdf parser based on DocAI (#9579 ) #9578 --------- Co-authored-by: Leonid Kuligin <kuligin@google.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	1 year ago
Naama Magami	adb21782b8	Add del vector pgvector + adding modification time to confluence and google drive docs (#9604 ) Description: - adding implementation of delete for pgvector - adding modification time in docs metadata for confluence and google drive. Issue: https://github.com/langchain-ai/langchain/issues/9312 Tag maintainer: @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	1 year ago
Erick Friis	3e5cda3405	Hub Push Ergonomics (#9731 ) Improves the hub pushing experience, returning a url instead of just a commit hash. Requires hub sdk 0.1.8	1 year ago
Tudor Golubenco	dc30edf51c	Xata as a chat message memory store (#9719 ) This adds Xata as a memory store also to the python version of LangChain, similar to the [one for LangChain.js](https://github.com/hwchase17/langchainjs/pull/2217). I have added a Jupyter Notebook with a simple and a more complex example using an agent. To run the integration test, you need to execute something like: ``` XATA_API_KEY='xau_...' XATA_DB_URL="https://demo-uni3q8.eu-west-1.xata.sh/db/langchain" poetry run pytest tests/integration_tests/memory/test_xata.py ``` Where `langchain` is the database you create in Xata.	1 year ago
William FH	dff00ea91e	Chat Loaders (#9708 ) Still working out interface/notebooks + need discord data dump to test out things other than copy+paste Update: - Going to remove the 'user_id' arg in the loaders themselves and just standardize on putting the "sender" arg in the extra kwargs. Then can provide a utility function to map these to ai and human messages - Going to move the discord one into just a notebook since I don't have a good dump to test on and copy+paste maybe isn't the greatest thing to support in v0 - Need to do more testing on slack since it seems the dump only includes channels and NOT 1 on 1 convos - --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Bagatur	0f48e6c36e	fix integration deps (#9722 )	1 year ago
Bagatur	a0800c9f15	rm google api core and add more dependency testing (#9721 )	1 year ago
Andrew White	2bcf581a23	Added search parameters to qdrant max_marginal_relevance_search (#7745 ) Adds the qdrant search filter/params to the `max_marginal_relevance_search` method, which is present on others. I did not add `offset` for pagination, because it's behavior would be ambiguous in this setting (since we fetch extra and down-select). --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Kacper Łukawski <lukawski.kacper@gmail.com>	1 year ago
Tomaz Bratanic	dacf96895a	Add the option to use separate LLMs for GraphCypherQA chain (#9689 ) The Graph Chains are different in the way that it uses two LLMChains instead of one like the retrievalQA chains. Therefore, sometimes you want to use different LLM to generate the database query and to generate the final answer. This feature would make it more convenient to use different LLMs in the same chain. I have also renamed the Graph DB QA Chain to Neo4j DB QA Chain in the documentation only as it is used only for Neo4j. The naming was ambigious as it was the first graphQA chain added and wasn't sure how do you want to spin it.	1 year ago
Bagatur	f5ea725796	bump 272 (#9704 )	1 year ago
Patrick Loeber	6bedfdf25a	Fix docs for AssemblyAIAudioTranscriptLoader (shorter import path) (#9687 ) Uses the shorter import path `from langchain.document_loaders import` instead of the full path `from langchain.document_loaders.assemblyai` Applies those changes to the docs and the unit test. See #9667 that adds this new loader.	1 year ago
Nuno Campos	78ffcdd9a9	Lint	1 year ago
Nuno Campos	20d2c0571c	Do not share executors between parent and child tasks	1 year ago
Harrison Chase	9963b32e59	Harrison/multi vector (#9700 )	1 year ago
Leonid Ganeline	c19888c12c	⏳ docstrings: `vectorstores` consistency (#9349 ) ⏳ - updated the top-level descriptions to a consistent format; - changed several `ValueError` to `ImportError` in the import cases; - changed the format of several internal functions from "name" to "_name". So, these functions are not shown in the Top-level API Reference page (with lists of classes/functions)	1 year ago
Kim Minjong	d0ff0db698	Update ChatOpenAI._stream to respect finish_reason (#9672 ) Currently, ChatOpenAI._stream does not reflect finish_reason to generation_info. Change it to reflect that. Same patch as https://github.com/langchain-ai/langchain/pull/9431 , but also applies to _stream.	1 year ago
Patrick Loeber	5990651070	Add new document_loader: AssemblyAIAudioTranscriptLoader (#9667 ) This PR adds a new document loader `AssemblyAIAudioTranscriptLoader` that allows to transcribe audio files with the [AssemblyAI API](https://www.assemblyai.com) and loads the transcribed text into documents. - Add new document_loader with class `AssemblyAIAudioTranscriptLoader` - Add optional dependency `assemblyai` - Add unit tests (using a Mock client) - Add docs notebook This is the equivalent to the JS integration already available in LangChain.js. See the [LangChain JS docs AssemblyAI page](https://js.langchain.com/docs/modules/data_connection/document_loaders/integrations/web_loaders/assemblyai_audio_transcription). At its simplest, you can use the loader to get a transcript back from an audio file like this: ```python from langchain.document_loaders.assemblyai import AssemblyAIAudioTranscriptLoader loader = AssemblyAIAudioTranscriptLoader(file_path="./testfile.mp3") docs = loader.load() ``` To use it, it needs the `assemblyai` python package installed, and the environment variable `ASSEMBLYAI_API_KEY` set with your API key. Alternatively, the API key can also be passed as an argument. Twitter handles to shout out if so kindly 🙇 [@AssemblyAI](https://twitter.com/AssemblyAI) and [@patloeber](https://twitter.com/patloeber) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	1 year ago
Eugene Yurtsev	9e1dbd4b49	x	1 year ago
Eugene Yurtsev	b88dfcb42a	Add indexing support (#9614 ) This PR introduces a persistence layer to help with indexing workflows into vectostores. The indexing code helps users to: 1. Avoid writing duplicated content into the vectostore 2. Avoid over-writing content if it's unchanged Importantly, this keeps on working even if the content being written is derived via a set of transformations from some source content (e.g., indexing children documents that were derived from parent documents by chunking.) The two main components are: 1. Persistence layer that keeps track of which keys were updated and when. Keeping track of the timestamp of updates, allows to clean up old content safely, and with minimal complexity. 2. HashedDocument which is used to hash the contents (including metadata) of the documents. We rely on the hashes for identifying duplicates. The indexing code works with ANY document loader. To add transformations to the documents, users for now can add a custom document loader that composes an existing loader together with document transformers. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
刘方瑞	c215481531	Update default index type and metric type for MyScale vector store (#9353 ) We update the default index type from `IVFFLAT` to `MSTG`, a new vector type developed by MyScale.	1 year ago
Joshua Sundance Bailey	a9c86774da	Anthropic: Allow the use of kwargs consistent with ChatOpenAI. (#9515 ) - Description: ~~Creates a new root_validator in `_AnthropicCommon` that allows the use of `model_name` and `max_tokens` keyword arguments.~~ Adds pydantic field aliases to support `model_name` and `max_tokens` as keyword arguments. Ultimately, this makes `ChatAnthropic` more consistent with `ChatOpenAI`, making the two classes more interchangeable for the developer. - Issue: https://github.com/langchain-ai/langchain/issues/9510 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Bagatur	342087bdfa	fix integration test imports (#9669 )	1 year ago
Keras Conv3d	cbaea8d63b	tair fix distance_type error, and add hybrid search (#9531 ) - fix: distance_type error, - feature: Tair add hybrid search --------- Co-authored-by: thw <hanwen.thw@alibaba-inc.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago

... 2 3 4 5 6 ...

901 Commits (71025013f85e1954abd2bd340d46728c177b7269)