langchain

Commit Graph

Author	SHA1	Message	Date
Lars von Wedel	6d82503eb1	Add parser and loader for Azure document intelligence service. (#10136 ) Hi, this PR contains loader / parser for Azure Document intelligence which is a ML-based service to ingest arbitrary PDFs / images, even if scanned. The loader generates Documents by pages of the original document. This is my first contribution to LangChain. Unfortunately I could not find the correct place for test cases. Happy to add one if you can point me to the location, but as this is a cloud-based service, a test would require network access and credentials - so might be of limited help. Dependencies: The needed dependency was already part of pyproject.toml, no change. Twitter: feel free to mention @LarsAC on the announcement	1 year ago
Harrison Chase	4abe85be57	Harrison/string inplace (#10153 ) Co-authored-by: Wrick Talukdar <wrick.talukdar@gmail.com> Co-authored-by: Anjan Biswas <anjanavb@amazon.com> Co-authored-by: Jha <nikjha@amazon.com> Co-authored-by: Lucky-Lance <77819606+Lucky-Lance@users.noreply.github.com> Co-authored-by: 陆徐东 <luxudong@MacBook-Pro.local>	1 year ago
Nino Risteski	0c0a7d19eb	Update openai_multi_functions_agent.ipynb (#10144 ) typo fix	1 year ago
Nino Risteski	f968b86652	Update apis.ipynb (#10145 ) few typo fixes	1 year ago
Guy Korland	765ef3b486	Add FalkorDB to imports (#10151 )	1 year ago
Nino Risteski	746c6ff9c3	Update index.mdx (#10142 ) fixed typos	1 year ago
Nino Risteski	fdebd3e02f	Update chat_vector_db.mdx (#10141 ) typo fix	1 year ago
Leonid Kuligin	30239b3025	added support for inference from Model Garden (#9367 ) #8850 --------- Co-authored-by: Leonid Kuligin <kuligin@google.com>	1 year ago
Leonid Ganeline	54a8df87b9	📖 docs: fixed `integration/llms` navbar (#9277 ) Fixed navbar: - renamed several files, so ToC is sorted correctly - made ToC items consistent: formatted several Titles - added several links - reformatted several docs to a consistent format - renamed several files (removed `_example` suffix) - added renamed files to the `docs/docs_skeleton/vercel.json`	1 year ago
Bagatur	b485c3048b	rm base64 images from docs (#10110 ) Causing problems indexing docs and notebook images don't render after markdown conversion anyways	1 year ago
William FH	f2fc4173c3	Update redirects meta tags (#10109 )	1 year ago
Leonid Ganeline	37e435bd00	docs: `youtube_search` tool example update (#9958 ) Added a link to source package; updated title, description.	1 year ago
Leonid Ganeline	3b8ee74e38	docs: `google-drive-tool` example fix (#10000 ) This notebook was mistakenly placed in the `toolkits` folder and appears within `Agents & Toolkits` menu. But it should be in `Tools`. Moved example into `tools/`; updated title to consistent format.	1 year ago
seamusp	afd96b2460	docs: agents & callbacks fixes (#10066 ) Various improvements to the Agents & Callbacks sections of the documentation including formatting, spelling, and grammar fixes to improve readability.	1 year ago
Benjamin Matson	58d7d86e51	feat: add bedrock chat model (#8017 ) Replace this comment with: - Description: Add Bedrock implementation of Anthropic Claude for Chat - Tag maintainer: @hwchase17, @baskaryan - Twitter handle: @bwmatson --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
KyrianC	491089754d	EdenAI LLM update. Add models name option (#8963 ) This PR follows the Eden AI (LLM + embeddings) integration. #8633 We added an optional parameter to choose different AI models for providers (like 'text-bison' for provider 'google', 'text-davinci-003' for provider 'openai', etc.). Usage: ```python llm = EdenAI( feature="text", provider="google", params={ "model": "text-bison", # new "temperature": 0.2, "max_tokens": 250, }, ) ``` You can also change the provider + model after initialization ```python llm = EdenAI( feature="text", provider="google", params={ "temperature": 0.2, "max_tokens": 250, }, ) prompt = """ hi """ llm(prompt, providers='openai', model='text-davinci-003') # change provider & model ``` The jupyter notebook as been updated with an example well. Ping: @hwchase17, @baskaryan --------- Co-authored-by: RedhaWassim <rwasssim@gmail.com> Co-authored-by: sam <melaine.samy@gmail.com>	1 year ago
Bagatur	71c418725f	index rename delete_mode -> cleanup (#10103 )	1 year ago
Bagatur	b927277809	Bagatur/eden type 2 (#10102 )	1 year ago
Bagatur	d4380339c1	eden tool nb nit (#10101 )	1 year ago
KyrianC	c7a5504789	Add EdenAI Tools (#9764 ) This PR follows the Eden AI (LLM + embeddings) integration. #8633 We added different Tools to empower agents with new capabilities : - text: explicit content detection - image: explicit content detection - image: object detection - OCR: invoice parsing - OCR: ID parsing - audio: speech to text - audio: text to speech We plan to add more in the future (like translation, language detection, + others). Usage: ```python llm=EdenAI(feature="text",provider="openai", params={"temperature" : 0.2,"max_tokens" : 250}) tools = [ EdenAiTextModerationTool(providers=["openai"],language="en"), EdenAiObjectDetectionTool(providers=["google","api4ai"]), EdenAiTextToSpeechTool(providers=["amazon"],language="en",voice="MALE"), EdenAiExplicitImageTool(providers=["amazon","google"]), EdenAiSpeechToTextTool(providers=["amazon"]), EdenAiParsingIDTool(providers=["amazon","klippa"],language="en"), EdenAiParsingInvoiceTool(providers=["amazon","google"],language="en"), ] agent_chain = initialize_agent( tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, return_intermediate_steps=True, ) result = agent_chain(""" i have this text : 'i want to slap you' first : i want to know if this text contains explicit content or not . second : if it does contain explicit content i want to know what is the explicit content in this text, third : i want to make the text into speech . if there is URL in the observations , you will always put it in the output (final answer) . """) ``` output: > Entering new AgentExecutor chain... > I need to extract the information from the ID and then convert it to text and then to speech > Action: edenai_identity_parsing > Action Input: "https://www.citizencard.com/images/citizencard-uk-id-card-2023.jpg" > Observation: last_name : > value : ANGELA > given_names : > value : GREENE > birth_place : > birth_date : > value : 2000-11-09 > issuance_date : > expire_date : > document_id : > issuing_state : > address : > age : > country : > document_type : > value : DRIVER LICENSE FRONT > gender : > image_id : > image_signature : > mrz : > nationality : > Thought: I now need to convert the information to text and then to speech > Action: edenai_text_to_speech > Action Input: "Welcome Angela Greene!" > Observation: https://d14uq1pz7dzsdq.cloudfront.net/0c494819-0bbc-4433-bfa4-6e99bd9747ea_.mp3?Expires=1693316851&Signature=YcMoVQgPuIMEOuSpFuvhkFM8JoBMSoGMcZb7MVWdqw7JEf5~67q9dEI90o5todE5mYXB5zSYoib6rGrmfBl4Rn5~yqDwZ~Tmc24K75zpQZIEyt5~ZSnHuXy4IFWGmlIVuGYVGMGKxTGNeCRNUXDhT6TXGZlr4mwa79Ei1YT7KcNyc1dsTrYB96LphnsqOERx4X9J9XriSwxn70X8oUPFfQmLcitr-syDhiwd9Wdpg6J5yHAJjf657u7Z1lFTBMoXGBuw1VYmyno-3TAiPeUcVlQXPueJ-ymZXmwaITmGOfH7HipZngZBziofRAFdhMYbIjYhegu5jS7TxHwRuox32A__&Key-Pair-Id=K1F55BTI9AHGIK > Thought: I now know the final answer > Final Answer: https://d14uq1pz7dzsdq.cloudfront.net/0c494819-0bbc-4433-bfa4-6e99bd9747ea_.mp3?Expires=1693316851&Signature=YcMoVQgPuIMEOuSpFuvhkFM8JoBMSoGMcZb7MVWdqw7JEf5~67q9dEI90o5todE5mYXB5zSYoib6rGrmfBl4Rn5~yqDwZ~Tmc24K75zpQZIEyt5~ZSnHuXy4IFWGmlIVuGYVGMGKxTGNeCRNUXDhT6TXGZlr4mwa79Ei1YT7KcNyc1dsTrYB96LphnsqOERx4X9J9XriSwxn70X8oUPFfQmLcitr-syDhiwd9Wdpg6J5y > > Finished chain. Other examples are available in the jupyter notebook. This PR is made in parallel with EdenAI LLM update #8963 I apologize for the messy PR. While working in implementing Tools we realized there was a few problems we needed to fix on LLM as well. Ping: @hwchase17, @baskaryan --------- Co-authored-by: RedhaWassim <rwasssim@gmail.com>	1 year ago
Bagatur	5f1c67b47c	Mv LCEL docs up a level (#10073 )	1 year ago
Harrison Chase	ad9e242a7a	add snippet for max concurrency (#9892 )	1 year ago
Stefano Lottini	c710c7303f	fix wrong import line in cassandra doc page for vector store (#10041 ) This fixes the exampe import line in the general "cassandra" doc page mdx file. (it was erroneously a copy of the chat message history import statement found below).	1 year ago
Jon Bennion	cc6a20d3e6	updated prompt name in documentation for sequential chain (#10048 ) Description: updated the prompt name in a sequential chain example so that it is not overwritten by the same prompt name in the next chain (this is a sequential chain example) Issue: n/a Dependencies: none Tag maintainer: not known Twitter handle: not on twitter, feel free to use my git username for anything	1 year ago
Zizhong Zhang	641b71e2cd	refactor: rename to OpaquePrompts (#10013 ) Renamed to OpaquePrompts cc @baskaryan Thanks in advance!	1 year ago
Bagatur	8d66b00c73	Data anonymizer notebook nit (#10062 )	1 year ago
Bagatur	3efab8d3df	implement vectorstores by tencent vectordb (#9989 ) Hi there！ I'm excited to open this PR to add support for using 'Tencent Cloud VectorDB' as a vector store. Tencent Cloud VectorDB is a fully-managed, self-developed, enterprise-level distributed database service designed for storing, retrieving, and analyzing multi-dimensional vector data. The database supports multiple index types and similarity calculation methods, with a single index supporting vector scales up to 1 billion and capable of handling millions of QPS with millisecond-level query latency. Tencent Cloud VectorDB not only provides external knowledge bases for large models to improve their accuracy, but also has wide applications in AI fields such as recommendation systems, NLP services, computer vision, and intelligent customer service. The PR includes: Implementation of Vectorstore. I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below make format make lint make coverage make test	1 year ago
Bagatur	b1644bc9ad	cr	1 year ago
Cameron Vetter	e37d51cab6	fix scoring profile example (#10016 ) - Description: A change in the documentation example for Azure Cognitive Vector Search with Scoring Profile so the example works as written - Issue: #10015 - Dependencies: None - Tag maintainer: @baskaryan @ruoccofabrizio - Twitter handle: @poshporcupine	1 year ago
Hyeokjun seo	e2e05ad89e	Fix Typo : `openai_api_key` -> `serpapi_api_key` (#10020 ) Fixed typo in the comments Notebook. (which says `openai_api_key` for SerpAPI)	1 year ago
Tomaz Bratanic	f2e8399cc8	Fix link in Neo4j provider page (#10023 )	1 year ago
Bagatur	7fa82900cb	guides docs nits (#10005 )	1 year ago
Bagatur	2f03e71e67	rename local llm guide (#10004 )	1 year ago
Bagatur	781f274d19	make privacy guide section (#10003 )	1 year ago
maks-operlejn-ds	a8f804a618	Add data anonymizer (#9863 ) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Bagatur	98cce7dcd3	update moderation docs (#10002 )	1 year ago
Christophe Bornet	9870bfb9cd	Add bucket and object key to metadata in S3 loader (#9317 ) - Description: this PR adds `s3_object_key` and `s3_bucket` to the doc metadata when loading an S3 file. This is particularly useful when using `S3DirectoryLoader` to remove the files from the dir once they have been processed (getting the object keys from the metadata `source` field seems brittle) - Dependencies: N/A - Tag maintainer: ? - Twitter handle: _cbornet --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	1 year ago
Guy Korland	24c0b01c38	Extend the FalkorDB QA demo (#9992 ) - Description: Extend the FalkorDB QA demo - Tag maintainer: @baskaryan	1 year ago
wlleiiwang	8c4e29240c	implement vectorstores by tencent vectordb	1 year ago
Leonid Ganeline	d03d6f6fd9	Merge branch 'master' into docs-tools-menu	1 year ago
Bagatur	8fb0a9594c	Add LLMonitor Callback Handler Integration - open-source observability & analytics (#9870 ) Adds support for [llmonitor](https://llmonitor.com) callbacks. It enables: - Requests tracking / logging / analytics - Error debugging - Cost analytics - User tracking Let me know if anythings neds to be changed for merge. Thank you!	1 year ago
leo-gan	8c1678a8c7	Updated titles, descriptions.	1 year ago
Bagatur	7bba1d911b	Fix typo in code_understanding.ipynb (#9899 ) seperate -> separate	1 year ago
Bagatur	2e65434568	docs: Fix the syntax error, replace "dotenv.load_env()" with "dotenv.… (#9900 ) Description: The documents incorrectly mentions "dotenv.load_env()", but it should actually be "dotenv.load_dotenv()". You can see the screenshot below for reference: python-dotenv: 1.0.0 ![image](https://github.com/langchain-ai/langchain/assets/2959046/94dc4b51-cc2f-412d-92e9-16b8ff0d513e)	1 year ago
Bagatur	b416f5c0c8	fix a link name format to the dependents document (#9928 )	1 year ago
Bagatur	8f199239b8	docs: `llms/google vertex AI` example update (#9960 ) Updated title, description, added sections.	1 year ago
Bagatur	2a03a0087d	docs: `memory` menu (#9947 ) The [Memory](https://python.langchain.com/docs/modules/memory/) menu is clogged with unnecessary wording. I've made it more concise by simplifying titles of the example notebooks. As results, menu is shorter and better for comprehend.	1 year ago
Bagatur	f7cc125cac	docs: `memory types` menu (#9949 ) The [Memory Types](https://python.langchain.com/docs/modules/memory/types/) menu is clogged with unnecessary wording. I've made it more concise by simplifying titles of the example notebooks. As results, menu is shorter and better for comprehend.	1 year ago
Bagatur	16eb935469	Fix for similarity_search_with_score (#9903 ) - Description: the implementation for similarity_search_with_score did not actually include a score or logic to filter. Now fixed. - Tag maintainer: @rlancemartin - Twitter handle: @ofermend	1 year ago
Fredrik Gullberg	f69d236a4a	docs: Fix spelling mistakes in apis.ipynb (#9911 ) - Description: Fix spelling mistakes in apis.ipynb - Issue: [#9910](https://github.com/langchain-ai/langchain/issues/9910) Co-authored-by: Fredrik Gullberg <fredrik.gullberg@klarna.com>	1 year ago
Nate Nethercott	0024824a6e	docs: Fix spelling mistakes in retrievers/get_started.mdx (#9920 ) Description: Fix spelling mistakes in retrievers/get_started.mdx	1 year ago
leo-gan	210de0c66b	Updated title, description, added sections	1 year ago
Cameron Hutchison	bcc3463ff4	docs: Azure AD Authentication for Azure OpenAI (#9951 ) # Description This PR adds additional documentation on how to use Azure Active Directory to authenticate to an OpenAI service within Azure. This method of authentication allows organizations with more complex security requirements to use Azure OpenAI. # Issue N/A # Dependencies N/A # Twitter https://twitter.com/CamAHutchison	1 year ago
Guy Korland	7cbe872af8	Add support for Falkordb (ex-RedisGraph) (#9821 ) Replace this entire comment with: - Description: Add support for Falkordb (ex-RedisGraph) - Tag maintainer: @hwchase17 - Twitter handle: @g_korland	1 year ago
Bagatur	ede45f535e	fix intro docs (#9950 )	1 year ago
Leonid Ganeline	393816e7bd	Merge branch 'master' into docs-memory-type-menu	1 year ago
Corvus Lee	0fb95ebe66	Docs: enrich SageMaker endpoint embeddings with docstrings and examples (#9924 ) Description: added comments to address the relationship between input/output transformations and the customised inference.py script.	1 year ago
leo-gan	7c7ae34eeb	updated .mdx titles and text.	1 year ago
leo-gan	d578efba35	updated notebook titles and text.	1 year ago
Leonid Ganeline	4b6e41a939	Merge branch 'master' into docs-memory-menu	1 year ago
Tomaz Bratanic	6092422e10	Add neo4j provider page (#9941 )	1 year ago
leo-gan	c906041aa8	updated notebook titles and text.	1 year ago
Tomaz Bratanic	db13fba7ea	Add neo4j vector support (#9770 ) Neo4j has added vector index integration just recently. To allow both ingestion and integrating it as vector RAG applications, I wrapped it as a vector store as the implementation is completely different from `GraphCypherQAChain`. Here, we are not generating any Cypher statements at query time, we are simply doing the vector similarity search using the new vector index as if we were dealing with a vector database. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Tudor Golubenco	171b0b183b	Pre-release Xata version no longer required (#9915 ) Tiny PR: Since we've released version 1.0.0 of the python SDK, we no longer need to specify the pre-release version when pip installing.	1 year ago
Mike Nitsenko	c80e406e95	Cube semantic loader: allow cubes processing (#9927 ) We've started to receive feedback (after launch) that using only views is confusing. We're considering this as a good practice, as a view serves as a "facade" for your data - however, we decided to let users decide this on their own. Solves the questions from: - https://github.com/cube-js/cube/issues/7028 - https://github.com/langchain-ai/langchain/pull/9690	1 year ago
LiaoKong	8f8455b24d	fix a link name format to the dependents document	1 year ago
Ofer Mendelevitch	8b8d2a6535	fixed similarity_search_with_score to really use a score updated unit test with a test for score threshold Updated demo notebook	1 year ago
Ikko Eltociear Ashimine	766bbd6c6b	Fix typo in code_understanding.ipynb seperate -> separate	1 year ago
tongtie	82a3c2a557	docs: Fix the syntax error, replace "dotenv.load_env()" with "dotenv.load_dotenv()".	1 year ago
Mazhar (Taha) Mumbaiwala	e80834d783	docs: Fix spelling mistakes in Etherscan.ipynb (#9845 )	1 year ago
Philippe PRADOS	7fdb7439e0	Update google drive notebooks (#9851 ) Update google drive doc loader and retriever notebooks. Show how to use with langchain-googledrive package. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Xiaobing Mi	5d47833ae1	Fix typo in web_scraping.ipynb (#9835 )	1 year ago
Leonid Ganeline	b1bffea9c7	docs: fix for title of `llm_caching` nb (#9891 ) Fixed title for the `extras/integrations/llms/llm_caching.ipynb`. Existing title breaks the sorted order of items in the navbar. Updated some formatting.	1 year ago
Leonid Ganeline	e01b00aa54	docs: `ainetwork` update (#9871 ) * Added links to the AI Network * Made title consistent to other tool kits * Added `integrations/providers/` integration card page * No changes in the example code!	1 year ago
Leonid Ganeline	cf122b6269	docs: `Infino` example fix (#9888 ) - Fixed a broken link in the `integrations/providers/infino.mdx` - Fixed a title in the `integration/collbacks/infino.ipynb` example - Updated text format in this example.	1 year ago
Piyush Jain	fe1b9ee6b8	Updated notebook for comprehend moderation (#9875 ) ### Description Updated the notebook for comprehend moderation. cc @baskaryan	1 year ago
William FH	b14d74dd4d	iMessage loader (#9832 ) Add an iMessage chat loader	1 year ago
Lance Martin	8393ba9dab	Add instructions for GGUF (#9874 ) llama.cpp migrated to GGUF model format, and new releases (e.g., [here](https://huggingface.co/TheBloke)) now use GGUF.	1 year ago
hughcrt	3a4d4c940c	Change video width	1 year ago
hughcrt	97741d41c5	Add LLMonitorCallbackHandler	1 year ago
eryk-dsai	7f5713b80a	feat: grammar-based sampling in llama-cpp (#9712 ) ## Description The following PR enables the [grammar-based sampling](https://github.com/ggerganov/llama.cpp/tree/master/grammars) in llama-cpp LLM. In short, loading file with formal grammar definition will constrain model outputs. For instance, one can force the model to generate valid JSON or generate only python lists. In the follow-up PR we will add: * docs with some description why it is cool and how it works * maybe some code sample for some task such as in llama repo --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Harrison Chase	c1badc1fa2	add gmail loader (#9810 )	1 year ago
Vikas Sheoran	63921e327d	docs: Fix a spelling mistake in adding_memory.ipynb (#9794 ) # Description This pull request fixes a small spelling mistake found while reading docs.	1 year ago
Rosário P. Fernandes	aab01b55db	typo: funtions --> functions (#9784 ) Minor typo in the extractions use-case	1 year ago
Sam Partee	a28eea5767	Redis metadata filtering and specification, index customization (#8612 ) ### Description The previous Redis implementation did not allow for the user to specify the index configuration (i.e. changing the underlying algorithm) or add additional metadata to use for querying (i.e. hybrid or "filtered" search). This PR introduces the ability to specify custom index attributes and metadata attributes as well as use that metadata in filtered queries. Overall, more structure was introduced to the Redis implementation that should allow for easier maintainability moving forward. # New Features The following features are now available with the Redis integration into Langchain ## Index schema generation The schema for the index will now be automatically generated if not specified by the user. For example, the data above has the multiple metadata categories. The the following example ```python from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores.redis import Redis embeddings = OpenAIEmbeddings() rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` Loading the data in through this and the other ``from_documents`` and ``from_texts`` methods will now generate index schema in Redis like the following. view index schema with the ``redisvl`` tool. [link](redisvl.com) ```bash $ rvl index info -i users ``` Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|---------------\|-----------------\|------------\| \| users \| HASH \| ['doc:users'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Metadata specification The metadata schema generation has the following rules 1. All text fields are indexed as text fields. 2. All numeric fields are index as numeric fields. If you would like to have a text field as a tag field, users can specify overrides like the following for the example data ```python # this can also be a path to a yaml file index_schema = { "text": [{"name": "user"}, {"name": "job"}], "tag": [{"name": "credit_score"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` This will change the index specification to Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users2 \| HASH \| ['doc:users2'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| and throw a warning to the user (log output) that the generated schema does not match the specified schema. ```text index_schema does not match generated schema from metadata. index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} ``` As long as this is on purpose, this is fine. The schema can be defined as a yaml file or a dictionary ```yaml text: - name: user - name: job tag: - name: credit_score numeric: - name: age ``` and you pass in a path like ```python rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", index_schema=Path("sample1.yml").resolve() ) ``` Which will create the same schema as defined in the dictionary example Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users3 \| HASH \| ['doc:users3'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Vector Indexing Schema Users with large use cases may want to change how they formulate the vector index created by Langchain To utilize all the features of Redis for vector database use cases like this, you can now do the following to pass in index attribute modifiers like changing the indexing algorithm to HNSW. ```python vector_schema = { "algorithm": "HNSW" } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` A more complex example may look like ```python vector_schema = { "algorithm": "HNSW", "ef_construction": 200, "ef_runtime": 20 } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` All names correspond to the arguments you would set if using Redis-py or RedisVL. (put in doc link later) ### Better Querying Both vector queries and Range (limit) queries are now available and metadata is returned by default. The outputs are shown. ```python >>> query = "foo" >>> results = rds.similarity_search(query, k=1) >>> print(results) [Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})] >>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False) >>> print(results) # no metadata, but with scores [(Document(page_content='foo', metadata={}), 7.15255737305e-07)] >>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001) >>> print(len(results)) # range query (only above threshold even if k is higher) 4 ``` ### Custom metadata filtering A big advantage of Redis in this space is being able to do filtering on data stored alongside the vector itself. With the example above, the following is now possible in langchain. The equivalence operators are overridden to describe a new expression language that mimic that of [redisvl](redisvl.com). This allows for arbitrarily long sequences of filters that resemble SQL commands that can be used directly with vector queries and range queries. There are two interfaces by which to do so and both are shown. ```python >>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText >>> age_filter = RedisFilter.num("age") > 18 >>> age_filter = RedisNum("age") > 18 # equivalent >>> results = rds.similarity_search(query, filter=age_filter) >>> print(len(results)) 3 >>> job_filter = RedisFilter.text("job") == "engineer" >>> job_filter = RedisText("job") == "engineer" # equivalent >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # fuzzy match text search >>> job_filter = RedisFilter.text("job") % "eng*" >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # combined filters (AND) >>> combined = age_filter & job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 1 # combined filters (OR) >>> combined = age_filter \| job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 4 ``` All the above filter results can be checked against the data above. ### Other - Issue: #3967 - Dependencies: No added dependencies - Tag maintainer: @hwchase17 @baskaryan @rlancemartin - Twitter handle: @sampartee --------- Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Anish Shah	fa0b8f3368	fix broken wandb link in debugging page (#9771 ) - Description: Fix broken hyperlink in debugging page	1 year ago
Monami Sharma	12a373810c	Fixing broken links to Moderation and Constitutional chain (#9768 ) - Description: Fixing broken links for Moderation and Constitutional chain - Issue: N/A - Twitter handle: MonamiSharma	1 year ago
nikhilkjha	d57d08fd01	Initial commit for comprehend moderator (#9665 ) This PR implements a custom chain that wraps Amazon Comprehend API calls. The custom chain is aimed to be used with LLM chains to provide moderation capability that let’s you detect and redact PII, Toxic and Intent content in the LLM prompt, or the LLM response. The implementation accepts a configuration object to control what checks will be performed on a LLM prompt and can be used in a variety of setups using the LangChain expression language to not only detect the configured info in chains, but also other constructs such as a retriever. The included sample notebook goes over the different configuration options and how to use it with other chains. ### Usage sample ```python from langchain_experimental.comprehend_moderation import BaseModerationActions, BaseModerationFilters moderation_config = { "filters":[ BaseModerationFilters.PII, BaseModerationFilters.TOXICITY, BaseModerationFilters.INTENT ], "pii":{ "action": BaseModerationActions.ALLOW, "threshold":0.5, "labels":["SSN"], "mask_character": "X" }, "toxicity":{ "action": BaseModerationActions.STOP, "threshold":0.5 }, "intent":{ "action": BaseModerationActions.STOP, "threshold":0.5 } } comp_moderation_with_config = AmazonComprehendModerationChain( moderation_config=moderation_config, #specify the configuration client=comprehend_client, #optionally pass the Boto3 Client verbose=True ) template = """Question: {question} Answer:""" prompt = PromptTemplate(template=template, input_variables=["question"]) responses = [ "Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like 323-22-9980. John Doe's phone number is (999)253-9876.", "Final Answer: This is a really shitty way of constructing a birdhouse. This is fucking insane to think that any birds would actually create their motherfucking nests here." ] llm = FakeListLLM(responses=responses) llm_chain = LLMChain(prompt=prompt, llm=llm) chain = ( prompt \| comp_moderation_with_config \| {llm_chain.input_keys[0]: lambda x: x['output'] } \| llm_chain \| { "input": lambda x: x['text'] } \| comp_moderation_with_config ) response = chain.invoke({"question": "A sample SSN number looks like this 123-456-7890. Can you give me some more samples?"}) print(response['output']) ``` ### Output ``` > Entering new AmazonComprehendModerationChain chain... Running AmazonComprehendModerationChain... Running pii validation... Found PII content..stopping.. The prompt contains PII entities and cannot be processed ``` --------- Co-authored-by: Piyush Jain <piyushjain@duck.com> Co-authored-by: Anjan Biswas <anjanavb@amazon.com> Co-authored-by: Jha <nikjha@amazon.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Lance Martin	4339d21cf1	Code LLaMA in code understanding use case (#9779 ) Update Code Understanding use case doc w/ Code-llama.	1 year ago
Lance Martin	2ab04a4e32	Update agent docs, move to use-case sub-directory (#9344 ) Re-structure and add new agent page	1 year ago
Lance Martin	985873c497	Update RAG use case (move to ntbk) (#9340 )	1 year ago
Harrison Chase	709a67d9bf	multivector notebook (#9740 )	1 year ago
Fabrizio Ruocco	cacaf487c3	Azure Cognitive Search - update sdk b8, mod user agent, search with scores (#9191 ) Description: Update Azure Cognitive Search SDK to version b8 (breaking change) Customizable User Agent. Implemented Similarity search with scores @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Margaret Qian	30151c99c7	Update Mosaic endpoint input/output api (#7391 ) As noted in prior PRs (https://github.com/hwchase17/langchain/pull/6060, https://github.com/hwchase17/langchain/pull/7348), the input/output format has changed a few times as we've stabilized our inference API. This PR updates the API to the latest stable version as indicated in our docs: https://docs.mosaicml.com/en/latest/inference.html The input format looks like this: `{"inputs": [<prompt>]} ` The output format looks like this: ` {"outputs": [<output_text>]} ` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Harrison Chase	ade482c17e	add twitter chat loader doc (#9737 )	1 year ago
Leonid Kuligin	87da56fb1e	Added a pdf parser based on DocAI (#9579 ) #9578 --------- Co-authored-by: Leonid Kuligin <kuligin@google.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	1 year ago
Tudor Golubenco	dc30edf51c	Xata as a chat message memory store (#9719 ) This adds Xata as a memory store also to the python version of LangChain, similar to the [one for LangChain.js](https://github.com/hwchase17/langchainjs/pull/2217). I have added a Jupyter Notebook with a simple and a more complex example using an agent. To run the integration test, you need to execute something like: ``` XATA_API_KEY='xau_...' XATA_DB_URL="https://demo-uni3q8.eu-west-1.xata.sh/db/langchain" poetry run pytest tests/integration_tests/memory/test_xata.py ``` Where `langchain` is the database you create in Xata.	1 year ago
William FH	dff00ea91e	Chat Loaders (#9708 ) Still working out interface/notebooks + need discord data dump to test out things other than copy+paste Update: - Going to remove the 'user_id' arg in the loaders themselves and just standardize on putting the "sender" arg in the extra kwargs. Then can provide a utility function to map these to ai and human messages - Going to move the discord one into just a notebook since I don't have a good dump to test on and copy+paste maybe isn't the greatest thing to support in v0 - Need to do more testing on slack since it seems the dump only includes channels and NOT 1 on 1 convos - --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Bagatur	22b6549a34	sort api classes (#9710 )	1 year ago
Tomaz Bratanic	dacf96895a	Add the option to use separate LLMs for GraphCypherQA chain (#9689 ) The Graph Chains are different in the way that it uses two LLMChains instead of one like the retrievalQA chains. Therefore, sometimes you want to use different LLM to generate the database query and to generate the final answer. This feature would make it more convenient to use different LLMs in the same chain. I have also renamed the Graph DB QA Chain to Neo4j DB QA Chain in the documentation only as it is used only for Neo4j. The naming was ambigious as it was the first graphQA chain added and wasn't sure how do you want to spin it.	1 year ago

1 2 3 4 5 ...

1962 Commits (f7f3c025855e89aac8849b8d90b0e58018a0c78e)