langchain

Commit Graph

Author	SHA1	Message	Date
Bagatur	7fa82900cb	guides docs nits (#10005 )	1 year ago
Bagatur	2f03e71e67	rename local llm guide (#10004 )	1 year ago
Bagatur	781f274d19	make privacy guide section (#10003 )	1 year ago
maks-operlejn-ds	a8f804a618	Add data anonymizer (#9863 ) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Bagatur	98cce7dcd3	update moderation docs (#10002 )	1 year ago
Christophe Bornet	9870bfb9cd	Add bucket and object key to metadata in S3 loader (#9317 ) - Description: this PR adds `s3_object_key` and `s3_bucket` to the doc metadata when loading an S3 file. This is particularly useful when using `S3DirectoryLoader` to remove the files from the dir once they have been processed (getting the object keys from the metadata `source` field seems brittle) - Dependencies: N/A - Tag maintainer: ? - Twitter handle: _cbornet --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	1 year ago
Guy Korland	24c0b01c38	Extend the FalkorDB QA demo (#9992 ) - Description: Extend the FalkorDB QA demo - Tag maintainer: @baskaryan	1 year ago
Leonid Ganeline	d03d6f6fd9	Merge branch 'master' into docs-tools-menu	1 year ago
Bagatur	8fb0a9594c	Add LLMonitor Callback Handler Integration - open-source observability & analytics (#9870 ) Adds support for [llmonitor](https://llmonitor.com) callbacks. It enables: - Requests tracking / logging / analytics - Error debugging - Cost analytics - User tracking Let me know if anythings neds to be changed for merge. Thank you!	1 year ago
leo-gan	8c1678a8c7	Updated titles, descriptions.	1 year ago
Bagatur	7bba1d911b	Fix typo in code_understanding.ipynb (#9899 ) seperate -> separate	1 year ago
Bagatur	2e65434568	docs: Fix the syntax error, replace "dotenv.load_env()" with "dotenv.… (#9900 ) Description: The documents incorrectly mentions "dotenv.load_env()", but it should actually be "dotenv.load_dotenv()". You can see the screenshot below for reference: python-dotenv: 1.0.0 ![image](https://github.com/langchain-ai/langchain/assets/2959046/94dc4b51-cc2f-412d-92e9-16b8ff0d513e)	1 year ago
Bagatur	b416f5c0c8	fix a link name format to the dependents document (#9928 )	1 year ago
Bagatur	8f199239b8	docs: `llms/google vertex AI` example update (#9960 ) Updated title, description, added sections.	1 year ago
Bagatur	2a03a0087d	docs: `memory` menu (#9947 ) The [Memory](https://python.langchain.com/docs/modules/memory/) menu is clogged with unnecessary wording. I've made it more concise by simplifying titles of the example notebooks. As results, menu is shorter and better for comprehend.	1 year ago
Bagatur	f7cc125cac	docs: `memory types` menu (#9949 ) The [Memory Types](https://python.langchain.com/docs/modules/memory/types/) menu is clogged with unnecessary wording. I've made it more concise by simplifying titles of the example notebooks. As results, menu is shorter and better for comprehend.	1 year ago
Bagatur	16eb935469	Fix for similarity_search_with_score (#9903 ) - Description: the implementation for similarity_search_with_score did not actually include a score or logic to filter. Now fixed. - Tag maintainer: @rlancemartin - Twitter handle: @ofermend	1 year ago
Fredrik Gullberg	f69d236a4a	docs: Fix spelling mistakes in apis.ipynb (#9911 ) - Description: Fix spelling mistakes in apis.ipynb - Issue: [#9910](https://github.com/langchain-ai/langchain/issues/9910) Co-authored-by: Fredrik Gullberg <fredrik.gullberg@klarna.com>	1 year ago
leo-gan	210de0c66b	Updated title, description, added sections	1 year ago
Cameron Hutchison	bcc3463ff4	docs: Azure AD Authentication for Azure OpenAI (#9951 ) # Description This PR adds additional documentation on how to use Azure Active Directory to authenticate to an OpenAI service within Azure. This method of authentication allows organizations with more complex security requirements to use Azure OpenAI. # Issue N/A # Dependencies N/A # Twitter https://twitter.com/CamAHutchison	1 year ago
Guy Korland	7cbe872af8	Add support for Falkordb (ex-RedisGraph) (#9821 ) Replace this entire comment with: - Description: Add support for Falkordb (ex-RedisGraph) - Tag maintainer: @hwchase17 - Twitter handle: @g_korland	1 year ago
Leonid Ganeline	393816e7bd	Merge branch 'master' into docs-memory-type-menu	1 year ago
Corvus Lee	0fb95ebe66	Docs: enrich SageMaker endpoint embeddings with docstrings and examples (#9924 ) Description: added comments to address the relationship between input/output transformations and the customised inference.py script.	1 year ago
leo-gan	d578efba35	updated notebook titles and text.	1 year ago
Leonid Ganeline	4b6e41a939	Merge branch 'master' into docs-memory-menu	1 year ago
Tomaz Bratanic	6092422e10	Add neo4j provider page (#9941 )	1 year ago
leo-gan	c906041aa8	updated notebook titles and text.	1 year ago
Tomaz Bratanic	db13fba7ea	Add neo4j vector support (#9770 ) Neo4j has added vector index integration just recently. To allow both ingestion and integrating it as vector RAG applications, I wrapped it as a vector store as the implementation is completely different from `GraphCypherQAChain`. Here, we are not generating any Cypher statements at query time, we are simply doing the vector similarity search using the new vector index as if we were dealing with a vector database. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Tudor Golubenco	171b0b183b	Pre-release Xata version no longer required (#9915 ) Tiny PR: Since we've released version 1.0.0 of the python SDK, we no longer need to specify the pre-release version when pip installing.	1 year ago
Mike Nitsenko	c80e406e95	Cube semantic loader: allow cubes processing (#9927 ) We've started to receive feedback (after launch) that using only views is confusing. We're considering this as a good practice, as a view serves as a "facade" for your data - however, we decided to let users decide this on their own. Solves the questions from: - https://github.com/cube-js/cube/issues/7028 - https://github.com/langchain-ai/langchain/pull/9690	1 year ago
LiaoKong	8f8455b24d	fix a link name format to the dependents document	1 year ago
Ofer Mendelevitch	8b8d2a6535	fixed similarity_search_with_score to really use a score updated unit test with a test for score threshold Updated demo notebook	1 year ago
Ikko Eltociear Ashimine	766bbd6c6b	Fix typo in code_understanding.ipynb seperate -> separate	1 year ago
tongtie	82a3c2a557	docs: Fix the syntax error, replace "dotenv.load_env()" with "dotenv.load_dotenv()".	1 year ago
Mazhar (Taha) Mumbaiwala	e80834d783	docs: Fix spelling mistakes in Etherscan.ipynb (#9845 )	1 year ago
Philippe PRADOS	7fdb7439e0	Update google drive notebooks (#9851 ) Update google drive doc loader and retriever notebooks. Show how to use with langchain-googledrive package. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Xiaobing Mi	5d47833ae1	Fix typo in web_scraping.ipynb (#9835 )	1 year ago
Leonid Ganeline	b1bffea9c7	docs: fix for title of `llm_caching` nb (#9891 ) Fixed title for the `extras/integrations/llms/llm_caching.ipynb`. Existing title breaks the sorted order of items in the navbar. Updated some formatting.	1 year ago
Leonid Ganeline	e01b00aa54	docs: `ainetwork` update (#9871 ) * Added links to the AI Network * Made title consistent to other tool kits * Added `integrations/providers/` integration card page * No changes in the example code!	1 year ago
Leonid Ganeline	cf122b6269	docs: `Infino` example fix (#9888 ) - Fixed a broken link in the `integrations/providers/infino.mdx` - Fixed a title in the `integration/collbacks/infino.ipynb` example - Updated text format in this example.	1 year ago
William FH	b14d74dd4d	iMessage loader (#9832 ) Add an iMessage chat loader	1 year ago
Lance Martin	8393ba9dab	Add instructions for GGUF (#9874 ) llama.cpp migrated to GGUF model format, and new releases (e.g., [here](https://huggingface.co/TheBloke)) now use GGUF.	1 year ago
hughcrt	3a4d4c940c	Change video width	1 year ago
hughcrt	97741d41c5	Add LLMonitorCallbackHandler	1 year ago
eryk-dsai	7f5713b80a	feat: grammar-based sampling in llama-cpp (#9712 ) ## Description The following PR enables the [grammar-based sampling](https://github.com/ggerganov/llama.cpp/tree/master/grammars) in llama-cpp LLM. In short, loading file with formal grammar definition will constrain model outputs. For instance, one can force the model to generate valid JSON or generate only python lists. In the follow-up PR we will add: * docs with some description why it is cool and how it works * maybe some code sample for some task such as in llama repo --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Harrison Chase	c1badc1fa2	add gmail loader (#9810 )	1 year ago
Vikas Sheoran	63921e327d	docs: Fix a spelling mistake in adding_memory.ipynb (#9794 ) # Description This pull request fixes a small spelling mistake found while reading docs.	1 year ago
Rosário P. Fernandes	aab01b55db	typo: funtions --> functions (#9784 ) Minor typo in the extractions use-case	1 year ago
Sam Partee	a28eea5767	Redis metadata filtering and specification, index customization (#8612 ) ### Description The previous Redis implementation did not allow for the user to specify the index configuration (i.e. changing the underlying algorithm) or add additional metadata to use for querying (i.e. hybrid or "filtered" search). This PR introduces the ability to specify custom index attributes and metadata attributes as well as use that metadata in filtered queries. Overall, more structure was introduced to the Redis implementation that should allow for easier maintainability moving forward. # New Features The following features are now available with the Redis integration into Langchain ## Index schema generation The schema for the index will now be automatically generated if not specified by the user. For example, the data above has the multiple metadata categories. The the following example ```python from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores.redis import Redis embeddings = OpenAIEmbeddings() rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` Loading the data in through this and the other ``from_documents`` and ``from_texts`` methods will now generate index schema in Redis like the following. view index schema with the ``redisvl`` tool. [link](redisvl.com) ```bash $ rvl index info -i users ``` Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|---------------\|-----------------\|------------\| \| users \| HASH \| ['doc:users'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Metadata specification The metadata schema generation has the following rules 1. All text fields are indexed as text fields. 2. All numeric fields are index as numeric fields. If you would like to have a text field as a tag field, users can specify overrides like the following for the example data ```python # this can also be a path to a yaml file index_schema = { "text": [{"name": "user"}, {"name": "job"}], "tag": [{"name": "credit_score"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` This will change the index specification to Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users2 \| HASH \| ['doc:users2'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| and throw a warning to the user (log output) that the generated schema does not match the specified schema. ```text index_schema does not match generated schema from metadata. index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} ``` As long as this is on purpose, this is fine. The schema can be defined as a yaml file or a dictionary ```yaml text: - name: user - name: job tag: - name: credit_score numeric: - name: age ``` and you pass in a path like ```python rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", index_schema=Path("sample1.yml").resolve() ) ``` Which will create the same schema as defined in the dictionary example Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users3 \| HASH \| ['doc:users3'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Vector Indexing Schema Users with large use cases may want to change how they formulate the vector index created by Langchain To utilize all the features of Redis for vector database use cases like this, you can now do the following to pass in index attribute modifiers like changing the indexing algorithm to HNSW. ```python vector_schema = { "algorithm": "HNSW" } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` A more complex example may look like ```python vector_schema = { "algorithm": "HNSW", "ef_construction": 200, "ef_runtime": 20 } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` All names correspond to the arguments you would set if using Redis-py or RedisVL. (put in doc link later) ### Better Querying Both vector queries and Range (limit) queries are now available and metadata is returned by default. The outputs are shown. ```python >>> query = "foo" >>> results = rds.similarity_search(query, k=1) >>> print(results) [Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})] >>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False) >>> print(results) # no metadata, but with scores [(Document(page_content='foo', metadata={}), 7.15255737305e-07)] >>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001) >>> print(len(results)) # range query (only above threshold even if k is higher) 4 ``` ### Custom metadata filtering A big advantage of Redis in this space is being able to do filtering on data stored alongside the vector itself. With the example above, the following is now possible in langchain. The equivalence operators are overridden to describe a new expression language that mimic that of [redisvl](redisvl.com). This allows for arbitrarily long sequences of filters that resemble SQL commands that can be used directly with vector queries and range queries. There are two interfaces by which to do so and both are shown. ```python >>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText >>> age_filter = RedisFilter.num("age") > 18 >>> age_filter = RedisNum("age") > 18 # equivalent >>> results = rds.similarity_search(query, filter=age_filter) >>> print(len(results)) 3 >>> job_filter = RedisFilter.text("job") == "engineer" >>> job_filter = RedisText("job") == "engineer" # equivalent >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # fuzzy match text search >>> job_filter = RedisFilter.text("job") % "eng*" >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # combined filters (AND) >>> combined = age_filter & job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 1 # combined filters (OR) >>> combined = age_filter \| job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 4 ``` All the above filter results can be checked against the data above. ### Other - Issue: #3967 - Dependencies: No added dependencies - Tag maintainer: @hwchase17 @baskaryan @rlancemartin - Twitter handle: @sampartee --------- Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Anish Shah	fa0b8f3368	fix broken wandb link in debugging page (#9771 ) - Description: Fix broken hyperlink in debugging page	1 year ago

1 2 3 4 5 ...

614 Commits (e805f8e26373b24431401f02ce1a4654cb2d2078)