langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Zizhong Zhang	641b71e2cd	refactor: rename to OpaquePrompts (#10013 ) Renamed to OpaquePrompts cc @baskaryan Thanks in advance!	2023-08-31 12:21:24 -07:00
Bagatur	8d66b00c73	Data anonymizer notebook nit (#10062 )	2023-08-31 10:58:13 -07:00
Bagatur	3efab8d3df	implement vectorstores by tencent vectordb (#9989 ) Hi there！ I'm excited to open this PR to add support for using 'Tencent Cloud VectorDB' as a vector store. Tencent Cloud VectorDB is a fully-managed, self-developed, enterprise-level distributed database service designed for storing, retrieving, and analyzing multi-dimensional vector data. The database supports multiple index types and similarity calculation methods, with a single index supporting vector scales up to 1 billion and capable of handling millions of QPS with millisecond-level query latency. Tencent Cloud VectorDB not only provides external knowledge bases for large models to improve their accuracy, but also has wide applications in AI fields such as recommendation systems, NLP services, computer vision, and intelligent customer service. The PR includes: Implementation of Vectorstore. I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below make format make lint make coverage make test	2023-08-31 00:48:25 -07:00
Bagatur	b1644bc9ad	cr	2023-08-31 00:43:34 -07:00
Cameron Vetter	e37d51cab6	fix scoring profile example (#10016 ) - Description: A change in the documentation example for Azure Cognitive Vector Search with Scoring Profile so the example works as written - Issue: #10015 - Dependencies: None - Tag maintainer: @baskaryan @ruoccofabrizio - Twitter handle: @poshporcupine	2023-08-31 00:35:06 -07:00
Hyeokjun seo	e2e05ad89e	Fix Typo : `openai_api_key` -> `serpapi_api_key` (#10020 ) Fixed typo in the comments Notebook. (which says `openai_api_key` for SerpAPI)	2023-08-31 00:33:13 -07:00
Tomaz Bratanic	f2e8399cc8	Fix link in Neo4j provider page (#10023 )	2023-08-31 00:32:42 -07:00
Bagatur	7fa82900cb	guides docs nits (#10005 )	2023-08-30 11:07:42 -07:00
Bagatur	2f03e71e67	rename local llm guide (#10004 )	2023-08-30 10:52:46 -07:00
Bagatur	781f274d19	make privacy guide section (#10003 )	2023-08-30 10:49:20 -07:00
maks-operlejn-ds	a8f804a618	Add data anonymizer (#9863 ) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-30 10:39:44 -07:00
Bagatur	98cce7dcd3	update moderation docs (#10002 )	2023-08-30 10:34:25 -07:00
Christophe Bornet	9870bfb9cd	Add bucket and object key to metadata in S3 loader (#9317 ) - Description: this PR adds `s3_object_key` and `s3_bucket` to the doc metadata when loading an S3 file. This is particularly useful when using `S3DirectoryLoader` to remove the files from the dir once they have been processed (getting the object keys from the metadata `source` field seems brittle) - Dependencies: N/A - Tag maintainer: ? - Twitter handle: _cbornet --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-30 11:03:24 -04:00
Guy Korland	24c0b01c38	Extend the FalkorDB QA demo (#9992 ) - Description: Extend the FalkorDB QA demo - Tag maintainer: @baskaryan	2023-08-30 10:13:18 -04:00
wlleiiwang	8c4e29240c	implement vectorstores by tencent vectordb	2023-08-30 16:40:58 +08:00
Leonid Ganeline	d03d6f6fd9	Merge branch 'master' into docs-tools-menu	2023-08-29 15:57:25 -07:00
Bagatur	8fb0a9594c	Add LLMonitor Callback Handler Integration - open-source observability & analytics (#9870 ) Adds support for [llmonitor](https://llmonitor.com) callbacks. It enables: - Requests tracking / logging / analytics - Error debugging - Cost analytics - User tracking Let me know if anythings neds to be changed for merge. Thank you!	2023-08-29 15:49:01 -07:00
leo-gan	8c1678a8c7	Updated titles, descriptions.	2023-08-29 15:42:28 -07:00
Bagatur	7bba1d911b	Fix typo in code_understanding.ipynb (#9899 ) seperate -> separate	2023-08-29 15:21:32 -07:00
Bagatur	2e65434568	docs: Fix the syntax error, replace "dotenv.load_env()" with "dotenv.… (#9900 ) Description: The documents incorrectly mentions "dotenv.load_env()", but it should actually be "dotenv.load_dotenv()". You can see the screenshot below for reference: python-dotenv: 1.0.0 ![image](https://github.com/langchain-ai/langchain/assets/2959046/94dc4b51-cc2f-412d-92e9-16b8ff0d513e)	2023-08-29 15:20:24 -07:00
Bagatur	b416f5c0c8	fix a link name format to the dependents document (#9928 )	2023-08-29 15:20:06 -07:00
Bagatur	8f199239b8	docs: `llms/google vertex AI` example update (#9960 ) Updated title, description, added sections.	2023-08-29 15:07:18 -07:00
Bagatur	2a03a0087d	docs: `memory` menu (#9947 ) The [Memory](https://python.langchain.com/docs/modules/memory/) menu is clogged with unnecessary wording. I've made it more concise by simplifying titles of the example notebooks. As results, menu is shorter and better for comprehend.	2023-08-29 15:06:11 -07:00
Bagatur	f7cc125cac	docs: `memory types` menu (#9949 ) The [Memory Types](https://python.langchain.com/docs/modules/memory/types/) menu is clogged with unnecessary wording. I've made it more concise by simplifying titles of the example notebooks. As results, menu is shorter and better for comprehend.	2023-08-29 15:05:23 -07:00
Bagatur	16eb935469	Fix for similarity_search_with_score (#9903 ) - Description: the implementation for similarity_search_with_score did not actually include a score or logic to filter. Now fixed. - Tag maintainer: @rlancemartin - Twitter handle: @ofermend	2023-08-29 15:04:48 -07:00
Fredrik Gullberg	f69d236a4a	docs: Fix spelling mistakes in apis.ipynb (#9911 ) - Description: Fix spelling mistakes in apis.ipynb - Issue: [#9910](https://github.com/langchain-ai/langchain/issues/9910) Co-authored-by: Fredrik Gullberg <fredrik.gullberg@klarna.com>	2023-08-29 14:53:00 -07:00
leo-gan	210de0c66b	Updated title, description, added sections	2023-08-29 14:31:33 -07:00
Cameron Hutchison	bcc3463ff4	docs: Azure AD Authentication for Azure OpenAI (#9951 ) # Description This PR adds additional documentation on how to use Azure Active Directory to authenticate to an OpenAI service within Azure. This method of authentication allows organizations with more complex security requirements to use Azure OpenAI. # Issue N/A # Dependencies N/A # Twitter https://twitter.com/CamAHutchison	2023-08-29 14:29:27 -07:00
Guy Korland	7cbe872af8	Add support for Falkordb (ex-RedisGraph) (#9821 ) Replace this entire comment with: - Description: Add support for Falkordb (ex-RedisGraph) - Tag maintainer: @hwchase17 - Twitter handle: @g_korland	2023-08-29 14:22:33 -07:00
Leonid Ganeline	393816e7bd	Merge branch 'master' into docs-memory-type-menu	2023-08-29 11:46:29 -07:00
Corvus Lee	0fb95ebe66	Docs: enrich SageMaker endpoint embeddings with docstrings and examples (#9924 ) Description: added comments to address the relationship between input/output transformations and the customised inference.py script.	2023-08-29 11:38:52 -07:00
leo-gan	d578efba35	updated notebook titles and text.	2023-08-29 11:25:53 -07:00
Leonid Ganeline	4b6e41a939	Merge branch 'master' into docs-memory-menu	2023-08-29 10:24:07 -07:00
Tomaz Bratanic	6092422e10	Add neo4j provider page (#9941 )	2023-08-29 10:09:51 -07:00
leo-gan	c906041aa8	updated notebook titles and text.	2023-08-29 09:58:26 -07:00
Tomaz Bratanic	db13fba7ea	Add neo4j vector support (#9770 ) Neo4j has added vector index integration just recently. To allow both ingestion and integrating it as vector RAG applications, I wrapped it as a vector store as the implementation is completely different from `GraphCypherQAChain`. Here, we are not generating any Cypher statements at query time, we are simply doing the vector similarity search using the new vector index as if we were dealing with a vector database. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-29 07:54:20 -07:00
Tudor Golubenco	171b0b183b	Pre-release Xata version no longer required (#9915 ) Tiny PR: Since we've released version 1.0.0 of the python SDK, we no longer need to specify the pre-release version when pip installing.	2023-08-29 07:21:22 -07:00
Mike Nitsenko	c80e406e95	Cube semantic loader: allow cubes processing (#9927 ) We've started to receive feedback (after launch) that using only views is confusing. We're considering this as a good practice, as a view serves as a "facade" for your data - however, we decided to let users decide this on their own. Solves the questions from: - https://github.com/cube-js/cube/issues/7028 - https://github.com/langchain-ai/langchain/pull/9690	2023-08-29 07:21:01 -07:00
LiaoKong	8f8455b24d	fix a link name format to the dependents document	2023-08-29 21:55:05 +08:00
Ofer Mendelevitch	8b8d2a6535	fixed similarity_search_with_score to really use a score updated unit test with a test for score threshold Updated demo notebook	2023-08-28 22:26:55 -07:00
Ikko Eltociear Ashimine	766bbd6c6b	Fix typo in code_understanding.ipynb seperate -> separate	2023-08-29 12:57:19 +09:00
tongtie	82a3c2a557	docs: Fix the syntax error, replace "dotenv.load_env()" with "dotenv.load_dotenv()".	2023-08-29 11:52:50 +08:00
Mazhar (Taha) Mumbaiwala	e80834d783	docs: Fix spelling mistakes in Etherscan.ipynb (#9845 )	2023-08-28 19:30:00 -07:00
Philippe PRADOS	7fdb7439e0	Update google drive notebooks (#9851 ) Update google drive doc loader and retriever notebooks. Show how to use with langchain-googledrive package. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-28 19:29:35 -07:00
Xiaobing Mi	5d47833ae1	Fix typo in web_scraping.ipynb (#9835 )	2023-08-28 19:26:23 -07:00
Leonid Ganeline	b1bffea9c7	docs: fix for title of `llm_caching` nb (#9891 ) Fixed title for the `extras/integrations/llms/llm_caching.ipynb`. Existing title breaks the sorted order of items in the navbar. Updated some formatting.	2023-08-28 18:34:04 -07:00
Leonid Ganeline	e01b00aa54	docs: `ainetwork` update (#9871 ) * Added links to the AI Network * Made title consistent to other tool kits * Added `integrations/providers/` integration card page * No changes in the example code!	2023-08-28 18:16:22 -07:00
Leonid Ganeline	cf122b6269	docs: `Infino` example fix (#9888 ) - Fixed a broken link in the `integrations/providers/infino.mdx` - Fixed a title in the `integration/collbacks/infino.ipynb` example - Updated text format in this example.	2023-08-28 17:42:11 -07:00
William FH	b14d74dd4d	iMessage loader (#9832 ) Add an iMessage chat loader	2023-08-28 13:43:59 -07:00
Lance Martin	8393ba9dab	Add instructions for GGUF (#9874 ) llama.cpp migrated to GGUF model format, and new releases (e.g., [here](https://huggingface.co/TheBloke)) now use GGUF.	2023-08-28 12:56:46 -07:00
hughcrt	3a4d4c940c	Change video width	2023-08-28 19:26:33 +02:00
hughcrt	97741d41c5	Add LLMonitorCallbackHandler	2023-08-28 19:24:50 +02:00
eryk-dsai	7f5713b80a	feat: grammar-based sampling in llama-cpp (#9712 ) ## Description The following PR enables the [grammar-based sampling](https://github.com/ggerganov/llama.cpp/tree/master/grammars) in llama-cpp LLM. In short, loading file with formal grammar definition will constrain model outputs. For instance, one can force the model to generate valid JSON or generate only python lists. In the follow-up PR we will add: * docs with some description why it is cool and how it works * maybe some code sample for some task such as in llama repo --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-28 09:52:55 -07:00
Harrison Chase	c1badc1fa2	add gmail loader (#9810 )	2023-08-27 17:18:09 -07:00
Vikas Sheoran	63921e327d	docs: Fix a spelling mistake in adding_memory.ipynb (#9794 ) # Description This pull request fixes a small spelling mistake found while reading docs.	2023-08-26 12:04:43 -07:00
Rosário P. Fernandes	aab01b55db	typo: funtions --> functions (#9784 ) Minor typo in the extractions use-case	2023-08-26 11:47:47 -07:00
Sam Partee	a28eea5767	Redis metadata filtering and specification, index customization (#8612 ) ### Description The previous Redis implementation did not allow for the user to specify the index configuration (i.e. changing the underlying algorithm) or add additional metadata to use for querying (i.e. hybrid or "filtered" search). This PR introduces the ability to specify custom index attributes and metadata attributes as well as use that metadata in filtered queries. Overall, more structure was introduced to the Redis implementation that should allow for easier maintainability moving forward. # New Features The following features are now available with the Redis integration into Langchain ## Index schema generation The schema for the index will now be automatically generated if not specified by the user. For example, the data above has the multiple metadata categories. The the following example ```python from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores.redis import Redis embeddings = OpenAIEmbeddings() rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` Loading the data in through this and the other ``from_documents`` and ``from_texts`` methods will now generate index schema in Redis like the following. view index schema with the ``redisvl`` tool. [link](redisvl.com) ```bash $ rvl index info -i users ``` Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|---------------\|-----------------\|------------\| \| users \| HASH \| ['doc:users'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Metadata specification The metadata schema generation has the following rules 1. All text fields are indexed as text fields. 2. All numeric fields are index as numeric fields. If you would like to have a text field as a tag field, users can specify overrides like the following for the example data ```python # this can also be a path to a yaml file index_schema = { "text": [{"name": "user"}, {"name": "job"}], "tag": [{"name": "credit_score"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` This will change the index specification to Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users2 \| HASH \| ['doc:users2'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| and throw a warning to the user (log output) that the generated schema does not match the specified schema. ```text index_schema does not match generated schema from metadata. index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} ``` As long as this is on purpose, this is fine. The schema can be defined as a yaml file or a dictionary ```yaml text: - name: user - name: job tag: - name: credit_score numeric: - name: age ``` and you pass in a path like ```python rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", index_schema=Path("sample1.yml").resolve() ) ``` Which will create the same schema as defined in the dictionary example Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users3 \| HASH \| ['doc:users3'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Vector Indexing Schema Users with large use cases may want to change how they formulate the vector index created by Langchain To utilize all the features of Redis for vector database use cases like this, you can now do the following to pass in index attribute modifiers like changing the indexing algorithm to HNSW. ```python vector_schema = { "algorithm": "HNSW" } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` A more complex example may look like ```python vector_schema = { "algorithm": "HNSW", "ef_construction": 200, "ef_runtime": 20 } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` All names correspond to the arguments you would set if using Redis-py or RedisVL. (put in doc link later) ### Better Querying Both vector queries and Range (limit) queries are now available and metadata is returned by default. The outputs are shown. ```python >>> query = "foo" >>> results = rds.similarity_search(query, k=1) >>> print(results) [Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})] >>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False) >>> print(results) # no metadata, but with scores [(Document(page_content='foo', metadata={}), 7.15255737305e-07)] >>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001) >>> print(len(results)) # range query (only above threshold even if k is higher) 4 ``` ### Custom metadata filtering A big advantage of Redis in this space is being able to do filtering on data stored alongside the vector itself. With the example above, the following is now possible in langchain. The equivalence operators are overridden to describe a new expression language that mimic that of [redisvl](redisvl.com). This allows for arbitrarily long sequences of filters that resemble SQL commands that can be used directly with vector queries and range queries. There are two interfaces by which to do so and both are shown. ```python >>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText >>> age_filter = RedisFilter.num("age") > 18 >>> age_filter = RedisNum("age") > 18 # equivalent >>> results = rds.similarity_search(query, filter=age_filter) >>> print(len(results)) 3 >>> job_filter = RedisFilter.text("job") == "engineer" >>> job_filter = RedisText("job") == "engineer" # equivalent >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # fuzzy match text search >>> job_filter = RedisFilter.text("job") % "eng*" >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # combined filters (AND) >>> combined = age_filter & job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 1 # combined filters (OR) >>> combined = age_filter \| job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 4 ``` All the above filter results can be checked against the data above. ### Other - Issue: #3967 - Dependencies: No added dependencies - Tag maintainer: @hwchase17 @baskaryan @rlancemartin - Twitter handle: @sampartee --------- Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-25 17:22:50 -07:00
Anish Shah	fa0b8f3368	fix broken wandb link in debugging page (#9771 ) - Description: Fix broken hyperlink in debugging page	2023-08-25 15:34:08 -07:00
Lance Martin	4339d21cf1	Code LLaMA in code understanding use case (#9779 ) Update Code Understanding use case doc w/ Code-llama.	2023-08-25 14:24:38 -07:00
Lance Martin	2ab04a4e32	Update agent docs, move to use-case sub-directory (#9344 ) Re-structure and add new agent page	2023-08-25 11:28:55 -07:00
Lance Martin	985873c497	Update RAG use case (move to ntbk) (#9340 )	2023-08-25 11:27:27 -07:00
Harrison Chase	709a67d9bf	multivector notebook (#9740 )	2023-08-25 07:07:27 -07:00
Fabrizio Ruocco	cacaf487c3	Azure Cognitive Search - update sdk b8, mod user agent, search with scores (#9191 ) Description: Update Azure Cognitive Search SDK to version b8 (breaking change) Customizable User Agent. Implemented Similarity search with scores @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-25 02:34:09 -07:00
Margaret Qian	30151c99c7	Update Mosaic endpoint input/output api (#7391 ) As noted in prior PRs (https://github.com/hwchase17/langchain/pull/6060, https://github.com/hwchase17/langchain/pull/7348), the input/output format has changed a few times as we've stabilized our inference API. This PR updates the API to the latest stable version as indicated in our docs: https://docs.mosaicml.com/en/latest/inference.html The input format looks like this: `{"inputs": [<prompt>]} ` The output format looks like this: ` {"outputs": [<output_text>]} ` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-24 22:13:17 -07:00
Harrison Chase	ade482c17e	add twitter chat loader doc (#9737 )	2023-08-24 21:55:22 -07:00
Leonid Kuligin	87da56fb1e	Added a pdf parser based on DocAI (#9579 ) #9578 --------- Co-authored-by: Leonid Kuligin <kuligin@google.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-24 21:44:49 -07:00
Tudor Golubenco	dc30edf51c	Xata as a chat message memory store (#9719 ) This adds Xata as a memory store also to the python version of LangChain, similar to the [one for LangChain.js](https://github.com/hwchase17/langchainjs/pull/2217). I have added a Jupyter Notebook with a simple and a more complex example using an agent. To run the integration test, you need to execute something like: ``` XATA_API_KEY='xau_...' XATA_DB_URL="https://demo-uni3q8.eu-west-1.xata.sh/db/langchain" poetry run pytest tests/integration_tests/memory/test_xata.py ``` Where `langchain` is the database you create in Xata.	2023-08-24 17:37:46 -07:00
William FH	dff00ea91e	Chat Loaders (#9708 ) Still working out interface/notebooks + need discord data dump to test out things other than copy+paste Update: - Going to remove the 'user_id' arg in the loaders themselves and just standardize on putting the "sender" arg in the extra kwargs. Then can provide a utility function to map these to ai and human messages - Going to move the discord one into just a notebook since I don't have a good dump to test on and copy+paste maybe isn't the greatest thing to support in v0 - Need to do more testing on slack since it seems the dump only includes channels and NOT 1 on 1 convos - --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-24 17:23:27 -07:00
Tomaz Bratanic	dacf96895a	Add the option to use separate LLMs for GraphCypherQA chain (#9689 ) The Graph Chains are different in the way that it uses two LLMChains instead of one like the retrievalQA chains. Therefore, sometimes you want to use different LLM to generate the database query and to generate the final answer. This feature would make it more convenient to use different LLMs in the same chain. I have also renamed the Graph DB QA Chain to Neo4j DB QA Chain in the documentation only as it is used only for Neo4j. The naming was ambigious as it was the first graphQA chain added and wasn't sure how do you want to spin it.	2023-08-24 11:50:38 -07:00
Lance Martin	c37be7f5fb	Add Code LLaMA to code QA use case (#9713 ) Use [Ollama integration](https://ollama.ai/blog/run-code-llama-locally).	2023-08-24 11:03:35 -07:00
Patrick Loeber	6bedfdf25a	Fix docs for AssemblyAIAudioTranscriptLoader (shorter import path) (#9687 ) Uses the shorter import path `from langchain.document_loaders import` instead of the full path `from langchain.document_loaders.assemblyai` Applies those changes to the docs and the unit test. See #9667 that adds this new loader.	2023-08-24 07:24:53 -07:00
了空	7cf5c582d2	Added a link to the dependencies document (#9703 )	2023-08-24 07:23:48 -07:00
Harrison Chase	9963b32e59	Harrison/multi vector (#9700 )	2023-08-24 06:42:42 -07:00
Leonid Ganeline	b048236c1a	📖 docs: `integrations/agent_toolkits` (#9333 ) Note: There are no changes in the file names! - The group name on the main navbar changed: `Agent toolkits` -> `Agents & Toolkits`. Examples here are the mix of the Agent and Toolkit examples because Agents and Toolkits in examples are always used together. - Titles changed: removed "Agent" and "Toolkit" suffixes. The reason is the same. - Formatting: mostly cleaning the header structure, so it could be better on the right-side navbar. Main navbar is looking much cleaner now.	2023-08-23 23:17:47 -07:00
Patrick Loeber	5990651070	Add new document_loader: AssemblyAIAudioTranscriptLoader (#9667 ) This PR adds a new document loader `AssemblyAIAudioTranscriptLoader` that allows to transcribe audio files with the [AssemblyAI API](https://www.assemblyai.com) and loads the transcribed text into documents. - Add new document_loader with class `AssemblyAIAudioTranscriptLoader` - Add optional dependency `assemblyai` - Add unit tests (using a Mock client) - Add docs notebook This is the equivalent to the JS integration already available in LangChain.js. See the [LangChain JS docs AssemblyAI page](https://js.langchain.com/docs/modules/data_connection/document_loaders/integrations/web_loaders/assemblyai_audio_transcription). At its simplest, you can use the loader to get a transcript back from an audio file like this: ```python from langchain.document_loaders.assemblyai import AssemblyAIAudioTranscriptLoader loader = AssemblyAIAudioTranscriptLoader(file_path="./testfile.mp3") docs = loader.load() ``` To use it, it needs the `assemblyai` python package installed, and the environment variable `ASSEMBLYAI_API_KEY` set with your API key. Alternatively, the API key can also be passed as an argument. Twitter handles to shout out if so kindly 🙇 [@AssemblyAI](https://twitter.com/AssemblyAI) and [@patloeber](https://twitter.com/patloeber) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-23 22:51:19 -07:00
seamusp	25f2c82ae8	docs:misc fixes (#9671 ) Improve internal consistency in LangChain documentation - Change occurrences of eg and eg. to e.g. - Fix headers containing unnecessary capital letters. - Change instances of "few shot" to "few-shot". - Add periods to end of sentences where missing. - Minor spelling and grammar fixes.	2023-08-23 22:36:54 -07:00
Eugene Yurtsev	b88dfcb42a	Add indexing support (#9614 ) This PR introduces a persistence layer to help with indexing workflows into vectostores. The indexing code helps users to: 1. Avoid writing duplicated content into the vectostore 2. Avoid over-writing content if it's unchanged Importantly, this keeps on working even if the content being written is derived via a set of transformations from some source content (e.g., indexing children documents that were derived from parent documents by chunking.) The two main components are: 1. Persistence layer that keeps track of which keys were updated and when. Keeping track of the timestamp of updates, allows to clean up old content safely, and with minimal complexity. 2. HashedDocument which is used to hash the contents (including metadata) of the documents. We rely on the hashes for identifying duplicates. The indexing code works with ANY document loader. To add transformations to the documents, users for now can add a custom document loader that composes an existing loader together with document transformers. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-23 21:41:38 -04:00
Lakshay Kansal	a8c916955f	Updates to Nomic Atlas and GPT4All documentation (#9414 ) Description: Updates for Nomic AI Atlas and GPT4All integrations documentation. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-23 17:49:44 -07:00
Keras Conv3d	cbaea8d63b	tair fix distance_type error, and add hybrid search (#9531 ) - fix: distance_type error, - feature: Tair add hybrid search --------- Co-authored-by: thw <hanwen.thw@alibaba-inc.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-23 16:38:31 -07:00
Jacob Lee	278ef0bdcf	Adds ChatOllama (#9628 ) @rlancemartin --------- Co-authored-by: Adilkhan Sarsen <54854336+adolkhan@users.noreply.github.com> Co-authored-by: Kim Minjong <make.dirty.code@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-23 13:02:26 -07:00
Bagatur	80dd162e0d	mv embedding cache docs (#9664 )	2023-08-23 11:46:04 -07:00
Bagatur	a40c12bb88	Update the nlpcloud connector after some changes on the NLP Cloud API (#9586 ) - Description: remove some text generation deprecated parameters and update the embeddings doc, - Tag maintainer: @rlancemartin	2023-08-23 11:35:08 -07:00
Bagatur	d8e2dd4c89	mv	2023-08-23 11:30:44 -07:00
Bagatur	e2e582f1f6	Fixed source key name for docugami loader (#8598 ) The Docugami loader was not returning the source metadata key. This was triggering this exception when used with retrievers, per https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/schema/prompt_template.py#L193C1-L195C41 The fix is simple and just updates the metadata key name for the document each chunk is sourced from, from "name" to "source" as expected. I tested by running the python notebook that has an end to end scenario in it. Tagging DataLoader maintainers @rlancemartin @eyurtsev	2023-08-23 11:24:55 -07:00
Zizhong Zhang	8a03836160	docs: fix PromptGuard docs (#9659 ) Fix PromptGuard docs. Noticed several trivial issues on the docs when integrating the new class. cc @baskaryan	2023-08-23 10:04:53 -07:00
Yong woo Song	f0ae10a20e	Fix typo in tigris (#9637 ) The link has a typo in [tigirs docs](https://python.langchain.com/docs/integrations/providers/tigris), so I couldn't access it. So, I have corrected it. Thanks! ☺️	2023-08-23 07:15:18 -07:00
Junlin Zhou	5b9bdcac1b	docs: fix link url (#9643 ) This pull request corrects the URL links in the Async API documentation to align with the updated project layout. The links had not been updated despite the changes in layout.	2023-08-23 07:05:02 -07:00
Joseph McElroy	2a06e7b216	ElasticsearchStore: improve error logging for adding documents (#9648 ) Not obvious what the error is when you cannot index. This pr adds the ability to log the first errors reason, to help the user diagnose the issue. Also added some more documentation for when you want to use the vectorstore with an embedding model deployed in elasticsearch. Credit: @elastic and @phoey1	2023-08-23 07:04:09 -07:00
Julien Salinas	f1072cc31f	Merge branch 'master' into master	2023-08-23 14:42:40 +02:00
Leonid Ganeline	e1f4f9ac3e	docs: `integrations/providers` (#9631 ) Added missed pages for `integrations/providers` from `vectorstores`. Updated several `vectorstores` notebooks.	2023-08-22 20:28:11 -07:00
anifort	900c1f3e8d	Add support for structured data sources with google enterprise search (#9037 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: Added the capability to handles structured data from google enterprise search, - Issue: Retriever failed when underline search engine was integrated with structured data, - Dependencies: google-api-core - Tag maintainer: @jarokaz - Twitter handle: anifort Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> --------- Co-authored-by: Christos Aniftos <aniftos@google.com> Co-authored-by: Holt Skinner <13262395+holtskinner@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-22 23:18:10 -04:00
Jacob Lee	632a83c48e	Update ChatOpenAI docs with fine-tuning example (#9632 )	2023-08-22 16:56:53 -07:00
Adilkhan Sarsen	f29312eb84	Fixing deeplake.mdx file as it uses outdates links (#9602 ) deeplake.mdx was using old links and was not working properly, in the PR we fix the issue.	2023-08-22 15:12:24 -07:00
klae01	b868ef23bc	Add AINetwork blockchain toolkit integration (#9527 ) # Description This PR introduces a new toolkit for interacting with the AINetwork blockchain. The toolkit provides a set of tools for performing various operations on the AINetwork blockchain, such as transferring AIN, reading and writing values to the blockchain database, managing apps, setting rules and owners. # Dependencies [ain-py](https://github.com/ainblockchain/ain-py) >= 1.0.2 # Misc The example notebook (langchain/docs/extras/integrations/toolkits/ainetwork.ipynb) is in the PR --------- Co-authored-by: kriii <kriii@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-22 08:03:33 -07:00
Vanessa Arndorfer	1ea2f9adf4	Document AzureML Deployment Example (#9571 ) Description: Link an example of deploying a Langchain app to an AzureML online endpoint to the deployments documentation page. Co-authored-by: Vanessa Arndorfer <vaarndor@microsoft.com>	2023-08-22 07:36:47 -07:00
toddkim95	fba29f203a	Add to support polars (#9610 ) ### Description Polars is a DataFrame interface on top of an OLAP Query Engine implemented in Rust. Polars is faster to read than pandas, so I'm looking forward to seeing it added to the document loader. ### Dependencies polars (https://pola-rs.github.io/polars-book/user-guide/) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-22 07:36:24 -07:00
Julien Salinas	4d0b7bb8e1	Remove Dolphin and GPT-J from the embeddings docs. These models are not proposed anymore.	2023-08-22 09:28:22 +02:00
Jacob Lee	0fea987dd2	Add missing param to parent document retriever notebook (#9569 )	2023-08-21 15:02:12 -07:00
Zizhong Zhang	00eff8c4a7	feat: Add PromptGuard integration (#9481 ) Add PromptGuard integration ------- There are two approaches to integrate PromptGuard with a LangChain application. 1. PromptGuardLLMWrapper 2. functions that can be used in LangChain expression. ----- - Dependencies `promptguard` python package, which is a runtime requirement if you'd try out the demo. - @baskaryan @hwchase17 Thanks for the ideas and suggestions along the development process. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-21 14:59:36 -07:00
Oleksandr Ichenskyi	8bc1a3dca8	docs: Add memgraph notebook (#9448 ) - Description: added graph_memgraph_qa.ipynb which shows how to use LLMs to provide a natural language interface to a Memgraph database using [MemgraphGraph](https://github.com/langchain-ai/langchain/pull/8591) class. - Dependencies: given that the notebook utilizes the MemgraphGraph class, it relies on both this class and several Python packages that are installed in the notebook using pip (langchain, openai, neo4j, gqlalchemy). The notebook is dependent on having a functional Memgraph instance running, as it requires this instance to establish a connection.	2023-08-21 13:45:04 -07:00

1 2 3 4 5 ...

672 Commits