langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-06 03:20:49 +00:00

Author	SHA1	Message	Date
Nikhil Suresh	56a0165a4e	cleaned up unit test example	2023-08-29 23:37:54 +00:00
Nikhil Suresh	b31475c622	minor updates to regex	2023-08-29 23:13:31 +00:00
Nikhil Suresh	dd10cf945c	fixed minor linting issues	2023-08-29 14:15:59 +00:00
Nikhil Suresh	23ef836b48	matches colon and any number of white spaces after colon	2023-08-29 04:18:33 +00:00
Nikhil Suresh	64eb5a6082	removed unnecessary white space in regex that breaks qa with sources chain	2023-08-29 03:54:38 +00:00
Nikhil Suresh	8a4670e127	updated formatting changes	2023-08-29 03:54:38 +00:00
Nikhil Suresh	b1f649bca5	fixed issue with white space and added unit tests	2023-08-29 03:54:38 +00:00
Nikhil Suresh	6d3485e798	fixed regex to match sources for all cases, also includes source	2023-08-29 03:54:25 +00:00
Mazhar (Taha) Mumbaiwala	e80834d783	docs: Fix spelling mistakes in Etherscan.ipynb (#9845 )	2023-08-28 19:30:00 -07:00
Philippe PRADOS	7fdb7439e0	Update google drive notebooks (#9851 ) Update google drive doc loader and retriever notebooks. Show how to use with langchain-googledrive package. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-28 19:29:35 -07:00
Xiaobing Mi	5d47833ae1	Fix typo in web_scraping.ipynb (#9835 )	2023-08-28 19:26:23 -07:00
Leonid Ganeline	b1bffea9c7	docs: fix for title of `llm_caching` nb (#9891 ) Fixed title for the `extras/integrations/llms/llm_caching.ipynb`. Existing title breaks the sorted order of items in the navbar. Updated some formatting.	2023-08-28 18:34:04 -07:00
Leonid Ganeline	e01b00aa54	docs: `ainetwork` update (#9871 ) * Added links to the AI Network * Made title consistent to other tool kits * Added `integrations/providers/` integration card page * No changes in the example code!	2023-08-28 18:16:22 -07:00
Predrag Gruevski	47499c6db4	Avoid `type: ignore` suppression by adding mypy type hint. (#9881 ) Mypy was not able to determine a good type for `type_to_loader_dict`, since the values in the dict are functions whose return types are related to each other in a complex way. One can see this by adding a line like `reveal_type(type_to_loader_dict)` and running mypy, which will get mypy to show what type it has inferred for that value. Adding an explicit type hint to help out mypy avoids the need for a mypy suppression and allows the code to type-check cleanly.	2023-08-28 17:53:33 -07:00
maks-operlejn-ds	f327535eda	Add conftest file to langchain experimental (#9886 ) In order to use `requires` marker in langchain-experimental, there's a need for conftest.py file inside. Everything is identical to the main langchain module. Co-authored-by: maks-operlejn-ds <maks.operlejn@gmail.com>	2023-08-28 17:52:16 -07:00
Leonid Ganeline	cf122b6269	docs: `Infino` example fix (#9888 ) - Fixed a broken link in the `integrations/providers/infino.mdx` - Fixed a title in the `integration/collbacks/infino.ipynb` example - Updated text format in this example.	2023-08-28 17:42:11 -07:00
Piyush Jain	fe1b9ee6b8	Updated notebook for comprehend moderation (#9875 ) ### Description Updated the notebook for comprehend moderation. cc @baskaryan	2023-08-28 16:01:43 -07:00
William FH	907c57e324	Add collect_runs callback (#9885 )	2023-08-28 15:30:41 -07:00
William FH	3103f07e03	Use existing required args obj if specified (#9883 ) We always overwrote the required args but we infer them by default. Doing it only the old way makes it so the llm guesses even if an arg is optional (e.g., for uuids)	2023-08-28 14:40:22 -07:00
William FH	b14d74dd4d	iMessage loader (#9832 ) Add an iMessage chat loader	2023-08-28 13:43:59 -07:00
Lance Martin	8393ba9dab	Add instructions for GGUF (#9874 ) llama.cpp migrated to GGUF model format, and new releases (e.g., [here](https://huggingface.co/TheBloke)) now use GGUF.	2023-08-28 12:56:46 -07:00
Predrag Gruevski	eb3d1fa93c	Add security warning to experimental `SQLDatabaseChain` class. (#9867 ) The most reliable way to not have a chain run an undesirable SQL command is to not give it database permissions to run that command. That way the database itself performs the rule enforcement, so it's much easier to configure and use properly than anything we could add in ourselves.	2023-08-28 13:53:27 -04:00
eryk-dsai	7f5713b80a	feat: grammar-based sampling in llama-cpp (#9712 ) ## Description The following PR enables the [grammar-based sampling](https://github.com/ggerganov/llama.cpp/tree/master/grammars) in llama-cpp LLM. In short, loading file with formal grammar definition will constrain model outputs. For instance, one can force the model to generate valid JSON or generate only python lists. In the follow-up PR we will add: * docs with some description why it is cool and how it works * maybe some code sample for some task such as in llama repo --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-28 09:52:55 -07:00
William FH	cb642ef658	Return feedback (#9629 ) Return the feedback values in an eval run result Also made a helper method to display as a dataframe but it may be overkill	2023-08-28 09:15:05 -07:00
Bagatur	5e2d0cf54e	bump 275 (#9860 )	2023-08-28 07:27:07 -07:00
XUEYANZ	f97d3a76e7	Update CONTRIBUTING.md (#9817 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. --> Hi LangChain :) Thank you for such a great project! I was going through the CONTRIBUTING.md and found a few minor issues.	2023-08-28 09:38:34 -04:00
Eugene Yurtsev	5edf819524	Qdrant Client: Expose instance for creating client (#9706 ) Expose classmethods to convenient initialize the vectostore. The purpose of this PR is to make it easy for users to initialize an empty vectorstore that's properly pre-configured without having to index documents into it via `from_documents`. This will make it easier for users to rely on the following indexing code: https://github.com/langchain-ai/langchain/pull/9614 to help manage data in the qdrant vectorstore.	2023-08-28 09:30:59 -04:00
Harrison Chase	610f46d83a	accept openai terms (#9826 )	2023-08-27 17:18:24 -07:00
Harrison Chase	c1badc1fa2	add gmail loader (#9810 )	2023-08-27 17:18:09 -07:00
Bagatur	0d01cede03	bump 274 (#9805 )	2023-08-26 12:16:26 -07:00
Vikas Sheoran	63921e327d	docs: Fix a spelling mistake in adding_memory.ipynb (#9794 ) # Description This pull request fixes a small spelling mistake found while reading docs.	2023-08-26 12:04:43 -07:00
Rosário P. Fernandes	aab01b55db	typo: funtions --> functions (#9784 ) Minor typo in the extractions use-case	2023-08-26 11:47:47 -07:00
Nikhil Suresh	0da5803f5a	fixed regex to match sources for all cases, also includes source (#9775 ) - Description: Updated the regex to handle all the different cases for string matching (SOURCES, sources, Sources), - Issue: https://github.com/langchain-ai/langchain/issues/9774 - Dependencies: N/A	2023-08-25 18:10:33 -07:00
Sam Partee	a28eea5767	Redis metadata filtering and specification, index customization (#8612 ) ### Description The previous Redis implementation did not allow for the user to specify the index configuration (i.e. changing the underlying algorithm) or add additional metadata to use for querying (i.e. hybrid or "filtered" search). This PR introduces the ability to specify custom index attributes and metadata attributes as well as use that metadata in filtered queries. Overall, more structure was introduced to the Redis implementation that should allow for easier maintainability moving forward. # New Features The following features are now available with the Redis integration into Langchain ## Index schema generation The schema for the index will now be automatically generated if not specified by the user. For example, the data above has the multiple metadata categories. The the following example ```python from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores.redis import Redis embeddings = OpenAIEmbeddings() rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` Loading the data in through this and the other ``from_documents`` and ``from_texts`` methods will now generate index schema in Redis like the following. view index schema with the ``redisvl`` tool. [link](redisvl.com) ```bash $ rvl index info -i users ``` Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|---------------\|-----------------\|------------\| \| users \| HASH \| ['doc:users'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Metadata specification The metadata schema generation has the following rules 1. All text fields are indexed as text fields. 2. All numeric fields are index as numeric fields. If you would like to have a text field as a tag field, users can specify overrides like the following for the example data ```python # this can also be a path to a yaml file index_schema = { "text": [{"name": "user"}, {"name": "job"}], "tag": [{"name": "credit_score"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` This will change the index specification to Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users2 \| HASH \| ['doc:users2'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| and throw a warning to the user (log output) that the generated schema does not match the specified schema. ```text index_schema does not match generated schema from metadata. index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} ``` As long as this is on purpose, this is fine. The schema can be defined as a yaml file or a dictionary ```yaml text: - name: user - name: job tag: - name: credit_score numeric: - name: age ``` and you pass in a path like ```python rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", index_schema=Path("sample1.yml").resolve() ) ``` Which will create the same schema as defined in the dictionary example Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users3 \| HASH \| ['doc:users3'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Vector Indexing Schema Users with large use cases may want to change how they formulate the vector index created by Langchain To utilize all the features of Redis for vector database use cases like this, you can now do the following to pass in index attribute modifiers like changing the indexing algorithm to HNSW. ```python vector_schema = { "algorithm": "HNSW" } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` A more complex example may look like ```python vector_schema = { "algorithm": "HNSW", "ef_construction": 200, "ef_runtime": 20 } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` All names correspond to the arguments you would set if using Redis-py or RedisVL. (put in doc link later) ### Better Querying Both vector queries and Range (limit) queries are now available and metadata is returned by default. The outputs are shown. ```python >>> query = "foo" >>> results = rds.similarity_search(query, k=1) >>> print(results) [Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})] >>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False) >>> print(results) # no metadata, but with scores [(Document(page_content='foo', metadata={}), 7.15255737305e-07)] >>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001) >>> print(len(results)) # range query (only above threshold even if k is higher) 4 ``` ### Custom metadata filtering A big advantage of Redis in this space is being able to do filtering on data stored alongside the vector itself. With the example above, the following is now possible in langchain. The equivalence operators are overridden to describe a new expression language that mimic that of [redisvl](redisvl.com). This allows for arbitrarily long sequences of filters that resemble SQL commands that can be used directly with vector queries and range queries. There are two interfaces by which to do so and both are shown. ```python >>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText >>> age_filter = RedisFilter.num("age") > 18 >>> age_filter = RedisNum("age") > 18 # equivalent >>> results = rds.similarity_search(query, filter=age_filter) >>> print(len(results)) 3 >>> job_filter = RedisFilter.text("job") == "engineer" >>> job_filter = RedisText("job") == "engineer" # equivalent >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # fuzzy match text search >>> job_filter = RedisFilter.text("job") % "eng*" >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # combined filters (AND) >>> combined = age_filter & job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 1 # combined filters (OR) >>> combined = age_filter \| job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 4 ``` All the above filter results can be checked against the data above. ### Other - Issue: #3967 - Dependencies: No added dependencies - Tag maintainer: @hwchase17 @baskaryan @rlancemartin - Twitter handle: @sampartee --------- Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-25 17:22:50 -07:00
Anish Shah	fa0b8f3368	fix broken wandb link in debugging page (#9771 ) - Description: Fix broken hyperlink in debugging page	2023-08-25 15:34:08 -07:00
Monami Sharma	12a373810c	Fixing broken links to Moderation and Constitutional chain (#9768 ) - Description: Fixing broken links for Moderation and Constitutional chain - Issue: N/A - Twitter handle: MonamiSharma	2023-08-25 15:19:32 -07:00
nikhilkjha	d57d08fd01	Initial commit for comprehend moderator (#9665 ) This PR implements a custom chain that wraps Amazon Comprehend API calls. The custom chain is aimed to be used with LLM chains to provide moderation capability that let’s you detect and redact PII, Toxic and Intent content in the LLM prompt, or the LLM response. The implementation accepts a configuration object to control what checks will be performed on a LLM prompt and can be used in a variety of setups using the LangChain expression language to not only detect the configured info in chains, but also other constructs such as a retriever. The included sample notebook goes over the different configuration options and how to use it with other chains. ### Usage sample ```python from langchain_experimental.comprehend_moderation import BaseModerationActions, BaseModerationFilters moderation_config = { "filters":[ BaseModerationFilters.PII, BaseModerationFilters.TOXICITY, BaseModerationFilters.INTENT ], "pii":{ "action": BaseModerationActions.ALLOW, "threshold":0.5, "labels":["SSN"], "mask_character": "X" }, "toxicity":{ "action": BaseModerationActions.STOP, "threshold":0.5 }, "intent":{ "action": BaseModerationActions.STOP, "threshold":0.5 } } comp_moderation_with_config = AmazonComprehendModerationChain( moderation_config=moderation_config, #specify the configuration client=comprehend_client, #optionally pass the Boto3 Client verbose=True ) template = """Question: {question} Answer:""" prompt = PromptTemplate(template=template, input_variables=["question"]) responses = [ "Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like 323-22-9980. John Doe's phone number is (999)253-9876.", "Final Answer: This is a really shitty way of constructing a birdhouse. This is fucking insane to think that any birds would actually create their motherfucking nests here." ] llm = FakeListLLM(responses=responses) llm_chain = LLMChain(prompt=prompt, llm=llm) chain = ( prompt \| comp_moderation_with_config \| {llm_chain.input_keys[0]: lambda x: x['output'] } \| llm_chain \| { "input": lambda x: x['text'] } \| comp_moderation_with_config ) response = chain.invoke({"question": "A sample SSN number looks like this 123-456-7890. Can you give me some more samples?"}) print(response['output']) ``` ### Output ``` > Entering new AmazonComprehendModerationChain chain... Running AmazonComprehendModerationChain... Running pii validation... Found PII content..stopping.. The prompt contains PII entities and cannot be processed ``` --------- Co-authored-by: Piyush Jain <piyushjain@duck.com> Co-authored-by: Anjan Biswas <anjanavb@amazon.com> Co-authored-by: Jha <nikjha@amazon.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-25 15:11:27 -07:00
Lance Martin	4339d21cf1	Code LLaMA in code understanding use case (#9779 ) Update Code Understanding use case doc w/ Code-llama.	2023-08-25 14:24:38 -07:00
William FH	1960ac8d25	token chunks (#9739 ) Co-authored-by: Andrew <abatutin@gmail.com>	2023-08-25 12:52:07 -07:00
Lance Martin	2ab04a4e32	Update agent docs, move to use-case sub-directory (#9344 ) Re-structure and add new agent page	2023-08-25 11:28:55 -07:00
Lance Martin	985873c497	Update RAG use case (move to ntbk) (#9340 )	2023-08-25 11:27:27 -07:00
Harrison Chase	709a67d9bf	multivector notebook (#9740 )	2023-08-25 07:07:27 -07:00
Bagatur	9731ce5a40	bump 273 (#9751 )	2023-08-25 03:05:04 -07:00
Fabrizio Ruocco	cacaf487c3	Azure Cognitive Search - update sdk b8, mod user agent, search with scores (#9191 ) Description: Update Azure Cognitive Search SDK to version b8 (breaking change) Customizable User Agent. Implemented Similarity search with scores @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-25 02:34:09 -07:00
Sergey Kozlov	135cb86215	Fix QuestionListOutputParser (#9738 ) This PR fixes `QuestionListOutputParser` text splitting. `QuestionListOutputParser` incorrectly splits numbered list text into lines. If text doesn't end with `\n` , the regex doesn't capture the last item. So it always returns `n - 1` items, and `WebResearchRetriever.llm_chain` generates less queries than requested in the search prompt. How to reproduce: ```python from langchain.retrievers.web_research import QuestionListOutputParser parser = QuestionListOutputParser() good = parser.parse( """1. This is line one. 2. This is line two. """ # <-- ! ) bad = parser.parse( """1. This is line one. 2. This is line two.""" # <-- No new line. ) assert good.lines == ['1. This is line one.\n', '2. This is line two.\n'], good.lines assert bad.lines == ['1. This is line one.\n', '2. This is line two.'], bad.lines ``` NOTE: Last item will not contain a line break but this seems ok because the items are stripped in the `WebResearchRetriever.clean_search_query()`.	2023-08-25 01:47:17 -07:00
Jurik-001	d04fe0d3ea	remove Value error "pyspark is not installed. Please install it with `pip i… (#9723 ) Description: You cannot execute spark_sql with versions prior to 3.4 due to the introduction of pyspark.errors in version 3.4. And if you are below you get 3.4 "pyspark is not installed. Please install it with pip nstall pyspark" which is not helpful. Also if you not have pyspark installed you get already the error in init. I would return all errors. But if you have a different idea feel free to comment. Issue: None Dependencies: None Maintainer: --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-24 22:18:55 -07:00
Margaret Qian	30151c99c7	Update Mosaic endpoint input/output api (#7391 ) As noted in prior PRs (https://github.com/hwchase17/langchain/pull/6060, https://github.com/hwchase17/langchain/pull/7348), the input/output format has changed a few times as we've stabilized our inference API. This PR updates the API to the latest stable version as indicated in our docs: https://docs.mosaicml.com/en/latest/inference.html The input format looks like this: `{"inputs": [<prompt>]} ` The output format looks like this: ` {"outputs": [<output_text>]} ` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-24 22:13:17 -07:00
Harrison Chase	ade482c17e	add twitter chat loader doc (#9737 )	2023-08-24 21:55:22 -07:00
Leonid Kuligin	87da56fb1e	Added a pdf parser based on DocAI (#9579 ) #9578 --------- Co-authored-by: Leonid Kuligin <kuligin@google.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-24 21:44:49 -07:00
Naama Magami	adb21782b8	Add del vector pgvector + adding modification time to confluence and google drive docs (#9604 ) Description: - adding implementation of delete for pgvector - adding modification time in docs metadata for confluence and google drive. Issue: https://github.com/langchain-ai/langchain/issues/9312 Tag maintainer: @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-24 21:09:30 -07:00

1 2 3 4 5 ...

4116 Commits