langchain

Commit Graph

Author	SHA1	Message	Date
Tudor Golubenco	171b0b183b	Pre-release Xata version no longer required (#9915 ) Tiny PR: Since we've released version 1.0.0 of the python SDK, we no longer need to specify the pre-release version when pip installing.	1 year ago
Mike Nitsenko	c80e406e95	Cube semantic loader: allow cubes processing (#9927 ) We've started to receive feedback (after launch) that using only views is confusing. We're considering this as a good practice, as a view serves as a "facade" for your data - however, we decided to let users decide this on their own. Solves the questions from: - https://github.com/cube-js/cube/issues/7028 - https://github.com/langchain-ai/langchain/pull/9690	1 year ago
Nikhil Suresh	dd10cf945c	fixed minor linting issues	1 year ago
LiaoKong	8f8455b24d	fix a link name format to the dependents document	1 year ago
adilkhan	bbae8cb88f	Added runtime argument	1 year ago
Ofer Mendelevitch	4454204455	reformat black	1 year ago
Ofer Mendelevitch	318a21e267	fixed typo in spelling	1 year ago
hughcrt	e71f4760db	Change multiline comment width	1 year ago
Ofer Mendelevitch	a5450be32e	fixed lint	1 year ago
Ofer Mendelevitch	8b8d2a6535	fixed similarity_search_with_score to really use a score updated unit test with a test for score threshold Updated demo notebook	1 year ago
Ofer Mendelevitch	1b6947e56c	Merge branch 'langchain-ai:master' into master	1 year ago
hughcrt	7979cef06a	Replace `\|` by `Union`	1 year ago
Nikhil Suresh	23ef836b48	matches colon and any number of white spaces after colon	1 year ago
Ikko Eltociear Ashimine	766bbd6c6b	Fix typo in code_understanding.ipynb seperate -> separate	1 year ago
Nikhil Suresh	64eb5a6082	removed unnecessary white space in regex that breaks qa with sources chain	1 year ago
Nikhil Suresh	8a4670e127	updated formatting changes	1 year ago
Nikhil Suresh	b1f649bca5	fixed issue with white space and added unit tests	1 year ago
Nikhil Suresh	6d3485e798	fixed regex to match sources for all cases, also includes source	1 year ago
tongtie	82a3c2a557	docs: Fix the syntax error, replace "dotenv.load_env()" with "dotenv.load_dotenv()".	1 year ago
Mazhar (Taha) Mumbaiwala	e80834d783	docs: Fix spelling mistakes in Etherscan.ipynb (#9845 )	1 year ago
Philippe PRADOS	7fdb7439e0	Update google drive notebooks (#9851 ) Update google drive doc loader and retriever notebooks. Show how to use with langchain-googledrive package. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Xiaobing Mi	5d47833ae1	Fix typo in web_scraping.ipynb (#9835 )	1 year ago
Leonid Ganeline	b1bffea9c7	docs: fix for title of `llm_caching` nb (#9891 ) Fixed title for the `extras/integrations/llms/llm_caching.ipynb`. Existing title breaks the sorted order of items in the navbar. Updated some formatting.	1 year ago
Leonid Ganeline	e01b00aa54	docs: `ainetwork` update (#9871 ) * Added links to the AI Network * Made title consistent to other tool kits * Added `integrations/providers/` integration card page * No changes in the example code!	1 year ago
Predrag Gruevski	47499c6db4	Avoid `type: ignore` suppression by adding mypy type hint. (#9881 ) Mypy was not able to determine a good type for `type_to_loader_dict`, since the values in the dict are functions whose return types are related to each other in a complex way. One can see this by adding a line like `reveal_type(type_to_loader_dict)` and running mypy, which will get mypy to show what type it has inferred for that value. Adding an explicit type hint to help out mypy avoids the need for a mypy suppression and allows the code to type-check cleanly.	1 year ago
maks-operlejn-ds	f327535eda	Add conftest file to langchain experimental (#9886 ) In order to use `requires` marker in langchain-experimental, there's a need for conftest.py file inside. Everything is identical to the main langchain module. Co-authored-by: maks-operlejn-ds <maks.operlejn@gmail.com>	1 year ago
Leonid Ganeline	cf122b6269	docs: `Infino` example fix (#9888 ) - Fixed a broken link in the `integrations/providers/infino.mdx` - Fixed a title in the `integration/collbacks/infino.ipynb` example - Updated text format in this example.	1 year ago
Piyush Jain	fe1b9ee6b8	Updated notebook for comprehend moderation (#9875 ) ### Description Updated the notebook for comprehend moderation. cc @baskaryan	1 year ago
William FH	907c57e324	Add collect_runs callback (#9885 )	1 year ago
William FH	3103f07e03	Use existing required args obj if specified (#9883 ) We always overwrote the required args but we infer them by default. Doing it only the old way makes it so the llm guesses even if an arg is optional (e.g., for uuids)	1 year ago
William FH	b14d74dd4d	iMessage loader (#9832 ) Add an iMessage chat loader	1 year ago
Lance Martin	8393ba9dab	Add instructions for GGUF (#9874 ) llama.cpp migrated to GGUF model format, and new releases (e.g., [here](https://huggingface.co/TheBloke)) now use GGUF.	1 year ago
Predrag Gruevski	eb3d1fa93c	Add security warning to experimental `SQLDatabaseChain` class. (#9867 ) The most reliable way to not have a chain run an undesirable SQL command is to not give it database permissions to run that command. That way the database itself performs the rule enforcement, so it's much easier to configure and use properly than anything we could add in ourselves.	1 year ago
hughcrt	3a4d4c940c	Change video width	1 year ago
hughcrt	97741d41c5	Add LLMonitorCallbackHandler	1 year ago
eryk-dsai	7f5713b80a	feat: grammar-based sampling in llama-cpp (#9712 ) ## Description The following PR enables the [grammar-based sampling](https://github.com/ggerganov/llama.cpp/tree/master/grammars) in llama-cpp LLM. In short, loading file with formal grammar definition will constrain model outputs. For instance, one can force the model to generate valid JSON or generate only python lists. In the follow-up PR we will add: * docs with some description why it is cool and how it works * maybe some code sample for some task such as in llama repo --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
William FH	cb642ef658	Return feedback (#9629 ) Return the feedback values in an eval run result Also made a helper method to display as a dataframe but it may be overkill	1 year ago
Bagatur	5e2d0cf54e	bump 275 (#9860 )	1 year ago
Predrag Gruevski	9aaa0fdce0	Use unified Python setup steps for release workflow.	1 year ago
Leonid Kuligin	00baddf34c	fixed enterprise search returning an empty array	1 year ago
XUEYANZ	f97d3a76e7	Update CONTRIBUTING.md (#9817 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. --> Hi LangChain :) Thank you for such a great project! I was going through the CONTRIBUTING.md and found a few minor issues.	1 year ago
Eugene Yurtsev	5edf819524	Qdrant Client: Expose instance for creating client (#9706 ) Expose classmethods to convenient initialize the vectostore. The purpose of this PR is to make it easy for users to initialize an empty vectorstore that's properly pre-configured without having to index documents into it via `from_documents`. This will make it easier for users to rely on the following indexing code: https://github.com/langchain-ai/langchain/pull/9614 to help manage data in the qdrant vectorstore.	1 year ago
Harrison Chase	610f46d83a	accept openai terms (#9826 )	1 year ago
Harrison Chase	c1badc1fa2	add gmail loader (#9810 )	1 year ago
Bagatur	0d01cede03	bump 274 (#9805 )	1 year ago
Vikas Sheoran	63921e327d	docs: Fix a spelling mistake in adding_memory.ipynb (#9794 ) # Description This pull request fixes a small spelling mistake found while reading docs.	1 year ago
Rosário P. Fernandes	aab01b55db	typo: funtions --> functions (#9784 ) Minor typo in the extractions use-case	1 year ago
Nikhil Suresh	0da5803f5a	fixed regex to match sources for all cases, also includes source (#9775 ) - Description: Updated the regex to handle all the different cases for string matching (SOURCES, sources, Sources), - Issue: https://github.com/langchain-ai/langchain/issues/9774 - Dependencies: N/A	1 year ago
Sam Partee	a28eea5767	Redis metadata filtering and specification, index customization (#8612 ) ### Description The previous Redis implementation did not allow for the user to specify the index configuration (i.e. changing the underlying algorithm) or add additional metadata to use for querying (i.e. hybrid or "filtered" search). This PR introduces the ability to specify custom index attributes and metadata attributes as well as use that metadata in filtered queries. Overall, more structure was introduced to the Redis implementation that should allow for easier maintainability moving forward. # New Features The following features are now available with the Redis integration into Langchain ## Index schema generation The schema for the index will now be automatically generated if not specified by the user. For example, the data above has the multiple metadata categories. The the following example ```python from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores.redis import Redis embeddings = OpenAIEmbeddings() rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` Loading the data in through this and the other ``from_documents`` and ``from_texts`` methods will now generate index schema in Redis like the following. view index schema with the ``redisvl`` tool. [link](redisvl.com) ```bash $ rvl index info -i users ``` Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|---------------\|-----------------\|------------\| \| users \| HASH \| ['doc:users'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Metadata specification The metadata schema generation has the following rules 1. All text fields are indexed as text fields. 2. All numeric fields are index as numeric fields. If you would like to have a text field as a tag field, users can specify overrides like the following for the example data ```python # this can also be a path to a yaml file index_schema = { "text": [{"name": "user"}, {"name": "job"}], "tag": [{"name": "credit_score"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` This will change the index specification to Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users2 \| HASH \| ['doc:users2'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| and throw a warning to the user (log output) that the generated schema does not match the specified schema. ```text index_schema does not match generated schema from metadata. index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} ``` As long as this is on purpose, this is fine. The schema can be defined as a yaml file or a dictionary ```yaml text: - name: user - name: job tag: - name: credit_score numeric: - name: age ``` and you pass in a path like ```python rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", index_schema=Path("sample1.yml").resolve() ) ``` Which will create the same schema as defined in the dictionary example Index Information: \| Index Name \| Storage Type \| Prefixes \| Index Options \| Indexing \| \|--------------\|----------------\|----------------\|-----------------\|------------\| \| users3 \| HASH \| ['doc:users3'] \| [] \| 0 \| Index Fields: \| Name \| Attribute \| Type \| Field Option \| Option Value \| \|----------------\|----------------\|---------\|----------------\|----------------\| \| user \| user \| TEXT \| WEIGHT \| 1 \| \| job \| job \| TEXT \| WEIGHT \| 1 \| \| content \| content \| TEXT \| WEIGHT \| 1 \| \| credit_score \| credit_score \| TAG \| SEPARATOR \| , \| \| age \| age \| NUMERIC \| \| \| \| content_vector \| content_vector \| VECTOR \| \| \| ### Custom Vector Indexing Schema Users with large use cases may want to change how they formulate the vector index created by Langchain To utilize all the features of Redis for vector database use cases like this, you can now do the following to pass in index attribute modifiers like changing the indexing algorithm to HNSW. ```python vector_schema = { "algorithm": "HNSW" } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` A more complex example may look like ```python vector_schema = { "algorithm": "HNSW", "ef_construction": 200, "ef_runtime": 20 } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` All names correspond to the arguments you would set if using Redis-py or RedisVL. (put in doc link later) ### Better Querying Both vector queries and Range (limit) queries are now available and metadata is returned by default. The outputs are shown. ```python >>> query = "foo" >>> results = rds.similarity_search(query, k=1) >>> print(results) [Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})] >>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False) >>> print(results) # no metadata, but with scores [(Document(page_content='foo', metadata={}), 7.15255737305e-07)] >>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001) >>> print(len(results)) # range query (only above threshold even if k is higher) 4 ``` ### Custom metadata filtering A big advantage of Redis in this space is being able to do filtering on data stored alongside the vector itself. With the example above, the following is now possible in langchain. The equivalence operators are overridden to describe a new expression language that mimic that of [redisvl](redisvl.com). This allows for arbitrarily long sequences of filters that resemble SQL commands that can be used directly with vector queries and range queries. There are two interfaces by which to do so and both are shown. ```python >>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText >>> age_filter = RedisFilter.num("age") > 18 >>> age_filter = RedisNum("age") > 18 # equivalent >>> results = rds.similarity_search(query, filter=age_filter) >>> print(len(results)) 3 >>> job_filter = RedisFilter.text("job") == "engineer" >>> job_filter = RedisText("job") == "engineer" # equivalent >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # fuzzy match text search >>> job_filter = RedisFilter.text("job") % "eng*" >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # combined filters (AND) >>> combined = age_filter & job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 1 # combined filters (OR) >>> combined = age_filter \| job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 4 ``` All the above filter results can be checked against the data above. ### Other - Issue: #3967 - Dependencies: No added dependencies - Tag maintainer: @hwchase17 @baskaryan @rlancemartin - Twitter handle: @sampartee --------- Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Anish Shah	fa0b8f3368	fix broken wandb link in debugging page (#9771 ) - Description: Fix broken hyperlink in debugging page	1 year ago

... 3 4 5 6 7 ...

4341 Commits (9bcfd58580c52dd49eecc7503934d13c3dc89ec5) All Branches Search

4341 Commits (9bcfd58580c52dd49eecc7503934d13c3dc89ec5)

All Branches