2023-04-29 03:47:18 +00:00
|
|
|
"""Test Redis cache functionality."""
|
2024-02-10 00:13:30 +00:00
|
|
|
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
import uuid
|
2024-02-08 03:06:09 +00:00
|
|
|
from contextlib import asynccontextmanager, contextmanager
|
|
|
|
from typing import AsyncGenerator, Generator, List, Optional, cast
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
|
2023-07-21 02:00:05 +00:00
|
|
|
import pytest
|
2024-05-08 20:46:52 +00:00
|
|
|
from langchain.globals import get_llm_cache, set_llm_cache
|
2023-11-21 16:35:29 +00:00
|
|
|
from langchain_core.embeddings import Embeddings
|
2023-11-20 21:09:30 +00:00
|
|
|
from langchain_core.load.dump import dumps
|
2023-11-21 16:35:29 +00:00
|
|
|
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage
|
|
|
|
from langchain_core.outputs import ChatGeneration, Generation, LLMResult
|
2023-04-29 03:47:18 +00:00
|
|
|
|
2024-05-08 20:46:52 +00:00
|
|
|
from langchain_community.cache import AsyncRedisCache, RedisCache, RedisSemanticCache
|
2023-12-11 21:53:30 +00:00
|
|
|
from tests.integration_tests.cache.fake_embeddings import (
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
ConsistentFakeEmbeddings,
|
|
|
|
FakeEmbeddings,
|
|
|
|
)
|
2023-07-21 02:00:05 +00:00
|
|
|
from tests.unit_tests.llms.fake_chat_model import FakeChatModel
|
2023-04-29 03:47:18 +00:00
|
|
|
from tests.unit_tests.llms.fake_llm import FakeLLM
|
|
|
|
|
2024-02-08 03:06:09 +00:00
|
|
|
# Using a non-standard port to avoid conflicts with potentially local running
|
|
|
|
# redis instances
|
|
|
|
# You can spin up a local redis using docker compose
|
|
|
|
# cd [repository-root]/docker
|
|
|
|
# docker-compose up redis
|
|
|
|
REDIS_TEST_URL = "redis://localhost:6020"
|
2023-04-29 03:47:18 +00:00
|
|
|
|
|
|
|
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
def random_string() -> str:
|
|
|
|
return str(uuid.uuid4())
|
|
|
|
|
|
|
|
|
2024-02-08 03:06:09 +00:00
|
|
|
@contextmanager
|
|
|
|
def get_sync_redis(*, ttl: Optional[int] = 1) -> Generator[RedisCache, None, None]:
|
|
|
|
"""Get a sync RedisCache instance."""
|
2023-08-14 16:59:18 +00:00
|
|
|
import redis
|
|
|
|
|
2024-02-08 03:06:09 +00:00
|
|
|
cache = RedisCache(redis_=redis.Redis.from_url(REDIS_TEST_URL), ttl=ttl)
|
|
|
|
try:
|
|
|
|
yield cache
|
|
|
|
finally:
|
|
|
|
cache.clear()
|
2023-08-14 16:59:18 +00:00
|
|
|
|
|
|
|
|
2024-02-08 03:06:09 +00:00
|
|
|
@asynccontextmanager
|
|
|
|
async def get_async_redis(
|
|
|
|
*, ttl: Optional[int] = 1
|
|
|
|
) -> AsyncGenerator[AsyncRedisCache, None]:
|
|
|
|
"""Get an async RedisCache instance."""
|
|
|
|
from redis.asyncio import Redis
|
2023-08-08 21:55:25 +00:00
|
|
|
|
2024-02-08 03:06:09 +00:00
|
|
|
cache = AsyncRedisCache(redis_=Redis.from_url(REDIS_TEST_URL), ttl=ttl)
|
|
|
|
try:
|
|
|
|
yield cache
|
|
|
|
finally:
|
|
|
|
await cache.aclear()
|
|
|
|
|
|
|
|
|
|
|
|
def test_redis_cache_ttl() -> None:
|
|
|
|
from redis import Redis
|
|
|
|
|
|
|
|
with get_sync_redis() as llm_cache:
|
|
|
|
set_llm_cache(llm_cache)
|
|
|
|
llm_cache.update("foo", "bar", [Generation(text="fizz")])
|
|
|
|
key = llm_cache._key("foo", "bar")
|
|
|
|
assert isinstance(llm_cache.redis, Redis)
|
|
|
|
assert llm_cache.redis.pttl(key) > 0
|
|
|
|
|
|
|
|
|
|
|
|
async def test_async_redis_cache_ttl() -> None:
|
|
|
|
from redis.asyncio import Redis as AsyncRedis
|
|
|
|
|
|
|
|
async with get_async_redis() as redis_cache:
|
|
|
|
set_llm_cache(redis_cache)
|
|
|
|
llm_cache = cast(RedisCache, get_llm_cache())
|
|
|
|
await llm_cache.aupdate("foo", "bar", [Generation(text="fizz")])
|
|
|
|
key = llm_cache._key("foo", "bar")
|
|
|
|
assert isinstance(llm_cache.redis, AsyncRedis)
|
|
|
|
assert await llm_cache.redis.pttl(key) > 0
|
|
|
|
|
|
|
|
|
|
|
|
def test_sync_redis_cache() -> None:
|
|
|
|
with get_sync_redis() as llm_cache:
|
|
|
|
set_llm_cache(llm_cache)
|
|
|
|
llm = FakeLLM()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
|
|
|
llm_cache.update("prompt", llm_string, [Generation(text="fizz0")])
|
|
|
|
output = llm.generate(["prompt"])
|
|
|
|
expected_output = LLMResult(
|
|
|
|
generations=[[Generation(text="fizz0")]],
|
|
|
|
llm_output={},
|
|
|
|
)
|
|
|
|
assert output == expected_output
|
|
|
|
|
|
|
|
|
|
|
|
async def test_sync_in_async_redis_cache() -> None:
|
|
|
|
"""Test the sync RedisCache invoked with async methods"""
|
|
|
|
with get_sync_redis() as llm_cache:
|
|
|
|
set_llm_cache(llm_cache)
|
|
|
|
llm = FakeLLM()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
|
|
|
# llm_cache.update("meow", llm_string, [Generation(text="meow")])
|
|
|
|
await llm_cache.aupdate("prompt", llm_string, [Generation(text="fizz1")])
|
|
|
|
output = await llm.agenerate(["prompt"])
|
|
|
|
expected_output = LLMResult(
|
|
|
|
generations=[[Generation(text="fizz1")]],
|
|
|
|
llm_output={},
|
|
|
|
)
|
|
|
|
assert output == expected_output
|
|
|
|
|
|
|
|
|
|
|
|
async def test_async_redis_cache() -> None:
|
|
|
|
async with get_async_redis() as redis_cache:
|
|
|
|
set_llm_cache(redis_cache)
|
|
|
|
llm = FakeLLM()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
|
|
|
llm_cache = cast(RedisCache, get_llm_cache())
|
|
|
|
await llm_cache.aupdate("prompt", llm_string, [Generation(text="fizz2")])
|
|
|
|
output = await llm.agenerate(["prompt"])
|
|
|
|
expected_output = LLMResult(
|
|
|
|
generations=[[Generation(text="fizz2")]],
|
|
|
|
llm_output={},
|
|
|
|
)
|
|
|
|
assert output == expected_output
|
|
|
|
|
|
|
|
|
|
|
|
async def test_async_in_sync_redis_cache() -> None:
|
|
|
|
async with get_async_redis() as redis_cache:
|
|
|
|
set_llm_cache(redis_cache)
|
|
|
|
llm = FakeLLM()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
|
|
|
llm_cache = cast(RedisCache, get_llm_cache())
|
|
|
|
with pytest.raises(NotImplementedError):
|
|
|
|
llm_cache.update("foo", llm_string, [Generation(text="fizz")])
|
2023-04-29 03:47:18 +00:00
|
|
|
|
|
|
|
|
2023-07-21 02:00:05 +00:00
|
|
|
def test_redis_cache_chat() -> None:
|
2024-02-08 03:06:09 +00:00
|
|
|
with get_sync_redis() as redis_cache:
|
|
|
|
set_llm_cache(redis_cache)
|
|
|
|
llm = FakeChatModel()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
|
|
|
prompt: List[BaseMessage] = [HumanMessage(content="foo")]
|
|
|
|
llm_cache = cast(RedisCache, get_llm_cache())
|
|
|
|
llm_cache.update(
|
|
|
|
dumps(prompt),
|
|
|
|
llm_string,
|
|
|
|
[ChatGeneration(message=AIMessage(content="fizz"))],
|
|
|
|
)
|
|
|
|
output = llm.generate([prompt])
|
|
|
|
expected_output = LLMResult(
|
|
|
|
generations=[[ChatGeneration(message=AIMessage(content="fizz"))]],
|
|
|
|
llm_output={},
|
|
|
|
)
|
|
|
|
assert output == expected_output
|
2023-08-08 21:55:25 +00:00
|
|
|
|
2024-02-08 03:06:09 +00:00
|
|
|
|
|
|
|
async def test_async_redis_cache_chat() -> None:
|
|
|
|
async with get_async_redis() as redis_cache:
|
|
|
|
set_llm_cache(redis_cache)
|
|
|
|
llm = FakeChatModel()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
|
|
|
prompt: List[BaseMessage] = [HumanMessage(content="foo")]
|
|
|
|
llm_cache = cast(RedisCache, get_llm_cache())
|
|
|
|
await llm_cache.aupdate(
|
|
|
|
dumps(prompt),
|
|
|
|
llm_string,
|
|
|
|
[ChatGeneration(message=AIMessage(content="fizz"))],
|
|
|
|
)
|
|
|
|
output = await llm.agenerate([prompt])
|
|
|
|
expected_output = LLMResult(
|
|
|
|
generations=[[ChatGeneration(message=AIMessage(content="fizz"))]],
|
|
|
|
llm_output={},
|
|
|
|
)
|
|
|
|
assert output == expected_output
|
2023-07-21 02:00:05 +00:00
|
|
|
|
|
|
|
|
2023-04-29 03:47:18 +00:00
|
|
|
def test_redis_semantic_cache() -> None:
|
2024-02-08 03:06:09 +00:00
|
|
|
"""Test redis semantic cache functionality."""
|
2023-10-14 16:29:30 +00:00
|
|
|
set_llm_cache(
|
|
|
|
RedisSemanticCache(
|
|
|
|
embedding=FakeEmbeddings(), redis_url=REDIS_TEST_URL, score_threshold=0.1
|
|
|
|
)
|
2023-04-29 03:47:18 +00:00
|
|
|
)
|
|
|
|
llm = FakeLLM()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
2024-02-08 03:06:09 +00:00
|
|
|
llm_cache = cast(RedisSemanticCache, get_llm_cache())
|
|
|
|
llm_cache.update("foo", llm_string, [Generation(text="fizz")])
|
2023-04-29 03:47:18 +00:00
|
|
|
output = llm.generate(
|
|
|
|
["bar"]
|
|
|
|
) # foo and bar will have the same embedding produced by FakeEmbeddings
|
|
|
|
expected_output = LLMResult(
|
|
|
|
generations=[[Generation(text="fizz")]],
|
|
|
|
llm_output={},
|
|
|
|
)
|
|
|
|
assert output == expected_output
|
|
|
|
# clear the cache
|
2024-02-08 03:06:09 +00:00
|
|
|
llm_cache.clear(llm_string=llm_string)
|
2023-04-29 03:47:18 +00:00
|
|
|
output = llm.generate(
|
|
|
|
["bar"]
|
|
|
|
) # foo and bar will have the same embedding produced by FakeEmbeddings
|
|
|
|
# expect different output now without cached result
|
|
|
|
assert output != expected_output
|
2024-02-08 03:06:09 +00:00
|
|
|
llm_cache.clear(llm_string=llm_string)
|
2023-07-21 02:00:05 +00:00
|
|
|
|
|
|
|
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
def test_redis_semantic_cache_multi() -> None:
|
2023-10-14 16:29:30 +00:00
|
|
|
set_llm_cache(
|
|
|
|
RedisSemanticCache(
|
|
|
|
embedding=FakeEmbeddings(), redis_url=REDIS_TEST_URL, score_threshold=0.1
|
|
|
|
)
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
)
|
|
|
|
llm = FakeLLM()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
2024-02-08 03:06:09 +00:00
|
|
|
llm_cache = cast(RedisSemanticCache, get_llm_cache())
|
|
|
|
llm_cache.update(
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
"foo", llm_string, [Generation(text="fizz"), Generation(text="Buzz")]
|
|
|
|
)
|
|
|
|
output = llm.generate(
|
|
|
|
["bar"]
|
|
|
|
) # foo and bar will have the same embedding produced by FakeEmbeddings
|
|
|
|
expected_output = LLMResult(
|
|
|
|
generations=[[Generation(text="fizz"), Generation(text="Buzz")]],
|
|
|
|
llm_output={},
|
|
|
|
)
|
|
|
|
assert output == expected_output
|
|
|
|
# clear the cache
|
2024-02-08 03:06:09 +00:00
|
|
|
llm_cache.clear(llm_string=llm_string)
|
2023-08-08 21:55:25 +00:00
|
|
|
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
|
|
|
|
def test_redis_semantic_cache_chat() -> None:
|
2023-10-14 16:29:30 +00:00
|
|
|
set_llm_cache(
|
|
|
|
RedisSemanticCache(
|
|
|
|
embedding=FakeEmbeddings(), redis_url=REDIS_TEST_URL, score_threshold=0.1
|
|
|
|
)
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
)
|
2023-07-21 02:00:05 +00:00
|
|
|
llm = FakeChatModel()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
2023-10-05 23:34:07 +00:00
|
|
|
prompt: List[BaseMessage] = [HumanMessage(content="foo")]
|
2024-02-08 03:06:09 +00:00
|
|
|
llm_cache = cast(RedisSemanticCache, get_llm_cache())
|
|
|
|
llm_cache.update(
|
2023-10-05 23:34:07 +00:00
|
|
|
dumps(prompt), llm_string, [ChatGeneration(message=AIMessage(content="fizz"))]
|
|
|
|
)
|
|
|
|
output = llm.generate([prompt])
|
|
|
|
expected_output = LLMResult(
|
|
|
|
generations=[[ChatGeneration(message=AIMessage(content="fizz"))]],
|
|
|
|
llm_output={},
|
|
|
|
)
|
|
|
|
assert output == expected_output
|
2024-02-08 03:06:09 +00:00
|
|
|
llm_cache.clear(llm_string=llm_string)
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.parametrize("embedding", [ConsistentFakeEmbeddings()])
|
|
|
|
@pytest.mark.parametrize(
|
|
|
|
"prompts, generations",
|
|
|
|
[
|
|
|
|
# Single prompt, single generation
|
|
|
|
([random_string()], [[random_string()]]),
|
|
|
|
# Single prompt, multiple generations
|
|
|
|
([random_string()], [[random_string(), random_string()]]),
|
|
|
|
# Single prompt, multiple generations
|
|
|
|
([random_string()], [[random_string(), random_string(), random_string()]]),
|
|
|
|
# Multiple prompts, multiple generations
|
|
|
|
(
|
|
|
|
[random_string(), random_string()],
|
|
|
|
[[random_string()], [random_string(), random_string()]],
|
|
|
|
),
|
|
|
|
],
|
|
|
|
ids=[
|
|
|
|
"single_prompt_single_generation",
|
|
|
|
"single_prompt_multiple_generations",
|
|
|
|
"single_prompt_multiple_generations",
|
|
|
|
"multiple_prompts_multiple_generations",
|
|
|
|
],
|
|
|
|
)
|
|
|
|
def test_redis_semantic_cache_hit(
|
|
|
|
embedding: Embeddings, prompts: List[str], generations: List[List[str]]
|
|
|
|
) -> None:
|
2023-10-14 16:29:30 +00:00
|
|
|
set_llm_cache(RedisSemanticCache(embedding=embedding, redis_url=REDIS_TEST_URL))
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
|
|
|
|
llm = FakeLLM()
|
|
|
|
params = llm.dict()
|
|
|
|
params["stop"] = None
|
|
|
|
llm_string = str(sorted([(k, v) for k, v in params.items()]))
|
|
|
|
|
|
|
|
llm_generations = [
|
|
|
|
[
|
|
|
|
Generation(text=generation, generation_info=params)
|
|
|
|
for generation in prompt_i_generations
|
|
|
|
]
|
|
|
|
for prompt_i_generations in generations
|
|
|
|
]
|
2024-02-08 03:06:09 +00:00
|
|
|
llm_cache = cast(RedisSemanticCache, get_llm_cache())
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
for prompt_i, llm_generations_i in zip(prompts, llm_generations):
|
2024-02-10 00:13:30 +00:00
|
|
|
print(prompt_i) # noqa: T201
|
|
|
|
print(llm_generations_i) # noqa: T201
|
2024-02-08 03:06:09 +00:00
|
|
|
llm_cache.update(prompt_i, llm_string, llm_generations_i)
|
Redis metadata filtering and specification, index customization (#8612)
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-26 00:22:50 +00:00
|
|
|
llm.generate(prompts)
|
|
|
|
assert llm.generate(prompts) == LLMResult(
|
|
|
|
generations=llm_generations, llm_output={}
|
|
|
|
)
|