- Description:
* Baidu AI Cloud's [Qianfan
Platform](https://cloud.baidu.com/doc/WENXINWORKSHOP/index.html) is an
all-in-one platform for large model development and service deployment,
catering to enterprise developers in China. Qianfan Platform offers a
wide range of resources, including the Wenxin Yiyan model (ERNIE-Bot)
and various third-party open-source models.
- Issue: none
- Dependencies:
* qianfan
- Tag maintainer: @baskaryan
- Twitter handle:
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
The `self-que[ring`
navbar](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)
has repeated `self-quering` repeated in each menu item. I've simplified
it to be more readable
- removed `self-quering` from a title of each page;
- added description to the vector stores
- added description and link to the Integration Card
(`integrations/providers`) of the vector stores when they are missed.
This PR addresses a few minor issues with the Cassandra vector store
implementation and extends the store to support Metadata search.
Thanks to the latest cassIO library (>=0.1.0), metadata filtering is
available in the store.
Further,
- the "relevance" score is prevented from being flipped in the [0,1]
interval, thus ensuring that 1 corresponds to the closest vector (this
is related to how the underlying cassIO class returns the cosine
difference);
- bumped the cassIO package version both in the notebooks and the
pyproject.toml;
- adjusted the textfile location for the vector-store example after the
reshuffling of the Langchain repo dir structure;
- added demonstration of metadata filtering in the Cassandra vector
store notebook;
- better docstring for the Cassandra vector store class;
- fixed test flakiness and removed offending out-of-place escape chars
from a test module docstring;
To my knowledge all relevant tests pass and mypy+black+ruff don't
complain. (mypy gives unrelated errors in other modules, which clearly
don't depend on the content of this PR).
Thank you!
Stefano
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
* More clarity around how geometry is handled. Not returned by default;
when returned, stored in metadata. This is because it's usually a waste
of tokens, but it should be accessible if needed.
* User can supply layer description to avoid errors when layer
properties are inaccessible due to passthrough access.
* Enhanced testing
* Updated notebook
---------
Co-authored-by: Connor Sutton <connor.sutton@swca.com>
Co-authored-by: connorsutton <135151649+connorsutton@users.noreply.github.com>
**Description:**
The latest version of HazyResearch/manifest doesn't support accessing
the "client" directly. The latest version supports connection pools and
a client has to be requested from the client pool.
**Issue:**
No matching issue was found
**Dependencies:**
The manifest.ipynb file in docs/extras/integrations/llms need to be
updated
**Twitter handle:**
@hrk_cbe
## Description:
I've integrated CTranslate2 with LangChain. CTranlate2 is a recently
popular library for efficient inference with Transformer models that
compares favorably to alternatives such as HF Text Generation Inference
and vLLM in
[benchmarks](https://hamel.dev/notes/llm/inference/03_inference.html).
Hi @baskaryan,
I've made updates to LLMonitorCallbackHandler to address a few bugs
reported by users
These changes don't alter the fundamental behavior of the callback
handler.
Thanks you!
---------
Co-authored-by: vincelwt <vince@lyser.io>
_Thank you to the LangChain team for the great project and in advance
for your review. Let me know if I can provide any other additional
information or do things differently in the future to make your lives
easier 🙏 _
@hwchase17 please let me know if you're not the right person to review 😄
This PR enables LangChain to access the Konko API via the chat_models
API wrapper.
Konko API is a fully managed API designed to help application
developers:
1. Select the right LLM(s) for their application
2. Prototype with various open-source and proprietary LLMs
3. Move to production in-line with their security, privacy, throughput,
latency SLAs without infrastructure set-up or administration using Konko
AI's SOC 2 compliant infrastructure
_Note on integration tests:_
We added 14 integration tests. They will all fail unless you export the
right API keys. 13 will pass with a KONKO_API_KEY provided and the other
one will pass with a OPENAI_API_KEY provided. When both are provided,
all 14 integration tests pass. If you would like to test this yourself,
please let me know and I can provide some temporary keys.
### Installation and Setup
1. **First you'll need an API key**
2. **Install Konko AI's Python SDK**
1. Enable a Python3.8+ environment
`pip install konko`
3. **Set API Keys**
**Option 1:** Set Environment Variables
You can set environment variables for
1. KONKO_API_KEY (Required)
2. OPENAI_API_KEY (Optional)
In your current shell session, use the export command:
`export KONKO_API_KEY={your_KONKO_API_KEY_here}`
`export OPENAI_API_KEY={your_OPENAI_API_KEY_here} #Optional`
Alternatively, you can add the above lines directly to your shell
startup script (such as .bashrc or .bash_profile for Bash shell and
.zshrc for Zsh shell) to have them set automatically every time a new
shell session starts.
**Option 2:** Set API Keys Programmatically
If you prefer to set your API keys directly within your Python script or
Jupyter notebook, you can use the following commands:
```python
konko.set_api_key('your_KONKO_API_KEY_here')
konko.set_openai_api_key('your_OPENAI_API_KEY_here') # Optional
```
### Calling a model
Find a model on the [[Konko Introduction
page](https://docs.konko.ai/docs#available-models)](https://docs.konko.ai/docs#available-models)
For example, for this [[LLama 2
model](https://docs.konko.ai/docs/meta-llama-2-13b-chat)](https://docs.konko.ai/docs/meta-llama-2-13b-chat).
The model id would be: `"meta-llama/Llama-2-13b-chat-hf"`
Another way to find the list of models running on the Konko instance is
through this
[[endpoint](https://docs.konko.ai/reference/listmodels)](https://docs.konko.ai/reference/listmodels).
From here, we can initialize our model:
```python
chat_instance = ChatKonko(max_tokens=10, model = 'meta-llama/Llama-2-13b-chat-hf')
```
And run it:
```python
msg = HumanMessage(content="Hi")
chat_response = chat_instance([msg])
```
## Description
Adds Supabase Vector as a self-querying retriever.
- Designed to be backwards compatible with existing `filter` logic on
`SupabaseVectorStore`.
- Adds new filter `postgrest_filter` to `SupabaseVectorStore`
`similarity_search()` methods
- Supports entire PostgREST [filter query
language](https://postgrest.org/en/stable/references/api/tables_views.html#read)
(used by self-querying retriever, but also works as an escape hatch for
more query control)
- `SupabaseVectorTranslator` converts Langchain filter into the above
PostgREST query
- Adds Jupyter Notebook for the self-querying retriever
- Adds tests
## Tag maintainer
@hwchase17
## Twitter handle
[@ggrdson](https://twitter.com/ggrdson)
- Description: Adding support for self-querying to Vectara integration
- Issue: per customer request
- Tag maintainer: @rlancemartin @baskaryan
- Twitter handle: @ofermend
Also updated some documentation, added self-query testing, and a demo
notebook with self-query example.
# Description
This pull request allows to use the
[NucliaDB](https://docs.nuclia.dev/docs/docs/nucliadb/intro) as a vector
store in LangChain.
It works with both a [local NucliaDB
instance](https://docs.nuclia.dev/docs/docs/nucliadb/deploy/basics) or
with [Nuclia Cloud](https://nuclia.cloud).
# Dependencies
It requires an up-to-date version of the `nuclia` Python package.
@rlancemartin, @eyurtsev, @hinthornw, please review it when you have a
moment :)
Note: our Twitter handler is `@NucliaAI`
Follow-up PR for https://github.com/langchain-ai/langchain/pull/10047,
simply adding a notebook quickstart example for the vector store with
SQLite, using the class SQLiteVSS.
Maintainer tag @baskaryan
Co-authored-by: Philippe Oger <philippe.oger@adevinta.com>
- Description: this PR adds the possibility to configure boto3 in the S3
loaders. Any named argument you add will be used to create the Boto3
session. This is useful when the AWS credentials can't be passed as env
variables or can't be read from the credentials file.
- Issue: N/A
- Dependencies: N/A
- Tag maintainer: ?
- Twitter handle: cbornet_
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Various improvements to the Model I/O section of the documentation
- Changed "Chat Model" to "chat model" in a few spots for internal
consistency
- Minor spelling & grammar fixes to improve readability & comprehension
Enhance SerpApi response which potential to have more relevant output.
<img width="345" alt="Screenshot 2023-09-01 at 8 26 13 AM"
src="https://github.com/langchain-ai/langchain/assets/10222402/80ff684d-e02e-4143-b218-5c1b102cbf75">
Query: What is the weather in Pomfret?
**Before:**
> I should look up the current weather conditions.
...
Final Answer: The current weather in Pomfret is 73°F with 1% chance of
precipitation and winds at 10 mph.
**After:**
> I should look up the current weather conditions.
...
Final Answer: The current weather in Pomfret is 62°F, 1% precipitation,
61% humidity, and 4 mph wind.
---
Query: Top team in english premier league?
**Before:**
> I need to find out which team is currently at the top of the English
Premier League
...
Final Answer: Liverpool FC is currently at the top of the English
Premier League.
**After:**
> I need to find out which team is currently at the top of the English
Premier League
...
Final Answer: Man City is currently at the top of the English Premier
League.
---
Query: Top team in english premier league?
**Before:**
> I need to find out which team is currently at the top of the English
Premier League
...
Final Answer: Liverpool FC is currently at the top of the English
Premier League.
**After:**
> I need to find out which team is currently at the top of the English
Premier League
...
Final Answer: Man City is currently at the top of the English Premier
League.
---
Query: Any upcoming events in Paris?
**Before:**
> I should look for events in Paris
Action: Search
...
Final Answer: Upcoming events in Paris this month include Whit Sunday &
Whit Monday (French National Holiday), Makeup in Paris, Paris Jazz
Festival, Fete de la Musique, and Salon International de la Maison de.
**After:**
> I should look for events in Paris
Action: Search
...
Final Answer: Upcoming events in Paris include Elektric Park 2023, The
Aces, and BEING AS AN OCEAN.
### Description
There is a really nice class for saving chat messages into a database -
SQLChatMessageHistory.
It leverages SqlAlchemy to be compatible with any supported database (in
contrast with PostgresChatMessageHistory, which is basically the same
but is limited to Postgres).
However, the class is not really customizable in terms of what you can
store. I can imagine a lot of use cases, when one will need to save a
message date, along with some additional metadata.
To solve this, I propose to extract the converting logic from
BaseMessage to SQLAlchemy model (and vice versa) into a separate class -
message converter. So instead of rewriting the whole
SQLChatMessageHistory class, a user will only need to write a custom
model and a simple mapping class, and pass its instance as a parameter.
I also noticed that there is no documentation on this class, so I added
that too, with an example of custom message converter.
### Issue
N/A
### Dependencies
N/A
### Tag maintainer
Not yet
### Twitter handle
N/A
- Description: Adds two optional parameters to the
DynamoDBChatMessageHistory class to enable users to pass in a name for
their PrimaryKey, or a Key object itself to enable the use of composite
keys, a common DynamoDB paradigm.
[AWS DynamoDB Key
docs](https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/)
- Issue: N/A
- Dependencies: N/A
- Twitter handle: N/A
---------
Co-authored-by: Josh White <josh@ctrlstack.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
This PR introduces a minor change to the TitanTakeoff integration.
Instead of specifying a port on localhost, this PR will allow users to
specify a baseURL instead. This will allow users to use the integration
if they have TitanTakeoff deployed externally (not on localhost). This
removes the hardcoded reference to localhost "http://localhost:{port}".
### Info about Titan Takeoff
Titan Takeoff is an inference server created by
[TitanML](https://www.titanml.co/) that allows you to deploy large
language models locally on your hardware in a single command. Most
generative model architectures are included, such as Falcon, Llama 2,
GPT2, T5 and many more.
Read more about Titan Takeoff here:
-
[Blog](https://medium.com/@TitanML/introducing-titan-takeoff-6c30e55a8e1e)
- [Docs](https://docs.titanml.co/docs/titan-takeoff/getting-started)
### Dependencies
No new dependencies are introduced. However, users will need to install
the titan-iris package in their local environment and start the Titan
Takeoff inferencing server in order to use the Titan Takeoff
integration.
Thanks for your help and please let me know if you have any questions.
cc: @hwchase17 @baskaryan
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- Description: Added example of running Q&A over structured data using
the `Airbyte` loaders and `pandas`
- Dependencies: any dependencies required for this change,
- Tag maintainer: @hwchase17
- Twitter handle: @pelaseyed
Hi,
this PR contains loader / parser for Azure Document intelligence which
is a ML-based service to ingest arbitrary PDFs / images, even if
scanned. The loader generates Documents by pages of the original
document. This is my first contribution to LangChain.
Unfortunately I could not find the correct place for test cases. Happy
to add one if you can point me to the location, but as this is a
cloud-based service, a test would require network access and credentials
- so might be of limited help.
Dependencies: The needed dependency was already part of pyproject.toml,
no change.
Twitter: feel free to mention @LarsAC on the announcement
Fixed navbar:
- renamed several files, so ToC is sorted correctly
- made ToC items consistent: formatted several Titles
- added several links
- reformatted several docs to a consistent format
- renamed several files (removed `_example` suffix)
- added renamed files to the `docs/docs_skeleton/vercel.json`
This notebook was mistakenly placed in the `toolkits` folder and appears
within `Agents & Toolkits` menu. But it should be in `Tools`.
Moved example into `tools/`; updated title to consistent format.
This PR follows the **Eden AI (LLM + embeddings) integration**. #8633
We added an optional parameter to choose different AI models for
providers (like 'text-bison' for provider 'google', 'text-davinci-003'
for provider 'openai', etc.).
Usage:
```python
llm = EdenAI(
feature="text",
provider="google",
params={
"model": "text-bison", # new
"temperature": 0.2,
"max_tokens": 250,
},
)
```
You can also change the provider + model after initialization
```python
llm = EdenAI(
feature="text",
provider="google",
params={
"temperature": 0.2,
"max_tokens": 250,
},
)
prompt = """
hi
"""
llm(prompt, providers='openai', model='text-davinci-003') # change provider & model
```
The jupyter notebook as been updated with an example well.
Ping: @hwchase17, @baskaryan
---------
Co-authored-by: RedhaWassim <rwasssim@gmail.com>
Co-authored-by: sam <melaine.samy@gmail.com>
This fixes the exampe import line in the general "cassandra" doc page
mdx file. (it was erroneously a copy of the chat message history import
statement found below).
Hi there!
I'm excited to open this PR to add support for using 'Tencent Cloud
VectorDB' as a vector store.
Tencent Cloud VectorDB is a fully-managed, self-developed,
enterprise-level distributed database service designed for storing,
retrieving, and analyzing multi-dimensional vector data. The database
supports multiple index types and similarity calculation methods, with a
single index supporting vector scales up to 1 billion and capable of
handling millions of QPS with millisecond-level query latency. Tencent
Cloud VectorDB not only provides external knowledge bases for large
models to improve their accuracy, but also has wide applications in AI
fields such as recommendation systems, NLP services, computer vision,
and intelligent customer service.
The PR includes:
Implementation of Vectorstore.
I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
make format
make lint
make coverage
make test
- Description: A change in the documentation example for Azure Cognitive
Vector Search with Scoring Profile so the example works as written
- Issue: #10015
- Dependencies: None
- Tag maintainer: @baskaryan @ruoccofabrizio
- Twitter handle: @poshporcupine
- Description: this PR adds `s3_object_key` and `s3_bucket` to the doc
metadata when loading an S3 file. This is particularly useful when using
`S3DirectoryLoader` to remove the files from the dir once they have been
processed (getting the object keys from the metadata `source` field
seems brittle)
- Dependencies: N/A
- Tag maintainer: ?
- Twitter handle: _cbornet
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Adds support for [llmonitor](https://llmonitor.com) callbacks.
It enables:
- Requests tracking / logging / analytics
- Error debugging
- Cost analytics
- User tracking
Let me know if anythings neds to be changed for merge.
Thank you!
- Description: the implementation for similarity_search_with_score did
not actually include a score or logic to filter. Now fixed.
- Tag maintainer: @rlancemartin
- Twitter handle: @ofermend
# Description
This PR adds additional documentation on how to use Azure Active
Directory to authenticate to an OpenAI service within Azure. This method
of authentication allows organizations with more complex security
requirements to use Azure OpenAI.
# Issue
N/A
# Dependencies
N/A
# Twitter
https://twitter.com/CamAHutchison
Neo4j has added vector index integration just recently. To allow both
ingestion and integrating it as vector RAG applications, I wrapped it as
a vector store as the implementation is completely different from
`GraphCypherQAChain`. Here, we are not generating any Cypher statements
at query time, we are simply doing the vector similarity search using
the new vector index as if we were dealing with a vector database.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Update google drive doc loader and retriever notebooks. Show how to use with langchain-googledrive package.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Fixed title for the `extras/integrations/llms/llm_caching.ipynb`.
Existing title breaks the sorted order of items in the navbar.
Updated some formatting.
* Added links to the AI Network
* Made title consistent to other tool kits
* Added `integrations/providers/` integration card page
* **No changes** in the example code!
- Fixed a broken link in the `integrations/providers/infino.mdx`
- Fixed a title in the `integration/collbacks/infino.ipynb` example
- Updated text format in this example.
## Description
The following PR enables the [grammar-based
sampling](https://github.com/ggerganov/llama.cpp/tree/master/grammars)
in llama-cpp LLM.
In short, loading file with formal grammar definition will constrain
model outputs. For instance, one can force the model to generate valid
JSON or generate only python lists.
In the follow-up PR we will add:
* docs with some description why it is cool and how it works
* maybe some code sample for some task such as in llama repo
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This adds Xata as a memory store also to the python version of
LangChain, similar to the [one for
LangChain.js](https://github.com/hwchase17/langchainjs/pull/2217).
I have added a Jupyter Notebook with a simple and a more complex example
using an agent.
To run the integration test, you need to execute something like:
```
XATA_API_KEY='xau_...' XATA_DB_URL="https://demo-uni3q8.eu-west-1.xata.sh/db/langchain" poetry run pytest tests/integration_tests/memory/test_xata.py
```
Where `langchain` is the database you create in Xata.
Still working out interface/notebooks + need discord data dump to test
out things other than copy+paste
Update:
- Going to remove the 'user_id' arg in the loaders themselves and just
standardize on putting the "sender" arg in the extra kwargs. Then can
provide a utility function to map these to ai and human messages
- Going to move the discord one into just a notebook since I don't have
a good dump to test on and copy+paste maybe isn't the greatest thing to
support in v0
- Need to do more testing on slack since it seems the dump only includes
channels and NOT 1 on 1 convos
-
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Uses the shorter import path
`from langchain.document_loaders import` instead of the full path
`from langchain.document_loaders.assemblyai`
Applies those changes to the docs and the unit test.
See #9667 that adds this new loader.