- Description: Added example of running Q&A over structured data using
the `Airbyte` loaders and `pandas`
- Dependencies: any dependencies required for this change,
- Tag maintainer: @hwchase17
- Twitter handle: @pelaseyed
Hi,
this PR contains loader / parser for Azure Document intelligence which
is a ML-based service to ingest arbitrary PDFs / images, even if
scanned. The loader generates Documents by pages of the original
document. This is my first contribution to LangChain.
Unfortunately I could not find the correct place for test cases. Happy
to add one if you can point me to the location, but as this is a
cloud-based service, a test would require network access and credentials
- so might be of limited help.
Dependencies: The needed dependency was already part of pyproject.toml,
no change.
Twitter: feel free to mention @LarsAC on the announcement
Fixed navbar:
- renamed several files, so ToC is sorted correctly
- made ToC items consistent: formatted several Titles
- added several links
- reformatted several docs to a consistent format
- renamed several files (removed `_example` suffix)
- added renamed files to the `docs/docs_skeleton/vercel.json`
This notebook was mistakenly placed in the `toolkits` folder and appears
within `Agents & Toolkits` menu. But it should be in `Tools`.
Moved example into `tools/`; updated title to consistent format.
This PR follows the **Eden AI (LLM + embeddings) integration**. #8633
We added an optional parameter to choose different AI models for
providers (like 'text-bison' for provider 'google', 'text-davinci-003'
for provider 'openai', etc.).
Usage:
```python
llm = EdenAI(
feature="text",
provider="google",
params={
"model": "text-bison", # new
"temperature": 0.2,
"max_tokens": 250,
},
)
```
You can also change the provider + model after initialization
```python
llm = EdenAI(
feature="text",
provider="google",
params={
"temperature": 0.2,
"max_tokens": 250,
},
)
prompt = """
hi
"""
llm(prompt, providers='openai', model='text-davinci-003') # change provider & model
```
The jupyter notebook as been updated with an example well.
Ping: @hwchase17, @baskaryan
---------
Co-authored-by: RedhaWassim <rwasssim@gmail.com>
Co-authored-by: sam <melaine.samy@gmail.com>
This fixes the exampe import line in the general "cassandra" doc page
mdx file. (it was erroneously a copy of the chat message history import
statement found below).
Hi there!
I'm excited to open this PR to add support for using 'Tencent Cloud
VectorDB' as a vector store.
Tencent Cloud VectorDB is a fully-managed, self-developed,
enterprise-level distributed database service designed for storing,
retrieving, and analyzing multi-dimensional vector data. The database
supports multiple index types and similarity calculation methods, with a
single index supporting vector scales up to 1 billion and capable of
handling millions of QPS with millisecond-level query latency. Tencent
Cloud VectorDB not only provides external knowledge bases for large
models to improve their accuracy, but also has wide applications in AI
fields such as recommendation systems, NLP services, computer vision,
and intelligent customer service.
The PR includes:
Implementation of Vectorstore.
I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
make format
make lint
make coverage
make test
- Description: A change in the documentation example for Azure Cognitive
Vector Search with Scoring Profile so the example works as written
- Issue: #10015
- Dependencies: None
- Tag maintainer: @baskaryan @ruoccofabrizio
- Twitter handle: @poshporcupine
- Description: this PR adds `s3_object_key` and `s3_bucket` to the doc
metadata when loading an S3 file. This is particularly useful when using
`S3DirectoryLoader` to remove the files from the dir once they have been
processed (getting the object keys from the metadata `source` field
seems brittle)
- Dependencies: N/A
- Tag maintainer: ?
- Twitter handle: _cbornet
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Adds support for [llmonitor](https://llmonitor.com) callbacks.
It enables:
- Requests tracking / logging / analytics
- Error debugging
- Cost analytics
- User tracking
Let me know if anythings neds to be changed for merge.
Thank you!
- Description: the implementation for similarity_search_with_score did
not actually include a score or logic to filter. Now fixed.
- Tag maintainer: @rlancemartin
- Twitter handle: @ofermend
# Description
This PR adds additional documentation on how to use Azure Active
Directory to authenticate to an OpenAI service within Azure. This method
of authentication allows organizations with more complex security
requirements to use Azure OpenAI.
# Issue
N/A
# Dependencies
N/A
# Twitter
https://twitter.com/CamAHutchison
Neo4j has added vector index integration just recently. To allow both
ingestion and integrating it as vector RAG applications, I wrapped it as
a vector store as the implementation is completely different from
`GraphCypherQAChain`. Here, we are not generating any Cypher statements
at query time, we are simply doing the vector similarity search using
the new vector index as if we were dealing with a vector database.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Update google drive doc loader and retriever notebooks. Show how to use with langchain-googledrive package.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Fixed title for the `extras/integrations/llms/llm_caching.ipynb`.
Existing title breaks the sorted order of items in the navbar.
Updated some formatting.
* Added links to the AI Network
* Made title consistent to other tool kits
* Added `integrations/providers/` integration card page
* **No changes** in the example code!
- Fixed a broken link in the `integrations/providers/infino.mdx`
- Fixed a title in the `integration/collbacks/infino.ipynb` example
- Updated text format in this example.
## Description
The following PR enables the [grammar-based
sampling](https://github.com/ggerganov/llama.cpp/tree/master/grammars)
in llama-cpp LLM.
In short, loading file with formal grammar definition will constrain
model outputs. For instance, one can force the model to generate valid
JSON or generate only python lists.
In the follow-up PR we will add:
* docs with some description why it is cool and how it works
* maybe some code sample for some task such as in llama repo
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Description
The previous Redis implementation did not allow for the user to specify
the index configuration (i.e. changing the underlying algorithm) or add
additional metadata to use for querying (i.e. hybrid or "filtered"
search).
This PR introduces the ability to specify custom index attributes and
metadata attributes as well as use that metadata in filtered queries.
Overall, more structure was introduced to the Redis implementation that
should allow for easier maintainability moving forward.
# New Features
The following features are now available with the Redis integration into
Langchain
## Index schema generation
The schema for the index will now be automatically generated if not
specified by the user. For example, the data above has the multiple
metadata categories. The the following example
```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis
embeddings = OpenAIEmbeddings()
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
Loading the data in through this and the other ``from_documents`` and
``from_texts`` methods will now generate index schema in Redis like the
following.
view index schema with the ``redisvl`` tool. [link](redisvl.com)
```bash
$ rvl index info -i users
```
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|---------------|-----------------|------------|
| users | HASH | ['doc:users'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Metadata specification
The metadata schema generation has the following rules
1. All text fields are indexed as text fields.
2. All numeric fields are index as numeric fields.
If you would like to have a text field as a tag field, users can specify
overrides like the following for the example data
```python
# this can also be a path to a yaml file
index_schema = {
"text": [{"name": "user"}, {"name": "job"}],
"tag": [{"name": "credit_score"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users"
)
```
This will change the index specification to
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users2 | HASH | ['doc:users2'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
and throw a warning to the user (log output) that the generated schema
does not match the specified schema.
```text
index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
```
As long as this is on purpose, this is fine.
The schema can be defined as a yaml file or a dictionary
```yaml
text:
- name: user
- name: job
tag:
- name: credit_score
numeric:
- name: age
```
and you pass in a path like
```python
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
index_schema=Path("sample1.yml").resolve()
)
```
Which will create the same schema as defined in the dictionary example
Index Information:
| Index Name | Storage Type | Prefixes | Index Options | Indexing |
|--------------|----------------|----------------|-----------------|------------|
| users3 | HASH | ['doc:users3'] | [] | 0 |
Index Fields:
| Name | Attribute | Type | Field Option | Option Value |
|----------------|----------------|---------|----------------|----------------|
| user | user | TEXT | WEIGHT | 1 |
| job | job | TEXT | WEIGHT | 1 |
| content | content | TEXT | WEIGHT | 1 |
| credit_score | credit_score | TAG | SEPARATOR | , |
| age | age | NUMERIC | | |
| content_vector | content_vector | VECTOR | | |
### Custom Vector Indexing Schema
Users with large use cases may want to change how they formulate the
vector index created by Langchain
To utilize all the features of Redis for vector database use cases like
this, you can now do the following to pass in index attribute modifiers
like changing the indexing algorithm to HNSW.
```python
vector_schema = {
"algorithm": "HNSW"
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
A more complex example may look like
```python
vector_schema = {
"algorithm": "HNSW",
"ef_construction": 200,
"ef_runtime": 20
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users3",
vector_schema=vector_schema
)
```
All names correspond to the arguments you would set if using Redis-py or
RedisVL. (put in doc link later)
### Better Querying
Both vector queries and Range (limit) queries are now available and
metadata is returned by default. The outputs are shown.
```python
>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]
>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]
>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4
```
### Custom metadata filtering
A big advantage of Redis in this space is being able to do filtering on
data stored alongside the vector itself. With the example above, the
following is now possible in langchain. The equivalence operators are
overridden to describe a new expression language that mimic that of
[redisvl](redisvl.com). This allows for arbitrarily long sequences of
filters that resemble SQL commands that can be used directly with vector
queries and range queries.
There are two interfaces by which to do so and both are shown.
```python
>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText
>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3
>>> job_filter = RedisFilter.text("job") == "engineer"
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2
# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1
# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4
```
All the above filter results can be checked against the data above.
### Other
- Issue: #3967
- Dependencies: No added dependencies
- Tag maintainer: @hwchase17 @baskaryan @rlancemartin
- Twitter handle: @sampartee
---------
Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This adds Xata as a memory store also to the python version of
LangChain, similar to the [one for
LangChain.js](https://github.com/hwchase17/langchainjs/pull/2217).
I have added a Jupyter Notebook with a simple and a more complex example
using an agent.
To run the integration test, you need to execute something like:
```
XATA_API_KEY='xau_...' XATA_DB_URL="https://demo-uni3q8.eu-west-1.xata.sh/db/langchain" poetry run pytest tests/integration_tests/memory/test_xata.py
```
Where `langchain` is the database you create in Xata.