Now it is hard to search for the integration points between
data_loaders, retrievers, tools, etc.
I've placed links to all groups of providers and integrations on the
`ecosystem` page.
So, it is easy to navigate between all integrations from a single
location.
### Background
Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search_by_vector` method.
### Changes
- a `max_marginal_relevance_search_by_vector` method implementation has
been added in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests
### Test Plan
Added tests for the `max_marginal_relevance_search_by_vector`
implementation
### Change Safety
- [x] I have added tests to cover my changes
kwargs shoud be passed into cls so that opensearch client can be
properly initlized in __init__(). Otherwise logic like below will not
work. as auth will not be passed into __init__
```python
docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url="http://localhost:9200")
query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
```
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-28-97.ec2.internal>
- Proactively raise error if a tool subclasses BaseTool, defines its
own schema, but fails to add the type-hints
- fix the auto-inferred schema of the decorator to strip the
unneeded virtual kwargs from the schema dict
Helps avoid silent instances of #3297
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3
Notes
* please double check if poetry is not messed up (thanks!)
Asks
* Would be great to create a shared slack channel for quick questions
---------
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
This PR addresses several improvements:
- Previously it was not possible to load spaces of more than 100 pages.
The `limit` was being used both as an overall page limit *and* as a per
request pagination limit. This, in combination with the fact that
atlassian seem to use a server-side hard limit of 100 when page content
is expanded, meant it wasn't possible to download >100 pages. Now
`limit` is used *only* as a per-request pagination limit and `max_pages`
is introduced as the way to limit the total number of pages returned by
the paginator.
- Document metadata now includes `source` (the source url), making it
compatible with `RetrievalQAWithSourcesChain`.
- It is now possible to include inline and footer comments.
- It is now possible to pass `verify_ssl=False` and other parameters to
the confluence object for use cases that require it.
Small improvements for the YouTube loader:
a) use the YouTube API permission scope instead of Google Drive
b) bugfix: allow transcript loading for single videos
c) an additional parameter "continue_on_failure" for cases when videos
in a playlist do not have transcription enabled.
d) support automated translation for all languages, if available.
---------
Co-authored-by: Johann-Peter Hartmann <johann-peter.hartmann@mayflower.de>
The detailed walkthrough of the Weaviate wrapper was pointing to the
getting-started notebook. Fixed it to point to the Weaviable notebook in
the examples folder.
This pull request adds a ChatGPT document loader to the document loaders
module in `langchain/document_loaders/chatgpt.py`. Additionally, it
includes an example Jupyter notebook in
`docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
which uses fake sample data based on the original structure of the
`conversations.json` file.
The following files were added/modified:
- `langchain/document_loaders/__init__.py`
- `langchain/document_loaders/chatgpt.py`
- `docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
-
`docs/modules/indexes/document_loaders/examples/example_data/fake_conversations.json`
This pull request was made in response to the recent release of ChatGPT
data exports by email:
https://help.openai.com/en/articles/7260999-how-do-i-export-my-chatgpt-history
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:
- [x] A new memory: AnalyticDBVector
- [x] A suite of integration tests verifies the AnalyticDB integration
I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x] make format
- [x] make lint
- [x] make coverage
- [x] make test
handles error when youtube video has transcripts disabled
```
youtube_transcript_api._errors.TranscriptsDisabled:
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=<URL> This is most likely caused by:
Subtitles are disabled for this video
If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
```
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>