4092fd21dc
This introduces the `YoutubeAudioLoader`, which will load blobs from a YouTube url and write them. Blobs are then parsed by `OpenAIWhisperParser()`, as show in this [PR](https://github.com/hwchase17/langchain/pull/5580), but we extend the parser to split audio such that each chuck meets the 25MB OpenAI size limit. As shown in the notebook, this enables a very simple UX: ``` # Transcribe the video to text loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser()) docs = loader.load() ``` Tested on full set of Karpathy lecture videos: ``` # Karpathy lecture videos urls = ["https://youtu.be/VMj-3S1tku0" "https://youtu.be/PaCmpygFfXo", "https://youtu.be/TCH_1BHY58I", "https://youtu.be/P6sfmUTpUmc", "https://youtu.be/q8SA3rM6ckI", "https://youtu.be/t3YJ5hKiMQ0", "https://youtu.be/kCc8FmEb1nY"] # Directory to save audio files save_dir = "~/Downloads/YouTube" # Transcribe the videos to text loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser()) docs = loader.load() ``` |
||
---|---|---|
.. | ||
integration_tests | ||
mock_servers | ||
unit_tests | ||
__init__.py | ||
data.py | ||
README.md |
Readme tests(draft)
Integrations Tests
Prepare
This repository contains functional tests for several search engines and databases. The tests aim to verify the correct behavior of the engines and databases according to their specifications and requirements.
To run some integration tests, such as tests located in
tests/integration_tests/vectorstores/
, you will need to install the following
software:
- Docker
- Python 3.8.1 or later
We have optional group test_integration
in the pyproject.toml
file. This group
should contain dependencies for the integration tests and can be installed using the
command:
poetry install --with test_integration
Any new dependencies should be added by running:
# add package and install it after adding:
poetry add tiktoken@latest --group "test_integration" && poetry install --with test_integration
Before running any tests, you should start a specific Docker container that has all the
necessary dependencies installed. For instance, we use the elasticsearch.yml
container
for test_elasticsearch.py
:
cd tests/integration_tests/vectorstores/docker-compose
docker-compose -f elasticsearch.yml up
Prepare environment variables for local testing:
- copy
tests/.env.example
totests/.env
- set variables in
tests/.env
file, e.gOPENAI_API_KEY
Additionally, it's important to note that some integration tests may require certain
environment variables to be set, such as OPENAI_API_KEY
. Be sure to set any required
environment variables before running the tests to ensure they run correctly.
Recording HTTP interactions with pytest-vcr
Some of the integration tests in this repository involve making HTTP requests to external services. To prevent these requests from being made every time the tests are run, we use pytest-vcr to record and replay HTTP interactions.
When running tests in a CI/CD pipeline, you may not want to modify the existing cassettes. You can use the --vcr-record=none command-line option to disable recording new cassettes. Here's an example:
pytest --log-cli-level=10 tests/integration_tests/vectorstores/test_pinecone.py --vcr-record=none
pytest tests/integration_tests/vectorstores/test_elasticsearch.py --vcr-record=none
Run some tests with coverage:
pytest tests/integration_tests/vectorstores/test_elasticsearch.py --cov=langchain --cov-report=html
start "" htmlcov/index.html || open htmlcov/index.html