You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/tests/unit_tests
Lance Martin 4092fd21dc
YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772)
This introduces the `YoutubeAudioLoader`, which will load blobs from a
YouTube url and write them. Blobs are then parsed by
`OpenAIWhisperParser()`, as show in this
[PR](https://github.com/hwchase17/langchain/pull/5580), but we extend
the parser to split audio such that each chuck meets the 25MB OpenAI
size limit. As shown in the notebook, this enables a very simple UX:

```
# Transcribe the video to text
loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser())
docs = loader.load()
``` 

Tested on full set of Karpathy lecture videos:

```
# Karpathy lecture videos
urls = ["https://youtu.be/VMj-3S1tku0"
        "https://youtu.be/PaCmpygFfXo",
        "https://youtu.be/TCH_1BHY58I",
        "https://youtu.be/P6sfmUTpUmc",
        "https://youtu.be/q8SA3rM6ckI",
        "https://youtu.be/t3YJ5hKiMQ0",
        "https://youtu.be/kCc8FmEb1nY"]

# Directory to save audio files 
save_dir = "~/Downloads/YouTube"
 
# Transcribe the videos to text
loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser())
docs = loader.load()
```
1 year ago
..
agents Raise an exception in MKRL and Chat Output Parsers if parsing text which contains both an action and a final answer (#5609) 1 year ago
callbacks py tracer fixes (#5377) 1 year ago
chains support returning run info for llms, chat models and chains (#5666) 1 year ago
chat_models Add ChatModel, LLM, and Embeddings for Google's PaLM APIs (#3575) 1 year ago
client Use client from LCP-SDK (#5695) 1 year ago
data Prompt from file proof of concept using plain text (#127) 2 years ago
docstore Add `DocstoreFn` - lookup doc via arbitrary function (#3760) 1 year ago
document_loaders YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772) 1 year ago
evaluation Adding an in-context QA evaluation chain + chain of thought reasoning chain for improved accuracy (#2444) 1 year ago
examples feat #4479: TextLoader auto detect encoding and improved exceptions (#4927) 1 year ago
llms Add Invocation Params (#4509) 1 year ago
memory Implemented appending arbitrary messages (#5293) 1 year ago
output_parsers convert the parameter 'text' to uppercase in the function 'parse' of the class BooleanOutputParser (#5397) 1 year ago
prompts Harrison/pipeline prompt (#5540) 1 year ago
retrievers Zep Hybrid Search (#5742) 1 year ago
tools Harrison/pubmed integration (#5664) 1 year ago
utilities Fix graphql tool (#4984) 1 year ago
vectorstores Add maximal relevance search to SKLearnVectorStore (#5430) 1 year ago
__init__.py initial commit 2 years ago
conftest.py Add pytest --only-extended and --only-core options (#4494) 1 year ago
test_bash.py Add Mastodon toots loader (#5036) 1 year ago
test_dependencies.py Use client from LCP-SDK (#5695) 1 year ago
test_document_transformers.py Contextual compression retriever (#2915) 1 year ago
test_formatting.py initial commit 2 years ago
test_math_utils.py add get_top_k_cosine_similarity method to get max top k score and index (#5059) 1 year ago
test_pytest_config.py Block sockets for unit-tests (#4803) 1 year ago
test_python.py option for csv agent to not include df in prompt (#4610) 1 year ago
test_schema.py [simple][test] Added test case for schema.py (#3692) 1 year ago
test_sql_database.py Fix SQLAlchemy truncating text when it is too big (#5206) 1 year ago
test_sql_database_schema.py Suppress duckdb warning in unit tests explicitly (#3653) 1 year ago
test_text_splitter.py Attribute support for html tags (#5782) 1 year ago