langchain/docs
Lance Martin 4092fd21dc
YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772)
This introduces the `YoutubeAudioLoader`, which will load blobs from a
YouTube url and write them. Blobs are then parsed by
`OpenAIWhisperParser()`, as show in this
[PR](https://github.com/hwchase17/langchain/pull/5580), but we extend
the parser to split audio such that each chuck meets the 25MB OpenAI
size limit. As shown in the notebook, this enables a very simple UX:

```
# Transcribe the video to text
loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser())
docs = loader.load()
``` 

Tested on full set of Karpathy lecture videos:

```
# Karpathy lecture videos
urls = ["https://youtu.be/VMj-3S1tku0"
        "https://youtu.be/PaCmpygFfXo",
        "https://youtu.be/TCH_1BHY58I",
        "https://youtu.be/P6sfmUTpUmc",
        "https://youtu.be/q8SA3rM6ckI",
        "https://youtu.be/t3YJ5hKiMQ0",
        "https://youtu.be/kCc8FmEb1nY"]

# Directory to save audio files 
save_dir = "~/Downloads/YouTube"
 
# Transcribe the videos to text
loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser())
docs = loader.load()
```
2023-06-06 15:15:08 -07:00
..
_static docs: Big Mendable Improvements (#4964) 2023-05-19 15:31:48 -07:00
additional_resources docs: Added Deploying LLMs into production + a new ecosystem (#4047) 2023-06-05 12:47:27 -07:00
ecosystem docs: Added Deploying LLMs into production + a new ecosystem (#4047) 2023-06-05 12:47:27 -07:00
getting_started Update tutorials.md (#5761) 2023-06-05 20:37:11 -07:00
integrations Revise DATABRICKS_API_TOKEN as DATABRICKS_TOKEN (#5796) 2023-06-06 14:22:49 -07:00
modules YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772) 2023-06-06 15:15:08 -07:00
reference Documentation fixes (linting and broken links) (#5563) 2023-06-01 13:06:17 -07:00
templates docs ecosystem/integrations update 3 (#5470) 2023-05-31 17:54:05 -07:00
tracing py tracer fixes (#5377) 2023-05-30 18:47:06 -07:00
use_cases minor refactor GenerativeAgentMemory (#5315) 2023-06-03 14:53:14 -07:00
conf.py docs: Mendable Search integration (#2803) 2023-04-13 21:52:25 -07:00
dependents.md docs: updated ecosystem/dependents (#5753) 2023-06-05 16:09:55 -07:00
index.rst docs: Added Deploying LLMs into production + a new ecosystem (#4047) 2023-06-05 12:47:27 -07:00
integrations.rst docs: ecosystem/integrations update 1 (#5219) 2023-05-29 07:25:17 -07:00
make.bat
Makefile
reference.rst docs: Deployments page moved into Ecosystem/ (#4949) 2023-05-21 21:18:22 -07:00
requirements.txt Harrison/docs reqs (#2199) 2023-03-30 08:20:30 -07:00