langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

History

Lance Martin 4092fd21dc YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772 ) This introduces the `YoutubeAudioLoader`, which will load blobs from a YouTube url and write them. Blobs are then parsed by `OpenAIWhisperParser()`, as show in this [PR](https://github.com/hwchase17/langchain/pull/5580), but we extend the parser to split audio such that each chuck meets the 25MB OpenAI size limit. As shown in the notebook, this enables a very simple UX: ``` # Transcribe the video to text loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser()) docs = loader.load() ``` Tested on full set of Karpathy lecture videos: ``` # Karpathy lecture videos urls = ["https://youtu.be/VMj-3S1tku0" "https://youtu.be/PaCmpygFfXo", "https://youtu.be/TCH_1BHY58I", "https://youtu.be/P6sfmUTpUmc", "https://youtu.be/q8SA3rM6ckI", "https://youtu.be/t3YJ5hKiMQ0", "https://youtu.be/kCc8FmEb1nY"] # Directory to save audio files save_dir = "~/Downloads/YouTube" # Transcribe the videos to text loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser()) docs = loader.load() ```		2023-06-06 15:15:08 -07:00
..
_static	docs: Big Mendable Improvements (#4964 )	2023-05-19 15:31:48 -07:00
additional_resources	docs: Added Deploying LLMs into production + a new ecosystem (#4047 )	2023-06-05 12:47:27 -07:00
ecosystem	docs: Added Deploying LLMs into production + a new ecosystem (#4047 )	2023-06-05 12:47:27 -07:00
getting_started	Update tutorials.md (#5761 )	2023-06-05 20:37:11 -07:00
integrations	Revise DATABRICKS_API_TOKEN as DATABRICKS_TOKEN (#5796 )	2023-06-06 14:22:49 -07:00
modules	YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772 )	2023-06-06 15:15:08 -07:00
reference	Documentation fixes (linting and broken links) (#5563 )	2023-06-01 13:06:17 -07:00
templates	docs `ecosystem/integrations` update 3 (#5470 )	2023-05-31 17:54:05 -07:00
tracing	py tracer fixes (#5377 )	2023-05-30 18:47:06 -07:00
use_cases	minor refactor GenerativeAgentMemory (#5315 )	2023-06-03 14:53:14 -07:00
conf.py
dependents.md	docs: updated `ecosystem/dependents` (#5753 )	2023-06-05 16:09:55 -07:00
index.rst	docs: Added Deploying LLMs into production + a new ecosystem (#4047 )	2023-06-05 12:47:27 -07:00
integrations.rst	docs: `ecosystem/integrations` update 1 (#5219 )	2023-05-29 07:25:17 -07:00
make.bat
Makefile
reference.rst	docs: `Deployments` page moved into `Ecosystem/` (#4949 )	2023-05-21 21:18:22 -07:00
requirements.txt