langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

History

Lance Martin 4092fd21dc YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772 ) This introduces the `YoutubeAudioLoader`, which will load blobs from a YouTube url and write them. Blobs are then parsed by `OpenAIWhisperParser()`, as show in this [PR](https://github.com/hwchase17/langchain/pull/5580), but we extend the parser to split audio such that each chuck meets the 25MB OpenAI size limit. As shown in the notebook, this enables a very simple UX: ``` # Transcribe the video to text loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser()) docs = loader.load() ``` Tested on full set of Karpathy lecture videos: ``` # Karpathy lecture videos urls = ["https://youtu.be/VMj-3S1tku0" "https://youtu.be/PaCmpygFfXo", "https://youtu.be/TCH_1BHY58I", "https://youtu.be/P6sfmUTpUmc", "https://youtu.be/q8SA3rM6ckI", "https://youtu.be/t3YJ5hKiMQ0", "https://youtu.be/kCc8FmEb1nY"] # Directory to save audio files save_dir = "~/Downloads/YouTube" # Transcribe the videos to text loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser()) docs = loader.load() ```		2023-06-06 15:15:08 -07:00
..
agents	Harrison/pubmed integration (#5664 )	2023-06-03 16:25:28 -07:00
callbacks	FileCallbackHandler (#5589 )	2023-06-03 16:48:48 -07:00
chains	Fixed multi input prompt for MapReduceChain (#4979 )	2023-06-03 14:41:03 -07:00
indexes	YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772 )	2023-06-06 15:15:08 -07:00
memory	fix: correct momento chat history notebook typo and title (#5646 )	2023-06-03 16:39:27 -07:00
models	Revise DATABRICKS_API_TOKEN as DATABRICKS_TOKEN (#5796 )	2023-06-06 14:22:49 -07:00
prompts	Harrison/pipeline prompt (#5540 )	2023-06-04 14:29:37 -07:00
utils/examples	Pass parsed inputs through to tool _run (#4309 )	2023-05-08 09:13:05 -07:00
agents.rst	docs: `modules` pages simplified (#5116 )	2023-06-03 14:44:32 -07:00
chains.rst	docs: `modules` pages simplified (#5116 )	2023-06-03 14:44:32 -07:00
indexes.rst	docs: `modules` pages simplified (#5116 )	2023-06-03 14:44:32 -07:00
memory.rst	docs: `modules` pages simplified (#5116 )	2023-06-03 14:44:32 -07:00
models.rst	docs: `modules` pages simplified (#5116 )	2023-06-03 14:44:32 -07:00
paul_graham_essay.txt
prompts.rst	docs: `modules` pages simplified (#5116 )	2023-06-03 14:44:32 -07:00
state_of_the_union.txt