langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-06 03:20:49 +00:00

History

Lance Martin 4092fd21dc YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772 ) This introduces the `YoutubeAudioLoader`, which will load blobs from a YouTube url and write them. Blobs are then parsed by `OpenAIWhisperParser()`, as show in this [PR](https://github.com/hwchase17/langchain/pull/5580), but we extend the parser to split audio such that each chuck meets the 25MB OpenAI size limit. As shown in the notebook, this enables a very simple UX: ``` # Transcribe the video to text loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser()) docs = loader.load() ``` Tested on full set of Karpathy lecture videos: ``` # Karpathy lecture videos urls = ["https://youtu.be/VMj-3S1tku0" "https://youtu.be/PaCmpygFfXo", "https://youtu.be/TCH_1BHY58I", "https://youtu.be/P6sfmUTpUmc", "https://youtu.be/q8SA3rM6ckI", "https://youtu.be/t3YJ5hKiMQ0", "https://youtu.be/kCc8FmEb1nY"] # Directory to save audio files save_dir = "~/Downloads/YouTube" # Transcribe the videos to text loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser()) docs = loader.load() ```		2023-06-06 15:15:08 -07:00
..
blob_loaders	YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772 )	2023-06-06 15:15:08 -07:00
loaders	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 )	2023-05-16 14:35:25 -07:00
parsers	Create OpenAIWhisperParser for generating Documents from audio files (#5580 )	2023-06-05 15:51:13 -07:00
sample_documents	Bibtex integration for document loader and retriever (#5137 )	2023-05-25 00:21:31 -07:00
test_docs	Allow readthedoc loader to pass custom html tag (#5175 )	2023-05-24 10:40:27 -07:00
__init__.py	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 )	2023-05-16 14:35:25 -07:00
test_base.py	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 )	2023-05-16 14:35:25 -07:00
test_bibtex.py	Bibtex integration for document loader and retriever (#5137 )	2023-05-25 00:21:31 -07:00
test_bshtml.py	Add html parsers (#4874 )	2023-05-17 22:39:11 -04:00
test_confluence.py	Add Confluence Loader unit tests (#3333 )	2023-05-16 15:17:07 -07:00
test_csv_loader.py	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 )	2023-05-16 14:35:25 -07:00
test_detect_encoding.py	feat #4479 : TextLoader auto detect encoding and improved exceptions (#4927 )	2023-05-18 09:55:14 -04:00
test_directory.py	Add path validation to DirectoryLoader (#5327 )	2023-05-28 15:31:23 -04:00
test_evernote_loader.py	feature/4493 Improve Evernote Document Loader (#4577 )	2023-05-19 14:28:17 -07:00
test_generic_loader.py	Add a generic document loader (#4875 )	2023-05-17 22:38:55 -04:00
test_github.py	DocumentLoader for GitHub (#5408 )	2023-05-29 20:11:21 -07:00
test_json_loader.py	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 )	2023-05-16 14:35:25 -07:00
test_psychic.py	Harrison/psychic (#5063 )	2023-05-21 09:13:20 -07:00
test_readthedoc.py	Allow readthedoc loader to pass custom html tag (#5175 )	2023-05-24 10:40:27 -07:00
test_telegram.py	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 )	2023-05-16 14:35:25 -07:00
test_trello.py	New Trello document loader (#4767 )	2023-05-29 19:47:56 -07:00
test_web_base.py	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 )	2023-05-16 14:35:25 -07:00
test_youtube.py	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 )	2023-05-16 14:35:25 -07:00