langchain

History

Lance Martin 4092fd21dc YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772 ) This introduces the `YoutubeAudioLoader`, which will load blobs from a YouTube url and write them. Blobs are then parsed by `OpenAIWhisperParser()`, as show in this [PR](https://github.com/hwchase17/langchain/pull/5580), but we extend the parser to split audio such that each chuck meets the 25MB OpenAI size limit. As shown in the notebook, this enables a very simple UX: ``` # Transcribe the video to text loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser()) docs = loader.load() ``` Tested on full set of Karpathy lecture videos: ``` # Karpathy lecture videos urls = ["https://youtu.be/VMj-3S1tku0" "https://youtu.be/PaCmpygFfXo", "https://youtu.be/TCH_1BHY58I", "https://youtu.be/P6sfmUTpUmc", "https://youtu.be/q8SA3rM6ckI", "https://youtu.be/t3YJ5hKiMQ0", "https://youtu.be/kCc8FmEb1nY"] # Directory to save audio files save_dir = "~/Downloads/YouTube" # Transcribe the videos to text loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser()) docs = loader.load() ```		2023-06-06 15:15:08 -07:00
..
agents	Raise an exception in MKRL and Chat Output Parsers if parsing text which contains both an action and a final answer (#5609 )	2023-06-04 14:40:49 -07:00
callbacks	py tracer fixes (#5377 )	2023-05-30 18:47:06 -07:00
chains	support returning run info for llms, chat models and chains (#5666 )	2023-06-06 10:07:46 -07:00
chat_models	Add ChatModel, LLM, and Embeddings for Google's PaLM APIs (#3575 )	2023-05-01 15:23:16 -07:00
client	Use client from LCP-SDK (#5695 )	2023-06-06 06:51:05 -07:00
data
docstore	Add `DocstoreFn` - lookup doc via arbitrary function (#3760 )	2023-04-28 19:50:32 -07:00
document_loaders	YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772 )	2023-06-06 15:15:08 -07:00
evaluation
examples	feat #4479 : TextLoader auto detect encoding and improved exceptions (#4927 )	2023-05-18 09:55:14 -04:00
llms	Add Invocation Params (#4509 )	2023-05-11 15:34:06 -07:00
memory	Implemented appending arbitrary messages (#5293 )	2023-05-29 07:18:59 -07:00
output_parsers	convert the parameter 'text' to uppercase in the function 'parse' of the class BooleanOutputParser (#5397 )	2023-05-30 16:26:17 -07:00
prompts	Harrison/pipeline prompt (#5540 )	2023-06-04 14:29:37 -07:00
retrievers	Zep Hybrid Search (#5742 )	2023-06-05 12:59:28 -07:00
tools	Harrison/pubmed integration (#5664 )	2023-06-03 16:25:28 -07:00
utilities	Fix graphql tool (#4984 )	2023-05-19 15:27:50 -07:00
vectorstores	Add maximal relevance search to SKLearnVectorStore (#5430 )	2023-05-30 16:13:33 -07:00
__init__.py
conftest.py	Add pytest --only-extended and --only-core options (#4494 )	2023-05-12 11:35:22 -04:00
test_bash.py	Add Mastodon toots loader (#5036 )	2023-05-22 16:43:07 -07:00
test_dependencies.py	Use client from LCP-SDK (#5695 )	2023-06-06 06:51:05 -07:00
test_document_transformers.py	Contextual compression retriever (#2915 )	2023-04-20 17:01:14 -07:00
test_formatting.py
test_math_utils.py	add get_top_k_cosine_similarity method to get max top k score and index (#5059 )	2023-05-22 11:55:48 -07:00
test_pytest_config.py	Block sockets for unit-tests (#4803 )	2023-05-16 14:41:24 -04:00
test_python.py	option for csv agent to not include df in prompt (#4610 )	2023-05-12 21:55:22 -07:00
test_schema.py	[simple][test] Added test case for schema.py (#3692 )	2023-04-28 20:42:24 -07:00
test_sql_database_schema.py	Suppress duckdb warning in unit tests explicitly (#3653 )	2023-04-27 14:29:41 -04:00
test_sql_database.py	Fix SQLAlchemy truncating text when it is too big (#5206 )	2023-06-01 21:33:31 -04:00
test_text_splitter.py	Attribute support for html tags (#5782 )	2023-06-06 09:27:37 -07:00