langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

History

Mr. Lance E Sloan «UMich» 84dc2dd059 community[patch]: Load YouTube transcripts (captions) as fixed-duration chunks with start times (#21710 ) - Description: Add a new format, `CHUNKS`, to `langchain_community.document_loaders.youtube.YoutubeLoader` which creates multiple `Document` objects from YouTube video transcripts (captions), each of a fixed duration. The metadata of each chunk `Document` includes the start time of each one and a URL to that time in the video on the YouTube website. I had implemented this for UMich (@umich-its-ai) in a local module, but it makes sense to contribute this to LangChain community for all to benefit and to simplify maintenance. - Issue: N/A - Dependencies: N/A - Twitter: lsloan_umich - Mastodon: [lsloan@mastodon.social](https://mastodon.social/@lsloan) With regards to tests and documentation, most existing features of the `YoutubeLoader` class are not tested. Only the `YoutubeLoader.extract_video_id()` static method had a test. However, while I was waiting for this PR to be reviewed and merged, I had time to add a test for the chunking feature I've proposed in this PR. I have added an example of using chunking to the `docs/docs/integrations/document_loaders/youtube_transcript.ipynb` notebook. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>		2024-06-11 17:44:36 +00:00
..
adapters
agent_toolkits	community[patch]: Fix remaining __inits__ in community (#22037 )	2024-05-22 17:42:17 +00:00
agents	community: update how OpenAIAssistantV2Runnable creates threads with tool_resources (#22549 )	2024-06-05 14:19:41 -04:00
callbacks	community[patch]: Add missing type annotations (#22758 )	2024-06-10 16:59:28 -04:00
chains	community[patch]: Add missing type annotations (#22758 )	2024-06-10 16:59:28 -04:00
chat_loaders	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
chat_message_histories	community[minor]: Add native async support to SQLChatMessageHistory (#22065 )	2024-06-05 15:10:38 +00:00
chat_models	Ollama vision support (#22734 )	2024-06-11 16:10:19 +00:00
cross_encoders
docstore	community[patch]: Fix remaining __inits__ in community (#22037 )	2024-05-22 17:42:17 +00:00
document_compressors	community[minor]: add Volcengine Rerank (#22700 )	2024-06-10 13:41:05 -07:00
document_loaders	community[patch]: Load YouTube transcripts (captions) as fixed-duration chunks with start times (#21710 )	2024-06-11 17:44:36 +00:00
document_transformers	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
embeddings	community[minor]: Add support for OVHcloud AI Endpoints Embedding (#22667 )	2024-06-10 21:07:25 +00:00
example_selectors
graphs	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
indexes
llms	community[patch]: Add missing type annotations (#22758 )	2024-06-10 16:59:28 -04:00
memory	community[minor]: Add Zep Cloud components + docs + examples (#21671 )	2024-05-27 12:50:13 -07:00
output_parsers	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
query_constructors
retrievers	community[patch]: Add missing type annotations (#22758 )	2024-06-10 16:59:28 -04:00
storage	community[minor]: fix redis store docstring and streamline initialization code (#22730 )	2024-06-11 14:08:05 +00:00
tools	community[patch]: Add missing type annotations (#22758 )	2024-06-10 16:59:28 -04:00
utilities	community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676 )	2024-06-11 10:34:01 -07:00
utils	community[patch]: Use Custom Logger Instead of Root Logger in get_user_agent Function (#22691 )	2024-06-08 02:33:07 +00:00
vectorstores	community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676 )	2024-06-11 10:34:01 -07:00
__init__.py
cache.py
py.typed