langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-06 03:20:49 +00:00

History

Mr. Lance E Sloan «UMich» 84dc2dd059 community[patch]: Load YouTube transcripts (captions) as fixed-duration chunks with start times (#21710 ) - Description: Add a new format, `CHUNKS`, to `langchain_community.document_loaders.youtube.YoutubeLoader` which creates multiple `Document` objects from YouTube video transcripts (captions), each of a fixed duration. The metadata of each chunk `Document` includes the start time of each one and a URL to that time in the video on the YouTube website. I had implemented this for UMich (@umich-its-ai) in a local module, but it makes sense to contribute this to LangChain community for all to benefit and to simplify maintenance. - Issue: N/A - Dependencies: N/A - Twitter: lsloan_umich - Mastodon: [lsloan@mastodon.social](https://mastodon.social/@lsloan) With regards to tests and documentation, most existing features of the `YoutubeLoader` class are not tested. Only the `YoutubeLoader.extract_video_id()` static method had a test. However, while I was waiting for this PR to be reviewed and merged, I had time to add a test for the chunking feature I've proposed in this PR. I have added an example of using chunking to the `docs/docs/integrations/document_loaders/youtube_transcript.ipynb` notebook. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>		2024-06-11 17:44:36 +00:00
..
cli	couchbase: Add the initial version of Couchbase partner package (#22087 )	2024-06-07 14:04:08 -07:00
community	community[patch]: Load YouTube transcripts (captions) as fixed-duration chunks with start times (#21710 )	2024-06-11 17:44:36 +00:00
core	core: fix mustache falsy cases (#22747 )	2024-06-10 14:00:12 -07:00
experimental	multiple: get rid of pyproject extras (#22581 )	2024-06-06 15:45:22 -07:00
langchain	langchain[minor]: Add native async implementation to LLMFilter, add concurrency to both sync and async paths (#22739 )	2024-06-11 10:55:40 -04:00
partners	docs: standardize ChatHuggingFace (#22693 )	2024-06-10 20:54:36 +00:00
standard-tests	multiple: add `stop` attribute (#22573 )	2024-06-06 12:11:52 -04:00
text-splitters	Community[minor]: Add language parser for Elixir (#22742 )	2024-06-10 15:56:57 +00:00