diff --git a/README.md b/README.md index 4706b063..76da5dc0 100644 --- a/README.md +++ b/README.md @@ -180,6 +180,7 @@ For some architectures it might be required to run Redis JSON on a nonstandard p ### Updating Tube Archivist You will see the current version number of **Tube Archivist** in the footer of the interface. There is a daily version check task querying tubearchivist.com, notifying you of any new releases in the footer. To take advantage of the latest fixes and improvements, make sure you are running the *latest and greatest*. +* This project is tested for updates between one or two releases maximum. Further updates back may or may not be supported and you might have to reset your index and configurations to update. Ideally apply new updates at least once per month. * There can be breaking changes between updates, particularly as the application grows, new environment variables or settings might be required for you to set in the your docker-compose file. *Always* check the **release notes**: Any breaking changes will be marked there. * All testing and development is done with the Elasticsearch version number as mentioned in the provided *docker-compose.yml* file. This will be updated when a new release of Elasticsearch is available. Running an older version of Elasticsearch is most likely not going to result in any issues, but it's still recommended to run the same version as mentioned. Use `bbilly1/tubearchivist-es` to automatically get the recommended version. diff --git a/docker-compose.yml b/docker-compose.yml index 72b65b90..5be45c08 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -34,7 +34,7 @@ services: depends_on: - archivist-es archivist-es: - image: bbilly1/tubearchivist-es # only for amd64, or use official es 8.6.0 + image: bbilly1/tubearchivist-es # only for amd64, or use official es 8.6.2 container_name: archivist-es restart: unless-stopped environment: diff --git a/docs/Search.md b/docs/Search.md index 2c7717b2..f6ad652c 100644 --- a/docs/Search.md +++ b/docs/Search.md @@ -1,31 +1,32 @@ # Search Page Accessible at `/search/` of your **Tube Archivist**, search your archive for Videos, Channels and Playlists - or even full text search throughout your indexed subtitles. +Just start typing to start a **simple** search *or* **start your query with a primary keyword** to search for a specific type and narrow down the result with secondary keywords. Secondary keywords can be in any order. Use *yes* or *no* for boolean values. + +- This will return 30 results per query, pagination is not implemented yet. - All your queries are case insensitive and are normalized to lowercase. - All your queries are analyzed for the english language, this means *singular*, *plural* and word variations like *-ing*, *-ed*, *-able* etc are treated as synonyms. +- Keyword value parsing begins with the `keyword:` name all the way until the end of query or the next keyword, e.g. in `video:learn python channel:corey`, the keyword `video` has value `learn python`. - Fuzzy search is activated for all your searches by default. This can catch typos in your queries or in the matching documents with one to two letters difference, depending on the query length. You can configure fuzziness with the secondary keyword `fuzzy:`, e.g: - `fuzzy:0` or `fuzzy:no`: Deactivate fuzzy matching. - `fuzzy:1`: Set fuzziness to one letter difference. - `fuzzy:2`: Set fuzziness to two letters difference. - All text searches are ranked, meaning the better a match the higher ranked the result. Unless otherwise stated, queries with multiple words are processed with the `and` operator, meaning all words need to match so each word will narrow down the result. -- This will return 30 results per query, pagination is not implemented yet. - -Just start typing to start a *simple* search or start your query with a primary keyword to search for a specific type and narrow down the result with secondary keywords. Secondary keywords can be in any order. Use *yes* or *no* for boolean values. ## Simple -Start your query without a keyword to make a simple query. This will search in *video titles*, *channel names* and *playlist titles* and will return matching videos, channels and playlists. Keyword searches will return more results in a particular category due to the fact that more fields are searched for matches. +Start your query without a keyword to make a simple query (primary keyword `simple:` is implied). This will search in *video titles*, *channel names* and *playlist titles* and will return matching videos, channels and playlists. Keyword searches will return more results in a particular category due to the fact that more fields are searched for matches. Simple queries do not have any secondary keywords. ## Video -Start your query with the primary keyword `video:` to search for videos only. This will search through the *video titles*, *tags* and *category* fields. Narrow your search down with secondary keywords: +Start your query with the **primary keyword** `video:` to search for videos only. This will search through the *video titles*, *tags* and *category* fields. Narrow your search down with secondary keywords: - `channel:` search for videos matching the channel name. - `active:` is a boolean value, to search for videos that are still active on youtube or that are not active any more. **Example**: -- `video:learn python channel:corey shafer active:yes`: This will return all videos with the term *Learn Python* from the channel *Corey Shafer* that are still *Active* on YouTube. +- `video:learn python channel:corey schafer active:yes`: This will return all videos with the term *Learn Python* from the channel *Corey Schafer* that are still *Active* on YouTube. - `video: channel:tom scott active:no`: Note the omitted term after the primary key, this will show all videos from the channel *Tom Scott* that are no longer active on YouTube. ## Channel -Start with the `channel:` primary keyword to search for channels matching your query. This will search through the *channel name* and *channel description* fields. Narrow your search down with secondary keywords: +Start with the `channel:` **primary keyword** to search for channels matching your query. This will search through the *channel name* and *channel description* fields. Narrow your search down with secondary keywords: - `subscribed:` is a boolean value, search for channels that you are subscribed to or not. - `active:` is a boolean value, to search for channels that are still active on YouTube or that are no longer active. @@ -34,7 +35,7 @@ Start with the `channel:` primary keyword to search for channels matching your q - `channel: active:no`: Note the omitted term after the primary key, this will return all channels that are no longer active on YouTube. ## Playlist -Start your query with the primary keyword `playlist:` to search for playlists only. This will search through the *playlist title* and *playlist description* fields. Narrow down your search with these secondary keywords: +Start your query with the **primary keyword** `playlist:` to search for playlists only. This will search through the *playlist title* and *playlist description* fields. Narrow down your search with these secondary keywords: - `subscribed`: is a boolean value, search for playlists that you are subscribed to or not. - `active:` is a boolean value, to search for playlists that are still active on YouTube or that are no longer active. @@ -44,7 +45,7 @@ Start your query with the primary keyword `playlist:` to search for playlists on - `playlist:html css active:yes`: Search for playlists containing *HTML CSS* that are still active on YouTube. ## Full -Start a full text search by beginning your query with the primary keyword `full:`. This will search through your indexed Subtitles showing segments with possible matches. This will only show any results if you have activated *subtitle download and index* on the settings page. The operator for full text searches is `or` meaning when searching for multiple words not all words need to match, but additional words will change the ranking of the result, the more words match and the better they match, the higher ranked the result. The matching words will get highlighted in the text preview. +Start a full text search by beginning your query with the **primary keyword** `full:`. This will search through your indexed Subtitles showing segments with possible matches. This will only show any results if you have activated *subtitle download and index* on the settings page. The operator for full text searches is `or` meaning when searching for multiple words not all words need to match, but additional words will change the ranking of the result, the more words match and the better they match, the higher ranked the result. The matching words will get highlighted in the text preview. Clicking the play button on the thumbnail will open the inplace player at the timestamp from where the segment starts. Same when clicking the video title, this will open the video page and put the player at the segment timestamp. This will overwrite any previous playback position. diff --git a/docs/Settings.md b/docs/Settings.md index b6651357..9ca15199 100644 --- a/docs/Settings.md +++ b/docs/Settings.md @@ -52,16 +52,11 @@ Cookies are used to store your session and contain your access token to your goo ### Auto import Easiest way to import your cookie is to use the **Tube Archivist Companion** [browser extension](https://github.com/tubearchivist/browser-extension) for Firefox and Chrome. -### Alternative Manual Export your cookie -- Install **Cookies.txt** addon for [chrome](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid) or [firefox](https://addons.mozilla.org/firefox/addon/cookies-txt). -- Visit YouTube and login with whichever YouTube account you wish to use to generate the cookies. -- Click on the extension icon in the toolbar - it will drop down showing the active cookies for YT. -- Click Export to export the cookies, filename is by default *cookies.google.txt*. +### Manual import +Alternatively you can also manually import your cookie into Tube Archivist. Export your cookie as a *Netscape* formatted text file, name it *cookies.google.txt* and put it into the *cache/import* folder. After that you can enable the option on the settings page and your cookie file will get imported. -### Alternative Manual Import your cookie -Place the file *cookies.google.txt* into the *cache/import* folder of Tube Archivist and enable the cookie import. Once you click on *Update Application Configurations* to save your changes, your cookie will get imported and stored internally. - -Once imported, a **Validate Cookie File** button will show, where you can confirm if your cookie is working or not. +- There are various tools out there that allow you to export cookies from your browser. This project doesn't make any specific recommendations. +- Once imported, a **Validate Cookie File** button will show, where you can confirm if your cookie is working or not. ### Use your cookie Once imported, additionally to the advantages above, your [Watch Later](https://www.youtube.com/playlist?list=WL) and [Liked Videos](https://www.youtube.com/playlist?list=LL) become a regular playlist you can download and subscribe to as any other [playlist](Playlists). diff --git a/tubearchivist/config/management/commands/ta_startup.py b/tubearchivist/config/management/commands/ta_startup.py index 852447a6..a74358de 100644 --- a/tubearchivist/config/management/commands/ta_startup.py +++ b/tubearchivist/config/management/commands/ta_startup.py @@ -5,6 +5,7 @@ Functionality: """ import os +from time import sleep from django.core.management.base import BaseCommand, CommandError from home.src.es.connect import ElasticWrap @@ -151,6 +152,12 @@ class Command(BaseCommand): for index_name in index_list: path = f"{index_name}/_update_by_query" response, status_code = ElasticWrap(path).post(data=data) + if status_code == 503: + message = f" 🗙 {index_name} retry failed migration." + self.stdout.write(self.style.ERROR(message)) + sleep(10) + response, status_code = ElasticWrap(path).post(data=data) + if status_code == 200: updated = response.get("updated", 0) if not updated: diff --git a/tubearchivist/config/settings.py b/tubearchivist/config/settings.py index 91486f5f..865586da 100644 --- a/tubearchivist/config/settings.py +++ b/tubearchivist/config/settings.py @@ -265,4 +265,4 @@ CORS_ALLOW_HEADERS = list(default_headers) + [ # TA application settings TA_UPSTREAM = "https://github.com/tubearchivist/tubearchivist" -TA_VERSION = "v0.3.3" +TA_VERSION = "v0.3.4" diff --git a/tubearchivist/home/src/download/queue.py b/tubearchivist/home/src/download/queue.py index 14298dc6..9bf4492a 100644 --- a/tubearchivist/home/src/download/queue.py +++ b/tubearchivist/home/src/download/queue.py @@ -158,13 +158,9 @@ class PendingList(PendingIndex): """manage the pending videos list""" yt_obs = { - "default_search": "ytsearch", - "quiet": True, - "check_formats": "selected", "noplaylist": True, "writethumbnail": True, "simulate": True, - "socket_timeout": 3, } def __init__(self, youtube_ids=False): @@ -244,6 +240,7 @@ class PendingList(PendingIndex): for idx, (youtube_id, vid_type) in enumerate(self.missing_videos): print(f"{youtube_id} ({vid_type}): add to download queue") + self._notify_add(idx) video_details = self.get_youtube_details(youtube_id, vid_type) if not video_details: continue @@ -256,8 +253,6 @@ class PendingList(PendingIndex): url = video_details["vid_thumb_url"] ThumbManager(youtube_id).download_video_thumb(url) - self._notify_add(idx) - if bulk_list: # add last newline bulk_list.append("\n") diff --git a/tubearchivist/home/src/download/yt_dlp_base.py b/tubearchivist/home/src/download/yt_dlp_base.py index fd59a9dc..5cecc35a 100644 --- a/tubearchivist/home/src/download/yt_dlp_base.py +++ b/tubearchivist/home/src/download/yt_dlp_base.py @@ -20,8 +20,9 @@ class YtWrap: "default_search": "ytsearch", "quiet": True, "check_formats": "selected", - "socket_timeout": 3, + "socket_timeout": 10, "extractor_retries": 3, + "retries": 10, } def __init__(self, obs_request, config=False): diff --git a/tubearchivist/home/src/download/yt_dlp_handler.py b/tubearchivist/home/src/download/yt_dlp_handler.py index 34855484..be7c71af 100644 --- a/tubearchivist/home/src/download/yt_dlp_handler.py +++ b/tubearchivist/home/src/download/yt_dlp_handler.py @@ -192,7 +192,7 @@ class VideoDownloader: "vid_type", VideoTypeEnum.VIDEOS.value ) video_type = VideoTypeEnum(tmp_vid_type) - print(f"Downloading type: {video_type}") + print(f"{youtube_id}: Downloading type: {video_type}") success = self._dl_single_vid(youtube_id) if not success: @@ -204,7 +204,7 @@ class VideoDownloader: "title": "Indexing....", "message": "Add video metadata to index.", } - RedisArchivist().set_message(self.MSG, mess_dict, expire=60) + RedisArchivist().set_message(self.MSG, mess_dict, expire=120) vid_dict = index_new_video( youtube_id, @@ -223,8 +223,10 @@ class VideoDownloader: if queue.has_item(): message = "Continue with next video." + expire = False else: message = "Download queue is finished." + expire = 10 self.move_to_archive(vid_dict) mess_dict = { @@ -233,7 +235,7 @@ class VideoDownloader: "title": "Completed", "message": message, } - RedisArchivist().set_message(self.MSG, mess_dict, expire=10) + RedisArchivist().set_message(self.MSG, mess_dict, expire=expire) self._delete_from_pending(youtube_id) # post processing @@ -260,7 +262,7 @@ class VideoDownloader: "title": "Looking for videos to download", "message": "Scanning your download queue.", } - RedisArchivist().set_message(self.MSG, mess_dict, expire=True) + RedisArchivist().set_message(self.MSG, mess_dict) pending = PendingList() pending.get_download() to_add = [ @@ -293,8 +295,11 @@ class VideoDownloader: title = "Downloading: " + response["info_dict"]["title"] try: + size = response.get("_total_bytes_str") + if size.strip() == "N/A": + size = response.get("_total_bytes_estimate_str", "N/A") + percent = response["_percent_str"] - size = response["_total_bytes_str"] speed = response["_speed_str"] eta = response["_eta_str"] message = f"{percent} of {size} at {speed} - time left: {eta}" @@ -318,7 +323,6 @@ class VideoDownloader: def _build_obs_basic(self): """initial obs""" self.obs = { - "default_search": "ytsearch", "merge_output_format": "mp4", "outtmpl": ( self.config["application"]["cache_dir"] @@ -326,13 +330,9 @@ class VideoDownloader: ), "progress_hooks": [self._progress_hook], "noprogress": True, - "quiet": True, "continuedl": True, - "retries": 3, "writethumbnail": False, "noplaylist": True, - "check_formats": "selected", - "socket_timeout": 3, } def _build_obs_user(self): diff --git a/tubearchivist/home/src/index/comments.py b/tubearchivist/home/src/index/comments.py index e9ec0d9a..32cea557 100644 --- a/tubearchivist/home/src/index/comments.py +++ b/tubearchivist/home/src/index/comments.py @@ -210,6 +210,9 @@ class CommentList: return total_videos = len(self.video_ids) + if notify: + self._notify(f"add comments for {total_videos} videos", False) + for idx, video_id in enumerate(self.video_ids): comment = Comments(video_id, config=self.config) if notify: @@ -219,16 +222,16 @@ class CommentList: comment.upload_comments() if notify: - self.notify_final(total_videos) + self._notify(f"added comments for {total_videos} videos", 5) @staticmethod - def notify_final(total_videos): - """send final notification""" + def _notify(message, expire): + """send notification""" key = "message:download" message = { "status": key, "level": "info", "title": "Download and index comments finished", - "message": f"added comments for {total_videos} videos", + "message": message, } - RedisArchivist().set_message(key, message, expire=4) + RedisArchivist().set_message(key, message, expire=expire) diff --git a/tubearchivist/home/templates/home/search.html b/tubearchivist/home/templates/home/search.html index b333baa8..5bbfe153 100644 --- a/tubearchivist/home/templates/home/search.html +++ b/tubearchivist/home/templates/home/search.html @@ -7,30 +7,80 @@ -
-

Video Results

-
-

No videos found.

+ -
-

Channel Results

-
-

No channels found.

+
+

Channel Results

+
+

No channels found.

+
-
-
-

Playlist Results

-
-

No playlists found.

+
+

Playlist Results

+
+

No playlists found.

+
+
+
+

Fulltext Results

+
+

No fulltext results found.

+
-
-

Fulltext Results

-
-

No fulltext results found.

+
+
+

Example queries

+
    +
  • music video — basic search
  • +
  • video: active:no — all videos deleted from YouTube
  • +
  • video:learn javascript channel:corey schafer active:yes
  • +
  • channel:linux subscribed:yes
  • +
  • playlist:backend engineering active:yes subscribed:yes
  • +
+
+
+

Keywords cheatsheet

+

For detailed usage check wiki.

+
+
    +
  • simple: (implied) — search in video titles, channel names and playlist titles
  • +
  • + video: — search in video titles, tags and category field +
      +
    • channel: — channel name
    • +
    • active:yes/no — whether the video is still active on YouTube
    • +
    +
  • +
  • + channel: — search in channel name and channel description +
      +
    • subscribed:yes/no — whether you are subscribed to the channel
    • +
    • active:yes/no — whether the video is still active on YouTube
    • +
    +
  • +
  • + playlist: — search in channel name and channel description +
      +
    • subscribed:yes/no — whether you are subscribed to the channel
    • +
    • active:yes/no — whether the video is still active on YouTube
    • +
    +
  • +
  • + full: — search in video subtitles +
      +
    • lang: — subtitles language (use two-letter ISO country code, same as the one from settings page)
    • +
    • source:auto/user — auto to search though auto-generated subtitles only, or user to search through user-uploaded subtitles only
    • +
    +
  • +
+
diff --git a/tubearchivist/requirements.txt b/tubearchivist/requirements.txt index 00a9e1a8..721df00e 100644 --- a/tubearchivist/requirements.txt +++ b/tubearchivist/requirements.txt @@ -10,4 +10,4 @@ requests==2.28.2 ryd-client==0.0.6 uWSGI==2.0.21 whitenoise==6.4.0 -yt_dlp==2023.2.17 +yt_dlp==2023.3.3 diff --git a/tubearchivist/static/css/style.css b/tubearchivist/static/css/style.css index 56ab4e35..dbf76f90 100644 --- a/tubearchivist/static/css/style.css +++ b/tubearchivist/static/css/style.css @@ -892,10 +892,24 @@ video:-webkit-full-screen { width: 100%; } -.multi-search-result { +.multi-search-result, #multi-search-results-placeholder { padding: 1rem 0; } +#multi-search-results-placeholder span { + font-family: monospace; + color: var(--accent-font-dark); + background-color: var(--highlight-bg); +} + +#multi-search-results-placeholder span.value { + color: var(--accent-font-light); +} + +#multi-search-results-placeholder ul { + margin-top: 10px; +} + /* channel overview page */ .channel-list.list { display: block; diff --git a/tubearchivist/static/script.js b/tubearchivist/static/script.js index 8dfff95b..ff8e8cc5 100644 --- a/tubearchivist/static/script.js +++ b/tubearchivist/static/script.js @@ -865,21 +865,33 @@ function setProgressBar(videoId, currentTime, duration) { // multi search form let searchTimeout = null; +let searchHttpRequest = null; function searchMulti(query) { clearTimeout(searchTimeout); searchTimeout = setTimeout(function () { - if (query.length > 1) { - let http = new XMLHttpRequest(); - http.onreadystatechange = function () { - if (http.readyState === 4) { - let response = JSON.parse(http.response); + if (query.length > 0) { + if (searchHttpRequest) { + searchHttpRequest.abort(); + } + searchHttpRequest = new XMLHttpRequest(); + searchHttpRequest.onreadystatechange = function () { + if (searchHttpRequest.readyState === 4) { + const response = JSON.parse(searchHttpRequest.response); populateMultiSearchResults(response.results, response.queryType); } }; - http.open('GET', `/api/search/?query=${query}`, true); - http.setRequestHeader('X-CSRFToken', getCookie('csrftoken')); - http.setRequestHeader('Content-type', 'application/json'); - http.send(); + searchHttpRequest.open('GET', `/api/search/?query=${query}`, true); + searchHttpRequest.setRequestHeader('X-CSRFToken', getCookie('csrftoken')); + searchHttpRequest.setRequestHeader('Content-type', 'application/json'); + searchHttpRequest.send(); + } else { + if (searchHttpRequest) { + searchHttpRequest.abort(); + searchHttpRequest = null; + } + // show the placeholder container and hide the results container + document.getElementById('multi-search-results').style.display = 'none'; + document.getElementById('multi-search-results-placeholder').style.display = 'block'; } }, 500); } @@ -890,6 +902,9 @@ function getViewDefaults(view) { } function populateMultiSearchResults(allResults, queryType) { + // show the results container and hide the placeholder container + document.getElementById('multi-search-results').style.display = 'block'; + document.getElementById('multi-search-results-placeholder').style.display = 'none'; // videos let defaultVideo = getViewDefaults('home'); let allVideos = allResults.video_results;