Thumbsup used to stop at the first problem processing a file.
This was a problem on large galleries where you’d need to run it again and again, fixing files as you went.
This change:
- skips problematic files and shows a summary at the end
- logs all warnings/errors to <thumbsup.log> when running the default output
Also refactor and cleanup of the logging logic.
The index is still updated, and the media folder is still scanned to log how many tasks are required.
This is useful to know what thumbsup is about to do, without actually running the expensive tasks.
This will help understand usage patterns to know what to focus on, e.g.
- are many people using thumbsup on Windows?
- are there many galleries with > 10,000 photos?
1. Move from a JSON index to a SQLite database.
- This allows the indexing to be interrupted & resumed
- Updating the index consumes less RAM than loading / saving an entire JSON object
- Loading the index consumes less RAM since it can be streamed, only exacting the properties we need every time (instead of loading all EXIF data in memory, only to discard most of it later)
- These make a big difference when processing 10,000+ photos
2. Switch from <glob> to a manual <readdir>
- Glob would take several hundred or GB of RAM when asked to find several thousand files
- Manual approach with <micromatch> library does the same thing in a fraction of the time / memory usage
3. Exiftool optimisations
- Run 1 exiftool process per CPU, still in batch mode (divide all files to be read into 1 bucket per CPU)
- Stream the exiftool output instead of buffering it in memory
Before changing it back, need to list the rules for extension changes. For example:
- should GIF thumbnail should be JPG, to avoid animations on the album page?
- what about transparent GIFs, will they look weird in JPG?
- maybe GIFs should stay as GIFs, but kept to a single frame only for thumbnails
- same thing for pngs, which might be better kept as PNG for transparency
- all other non-browser-friendy formats should become JPG
These rules will be a lot easier to implement when the new input data structure is in place
One major change here is that thumbnails will always be generated as ".jpg".
This is potentially a breaking change, in the sense that all "png" or "jpeg" thumbnails
would be to re-calculated and re-uploaded.
Since we read all the file metadata for EXIF dates, and we need it as well for the view model,
we should use it to generate the thumbnails and save many calls to glob() and fs.stat()