thumbsup

mirror of https://github.com/thumbsup/thumbsup synced 2024-11-09 13:10:28 +00:00

History

Romain 24b2f9bd7c Major optimisations: SQLite index + faster disk glob + new exiftool streaming 1. Move from a JSON index to a SQLite database. - This allows the indexing to be interrupted & resumed - Updating the index consumes less RAM than loading / saving an entire JSON object - Loading the index consumes less RAM since it can be streamed, only exacting the properties we need every time (instead of loading all EXIF data in memory, only to discard most of it later) - These make a big difference when processing 10,000+ photos 2. Switch from <glob> to a manual <readdir> - Glob would take several hundred or GB of RAM when asked to find several thousand files - Manual approach with <micromatch> library does the same thing in a fraction of the time / memory usage 3. Exiftool optimisations - Run 1 exiftool process per CPU, still in batch mode (divide all files to be read into 1 bucket per CPU) - Stream the exiftool output instead of buffering it in memory	2017-11-24 22:08:59 +11:00
..
beach.jpg	Major optimisations: SQLite index + faster disk glob + new exiftool streaming	2017-11-24 22:08:59 +11:00
tower.jpg	Major optimisations: SQLite index + faster disk glob + new exiftool streaming	2017-11-24 22:08:59 +11:00

Romain 24b2f9bd7c Major optimisations: SQLite index + faster disk glob + new exiftool streaming

1. Move from a JSON index to a SQLite database.
  - This allows the indexing to be interrupted & resumed
  - Updating the index consumes less RAM than loading / saving an entire JSON object
  - Loading the index consumes less RAM since it can be streamed, only exacting the properties we need every time (instead of loading all EXIF data in memory, only to discard most of it later)
  - These make a big difference when processing 10,000+ photos

2. Switch from <glob> to a manual <readdir>
  - Glob would take several hundred or GB of RAM when asked to find several thousand files
  - Manual approach with <micromatch> library does the same thing in a fraction of the time / memory usage

3. Exiftool optimisations
  - Run 1 exiftool process per CPU, still in batch mode (divide all files to be read into 1 bucket per CPU)
  - Stream the exiftool output instead of buffering it in memory

2017-11-24 22:08:59 +11:00

beach.jpg

Major optimisations: SQLite index + faster disk glob + new exiftool streaming

2017-11-24 22:08:59 +11:00

tower.jpg

Major optimisations: SQLite index + faster disk glob + new exiftool streaming

2017-11-24 22:08:59 +11:00