- Red - Program settings, contains info about included/excluded directories which user may want to check. Also, there is a tab with allowed extensions, which allows users to choose which type of files they want to check. Next category is Excluded items, which allows to discard specific path by using `*` wildcard - so `/home/ra*` means that e.g. `/home/rafal/` will be ignored but not `/home/czkawka/`. The last one is settings tab which allows to save configuration of the program, reset and load it when needed.
- Green - This allows to choose which tool we want to use.
- Blue - Here are settings for the current tool, which we want/need to configure
- Pink - Window in which results of searching are printed
- Yellow - Box with buttons like `Search`(starts searching with the currently selected tool), `Hide Text View`(hides text box at the bottom with white overlay), `Symlink`(creates symlink to selected file), `Select`(shows options to select specific rows), `Delete`(deletes selected files), `Save`(save to file the search result) - some buttons are only visible when at least one result is visible.
- Brown - Small informative field to show informations e.g. about number of found duplicate files
- Stop - button in progress dialog, allows to easily stop current task. Sometimes it may take a few seconds until all atomic operations end and GUI will become responsive again
To open multiple file just select desired files with CTRL key pressed and still when clicking this key, double click at selected items with left mouse button.
By default, all tools only write about results to console, but it is possible with specific arguments to delete some files/arguments or save it to file.
-`cache_similar_image_SIZE_HASH_FILTER.txt` - stores cache data and hashes which may be used later without needing to compute image hash again - editing this file manually is not recommended, but it is allowed. Each algorithms uses its own file, because hashes are completely different in each.
-`cache_duplicates_Blake3.txt` - stores cache data of duplicated files, to not suffer too big of a performance hit when saving/loading file, only already fully hashed files bigger than 5MB are stored. Similar files with replaced `Blake3` to e.g. `SHA256` may be shown, when support for new hashes will be introduced in Czkawka.
You can manually edit config file `czkawka_gui_config.txt` and add/remove/change directories as you want. After setting required values, configuration must be loaded to Czkawka.
- **Slow checking of little number similar images**
If you checked before a large number of images (several tens of thousands) and they are still present on the disk, then the required information about all of them is loaded from and saved to the cache, even if you are working with only few image files. You can rename one of cache file which starts from `cache_similar_image`(to be able to use it again) or delete it - cache will then regenerate but with smaller number of entries and this way it should load and save a lot of faster.
After searching for them you should check at which element it points to and if it does not exist, add this symlinks into the list of invalid symlinks, pointing to a non-existent path.
The second mode is to detect recursive symlink. Unfortunately, this mode does not work and it displays when using it an error of a non-existent target element, but it is implemented by counting the jumps of the symlink and after exceeding a certain number (e.g. 20) it is considered that the given symlink is recursive.
The tool first collects images with specific extensions that can be checked - `[".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif", ".pnm", ".tga", ".ff", ".gif", ".jif", ".jfi", ".ico", ".webp", ".avif"]`.
Computed hash data is then thrown into a special tree that allows to compare hashes using [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance).
Finally, each hash is compared with the others and if the distance between them is less than the maximum distance specified by the user, the images are considered similar and thrown from the pool of images to be searched.
It is possible to choose one of 5 types of hashes - `Gradient`, `Mean`, `VertGradient`, `Blockhash`, `DoubleGradient`.
Before calculating hashes usually images are resized with specific algorithm(`Lanczos3`, `Gaussian`, `CatmullRom`, `Triangle`, `Nearest`) to e.g. 8x8 or 16x16 image(allowed sizes - `4x4`, `8x8`, `16x16`), which allows simplifying later computations. Both size and filter can be adjusted in application.
Each configuration saves results to different cache files to save users from invalid results.
Some images broke hash functions and create hashes full of `0` or `255`, so these images are silently excluded(probably proper error reporting should be provided).
You can test each algorithm with provided CLI tool, just put to folder `test.jpg` file and run inside this command `czkawka_cli tester -i`
Some tidbits:
- Smaller hash size not always means that calculating it will take more time
-`Blockhash` is the only algorithm that don't resize images before hashing
-`Nearest` resize algorithm can be faster even 5 times than any other available but provide worse results
Only some file extensions are supported, because I rely on external crates. Also, some false positives may be shown(e.g. https://github.com/image-rs/jpeg-decoder/issues/130) so always open file to check if it is really broken.