* Spicing up the README
- Making it more readable
- Better English, easier to read
- Hiding links
- Fixing the absolute hideous tables which were impossible to read in the raw readme
* Fixed some things, not a lot though.
Precompiled binaries are available here - https://github.com/qarmin/czkawka/releases/.
Ready-to-go executables are available [**here**](https://github.com/qarmin/czkawka/releases/).
If the app does not run when clicking at a launcher, run it through a terminal.
You don't need to have any additional libraries for CLI Czkawka
#### GUI Requirements
### GUI Requirements
##### Linux
For Czkawka GUI you need to have at least GTK 3.22 and also Alsa installed(for finding broken music files).
It should be installed by default on all the most popular distros.
For Czkawka GUI you are required to have at least `GTK 3.22` and also `Alsa` installed (for finding broken music
files). It should be installed by default on all the most popular distros.
##### Windows
`czkawka_gui.exe` extracted from zip file `windows_czkawka_gui.zip` needs to have all files inside around, because use them.
If you want to move somewhere else exe binary and open it, you need to install GTK 3 runtime from site https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases
##### MacOS
For now you need to install manually GTK 3 libraries, because are dynamically loaded from OS(Help needed to use static linking).
To install it you need to type this commands in terminal
The `czkawka_gui.exe` which is extracted from the `windows_czkawka_gui.zip` zip file needs to be in the same
file as the rest. If you want to move and open the executable somewhere else, you need to install the `GTK 3`
runtime from [**here**](https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases).
##### macOS
Currently you need to manually install `GTK 3` libraries, because they are dynamically loaded from the OS (*we need
help in using static linking*). Installation in the terminal:
Next you need to go to place where you downloaded app and add executable bit
After that, go to the location where you installed this and add the `executable` permission.
```shell
chmod +x mac_czkawka_gui
```
At the end you can open this app
Execute in the same folder with:
```shell
./mac_czkawka_gui
```
### Appimage
Appimage files are available in release page - https://github.com/qarmin/czkawka/releases/
For now looks that there is a bug with this format, because it doesn't allow opening two images/files at once.
Appimage files are available in release page - [**Github releases**](https://github.com/qarmin/czkawka/releases/)
There is a problem with this currently, as it doesn't allow you to open two images/files at once.
### Cargo
The easiest method to install Czkawka is to use Cargo command, since it basically compile an entire app, you need to install required packages from `Compilation` section
The easiest method to install Czkawka is using the `cargo` command. For compiling it, you need to get all the
requirements from the [compilation section](#Compilation)
```
cargo install czkawka_gui
cargo install czkawka_cli
```
You can update package by typing same command.
You can update the package with the same command.
### Snap
Snap also are available, but there is no access to external drives.
Snaps also are available, but there is no access to external drives.
```
sudo snap install czkawka
```
Snap store entry - https://snapcraft.io/czkawka
The Snap store entry can be found [**here**](https://snapcraft.io/czkawka).
Edgy builds are build for every commit, but it may be a little unstable(very rarely, because I'm not pushing untested code).
Fresh builds are created for every commit, but they may be a little unstable, although that happenes very rarely
because I don't push untested code.
<!-- Dunno if the flatpak section should be here, because it just takes valuable
space, but if you want to keep it you do you. I'm only fixing the english
here. -->
### Flatpak
Maybe someday
### Debian/Ubuntu repository and PPA
Tried to set up it, but for now I have problems described in this issue
Since Czkawka is written in Rust and aims to be a faster alternative to FSlint (written in Python), we need to compare the speed of these tools.
I tested it on SSD Disk 256 GB GoodRam and i7 4770 CPU.
I prepared a directory and performed a test without any folder exceptions(I removed all directories from FSlint and Czkawka from other tabs than Include Directory) which contained 229868 files which took 203,7 GB and 13708 duplicates files in 9117 groups which took 7.90 GB.
Since Czkawka is written in Rust and it aims to be a faster alternative to FSlint (which written in Python), we need
to compare the speed of these tools.
I tested it on a 256 GB SSD and a i7-4770 CPU.
I prepared a directory and performed a test without any folder exceptions (I removed all directories from FSlint and
Czkawka from other tabs than Include Directory) which contained 229 868 files, took 203.7 GB and had 13 708 duplicate
files in 9117 groups which took 7.90 GB.
Minimum file size to check I set to 1 KB on all programs
Minimum file size to check I set to 1 KB on all programs.
| App| Executing Time |
|:----------:|:-------------:|
| FSlint 2.4.7 (Second Run)| 86s |
| Czkawka 1.4.0 (Second Run) | 12s |
| DupeGuru 4.0.4 (Second Run) | 28s |
| App| Executing Time |
|:---------------------------:|:--------------:|
| FSlint 2.4.7 (Second Run)| 86s |
| Czkawka 1.4.0 (Second Run) | 12s |
| DupeGuru 4.0.4 (Second Run) | 28s |
I used Mprof for checking memory usage FSlint and Dupeguru, for Czkawka I used Heaptrack.
To not get Dupeguru crash I checked smaller directory with 217986 files and 41883 folders.
I used Mprof for checking memory usage FSlint and DupeGuru, for Czkawka I used Heaptrack.
To not get a crash from DupeGuru I checked a smaller directory with 217 986 files and 41 883 folders.
| App| Idle Ram | Max Operational Ram Usage | Stabilized after search |
- Feature proposals - proposal to change/add/delete some features
- Pull Requests - implementing a new feature yourself or fixing bugs, but you have to pay attention to code quality. If the change is bigger, then it's a good idea to open a new issue to discuss changes.
- Documentation - There is [instruction](instructions/Instruction.md) which you can improve.
- Pull Requests - implementing a new feature yourself or fixing bugs, but you have to pay attention to code quality.
If the change is bigger, then it's a good idea to open a new issue to discuss changes.
- Documentation - There is an [instruction](instructions/Instruction.md) which you can improve.
The code should be clean and well formatted (Clippy and fmt are required in each PR).
@ -245,20 +286,22 @@ Czkawka is a Polish word which means _hiccup_.
I chose this name because I wanted to hear people speaking other languages pronounce it.
This name is not as bad as it seems, because I was also thinking about using words like _żółć_, _gżegżółka_ or _żołądź_, but I gave up on these ideas because they contained Polish characters, which would cause difficulty in searching for the project.
This name is not as bad as it seems, because I was also thinking about using words like _żółć_, _gżegżółka_ or _żołądź_,
but I gave up on these ideas because they contained Polish characters, which would cause difficulty in searching for the project.
At the beginning of the program creation, if the response concerning the name was unanimously negative, I prepared myself for a possible change of the name of the program, but the opinions were extremely mixed.
At the beginning of the program creation, if the response concerning the name was unanimously negative, I prepared myself
for a possible change of the name of the program, and the opinions were extremely mixed.
## License
Code is distributed under MIT license.
Icon is created by [jannuary](https://github.com/jannuary) and licensed CC-BY-4.0.
Windows dark theme is used from AdMin repo - https://github.com/nrhodes91/AdMin with MIT license
Windows dark theme is used from [AdMin repo](https://github.com/nrhodes91/AdMin) with MIT license
Program is completely free to use.
The program is completely free to use.
"Gratis to uczciwa cena" - "Free is a fair price"
## Donations
If you are using the app, I would appreciate a donation for its further development - https://github.com/sponsors/qarmin.
If you are using the app, I would appreciate a donation for its further development, which can be done [here](https://github.com/sponsors/qarmin).
Czkawka for now contains two independent frontends - Console and Graphical interface which share the core module which contains basic and common functions used by each frontend.
Czkawka for now contains two independent frontends - the terminal and graphical interface which share the core module.
Using Rust language without unsafe code, helps to create safe, fast with small resource requirements.
This code also has good support for multi-threading.
The code has very good support for multithreading, so the better processor/disk the performance should increase exponentially.
# Tools
## Tools - How works?
### Duplicate Finder
Duplicate Finder allows you to search for files and group them according to a predefined criterion:
- **By name** - Groups files by name e.g. `/home/rafal/plik.txt` will be treat like duplicate of file `/home/romb/plik.txt`. This is the fastest method, but it is very unreliable and should not be used unless you know what you are doing.
- **By size** - Groups files by its size(in bytes), which must be exactly the same. It is as fast as the previous mode and usually gives much more correct results with duplicates, but I also do not recommend using it if you do not know what you are doing.
- **By hash** - A mode containing a check of the hash (cryptographic hash) of a given file which determines with great probability whether the files are identical.
This is the slowest, but almost 100% sure way to check the files.
- **By name** - Groups files by name e.g. `/home/john/cats.txt` will be treated like a duplicate of a file named
`/home/lucy/cats.txt`. This is the fastest method, but it is very unreliable and should not be used unless you know
what you are doing.
- **By size** - Groups files by their size (in bytes and perfect matches only). It is as fast as the previous mode and
usually gives better results with duplicates, but I also do not recommend using it if you do not know what you are doing.
Because the hash is only checked inside groups of files of the same size, it is practically impossible for two different files to be considered identical.
- **By hash** - A mode containing a check of the hash (cryptographic hash) of a given file which determines with great
probability whether the files are identical.
This is the slowest, but almost 100% sure way to check the files.
It consists of 3 parts:
- Grouping files of identical size - allows you to throw away files of unique size, which are already known to have no duplicates at this stage.
- PreHash check - Each group of files of identical size is placed in a queue using all processor threads (each action in the group is independent of the others).
In each such group a small fragment of each file (2KB) is loaded in turn and then hashed. All files whose partial hashes are unique within the group are removed from it. Using this step usually allowed me to reduce the time of searching for duplicates even by half.
- Checking the Hash - After leaving files that have the same beginning in groups, you should now check the whole contents of the file to make sure they are identical.
Because the hash is only checked inside groups of files of the same size, it is practically impossible for two different
files to be considered identical.
- **By hashmb** - Works the same way as via hash, only in the last phase it does not check the whole file but only its first Megabyte. It is perfect for quick search of possible duplicate files.
It consists of 3 steps:
- Grouping files of identical size - allows you to throw away files of unique size, which are already known to have no
duplicates at this stage.
- PreHash check - Each group of files of identical size is placed in a queue using all processor threads (each action in
the group is independent of the others). In each such group a small fragment of each file (2KB) is loaded in turn and
then hashed. All files whose partial hashes are unique within the group are removed from it. Using this step usually
allows me to reduce the time of searching for duplicates even by half.
- Checking the hash - After leaving files that have the same beginning in groups, you should now check the whole contents
of the file to make sure they are identical.
- **By hashmb** - Works the same way as via hash, only in the last phase it does not check the whole file but only its first
megabyte. It is perfect for quick search of possible duplicate files.
### Empty Files
Searching for empty files is rather easy, because we only need to read file metadata and check if its length is 0.
Searching for empty files is easy and fast, because we only need check the file metadata and its length.
### Empty Directories
Empty directories are those that do not contain any other files, symbolic links, etc. unless they are other empty directories.
At the beginning, a special entry is created for each directory containing - the parent path (only if it is not a folder directly selected by the user) and a flag to indicate whether the given directory is empty(at the beginning each one is potentially empty).
At the beginning, a special entry is created for each directory containing - the parent path (only if it is not a folder
directly selected by the user) and a flag to indicate whether the given directory is empty (at the beginning each one is
set to be potentionally empty).
First, user-defined folders are put into the pool of folders to be checked.
Each element is checked to see if it is
- folder - this folder is added to the check queue as possible empty - `FolderEmptiness::Maybe`
- anything else - the given folder is "poisoned" with the `FolderEmptiness::No` flag, indicating that the folder is no longer empty. Then each folder directly or indirectly containing the file is also poisoned with the `FolderEmptiness::No` flag.
- anything else - the given folder is "poisoned" with the `FolderEmptiness::No` flag, indicating that the folder is no longer
empty. Then each folder directly or indirectly containing the file is also poisoned with the `FolderEmptiness::No` flag.
e.g. There is 4 checked folder which may be empty `/krowa/`, `/krowa/ucho/`, `/krowa/ucho/stos/`, `/krowa/ucho/flaga/`.
Example: there are 4 checked folders which *may* be empty `/cow/`, `/cow/ear/`, `/cow/ear/stack/`, `/cow/ear/flag/`.
In the last one is found a file, so that means that `/krowa/ucho/flaga/` is not empty and also all parents - `/krowa/ucho/` and `/krowa/`.
`/krowa/ucho/stos/` still may be empty.
The last folder contains a file, so that means that `/cow/ear/flag` is not empty and also all its parents - `/cow/ear/` and `/cow/`,
but `/cow/ear/stack/` may still be empty.
Finally, all folders with the flag `FolderEmpriness::Maybe` are considered empty
Finally, all folders with the flag `FolderEmptiness::Maybe` are defaulted to empty.
### Big Files
From each file inside the given path its size is read and then after sorting it, e.g. 50 largest files are displayed.