2
0
mirror of https://github.com/qarmin/czkawka synced 2024-10-31 21:20:19 +00:00
Go to file
2020-11-08 16:19:40 +01:00
.github/workflows Add appimage support (#77) 2020-10-18 17:08:29 +02:00
czkawka_cli Similar images improvements: (#97) 2020-11-07 16:26:40 +01:00
czkawka_core Add support for multithreading in duplicate finder (#100) 2020-11-08 16:19:40 +01:00
czkawka_gui Add multithread support to similar image finder (#98) 2020-11-08 07:41:29 +01:00
czkawka_gui_orbtk Release version 1.3.0 2020-11-02 22:47:36 +01:00
misc Release version 1.3.0 2020-11-02 22:47:36 +01:00
pkgs Add appimage support (#77) 2020-10-18 17:08:29 +02:00
.gitignore Add test project to automatically find regressions (#65) 2020-10-13 15:56:04 +02:00
.rustfmt.toml Removed almost all occurrences of println from core 2020-09-11 15:52:06 +02:00
Cargo.lock Add multithread support to similar image finder (#98) 2020-11-08 07:41:29 +01:00
Cargo.toml Better explanation, starting working with GUI 2020-09-03 17:33:43 +02:00
Changelog.md Release version 1.3.0 2020-11-02 22:47:36 +01:00
icon.png Add appimage support (#77) 2020-10-18 17:08:29 +02:00
icon.svg Add appimage support (#77) 2020-10-18 17:08:29 +02:00
Instruction.md Add similar images support for GUI (#69) 2020-10-15 09:04:02 +02:00
LICENSE Add license 2020-09-11 13:41:00 +02:00
README.md Add support for multithreading to finding same music (#99) 2020-11-08 09:10:49 +01:00

Czkawka

Czkawka is a simple, fast and easy to use alternative to FSlint, written in Rust.
This is my first ever project in Rust so many things might not be written in the most optimal way.

Czkawka

Why?

There's a lot of tools for finding duplicates, empty folders, temporary files etc. on the Internet, but in most cases these are only available as CLI, which is hard to use by users.

GUI FSlint allows selecting different files and folders easily, but is based on old and unsupported Python 2 and GTK 2.

Other tools are usually written in C/C++ for high performance but still need to be tested a lot for memory leaks, invalid memory reads/writes and double frees.

But the most important thing for me was to learn Rust and create a program useful for the open source community.

Features

  • Written in memory safe Rust
  • Amazingly fast - due using more or less advanced algorithms
  • CLI frontend, very fast and powerful with rich help
  • GUI GTK frontend - uses modern GTK 3 and looks similar to FSlint
  • Light/Dark theme match the appearance of the system
  • Saving results to a file - allows reading entries found by the tool easily
  • Rich search option - allows setting absolute included and excluded directories, set of allowed file extensions or excluded items with * wildcard
  • Clean Glade file in which UI can be easily modernized
  • Multiple tools to use:
    • Duplicates - Finds duplicates basing on size(fast), hash(accurate), first 1MB of hash(moderate)
    • Empty Folders - Finds empty folders with the help of advanced algorithm
    • Big Files - Finds provided number of the biggest files in given location
    • Empty Files - Looks for empty files across disk
    • Temporary Files - Allows finding temporary files
    • Similar Files - Finds files which are not exactly the same
    • Zeroed Files - Find files which are filled with zeros(usually corrupted)
    • Same Music - Search for music with same artist, album etc.

Usage and requirements

Precompiled binaries

For Linux of the program, the only requirement is having GTK 3.22+ installed on system.

Precompiled binaries are available here - https://github.com/qarmin/czkawka/releases/ If the app does not run when clicking at a launcher, run it through a terminal.

Appimage

Appimage files are available in release page, same as native binaries and minimal required version of OS is Ubuntu 18.04 - https://github.com/qarmin/czkawka/releases/

Cargo

Easier method to install Czkawka is to use Cargo command(you must have installed GTK libraries in OS)

cargo install czkawka_gui

You can update package by typing same command.

Snap, Flatpak

Maybe someday

Debian/Ubuntu repository and PPA

Tried to set up it, but for now I have problems described in this issue

https://salsa.debian.org/rust-team/debcargo-conf/-/issues/21

AUR - Arch Linux Package (unofficial)

Czkawka is also available in Arch Linux's AUR from which it can be easily downloaded and installed on the system.

yay -Syu czkawka-git

This is unofficial package, so new versions will not be always available.

Devel versions

Artifacts from each commit you can also download here - https://github.com/qarmin/czkawka/actions

Compilation

Requirements

Rust 1.46 - probably lower also works fine(1.40 is needed by GTK)
GTK 3.22 - for GTK backend

For now only Linux (and maybe also macOS) is supported

  • Install requirements for GTK
apt install -y libgtk-3-dev

Compilation from source

  • Download the source
git clone https://github.com/qarmin/czkawka.git
cd czkawka
  • Run GTK GUI
cargo run --bin czkawka_gui

For Linux-to-Windows cross-building instruction look at the CI. GUI GTK

cargo run --bin czkawka_gui_orbtk

GUI Orbtk

  • Run CLI(this will print help with a lot of examples)
cargo run --bin czkawka_cli

CLI

Benchmarks

Since Czkawka is written in Rust and aims to be a faster alternative to FSlint (written in Python), we need to compare the speed of these tools.

Currently, I'm working on multithreading support in Czkawka so benchmarks should be updated in versions 1.4.0+. Also Dupeguru probably will have new 4.0.5 release soon.

I prepared a directory and performed a test without any folder exceptions(I removed all directories from FSlint and Czkawka from other tabs than Include Directory) which contained 320004 files and 36902 folders and 108844 duplicates files in 34475 groups which took 4.53 GB.

Minimum file size to check I set to 1 KB on all programs

The first run reads every file entry and saves it to cache, so this step is limited mostly by disk performance. In the second run the cache helps it, so searching is sometimes faster (with few duplicates even 10x faster).

DupeGuru after selecting files, froze at 45% for ~15 minutes, so I just kill it.

App Executing Time
FSlint 2.4.7 (First Run) 255s
FSlint 2.4.7 (Second Run) 126s
Czkawka 1.3.0 (First Run) 150s
Czkawka 1.3.0 (Second Run) 107s
DupeGuru 4.0.4 (First Run) -
DupeGuru 4.0.4 (Second Run) -

I used Mprof for checking memory usage FSlint and Dupeguru, for Czkawka I used Heaptrack. To not get Dupeguru crash I checked smaller directory with 217986 files and 41883 folders.

App Idle Ram Max Operational Ram Usage Stabilized after search
FSlint 2.4.7 54 MB 120 MB 117 MB
Czkawka 1.3.0 8 MB 42 MB 41 MB
DupeGuru 4.0.4 110 MB 637 MB 602 MB

Similar Images which check 386 files which takes 1,9GB

App Scan time
Czkawka 1.3.0 267s
DupeGuru 4.0.4 75s

Similar Images which check 5018 files which takes 389MB

App Scan time
Czkawka 1.3.0 45s
DupeGuru 4.0.4 87s

So still is a big room for improvements.

Comparsion other tools

Czkawka FSlint DupeGuru
Language Rust Python Python/Objective C
OS Linux, Windows, Mac(only CLI) Linux Linux, Windows, Mac
Framework GTK 3 (Gtk-rs) GTK 2 (PyGTK) Qt 5 (PyQt)/Cocoa
Ram Usage Low Medium Very High
Duplicate finder X X X
Empty files X X
Empty folders X X
Temporary files X X
Big files X
Similar images X X
Zeroed Files X
Music duplicates(EXIF) X X
Installed packages X
Invalid names X
Names conflict X
Invalid symlinks X
Bad ID X
Non stripped binaries X
Redundant whitespace X
Multiple languages(po) X X
Project Activity High Very Low High

Contributions

Contributions to this repository are welcome.

You can help by creating:

  • Bug report - memory leaks, unexpected behavior, crashes
  • Feature proposals - proposal to change/add/delete some features
  • Pull Requests - implementing a new feature yourself or fixing bugs, but you have to pay attention to code quality. If the change is bigger, then it's a good idea to open a new issue to discuss changes.

The code should be clean and well formatted (Clippy and fmt are required in each PR).

The code should also be easy to read, so please use the simplest language possible without any magic numbers and variables with strange names. You should also try to write unit tests if possible.

Name

Czkawka is a Polish word which means hiccup.
I chose this name because I wanted to hear people speaking other languages pronounce it.
This name is not as bad as it seems, because I was also thinking about using words like żółć, gżegżółka or żołądź, but I gave up on these ideas because they contained Polish characters, which would cause difficulty in searching for the project.

At the beginning of the program creation, if the response concerning the name was unanimously negative, I prepared myself for a possible change of the name of the program, but the opinions were extremely mixed.

License

Code is distributed under MIT license.

Program is completely free to use.

"Gratis to uczciwa cena" - "Free is a fair price"