mirror of https://github.com/qarmin/czkawka synced 2024-10-31 21:20:19 +00:00

Go to file

Rafał Mikrut 4a27b2b3d9 Added support for deleting selected files and folders		2020-09-29 18:44:20 +02:00
.github/workflows	Test workspace	2020-09-20 13:30:15 +02:00
czkawka_cli	Updated help in CLI	2020-09-29 16:09:42 +02:00
czkawka_core	Add more info about packages	2020-09-29 09:27:45 +02:00
czkawka_gui	Added support for deleting selected files and folders	2020-09-29 18:44:20 +02:00
czkawka_gui_orbtk	Add more info about packages	2020-09-29 09:27:45 +02:00
misc/.idea	Update .idea folder to the see latest additions	2020-09-27 10:59:20 +02:00
.gitignore	Add results.txt to gitignore(default testing file)	2020-09-18 11:25:12 +02:00
.rustfmt.toml	Removed almost all occurrences of println from core	2020-09-11 15:52:06 +02:00
Cargo.lock	Bump version to 0.1.4 dev	2020-09-27 11:17:14 +02:00
Cargo.toml	Better explanation, starting working with GUI	2020-09-03 17:33:43 +02:00
Changelog	Bump version to 0.1.4 dev	2020-09-27 11:17:14 +02:00
Instruction.md	Begin of working with finding biggest files	2020-09-25 21:05:47 +02:00
LICENSE	Add license	2020-09-11 13:41:00 +02:00
README.md	Improve README	2020-09-27 11:13:06 +02:00

README.md

Czkawka

Czkawka is simple, fast and easy to use alternative to Fslint, written in Rust.
This is my first ever project in Rust so probably a lot of things are not being written in the most optimal way.

Features

Written in fast and memory safe Rust
CLI frontend, very fast and powerful with rich help
GUI GTK frontend(Still WIP) - use modern GTK 3 and looks similar to FSlint
GUI Orbtk frontend(Very early WIP) - alternative GUI with reduced functionality
Saving results to file - allows to easily read entries found by tool
Rich search option - allows setting absolute included and excluded directories, set of allowed files extensions or excluded items with * wildcard
Multiple tools to use:
- Duplicates - Finds duplicates basing on its size(fast), hash(accurate)
- Empty Folders - Finds empty folders with help of advanced algorithm
- Big Files - Finds provided number of the biggest files in given location
- Empty Files - Looks for empty files across disk
- Temporary Files - Allows finding temporary files

TODO

Comments - a lot of things should be described
Probably extern argument parser in czkawka-cli could be used
More unit tests
Debian package
Finding files with debug symbols
Maybe windows support, but this will need some refactoring in code
Translation support
GTK Gui
- Selection of records(don't know how to do this)
- Popups
- Choosing directories(included, excluded)
- Popup with type of deleted records
- Run in another thread searching to be able to pause
Orbtk GUI
- Basic selecting included and excluded folders
- Text field to show informations about number of found folders/files
- Simple buttons to delete

Usage and requirements

Rust 1.46 - probably lower also works fine
GTK 3.24 - for GTK backend

Precompiled binaries are here(may not work in every Linux distro) - https://github.com/qarmin/czkawka/releases/

For now only Linux(and maybe also macOS) is supported

Install requirements for GTK

apt install -y libgtk-3-dev

Download source

git clone https://github.com/qarmin/czkawka.git
cd czkawka

Run GTK GUI(Still WIP)

cargo run --bin czkawka_gui

Run alternative Orbtk GUI(Still WIP, and currently stopped due https://github.com/intellij-rust/intellij-rust/issues/5943)

cargo run --bin czkawka_gui_orbtk

Run CLI(this will print help with a lot of examples)

cargo run --bin czkawka_cli

Speed

Since Czkawka is written in Rust and aims to be a faster alternative for written in Python - FSlint we need to compare speed of this two tools.

I checked my home directory without any folder exceptions(I removed all directories from FSlint advanced tab) which contained 379359 files and 42445 folders and 50301 duplicated files in 29723 groups which took 450,4 MB.

First run reads file entry and save it to cache so this step is mostly limited by disk performance, and with second run cache helps it so searching is a lot of faster.

Duplicate Checker(Version 0.1.0)

App	Executing Time
Fslint (First Run)	140s
Fslint (Second Run)	23s
Czkawka CLI Release(First Run)	128s
Czkawka CLI Release(Second Run)	8s

App	Idle Ram	Max Operational Ram Usage
Fslint
Czkawka CLI Release
Czkawka GTK GUI Release

Empty folder finder

App	Executing Time
Fslint
Czkawka CLI Release
Czkawka GTK GUI Release

Differences should be more visible when using slower processor or faster disk.

How it works?

Duplicate Finder

The only required parameter for checking duplicates is included folders -i. This parameter validates provided folders - which must have absolute path(without ~ and other similar symbols at the beginning), not contains *(wildcard), be dir(not file or symlink), exists. Later same things are done with excluded folders -e.

Next, this included and excluded folders are optimized due to tree structure of file system:

Folders which contains another folders are combined(separately for included and excluded) - /home/pulpet and /home/pulpet/a are combined to /home/pulpet
Included folders which are located inside excluded ones are delete - Included folder /etc/tomcat/ is deleted because excluded folder is /etc/
Non existed directories are being removed
Excluded path which are outside included path are deleted - Excluded path /etc/ is removed if included path is /home/ If after optimization there is no included folders, then program ends with non zero value(TODO, this should be handled by returning value).

Next with provided by user minimal size of checked size -s, program checks recursively or not included folders and checks files by sizes and put files with same sizes to different boxes. Next boxes which contains only one element are removed because files inside that means that are not duplicated.

Now if user select this, then provided is checking hash of file, because may happens that files have equal size, but differ in one or more bytes.

There are two available methods to check hash:

full(default) - check hash of entire file so this method is slow especially with large files and but there is almost no chance that two different files will be treated like they were a duplicates.
partial - check hash only max first 1MB of file, so it is a lot of more accurate than only checking size of files, but still there is very small chance that files which were identified as duplicates they are not.

At the end if user used -delete option, specified files are removed - All Except Oldest/Newest or Only Oldest/Newest

License

Code is distributed under MIT license.