Archives/pixz - pixz - blob42 source forge

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Go to file

Dave Vasilevsky 0f21868543 tabs to spaces		14 years ago
.gitignore	Ignore pread binary	14 years ago
Makefile	Single driver	14 years ago
README	tabs to spaces	14 years ago
TODO	Doc update	14 years ago
common.c	Doc update	14 years ago
cpu.c	Dynamically determine the number of CPUs--now we're actually useful	15 years ago
endian.c	Linux support, woo	15 years ago
list.c	Doc update	14 years ago
pixz.h	Doc update	14 years ago
read.c	Single driver	14 years ago
test.sh	simpler block queuing; go back to dict_size blocks; run test in bash, some sh have no "time"	15 years ago
write.c	Single driver	14 years ago

README

Pixz (pronounced 'pixie') is a parallel, indexing version of XZ.


The existing XZ Utils ( http://tukaani.org/xz/ ) provide great compression in the .xz file format, but they have two significant problems:

* They are single-threaded, while most users nowadays have multi-core computers.
* The .xz files they produce are just one big block of compressed data, rather than a collection of smaller blocks. This makes random access to the original data impossible.


With pixz, both these problems are solved. The most useful commands:

$ pixz foo.tar foo.tpxz         # Compress and index a tarball, multi-core
$ pixz -l foo.tpxz              # Very quickly list the contents of the compressed tarball
$ pixz -x dir/file < foo.tpxz | tar x   # Very quickly extract a file, multi-core.
                                        # Also verifies that contents match index.

$ pixz bar bar.xz           # Compress a non-tarball, multi-core
$ pixz -d bar.xz bar        # Decompress it, multi-core


Specifying input and output:

$ pixz < foo.tar > foo.tpxz     # Same as 'pixz foo.tar foo.tpxz'
$ pixz -i foo.tar -o foo.tpxz   # Ditto. These both work for -x, -d and -l too, eg:

$ pixz -x -i foo.tpxz -o foo.tar file1 file2 ... # Extract the files from foo.tpxz into foo.tar

$ pixz foo.tar                  # Compress it to foo.tpxz, removing the original
$ pixz -d foo.tpxz              # Extract it to foo.tar, removing the original


Other flags:

$ pixz -1 foo.tar           # Faster, worse compression
$ pixz -9 foo.tar           # Better, slower compression 

$ pixz -t foo.tar           # Compress but don't treat it as a tarball (don't index it)
$ pixz -d -t foo.tpxz       # Decompress foo, don't check that contents match index
$ pixz -l -t foo.tpxz       # List the xz blocks instead of files


Compare to:
    plzip
        * About equally complex, efficient
        * lzip format seems less-used
        * Version 1 is theoretically indexable...I think
    ChopZip
        * Python, much simpler
        * More flexible, supports arbitrary compression programs
        * Uses streams instead of blocks, not indexable
        * Splits input and then combines output, much higher disk usage 
    pxz
        * Simpler code
        * Uses OpenMP instead of pthreads
        * Uses streams instead of blocks, not indexable
        * Uses temp files and doesn't combine them until the whole file is compressed, high disk/memory usage

Comparable tools for other compression algorithms:
    pbzip2
        * Not indexable
        * Appears slow
        * bzip2 algorithm is non-ideal
    pigz
        * Not indexable
    dictzip
        * Not parallel


Requirements:
    * libarchive 2.8 or later
    * liblzma 4.999.9-beta-212 or later (from the xz distribution)