Archives/pixz - pixz - blob42 source forge

mirror of https://github.com/vasi/pixz synced 2024-10-30 15:21:41 +00:00

Go to file

Dave Vasilevsky b02ae92561 test.sh: Make it general, make it work		2012-10-13 07:11:16 -04:00
.gitignore	Woops, forgot to add pixz.c, how embarassing!	2010-10-14 02:14:58 -04:00
common.c	More blocks, to smooth out inconsistent read times. Allow OPTS in LDFLAGS	2010-10-16 19:11:53 -04:00
cpu.c	Dynamically determine the number of CPUs--now we're actually useful	2010-01-16 23:33:56 -05:00
endian.c	Platform independent endian support	2012-10-13 06:55:12 -04:00
LICENSE	Add license	2011-08-08 15:26:19 -04:00
list.c	Fix race condition; header cleanup	2010-10-14 04:20:11 -04:00
Makefile	Don't hardcode LIBPREFIX	2012-10-13 06:23:18 -04:00
pixz.c	Fix digit option processing	2010-10-16 03:06:08 -04:00
pixz.h	Works on 32-bit	2012-09-18 01:57:41 -04:00
read.c	fix off-by-one error: if file ends at block boundary, must go to next file; tar_next_block will take care of getting the next block	2010-11-11 22:00:18 -05:00
README	Warning about data-loss when assuming tarball-mode.	2011-08-06 16:34:13 -03:00
test.sh	test.sh: Make it general, make it work	2012-10-13 07:11:16 -04:00
TODO	Doc update	2010-10-14 02:11:46 -04:00
write.c	Fix race condition; header cleanup	2010-10-14 04:20:11 -04:00

README

Pixz (pronounced 'pixie') is a parallel, indexing version of XZ.


The existing XZ Utils ( http://tukaani.org/xz/ ) provide great compression in the .xz file format, but they have two significant problems:

* They are single-threaded, while most users nowadays have multi-core computers.
* The .xz files they produce are just one big block of compressed data, rather than a collection of smaller blocks. This makes random access to the original data impossible.


With pixz, both these problems are solved. The most useful commands:

$ pixz foo.tar foo.tpxz         # Compress and index a tarball, multi-core
$ pixz -l foo.tpxz              # Very quickly list the contents of the compressed tarball
$ pixz -d foo.tpxz foo.tar      # Decompress it, multi-core
$ pixz -x dir/file < foo.tpxz | tar x   # Very quickly extract a file, multi-core.
                                        # Also verifies that contents match index.

$ tar -Ipixz -cf foo.tpxz foo           # Create a tarball using pixz for multi-core compression

$ pixz bar bar.xz           # Compress a non-tarball, multi-core
$ pixz -d bar.xz bar        # Decompress it, multi-core


Specifying input and output:

$ pixz < foo.tar > foo.tpxz     # Same as 'pixz foo.tar foo.tpxz'
$ pixz -i foo.tar -o foo.tpxz   # Ditto. These both work for -x, -d and -l too, eg:

$ pixz -x -i foo.tpxz -o foo.tar file1 file2 ... # Extract the files from foo.tpxz into foo.tar

$ pixz foo.tar                  # Compress it to foo.tpxz, removing the original
$ pixz -d foo.tpxz              # Extract it to foo.tar, removing the original


Other flags:

$ pixz -1 foo.tar           # Faster, worse compression
$ pixz -9 foo.tar           # Better, slower compression 

$ pixz -t foo.tar           # Compress but don't treat it as a tarball (don't index it)
$ pixz -d -t foo.tpxz       # Decompress foo, don't check that contents match index
$ pixz -l -t foo.tpxz       # List the xz blocks instead of files

WARNING: Running pixz without the -t flag will cause it to treat the input as a tarball, as long as it looks vaguely tarball-like. This means if the file starts with at least 1024 zero bytes, pixz will assume it's empty, and truncate the output! If your input files aren't tarballs, run with -t or face possible data-loss.


Compare to:
    plzip
        * About equally complex, efficient
        * lzip format seems less-used
        * Version 1 is theoretically indexable...I think
    ChopZip
        * Python, much simpler
        * More flexible, supports arbitrary compression programs
        * Uses streams instead of blocks, not indexable
        * Splits input and then combines output, much higher disk usage 
    pxz
        * Simpler code
        * Uses OpenMP instead of pthreads
        * Uses streams instead of blocks, not indexable
        * Uses temp files and doesn't combine them until the whole file is compressed, high disk/memory usage

Comparable tools for other compression algorithms:
    pbzip2
        * Not indexable
        * Appears slow
        * bzip2 algorithm is non-ideal
    pigz
        * Not indexable
    dictzip
        * Not parallel


Requirements:
    * libarchive 2.8 or later
    * liblzma 4.999.9-beta-212 or later (from the xz distribution)