You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Go to file
Dave Vasilevsky 8c7333d2f7 Ignore pread binary 14 years ago
.gitignore Ignore pread binary 14 years ago
Makefile Allow optimization flags on make command line; all user to force no listing of tar files 14 years ago
README List requirements to build 14 years ago
TODO Safety todos 14 years ago
common.c Parallel extraction! 14 years ago
cpu.c Dynamically determine the number of CPUs--now we're actually useful 15 years ago
endian.c Linux support, woo 15 years ago
list.c Allow optimization flags on make command line; all user to force no listing of tar files 14 years ago
pixz.h Parallel extraction! 14 years ago
pread.c Parallel extraction! 14 years ago
read.c List to stdout; auto-decode index 14 years ago
test.sh simpler block queuing; go back to dict_size blocks; run test in bash, some sh have no "time" 15 years ago
write.c Parallel extraction! 14 years ago

README

Pixz (pronounced 'pixie') is a parallel, indexing version of XZ.


The existing XZ Utils ( http://tukaani.org/xz/ ) provide great compression in the .xz file format, but they have two significant problems:

* They are single-threaded, while most users nowadays have multi-core computers.
* The .xz files they produce are just one big block of compressed data, rather than a collection of smaller blocks. This makes random access to the original data impossible.


With pixz, both these problems can eventually be solved. Currently these pixz tools are available:

* write INPUT.tar OUTPUT.tpxz: Compresses an uncompressed tarball. The compression uses two cores. An index of all the files in the tarball is stored within the file, yet it remains compatible with standard xz and tar.

* read INPUT.tpxz PATH: Efficiently extracts a single file from a tarball compressed by 'write'.

* list [-t] INPUT.xz: Lists the xz blocks present within any .xz file. Optionally also lists a file index as stored by 'write'.



Compare to:
	plzip
		* About equally complex, efficient
		* lzip format seems less-used
		* Version 1 is theoretically indexable...I think
	ChopZip
		* Python, much simpler
		* More flexible, supports arbitrary compression programs
		* Uses streams instead of blocks, not indexable
		* Splits input and then combines output, much higher disk usage 
	pxz
		* Simpler code
		* Uses OpenMP instead of pthreads
		* Uses streams instead of blocks, not indexable
		* Uses temp files and doesn't combine them until the whole file is compressed, high disk/memory usage

Comparable tools for other compression algorithms:
	pbzip2
		* Not indexable
		* Appears slow
		* bzip2 algorithm is non-ideal
	pigz
		* Not indexable
	dictzip
		* Not parallel


Requirements:
	* libarchive 2.8 or later
	* liblzma 4.999.9-beta-212 or later (from the xz distribution)