updates README

2024-11-18 15:26:46 +00:00 · 2015-08-07 12:15:22 +02:00 · 2015-08-07 12:15:22 +02:00 · 00bad4eed5
commit 00bad4eed5
parent 378ee786d6
2 changed files with 182 additions and 80 deletions
--- a/80
+++ b/80
@ -1,80 +0,0 @@
 Pixz (pronounced 'pixie') is a parallel, indexing version of XZ
 Repository: https://github.com/vasi/pixz
 Downloads: https://sourceforge.net/projects/pixz/files/
 The existing XZ Utils ( http://tukaani.org/xz/ ) provide great compression in the .xz file format, but they have two significant problems:
 * They are single-threaded, while most users nowadays have multi-core computers.
 * The .xz files they produce are just one big block of compressed data, rather than a collection of smaller blocks. This makes random access to the original data impossible.
 With pixz, both these problems are solved. The most useful commands:
 $ pixz foo.tar foo.tpxz         # Compress and index a tarball, multi-core
 $ pixz -l foo.tpxz              # Very quickly list the contents of the compressed tarball
 $ pixz -d foo.tpxz foo.tar      # Decompress it, multi-core
 $ pixz -x dir/file < foo.tpxz | tar x   # Very quickly extract a file, multi-core.
                                        # Also verifies that contents match index.
 $ tar -Ipixz -cf foo.tpxz foo           # Create a tarball using pixz for multi-core compression
 $ pixz bar bar.xz           # Compress a non-tarball, multi-core
 $ pixz -d bar.xz bar        # Decompress it, multi-core
 Specifying input and output:
 $ pixz < foo.tar > foo.tpxz     # Same as 'pixz foo.tar foo.tpxz'
 $ pixz -i foo.tar -o foo.tpxz   # Ditto. These both work for -x, -d and -l too, eg:
 $ pixz -x -i foo.tpxz -o foo.tar file1 file2 ... # Extract the files from foo.tpxz into foo.tar
 $ pixz foo.tar                  # Compress it to foo.tpxz, removing the original
 $ pixz -d foo.tpxz              # Extract it to foo.tar, removing the original
 Other flags:
 $ pixz -1 foo.tar           # Faster, worse compression
 $ pixz -9 foo.tar           # Better, slower compression
 $ pixz -p 2 foo.tar         # Cap the number of threads at 2
 $ pixz -t foo.tar           # Compress but don't treat it as a tarball (don't index it)
 $ pixz -d -t foo.tpxz       # Decompress foo, don't check that contents match index
 $ pixz -l -t foo.tpxz       # List the xz blocks instead of files
 For even more tuning flags, check the manual page.
 Compare to:
    plzip
        * About equally complex, efficient
        * lzip format seems less-used
        * Version 1 is theoretically indexable...I think
    ChopZip
        * Python, much simpler
        * More flexible, supports arbitrary compression programs
        * Uses streams instead of blocks, not indexable
        * Splits input and then combines output, much higher disk usage 
    pxz
        * Simpler code
        * Uses OpenMP instead of pthreads
        * Uses streams instead of blocks, not indexable
        * Uses temp files and doesn't combine them until the whole file is compressed, high disk/memory usage
 Comparable tools for other compression algorithms:
    pbzip2
        * Not indexable
        * Appears slow
        * bzip2 algorithm is non-ideal
    pigz
        * Not indexable
    dictzip, idzip
        * Not parallel
 Requirements:
    * libarchive 2.8 or later
    * liblzma 4.999.9-beta-212 or later (from the xz distribution)
--- a/README.md
+++ b/README.md
@ -0,0 +1,182 @@
 pixz
 ====
 Pixz (pronounced *pixie*) is a parallel, indexing version of `xz`.
 Repository: https://github.com/vasi/pixz
 Downloads: https://github.com/vasi/pixz/releases
 pixz vs xz
 ----------
 The existing [XZ Utils](http://tukaani.org/xz/) provide great compression in the `.xz` file format,
 but they have two significant problems:
 -   they are single-threaded, while most users nowadays have multi-core computers
 -   the `.xz` files they produce are just one big block of compressed data, rather than a collection
    of smaller blocks which makes random access to the original data impossible
 With pixz, both these problems are solved.
 Building pixz
 -------------
 General help about the building process's configuration step can be acquired via:
 ```
 ./configure --help
 ```
 ### Dependencies
 -   pthreads
 -   liblzma 4.999.9-beta-212 or later (from the xz distribution)
 -   libarchive 2.8 or later
 -   AsciiDoc to generate the man page
 ### Build from Release Tarball
 ```
 ./configure
 make
 make install
 ```
 You many need `sudo` permissions to run `make install`.
 ### Build from GitHub
 ```
 git clone https://github.com/vasi/pixz.git
 cd pixz
 ./autogen.sh
 ./configure
 make
 make install
 ```
 You many need `sudo` permissions to run `make install`.
 Usage
 -----
 ### Single Files
 Compress a single file (no tarball, just compression), multi-core:
    pixz bar bar.xz
 Decompress it, multi-core:
    pixz -d bar.xz bar
 ### Tarballs
 Compress and index a tarball, multi-core:
    pixz foo.tar foo.tpxz
 Very quickly list the contents of the compressed tarball:
    pixz -l foo.tpxz
 Decompress the tarball, multi-core:
    pixz -d foo.tpxz foo.tar
 Very quickly extract a single file, multi-core, also verifies that contents match index:
    pixz -x dir/file < foo.tpxz | tar x
 Create a tarball using pixz for multi-core compression:
    tar -Ipixz -cf foo.tpxz foo/
 ### Specifying Input and Output
 These are the same (also work for `-x`, `-d` and `-l` as well):
    pixz foo.tar foo.tpxz
    pixz < foo.tar > foo.tpxz
    pixz -i foo.tar -o foo.tpxz
 Extract the files from `foo.tpxz` into `foo.tar`:
    pixz -x -i foo.tpxz -o foo.tar file1 file2 ...
 Compress to `foo.tpxz`, removing the original:
    pixz foo.tar
 Extract to `foo.tar`, removing the original:
    pixz -d foo.tpxz
 ### Other Flags
 Faster, worse compression:
    pixz -1 foo.tar
 Better, slower compression:
    pixz -9 foo.tar
 Use exactly 2 threads:
    pixz -p 2 foo.tar
 Compress, but do not treat it as a tarball, i.e. do not index it:
    pixz -t foo.tar
 Decompress, but do not check that contents match index:
    pixz -d -t foo.tpxz
 List the xz blocks instead of files:
    pixz -l -t foo.tpxz
 For even more tuning flags, check the manual page:
    man pixz
 Comparison to other Tools
 -------------------------
 ### plzip
 -   about equally complex and efficient
 -   lzip format seems less-used
 -   version 1 is theoretically indexable, I think
 ### ChopZip
 -   written in Python, much simpler
 -   more flexible, supports arbitrary compression programs
 -   uses streams instead of blocks, not indexable
 -   splits input and then combines output, much higher disk usage
 ### pxz
 -   simpler code
 -   uses OpenMP instead of pthreads
 -   uses streams instead of blocks, not indexable
 -   uses temporary files and does not combine them until the whole file is compressed, high disk and
    memory usage
 ### pbzip2
 -   not indexable
 -   appears slow
 -   bzip2 algorithm is non-ideal
 ### pigz
 -   not indexable
 ### dictzip, idzip
 -   not parallel