This is a partial fix for handling ANSI escapes in
parts of log messages that would prevent regexes
from matching. Still more work to do.
Related to #1057
Instead of throwing an error when we unexpectedly reach a stream EOS, treat it as regular
stream end. This allows for streams that might different encodings for different sections.
Even though we don't recognize the encoding and continue on with the data, at least we
don't fail when we reach this situation. This allows us safely to try and continue
parsing the next catted gz stream, knowing that if it fails, we will handle it gracefully.
Don't try to continue reading the next stream of a concatenated
gzip file. The next stream may be CRC noise or other garbage.
Maybe in the future we should look for a gzip header in the
following bytes of the stream and try to decode from there.
But it's not clear that anyone ever uses this supposed gzip
feature anyway.
Let's just end the stream when we reach EOS. Also, if the
stream fails to init, let's leave it closed instead of throwing
an error no one is likely to catch. Log the error msg from
zlib if one is provided.
The gzread function is slow. Every time you seek to a new location, the
whole file up to that position has to be decompressed again. This causes
massive lags when trying to do simple things in lnav on a large .gz file.
Use the zlib inflate* functions instead and record the dictionary
periodically while processing the file the first time. Then use
inflateSetDictionary to restore the dictionary to a convenient
location when trying to seek into the file again in the future.
Use a default period of 1MB of compressed data for syncpoints.
Each syncpoint uses 32KB. This is a ratio of 3.2%. For example,
a 1GB .gz file (compressed size) will require us to keep 32MB
of index data in memory. A better method may be to use a fixed
number of syncpoints and divide the file appropriately. This
would keep the memory bounded at the cost of slower file
navigation on large .gz files.
Use pread to read the data for the stream decompressor and remove
the lock_hack previously employed.
NB. The documentation on these zlib functions is sparse. I followed
the example in zlib/examples/zran.c, but I used the z_stream total_in
and total_out variables instead of keeping my own separately as zran.c
does. Maybe this is incompatible with some very old zlib versions.
I haven't looked.