Earlier I compared dealing with planet files compressed with bzip2 and gzip, and user Mungewell suggested trying out LZO (which I had originally remembered as LZMA, which is actually a different compressor with the opposite goals: maximize compression ratio, regardless of processing time). LZO turns out even better than gzip because it compresses and more importantly decompresses even faster, again at a loss in compression ratio but without leaving the order of magnitude set by gzip, bzip2, lzma. Recent planet sizes shape as follows:
raw 150 GB
lzop 14 GB
gzip 10.5 GB
bzip2 6.5 GB
...with lzop decompressing at least 2x as fast as gzip (which is already at least 15x faster than bzip2), so on a (2009) average hard-drive and average desktop CPU, processing a planet (reading off HD + decompressing) is fastest with LZO. Compression + writing to HD is also the fastest with LZO on my hardware, unfortunately I can't give exact numbers because I'm doing my processing on one of my university's machines now, which have better specs.
I suspect with LZO we're close the sweet spot and with one of those slower HDs you might be better off using gzip because the balance between CPU speed and HD access speed is moved in the direction where you want to save on IO. You definitely don't want to use raw planet because IO becomes the bottleneck even on fastest avilable hardware.
Discussion
Comment from RubenKelevra on 27 September 2009 at 22:50
But the download size would increase dramatically ... it would take about 34 hours for me to download it. The current bz2 compressed one take 16. And the traffic on the servers would increase also raise by 215%.
Comment from balrog-kun on 29 September 2009 at 01:04
Yeah - the point is to find the perfect balance between decompression/compression time and IO time. For disk IO the balance is different than for network IO.
I was talking about disk IO here, i.e. when you have the snapshot on your disk. For network IO, using lzma would probably give the best balance, because it generates the smallest files.