Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4509

Make CompressingStoredFieldsFormat the new default StoredFieldsFormat impl

Details

    • Wish
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 4.1
    • core/store
    • None
    • New

    Description

      What would you think of making CompressingStoredFieldsFormat the new default StoredFieldsFormat?

      Stored fields compression has many benefitsĀ :

      • it makes the I/O cache work for us,
      • file-based index replication/backup becomes cheaper.

      Things to know:

      • even with incompressible data, there is less than 0.5% overhead with LZ4,
      • LZ4 compression requires ~ 16kB of memory and LZ4 HC compression requires ~ 256kB,
      • LZ4 uncompression has almost no memory overhead,
      • on my low-end laptop, the LZ4 impl in Lucene uncompresses at ~ 300mB/s.

      I think we could use the same default parameters as in CompressingCodec :

      • LZ4 compression,
      • in-memory stored fields index that is very memory-efficient (less than 12 bytes per block of compressed docs) and uses binary search to locate documents in the fields data file,
      • 16 kB blocks (small enough so that there is no major slow down when the whole index would fit into the I/O cache anyway, and large enough to provide interesting compression ratiosĀ ; for example Robert got a 0.35 compression ratio with the geonames.org database).

      Any concerns?

      Attachments

        1. LUCENE-4509.patch
          12 kB
          Adrien Grand
        2. LUCENE-4509.patch
          12 kB
          Adrien Grand

        Activity

          People

            jpountz Adrien Grand
            jpountz Adrien Grand
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: