Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4509

Make CompressingStoredFieldsFormat the new default StoredFieldsFormat impl

    Details

    • Type: Wish
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      What would you think of making CompressingStoredFieldsFormat the new default StoredFieldsFormat?

      Stored fields compression has many benefitsĀ :

      • it makes the I/O cache work for us,
      • file-based index replication/backup becomes cheaper.

      Things to know:

      • even with incompressible data, there is less than 0.5% overhead with LZ4,
      • LZ4 compression requires ~ 16kB of memory and LZ4 HC compression requires ~ 256kB,
      • LZ4 uncompression has almost no memory overhead,
      • on my low-end laptop, the LZ4 impl in Lucene uncompresses at ~ 300mB/s.

      I think we could use the same default parameters as in CompressingCodec :

      • LZ4 compression,
      • in-memory stored fields index that is very memory-efficient (less than 12 bytes per block of compressed docs) and uses binary search to locate documents in the fields data file,
      • 16 kB blocks (small enough so that there is no major slow down when the whole index would fit into the I/O cache anyway, and large enough to provide interesting compression ratiosĀ ; for example Robert got a 0.35 compression ratio with the geonames.org database).

      Any concerns?

        Attachments

        1. LUCENE-4509.patch
          12 kB
          Adrien Grand
        2. LUCENE-4509.patch
          12 kB
          Adrien Grand

          Activity

            People

            • Assignee:
              jpountz Adrien Grand
              Reporter:
              jpountz Adrien Grand
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: