HBase
  1. HBase
  2. HBASE-5891

Change Compression Based on Type of Compaction

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We currently use LZO on our production systems because the on-demand decompression speed of GZ is too slow. That said, many of our major-compacted StoreFiles are infrequently read because of lazy seek optimizations, but they occupy the majority of our disk space. One idea is to change the type of compression depending upon compaction characteristics (input size or major compaction flag). This would allow us to have our largest and least-read files be GZ compressed and save space.

        Activity

        Hide
        Andrew Purtell added a comment -

        It used to be possible (circa 0.90) to vary the compression algorithm used for flushes and minor compactions and that for major compactions. I added this because we had a case under consideration where data would grow colder proportionally to the delta between current and write time. It was simple and low impact to set flush compaction to LZO and major compaction to BZIP2 (and we flirted with LZMA but that is simply too bandwidth constrained), and a script would trigger region-by-region major compaction daily. I don't know if this is maintained in the current code base. Compaction was significantly reworked 0.90 -> 0.92 and we didn't pick up the majority of these changes in our internal version.

        Show
        Andrew Purtell added a comment - It used to be possible (circa 0.90) to vary the compression algorithm used for flushes and minor compactions and that for major compactions. I added this because we had a case under consideration where data would grow colder proportionally to the delta between current and write time. It was simple and low impact to set flush compaction to LZO and major compaction to BZIP2 (and we flirted with LZMA but that is simply too bandwidth constrained), and a script would trigger region-by-region major compaction daily. I don't know if this is maintained in the current code base. Compaction was significantly reworked 0.90 -> 0.92 and we didn't pick up the majority of these changes in our internal version.

          People

          • Assignee:
            Unassigned
            Reporter:
            Nicolas Spiegelberg
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development