Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: 0.90.1, 0.92.0
    • Fix Version/s: None
    • Component/s: regionserver, wal
    • Labels:
      None

      Description

      In cases where the load to all column families in a store is not evenly distributed, having per-column family flushes will reduce network IO by helping the compaction algorithm minimize its need for unconditional selection. This issue is about refactoring the flush algorithm to move from HRegion granularity to Store.

        Issue Links

          Activity

          Hide
          Nicolas Spiegelberg added a comment -

          Duplicate of HBASE-3149

          Show
          Nicolas Spiegelberg added a comment - Duplicate of HBASE-3149
          Hide
          Jean-Daniel Cryans added a comment -

          Should we close HBASE-3149 then?

          Show
          Jean-Daniel Cryans added a comment - Should we close HBASE-3149 then?
          Hide
          stack added a comment -

          Great stuff Nicolas. Bring it on.

          Show
          stack added a comment - Great stuff Nicolas. Bring it on.
          Hide
          Nicolas Spiegelberg added a comment -

          Some interesting stats. We did some rough calculations internally to see what effect an uneven distribution of data into column families was having on our network IO. Our data distribution for 3 column families was 1:1:20. When we looked at the flush:minor-compaction ratio for each of the store files, the large column family had a 1:2 ratio but the small CFs both had a 1:20 ratio! We are looking at roughly a 10% network IO decrease if we can bring those other 2 CFs down to a 1:2 ratio as well.

          Show
          Nicolas Spiegelberg added a comment - Some interesting stats. We did some rough calculations internally to see what effect an uneven distribution of data into column families was having on our network IO. Our data distribution for 3 column families was 1:1:20. When we looked at the flush:minor-compaction ratio for each of the store files, the large column family had a 1:2 ratio but the small CFs both had a 1:20 ratio! We are looking at roughly a 10% network IO decrease if we can bring those other 2 CFs down to a 1:2 ratio as well.

            People

            • Assignee:
              Nicolas Spiegelberg
              Reporter:
              Nicolas Spiegelberg
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development