Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0, 0.99.0
    • Component/s: Compaction
    • Labels:
      None

      Description

      So I was thinking about having many regions as the way to make compactions more manageable, and writing the level db doc about how level db range overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication factor.

      And I suggest the following idea, let's call it stripe compactions. It's a mix between level db ideas and having many small regions.
      It allows us to have a subset of benefits of many regions (wrt reads and compactions) without many of the drawbacks (managing and current memstore/etc. limitation).
      It also doesn't break seqNum-based file sorting for any one key.
      It works like this.
      The region key space is separated into configurable number of fixed-boundary stripes (determined the first time we stripe the data, see below).
      All the data from memstores is written to normal files with all keys present (not striped), similar to L0 in LevelDb, or current files.
      Compaction policy does 3 types of compactions.
      First is L0 compaction, which takes all L0 files and breaks them down by stripe. It may be optimized by adding more small files from different stripes, but the main logical outcome is that there are no more L0 files and all data is striped.
      Second is exactly similar to current compaction, but compacting one single stripe. In future, nothing prevents us from applying compaction rules and compacting part of the stripe (e.g. similar to current policy with rations and stuff, tiers, whatever), but for the first cut I'd argue let it "major compact" the entire stripe. Or just have the ratio and no more complexity.
      Finally, the third addresses the concern of the fixed boundaries causing stripes to be very unbalanced.
      It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the results out with different boundaries.
      There's a tradeoff here - if we always take 2 adjacent stripes, compactions will be smaller but rebalancing will take ridiculous amount of I/O.
      If we take many stripes we are essentially getting into the epic-major-compaction problem again. Some heuristics will have to be in place.
      In general, if, before stripes are determined, we initially let L0 grow before determining the stripes, we will get better boundaries.
      Also, unless unbalancing is really large we don't need to rebalance really.
      Obviously this scheme (as well as level) is not applicable for all scenarios, e.g. if timestamp is your key it completely falls apart.

      The end result:

      • many small compactions that can be spread out in time.
      • reads still read from a small number of files (one stripe + L0).
      • region splits become marvelously simple (if we could move files between regions, no references would be needed).
        Main advantage over Level (for HBase) is that default store can still open the files and get correct results - there are no range overlap shenanigans.
        It also needs no metadata, although we may record some for convenience.
        It also would appear to not cause as much I/O.
      1. Stripe compactions.pdf
        114 kB
        Sergey Shelukhin
      2. Stripe compactions.pdf
        127 kB
        Sergey Shelukhin
      3. Stripe compaction perf evaluation.pdf
        206 kB
        Sergey Shelukhin
      4. Stripe compactions.pdf
        129 kB
        Sergey Shelukhin
      5. Stripe compaction perf evaluation.pdf
        214 kB
        Sergey Shelukhin
      6. Stripe compaction perf evaluation.pdf
        339 kB
        Sergey Shelukhin
      7. Using stripe compactions.pdf
        135 kB
        Sergey Shelukhin
      8. Using stripe compactions.pdf
        138 kB
        Sergey Shelukhin
      9. Using stripe compactions.pdf
        141 kB
        Sergey Shelukhin
      10. Stripe compactions.pdf
        129 kB
        Sergey Shelukhin
      11. stripe-cdf.pdf
        23 kB
        Sergey Shelukhin

        Issue Links

          Activity

          Sergey Shelukhin created issue -
          Sergey Shelukhin made changes -
          Field Original Value New Value
          Description So I was thinking about having many regions as the way to make compactions more manageable, and writing the level db doc about how level db range overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication factor.

          And I suggest the following idea, let's call it stripe compactions. It's a mix between level db ideas and having many small regions.
          It allows us to have a subset of benefits of many regions (wrt reads and compactions) without many of the drawbacks (managing and current memstore/etc. limitation).
          It also doesn't break seqNum-based file sorting for any one key.
          It works like this.
          The key space is separated into configurable number of fixed-boundary stripes.
          All the data from memstores is written to normal files with all keys present (not striped), similar to L0 in LevelDb, or current files.
          Compaction policy does 3 types of compactions.
          First is L0 compaction, which takes all L0 files and breaks them down by stripe. It may be optimized by adding more small files from different stripes, but the main logical outcome is that there are no more L0 files and all data is striped.
          Second is exactly similar to current compaction, but compacting the entire stripe. In future, nothing prevents us from applying compaction rules and compacting part of the stripe (e.g. similar to current policy with rations and stuff, tiers, whatever), but for the first cut I'd argue let it "major compact" the entire stripe. Or just have the ratio and no more complexity.
          Finally, the third addresses the concern of the fixed boundaries causing stripes to be very unbalanced.
          It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the results out with different boundary.
          There's a tradeoff here - if we always take 2 adjacent stripes, compactions will be smaller but rebalancing will take ridiculous amount of I/O.
          If we take many stripes we are essentially getting into the epic-major-compaction problem again. Some heuristics will have to be in place.
          In general, if, before stripes are determined, we initially let L0 grow before determining the stripes, we will get better boundaries.
          Also, unless unbalancing is really large we don't need to rebalance really.
          Obviously this scheme (as well as level) is not applicable for all scenarios, e.g. if timestamp is your key it completely falls apart.

          The end result:
          - many small compactions that can be spread out in time.
          - reads still read from a small number of files (one stripe + L0).
          - region splits become marvelously simple (if we could move files between regions, no references would be needed).
          Main advantage over Level (for HBase) is that default store can still open the files and get correct results - there are no range overlap shenanigans.
          It also needs no metadata, although we may record some for convenience.
          So I was thinking about having many regions as the way to make compactions more manageable, and writing the level db doc about how level db range overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication factor.

          And I suggest the following idea, let's call it stripe compactions. It's a mix between level db ideas and having many small regions.
          It allows us to have a subset of benefits of many regions (wrt reads and compactions) without many of the drawbacks (managing and current memstore/etc. limitation).
          It also doesn't break seqNum-based file sorting for any one key.
          It works like this.
          The region key space is separated into configurable number of fixed-boundary stripes (determined the first time we stripe the data, see below).
          All the data from memstores is written to normal files with all keys present (not striped), similar to L0 in LevelDb, or current files.
          Compaction policy does 3 types of compactions.
          First is L0 compaction, which takes all L0 files and breaks them down by stripe. It may be optimized by adding more small files from different stripes, but the main logical outcome is that there are no more L0 files and all data is striped.
          Second is exactly similar to current compaction, but compacting one single stripe. In future, nothing prevents us from applying compaction rules and compacting part of the stripe (e.g. similar to current policy with rations and stuff, tiers, whatever), but for the first cut I'd argue let it "major compact" the entire stripe. Or just have the ratio and no more complexity.
          Finally, the third addresses the concern of the fixed boundaries causing stripes to be very unbalanced.
          It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the results out with different boundaries.
          There's a tradeoff here - if we always take 2 adjacent stripes, compactions will be smaller but rebalancing will take ridiculous amount of I/O.
          If we take many stripes we are essentially getting into the epic-major-compaction problem again. Some heuristics will have to be in place.
          In general, if, before stripes are determined, we initially let L0 grow before determining the stripes, we will get better boundaries.
          Also, unless unbalancing is really large we don't need to rebalance really.
          Obviously this scheme (as well as level) is not applicable for all scenarios, e.g. if timestamp is your key it completely falls apart.

          The end result:
          - many small compactions that can be spread out in time.
          - reads still read from a small number of files (one stripe + L0).
          - region splits become marvelously simple (if we could move files between regions, no references would be needed).
          Main advantage over Level (for HBase) is that default store can still open the files and get correct results - there are no range overlap shenanigans.
          It also needs no metadata, although we may record some for convenience.
          Sergey Shelukhin made changes -
          Description So I was thinking about having many regions as the way to make compactions more manageable, and writing the level db doc about how level db range overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication factor.

          And I suggest the following idea, let's call it stripe compactions. It's a mix between level db ideas and having many small regions.
          It allows us to have a subset of benefits of many regions (wrt reads and compactions) without many of the drawbacks (managing and current memstore/etc. limitation).
          It also doesn't break seqNum-based file sorting for any one key.
          It works like this.
          The region key space is separated into configurable number of fixed-boundary stripes (determined the first time we stripe the data, see below).
          All the data from memstores is written to normal files with all keys present (not striped), similar to L0 in LevelDb, or current files.
          Compaction policy does 3 types of compactions.
          First is L0 compaction, which takes all L0 files and breaks them down by stripe. It may be optimized by adding more small files from different stripes, but the main logical outcome is that there are no more L0 files and all data is striped.
          Second is exactly similar to current compaction, but compacting one single stripe. In future, nothing prevents us from applying compaction rules and compacting part of the stripe (e.g. similar to current policy with rations and stuff, tiers, whatever), but for the first cut I'd argue let it "major compact" the entire stripe. Or just have the ratio and no more complexity.
          Finally, the third addresses the concern of the fixed boundaries causing stripes to be very unbalanced.
          It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the results out with different boundaries.
          There's a tradeoff here - if we always take 2 adjacent stripes, compactions will be smaller but rebalancing will take ridiculous amount of I/O.
          If we take many stripes we are essentially getting into the epic-major-compaction problem again. Some heuristics will have to be in place.
          In general, if, before stripes are determined, we initially let L0 grow before determining the stripes, we will get better boundaries.
          Also, unless unbalancing is really large we don't need to rebalance really.
          Obviously this scheme (as well as level) is not applicable for all scenarios, e.g. if timestamp is your key it completely falls apart.

          The end result:
          - many small compactions that can be spread out in time.
          - reads still read from a small number of files (one stripe + L0).
          - region splits become marvelously simple (if we could move files between regions, no references would be needed).
          Main advantage over Level (for HBase) is that default store can still open the files and get correct results - there are no range overlap shenanigans.
          It also needs no metadata, although we may record some for convenience.
          So I was thinking about having many regions as the way to make compactions more manageable, and writing the level db doc about how level db range overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication factor.

          And I suggest the following idea, let's call it stripe compactions. It's a mix between level db ideas and having many small regions.
          It allows us to have a subset of benefits of many regions (wrt reads and compactions) without many of the drawbacks (managing and current memstore/etc. limitation).
          It also doesn't break seqNum-based file sorting for any one key.
          It works like this.
          The region key space is separated into configurable number of fixed-boundary stripes (determined the first time we stripe the data, see below).
          All the data from memstores is written to normal files with all keys present (not striped), similar to L0 in LevelDb, or current files.
          Compaction policy does 3 types of compactions.
          First is L0 compaction, which takes all L0 files and breaks them down by stripe. It may be optimized by adding more small files from different stripes, but the main logical outcome is that there are no more L0 files and all data is striped.
          Second is exactly similar to current compaction, but compacting one single stripe. In future, nothing prevents us from applying compaction rules and compacting part of the stripe (e.g. similar to current policy with rations and stuff, tiers, whatever), but for the first cut I'd argue let it "major compact" the entire stripe. Or just have the ratio and no more complexity.
          Finally, the third addresses the concern of the fixed boundaries causing stripes to be very unbalanced.
          It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the results out with different boundaries.
          There's a tradeoff here - if we always take 2 adjacent stripes, compactions will be smaller but rebalancing will take ridiculous amount of I/O.
          If we take many stripes we are essentially getting into the epic-major-compaction problem again. Some heuristics will have to be in place.
          In general, if, before stripes are determined, we initially let L0 grow before determining the stripes, we will get better boundaries.
          Also, unless unbalancing is really large we don't need to rebalance really.
          Obviously this scheme (as well as level) is not applicable for all scenarios, e.g. if timestamp is your key it completely falls apart.

          The end result:
          - many small compactions that can be spread out in time.
          - reads still read from a small number of files (one stripe + L0).
          - region splits become marvelously simple (if we could move files between regions, no references would be needed).
          Main advantage over Level (for HBase) is that default store can still open the files and get correct results - there are no range overlap shenanigans.
          It also needs no metadata, although we may record some for convenience.
          It also would appear to not cause as much I/O.
          Sergey Shelukhin made changes -
          Link This issue relates to HBASE-7519 [ HBASE-7519 ]
          Sergey Shelukhin made changes -
          Link This issue is blocked by HBASE-7603 [ HBASE-7603 ]
          Sergey Shelukhin made changes -
          Component/s Compaction [ 12319905 ]
          Ted Yu made changes -
          Link This issue is related to HBASE-7857 [ HBASE-7857 ]
          Sergey Shelukhin made changes -
          Attachment Stripe compactions.pdf [ 12574946 ]
          Sergey Shelukhin made changes -
          Attachment Stripe compactions.pdf [ 12575449 ]
          Sergey Shelukhin made changes -
          Attachment Stripe compaction perf evaluation.pdf [ 12575634 ]
          Sergey Shelukhin made changes -
          Attachment Stripe compactions.pdf [ 12576005 ]
          Attachment Stripe compaction perf evaluation.pdf [ 12576006 ]
          Sergey Shelukhin made changes -
          Attachment Stripe compaction perf evaluation.pdf [ 12576697 ]
          Sergey Shelukhin made changes -
          Attachment Using stripe compactions.pdf [ 12580788 ]
          Sergey Shelukhin made changes -
          Attachment Using stripe compactions.pdf [ 12580789 ]
          Sergey Shelukhin made changes -
          Attachment Using stripe compactions.pdf [ 12580788 ]
          Sergey Shelukhin made changes -
          Attachment Using stripe compactions.pdf [ 12580791 ]
          Sergey Shelukhin made changes -
          Attachment Using stripe compactions.pdf [ 12581271 ]
          Attachment Stripe compactions.pdf [ 12581272 ]
          Sergey Shelukhin made changes -
          Attachment stripe-cdf.pdf [ 12582226 ]
          Sergey Shelukhin made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.98.0 [ 12323143 ]
          Fix Version/s 0.99.0 [ 12325675 ]
          Resolution Fixed [ 1 ]
          Enis Soztutar made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Sergey Shelukhin
              Reporter:
              Sergey Shelukhin
            • Votes:
              0 Vote for this issue
              Watchers:
              42 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development