Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28043

Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0, 3.0.0-beta-1
    • None
    • None
    • Hide
      Optimizes StoreFileScanner.seekToPreviousRow to use keep track of a hint which allows us to eliminate one seek per call, resulting in 40% or more throughput increase for reverse scans. External users (Phoenix) of this LimitedPrivate method should be aware of the change in semantics. It is expected that seekToPreviousRow is used for scanning backwards in the StoreFile. Calling with an originalKey greater than the previously passed key (i.e. forward in the StoreFile), the originalKey will not be honored. Instead use seek for this case.
      Show
      Optimizes StoreFileScanner.seekToPreviousRow to use keep track of a hint which allows us to eliminate one seek per call, resulting in 40% or more throughput increase for reverse scans. External users (Phoenix) of this LimitedPrivate method should be aware of the change in semantics. It is expected that seekToPreviousRow is used for scanning backwards in the StoreFile. Calling with an originalKey greater than the previously passed key (i.e. forward in the StoreFile), the originalKey will not be honored. Instead use seek for this case.

    Description

      Currently, for non-RIV1 DBE encodings, each call to StoreFileScanner.seekToPreviousRow (a common operation in reverse scans) results in two seeks: 

      1. Seek from the beginning of the block to before the given row to find the prior row
      2. Seek from the beginning of the block to the first cell of the prior row

      So if there are "N" rows in a block, a reverse scan through each row results in seeking past summation from i=1 to N (2(i-1)) rows.

       

      This is a particularly expensive operation for tall tables that have many rows in a block.

       

      By introducing a state variable "previousRow" to StoreFileScanner, I believe that we could modify the seeking algorithm to be:

      1. Seek from the beginning of the block to before the given row to find the prior row
      2. Seek from the beginning of the block to before the row that is before the row that was just seeked to (i.e. 2 rows back). Save this as a hint for where the prior row is in "previousRow"
      3. Reseek from "previousRow" (2 rows back from start) to 1 row back from start (to the actual previousRow)

      Then the rest of the calls where a "previousRow" is present, you just need to seek to the beginning of the block once instead of twice, i.e. 

      1. seek from the beginning of the block to right before the beginning of your "previousRow" marker. Save this as the new "previousRow" marker
      2. Reseek to the next row (i.e. your previous "previousRow" marker)

       

      If there are "N" rows in a block, a reverse scan from row N to row 0 results in seeking past approximately summation from i=1 to N (i-1) rows i.e. 50% less than the current behavior.

       

      See the attached diagrams for the current and proposed behavior. 

       

       

      Attachments

        1. Current_SeekToPreviousRowBehavior.png
          222 kB
          Becker Ewing
        2. Proposed_SeekToPreviousRowBehavior.png
          249 kB
          Becker Ewing

        Issue Links

          Activity

            People

              bewing Becker Ewing
              bewing Becker Ewing
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: