HBase
  1. HBase
  2. HBASE-4465

Lazy-seek optimization for StoreFile scanners

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Check the most recent file first before seeking all other files in a Store.

      Description

      Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

      This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

        Issue Links

          Activity

          Raymond Liu made changes -
          Link This issue relates to HBASE-8001 [ HBASE-8001 ]
          Lars Hofhansl made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          stack made changes -
          Hadoop Flags Reviewed [ 10343 ]
          Release Note Check the most recent file first before seeking all other files in a Store.
          Mikhail Bautin made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Mikhail Bautin made changes -
          Liyin Tang made changes -
          Link This issue is required by HBASE-4532 [ HBASE-4532 ]
          Mikhail Bautin made changes -
          Link This issue requires HBASE-4534 [ HBASE-4534 ]
          Liyin Tang made changes -
          Link This issue is required by HBASE-4469 [ HBASE-4469 ]
          Mikhail Bautin made changes -
          Field Original Value New Value
          Fix Version/s 0.92.0 [ 12314223 ]
          Mikhail Bautin created issue -

            People

            • Assignee:
              Mikhail Bautin
              Reporter:
              Mikhail Bautin
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development