Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26273

TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use ReadType.STREAM for scanning HFiles

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-1, 2.4.6
    • 2.5.0, 3.0.0-alpha-2, 2.3.7, 2.4.7
    • mapreduce
    • None
    • Reviewed
    • Hide
      HBase's MapReduce API which can operate over HBase snapshots will now default to using ReadType.STREAM instead of ReadType.DEFAULT (which is PREAD) as a result of this change. HBase developers expect that STREAM will perform significantly better for average Snapshot-based batch jobs. Users can restore the previous functionality (using PREAD) by updating their code to explicitly set a value of `ReadType.PREAD` on the `Scan` object they provide to TableSnapshotInputFormat, or by setting the configuration property "hbase.TableSnapshotInputFormat.scanner.readtype" to "PREAD" in hbase-site.xml.
      Show
      HBase's MapReduce API which can operate over HBase snapshots will now default to using ReadType.STREAM instead of ReadType.DEFAULT (which is PREAD) as a result of this change. HBase developers expect that STREAM will perform significantly better for average Snapshot-based batch jobs. Users can restore the previous functionality (using PREAD) by updating their code to explicitly set a value of `ReadType.PREAD` on the `Scan` object they provide to TableSnapshotInputFormat, or by setting the configuration property "hbase.TableSnapshotInputFormat.scanner.readtype" to "PREAD" in hbase-site.xml.

    Description

      After the change in HBASE-17917 that use PREAD (ReadType.DEFAULT) for all user scan, the behavior of TableSnapshotInputFormat changed from STREAM to PREAD.

      TableSnapshotInputFormat is supposed to be use with a YARN/MR or other batch engine that should read the entire HFile in the container/executor, with default always to PREAD, we executing a lot more DFSInputStream#seek calls to simply read through the datablock section of the HFile.

      The goal of this change is to make any downstream using TableSnapshotInputFormat with STREAM scan.

      Attachments

        Issue Links

          Activity

            People

              elserj Josh Elser
              taklwu Tak-Lon (Stephen) Wu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: