Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26273

TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use ReadType.STREAM for scanning HFiles

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha-1, 2.4.6
    • Fix Version/s: 2.5.0, 3.0.0-alpha-2, 2.3.7, 2.4.7
    • Component/s: mapreduce
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      HBase's MapReduce API which can operate over HBase snapshots will now default to using ReadType.STREAM instead of ReadType.DEFAULT (which is PREAD) as a result of this change. HBase developers expect that STREAM will perform significantly better for average Snapshot-based batch jobs. Users can restore the previous functionality (using PREAD) by updating their code to explicitly set a value of `ReadType.PREAD` on the `Scan` object they provide to TableSnapshotInputFormat, or by setting the configuration property "hbase.TableSnapshotInputFormat.scanner.readtype" to "PREAD" in hbase-site.xml.
      Show
      HBase's MapReduce API which can operate over HBase snapshots will now default to using ReadType.STREAM instead of ReadType.DEFAULT (which is PREAD) as a result of this change. HBase developers expect that STREAM will perform significantly better for average Snapshot-based batch jobs. Users can restore the previous functionality (using PREAD) by updating their code to explicitly set a value of `ReadType.PREAD` on the `Scan` object they provide to TableSnapshotInputFormat, or by setting the configuration property "hbase.TableSnapshotInputFormat.scanner.readtype" to "PREAD" in hbase-site.xml.

      Description

      After the change in HBASE-17917 that use PREAD (ReadType.DEFAULT) for all user scan, the behavior of TableSnapshotInputFormat changed from STREAM to PREAD.

      TableSnapshotInputFormat is supposed to be use with a YARN/MR or other batch engine that should read the entire HFile in the container/executor, with default always to PREAD, we executing a lot more DFSInputStream#seek calls to simply read through the datablock section of the HFile.

      The goal of this change is to make any downstream using TableSnapshotInputFormat with STREAM scan.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                elserj Josh Elser
                Reporter:
                taklwu Tak-Lon (Stephen) Wu
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: