Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18090

Improve TableSnapshotInputFormat to allow more multiple mappers per region

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      In this task, we make it possible to run multiple mappers per region in the table snapshot. The following code is primary table snapshot mapper initializatio:

      TableMapReduceUtil.initTableSnapshotMapperJob(
                snapshotName, // The name of the snapshot (of a table) to read from
                scan, // Scan instance to control CF and attribute selection
                mapper, // mapper
                outputKeyClass, // mapper output key
                outputValueClass, // mapper output value
                job, // The current job to adjust
                true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars)
                restoreDir, // a temporary directory to copy the snapshot files into
      );

      The job only run one map task per region in the table snapshot. With this feature, client can specify the desired num of mappers when init table snapshot mapper job:

      TableMapReduceUtil.initTableSnapshotMapperJob(
                snapshotName, // The name of the snapshot (of a table) to read from
                scan, // Scan instance to control CF and attribute selection
                mapper, // mapper
                outputKeyClass, // mapper output key
                outputValueClass, // mapper output value
                job, // The current job to adjust
                true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars)
                restoreDir, // a temporary directory to copy the snapshot files into
                splitAlgorithm, // splitAlgo algorithm to split, current split algorithms support RegionSplitter.UniformSplit() and RegionSplitter.HexStringSplit()
                n // how many input splits to generate per one region
      );
      Show
      In this task, we make it possible to run multiple mappers per region in the table snapshot. The following code is primary table snapshot mapper initializatio: TableMapReduceUtil.initTableSnapshotMapperJob(           snapshotName, // The name of the snapshot (of a table) to read from           scan, // Scan instance to control CF and attribute selection           mapper, // mapper           outputKeyClass, // mapper output key           outputValueClass, // mapper output value           job, // The current job to adjust           true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars)           restoreDir, // a temporary directory to copy the snapshot files into ); The job only run one map task per region in the table snapshot. With this feature, client can specify the desired num of mappers when init table snapshot mapper job: TableMapReduceUtil.initTableSnapshotMapperJob(           snapshotName, // The name of the snapshot (of a table) to read from           scan, // Scan instance to control CF and attribute selection           mapper, // mapper           outputKeyClass, // mapper output key           outputValueClass, // mapper output value           job, // The current job to adjust           true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars)           restoreDir, // a temporary directory to copy the snapshot files into           splitAlgorithm, // splitAlgo algorithm to split, current split algorithms support RegionSplitter.UniformSplit() and RegionSplitter.HexStringSplit()           n // how many input splits to generate per one region );

    Description

      TableSnapshotInputFormat runs one map task per region in the table snapshot. This places unnecessary restriction that the region layout of the original table needs to take the processing resources available to MR job into consideration. Allowing to run multiple mappers per region (assuming reasonably even key distribution) would be useful.

      Attachments

        1. HBASE-18090.branch-1.3.001.patch
          42 kB
          Michael Stack
        2. HBASE-18090.branch-1.3.001.patch
          42 kB
          Michael Stack
        3. HBASE-18090.branch-1.patch
          42 kB
          xinxin fan
        4. HBASE-18090-branch-1.3-v1.patch
          38 kB
          Mikhail Antonov
        5. HBASE-18090-branch-1.3-v2.patch
          39 kB
          xinxin fan
        6. HBASE-18090-branch-1-v2.patch
          42 kB
          Michael Stack
        7. HBASE-18090-branch-1-v2.patch
          42 kB
          xinxin fan
        8. HBASE-18090-V3-master.patch
          41 kB
          xinxin fan
        9. HBASE-18090-V4-master.patch
          43 kB
          xinxin fan
        10. HBASE-18090-V5-master.patch
          43 kB
          xinxin fan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            xinxin fan xinxin fan
            mantonov Mikhail Antonov
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment