HBase
  1. HBase
  2. HBASE-3996

Support multiple tables and scanners as input to the mapper in map/reduce jobs

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.5, 0.95.0
    • Component/s: mapreduce
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Adds MultiTableInputFormat.

      Usage example:

      {code}
      Scan scan1 = new Scan();
      scan1.setStartRow(start1);
      scan1.setStopRow(end1);
      Scan scan2 = new Scan();
      scan2.setStartRow(start2);
      scan2.setStopRow(end2);
      MultiTableInputCollection mtic = new MultiTableInputCollection();
      mtic.Add(tableName1, scan1);
      mtic.Add(tableName2, scan2);
      TableMapReduceUtil.initTableMapperJob(mtic, TestTableMapper.class, Text.class, IntWritable.class, job1);
      {code}
      Show
      Adds MultiTableInputFormat. Usage example: {code} Scan scan1 = new Scan(); scan1.setStartRow(start1); scan1.setStopRow(end1); Scan scan2 = new Scan(); scan2.setStartRow(start2); scan2.setStopRow(end2); MultiTableInputCollection mtic = new MultiTableInputCollection(); mtic.Add(tableName1, scan1); mtic.Add(tableName2, scan2); TableMapReduceUtil.initTableMapperJob(mtic, TestTableMapper.class, Text.class, IntWritable.class, job1); {code}

      Description

      It seems that in many cases feeding data from multiple tables or multiple scanners on a single table can save a lot of time when running map/reduce jobs.
      I propose a new MultiTableInputFormat class that would allow doing this.

      1. HBase-3996.patch
        66 kB
        Eran Kutner
      2. 3996-v2.txt
        43 kB
        Ted Yu
      3. 3996-v3.txt
        42 kB
        Ted Yu
      4. 3996-v4.txt
        40 kB
        Ted Yu
      5. 3996-v5.txt
        40 kB
        Eran Kutner
      6. 3996-v6.txt
        40 kB
        Ted Yu
      7. 3996-v7.txt
        42 kB
        Ted Yu
      8. 3996-v8.txt
        32 kB
        Bryan Baugher
      9. 3996-v9.txt
        32 kB
        Bryan Baugher
      10. 3996-v10.txt
        33 kB
        Bryan Baugher
      11. 3996-v11.txt
        33 kB
        Bryan Baugher
      12. 3996-v12.txt
        33 kB
        Bryan Baugher
      13. 3996-v13.txt
        33 kB
        Bryan Baugher
      14. 3996-v14.txt
        33 kB
        Bryan Baugher
      15. 3996-0.94.txt
        33 kB
        Lars Hofhansl
      16. 3996-v15.txt
        34 kB
        Lars Hofhansl

        Issue Links

          Activity

          Eran Kutner created issue -
          Eran Kutner made changes -
          Field Original Value New Value
          Attachment MultiTableInputCollection.java [ 12482779 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.java [ 12482780 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormatBase.java [ 12482781 ]
          Eran Kutner made changes -
          Attachment TableMapReduceUtil.java [ 12482782 ]
          Eran Kutner made changes -
          Attachment TableSplit.java [ 12482783 ]
          Eran Kutner made changes -
          Attachment MultiTableInputCollection.java [ 12482779 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.java [ 12482780 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormatBase.java [ 12482781 ]
          Eran Kutner made changes -
          Attachment TableMapReduceUtil.java [ 12482782 ]
          Eran Kutner made changes -
          Attachment TableSplit.java [ 12482783 ]
          Eran Kutner made changes -
          Attachment TestMultiTableInputFormat.java.patch [ 12483323 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12483324 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12483324 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12483329 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12483329 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12483549 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12483549 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12483552 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12483552 ]
          Eran Kutner made changes -
          Attachment TestMultiTableInputFormat.java.patch [ 12483323 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12486360 ]
          Attachment TestMultiTableInputFormat.java.patch [ 12486361 ]
          stack made changes -
          Fix Version/s 0.94.0 [ 12316419 ]
          Fix Version/s 0.90.4 [ 12316406 ]
          Eran Kutner made changes -
          Attachment HBase-3996.patch [ 12514793 ]
          Eran Kutner made changes -
          Attachment MultiTableInputFormat.patch [ 12486360 ]
          Eran Kutner made changes -
          Attachment TestMultiTableInputFormat.java.patch [ 12486361 ]
          Ted Yu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Ted Yu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Ted Yu made changes -
          Attachment 3996-v2.txt [ 12518862 ]
          Ted Yu made changes -
          Assignee Eran Kutner [ erank ]
          Ted Yu made changes -
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Fix Version/s 0.94.0 [ 12316419 ]
          Ted Yu made changes -
          Fix Version/s 0.94.0 [ 12316419 ]
          Ted Yu made changes -
          Attachment 3996-v3.txt [ 12518983 ]
          Ted Yu made changes -
          Attachment 3996-v3.txt [ 12518983 ]
          Ted Yu made changes -
          Attachment 3996-v3.txt [ 12518989 ]
          Ted Yu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Ted Yu made changes -
          Attachment 3996-v4.txt [ 12519016 ]
          Ted Yu made changes -
          Attachment 3996-v4.txt [ 12519016 ]
          Ted Yu made changes -
          Attachment 3996-v4.txt [ 12519023 ]
          Lars Hofhansl made changes -
          Fix Version/s 0.94.0 [ 12316419 ]
          Eran Kutner made changes -
          Attachment 3996-v5.txt [ 12519316 ]
          Ted Yu made changes -
          Attachment 3996-v6.txt [ 12520290 ]
          Ted Yu made changes -
          Attachment 3996-v7.txt [ 12520294 ]
          Lars Hofhansl made changes -
          Fix Version/s 0.94.2 [ 12321884 ]
          Lars Hofhansl made changes -
          Comment [ Looking at it again and reviewing the comments and the latest version of RB this looks good. Not sure why it got stuck.

          A remaining question is 0.94 or not. The changes to TableSplit would not allow a new version of it to be deserialized by an old server. Is that OK for an M/R job?
          Also, the comment I had about that extra table.close TableRecordReaderImpl.java. If that is a bug I would prefer that in a separate jira (unless other changes here necessitate this close, but I do not think so).

          @Stack: Could you make sure that your comments are addressed?
          ]
          Lars Hofhansl made changes -
          Fix Version/s 0.94.3 [ 12323144 ]
          Fix Version/s 0.94.2 [ 12321884 ]
          Lars Hofhansl made changes -
          Assignee Eran Kutner [ erank ] Lars Hofhansl [ lhofhansl ]
          Lars Hofhansl made changes -
          Fix Version/s 0.94.4 [ 12323367 ]
          Fix Version/s 0.94.3 [ 12323144 ]
          Lars Hofhansl made changes -
          Fix Version/s 0.94.5 [ 12323874 ]
          Fix Version/s 0.94.4 [ 12323367 ]
          Bryan Baugher made changes -
          Attachment 3996-v8.txt [ 12563355 ]
          Bryan Baugher made changes -
          Attachment 3996-v9.txt [ 12563359 ]
          Bryan Baugher made changes -
          Attachment 3996-v10.txt [ 12563366 ]
          Bryan Baugher made changes -
          Attachment 3996-v11.txt [ 12563581 ]
          Ted Yu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          stack made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          Lars Hofhansl made changes -
          Assignee Lars Hofhansl [ lhofhansl ] Bryan Baugher [ bbaugher ]
          Bryan Baugher made changes -
          Attachment 3996-v12.txt [ 12565806 ]
          Ted Yu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Bryan Baugher made changes -
          Attachment 3996-v13.txt [ 12565984 ]
          Bryan Baugher made changes -
          Attachment 3996-v14.txt [ 12566013 ]
          Lars Hofhansl made changes -
          Attachment 3996-0.94.txt [ 12567718 ]
          Attachment 3996-v15.txt [ 12567719 ]
          Lars Hofhansl made changes -
          Attachment 3996-0.94.txt [ 12567718 ]
          Lars Hofhansl made changes -
          Attachment 3996-v15.txt [ 12567719 ]
          Lars Hofhansl made changes -
          Attachment 3996-0.94.txt [ 12567720 ]
          Attachment 3996-v15.txt [ 12567721 ]
          Lars Hofhansl made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Lars Hofhansl made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          stack made changes -
          Fix Version/s 0.95.0 [ 12324094 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Fix Version/s 0.94.5 [ 12323874 ]
          Lars Hofhansl made changes -
          Fix Version/s 0.94.0 [ 12316419 ]
          Lars Hofhansl made changes -
          Fix Version/s 0.94.5 [ 12323874 ]
          Fix Version/s 0.94.0 [ 12316419 ]
          Ron Buckley made changes -
          Summary Support multiple tables and scanners as input to the mapper in map/reduce jobs d
          Ron Buckley made changes -
          Summary d  Support multiple tables and scanners as input to the mapper in map/reduce jobs
          stack made changes -
          Release Note Adds MultiTableInputFormat.

          Usage example:

          {code}
          Scan scan1 = new Scan();
          scan1.setStartRow(start1);
          scan1.setStopRow(end1);
          Scan scan2 = new Scan();
          scan2.setStartRow(start2);
          scan2.setStopRow(end2);
          MultiTableInputCollection mtic = new MultiTableInputCollection();
          mtic.Add(tableName1, scan1);
          mtic.Add(tableName2, scan2);
          TableMapReduceUtil.initTableMapperJob(mtic, TestTableMapper.class, Text.class, IntWritable.class, job1);
          {code}
          Harsh J made changes -
          Link This issue is related to HIVE-4515 [ HIVE-4515 ]
          Harsh J made changes -
          Link This issue is related to HIVE-4520 [ HIVE-4520 ]

            People

            • Assignee:
              Bryan Baugher
              Reporter:
              Eran Kutner
            • Votes:
              10 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development