Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.20.0, 0.19.3
    • Fix Version/s: 0.20.0
    • Component/s: master, regionserver
    • Labels:
      None
    • Environment:

      DN, TT and RS running on the same nodes.

      Description

      The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
      My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

        Issue Links

          Activity

          Hide
          amansk Amandeep Khurana added a comment -

          I had this issue in 0.19. Not facing the problem in 0.20 though.

          Show
          amansk Amandeep Khurana added a comment - I had this issue in 0.19. Not facing the problem in 0.20 though.
          Hide
          streamy Jonathan Gray added a comment -

          Thank you for researching, stack.

          Next week we'll have a ton of MR running on trunk so will report if we find anything strange.

          Show
          streamy Jonathan Gray added a comment - Thank you for researching, stack. Next week we'll have a ton of MR running on trunk so will report if we find anything strange.
          Hide
          stack stack added a comment -

          I ran a rowcounter job against a 100 region table of ~20M rows. Cluster was small (4 regionservers). Tasktrackers ran beside the RS. Every task was scheduled on the TT that was local to the RS ("Input Split Locations" always had same value as "Machine" in the taskdetails page).

          Show
          stack stack added a comment - I ran a rowcounter job against a 100 region table of ~20M rows. Cluster was small (4 regionservers). Tasktrackers ran beside the RS. Every task was scheduled on the TT that was local to the RS ("Input Split Locations" always had same value as "Machine" in the taskdetails page).
          Hide
          stack stack added a comment -

          So, what is the indicator in the MR UI measuring? TT+DN locality? Or is it TT+RS? If the latter, and we are only 10% of the time doing TT mapper local to the region hosting server, then our TT+RS locality would seem to be broke – or ineffective (either would be good to know).

          Show
          stack stack added a comment - So, what is the indicator in the MR UI measuring? TT+DN locality? Or is it TT+RS? If the latter, and we are only 10% of the time doing TT mapper local to the region hosting server, then our TT+RS locality would seem to be broke – or ineffective (either would be good to know).
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          We already do this inside TableInputFormatBase:

          String regionLocation = table.getRegionLocation(startKeys[startPos]).
            getServerAddress().getHostname(); 
          splits[i] = new TableSplit(this.table.getTableName(),
            startKeys[startPos], ((i + 1) < realNumSplits) ? startKeys[lastPos]:
            HConstants.EMPTY_START_ROW, regionLocation);
          LOG.info("split: " + i + "->" + splits[i]);
          

          I don't know if we can do anything more than that. One difference in HBase compared to mapred on HDFS is that a region is only on one node, not 3 which is the default replication factor. So being able to get the right map task on the right RS at the right moment may be difficult for the JobTracker.

          Show
          jdcryans Jean-Daniel Cryans added a comment - We already do this inside TableInputFormatBase: String regionLocation = table.getRegionLocation(startKeys[startPos]). getServerAddress().getHostname(); splits[i] = new TableSplit( this .table.getTableName(), startKeys[startPos], ((i + 1) < realNumSplits) ? startKeys[lastPos]: HConstants.EMPTY_START_ROW, regionLocation); LOG.info( "split: " + i + "->" + splits[i]); I don't know if we can do anything more than that. One difference in HBase compared to mapred on HDFS is that a region is only on one node, not 3 which is the default replication factor. So being able to get the right map task on the right RS at the right moment may be difficult for the JobTracker.
          Hide
          streamy Jonathan Gray added a comment -

          Bringing in to 0.20.0 so someone can verify whether this works in trunk or not. I can do it later this week if no one else does.

          Show
          streamy Jonathan Gray added a comment - Bringing in to 0.20.0 so someone can verify whether this works in trunk or not. I can do it later this week if no one else does.
          Hide
          streamy Jonathan Gray added a comment -

          This needs to be tested on trunk, thought we had fixed this.

          Show
          streamy Jonathan Gray added a comment - This needs to be tested on trunk, thought we had fixed this.

            People

            • Assignee:
              Unassigned
              Reporter:
              amansk Amandeep Khurana
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development