Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-90

incrementing counters should not be used for triggering record skipping

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The following code is really problematic:

      public void incrCounter(String group, String counter, long amount) {
        if (counters != null) {
          counters.incrCounter(group, counter, amount);
        }
        if(skipping && SkipBadRecords.COUNTER_GROUP.equals(group) && (
           SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS.equals(counter) ||
           SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS.equals(counter))) {
           //if application reports the processed records, move the 
           //currentRecStartIndex to the next.
           //currentRecStartIndex is the start index which has not yet been 
           //finished and is still in task's stomach.
           for(int i=0;i<amount;i++) {
              currentRecStartIndex = currentRecIndexIterator.next();
           }
         ...
      }
      

      In particular, if the user updates a counter with the wrong name, bad things will presumably happen...

        Activity

        Owen O'Malley created issue -
        Hide
        Sharad Agarwal added a comment -

        Skipping bad records feature need a way to get a callback for the number of processed records from streaming process. To support this, counters were chosen as that is supported by both pipes and streaming ->https://issues.apache.org/jira/browse/HADOOP-153?focusedCommentId=12610897#action_12610897 (last point)

        In particular, if the user updates a counter with the wrong name, bad things will presumably happen...

        I see this can only happen if user defines its own counter with the same name. Or is there any other problem which can happen? would it be ok for now to document the framework reserve counter names and perhaps log in the above loop that framework counter is being updated ?

        Other alternative if we don't want to use counter for this at all, would be to add a mechanism in streaming and pipes protocol. Streaming can write to stderr something like processedRecords, which would be parsed by the framework. Similarly need to be added to Pipes protocol as well.

        Show
        Sharad Agarwal added a comment - Skipping bad records feature need a way to get a callback for the number of processed records from streaming process. To support this, counters were chosen as that is supported by both pipes and streaming -> https://issues.apache.org/jira/browse/HADOOP-153?focusedCommentId=12610897#action_12610897 (last point) In particular, if the user updates a counter with the wrong name, bad things will presumably happen... I see this can only happen if user defines its own counter with the same name. Or is there any other problem which can happen? would it be ok for now to document the framework reserve counter names and perhaps log in the above loop that framework counter is being updated ? Other alternative if we don't want to use counter for this at all, would be to add a mechanism in streaming and pipes protocol. Streaming can write to stderr something like processedRecords, which would be parsed by the framework. Similarly need to be added to Pipes protocol as well.
        Owen O'Malley made changes -
        Field Original Value New Value
        Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
        Key HADOOP-4718 MAPREDUCE-90
        Component/s mapred [ 12310690 ]
        Hide
        Harsh J added a comment -

        I think the resolution that was taken here was that we no longer support/recommend the skip bad records in the framework. Users should implement their own tech to handle bad records in their apps.

        Given this, I am resolving this as Won't Fix.

        Show
        Harsh J added a comment - I think the resolution that was taken here was that we no longer support/recommend the skip bad records in the framework. Users should implement their own tech to handle bad records in their apps. Given this, I am resolving this as Won't Fix.
        Harsh J made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Owen O'Malley
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development