Hadoop Common
  1. Hadoop Common
  2. HADOOP-3970

Counters written to the job history cannot be recovered back

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added getEscapedCompactString() and fromEscapedCompactString() to Counters.java to represent counters as Strings and to reconstruct the counters from the Strings.

      Description

      Counters that are written to the JobHistory are stringified using Counters.makeCompactString(). The format in which this api converts the counter into a string is groupname.countername:value. The problem is that groupname and countername can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

      1. HADOOP-3970-v4.1.patch
        18 kB
        Amar Kamat
      2. HADOOP-3970-v4.patch
        20 kB
        Amar Kamat
      3. HADOOP-3970-v3.patch
        20 kB
        Amar Kamat
      4. HADOOP-3970-v2.patch
        19 kB
        Amar Kamat
      5. HADOOP-3970-v1.patch
        4 kB
        Amar Kamat

        Issue Links

          Activity

          Hide
          Amar Kamat added a comment -

          Currently the way counters are converted to string (in jobhistory) is as follows
          groupname.countername:value
          So for a counter which has the following contents

          groupname countername value
          g1 c1 v1
          g1 c2 v2
          g1 c3 v3
          g2 c4 v4
          g2 c5 v5

          We get
          g1.c1:v1, g1.c2:v2, g1.c3:v3, g2.c4:v4, g2.c5:v5

          One way to overcome the problem stated above is to use the length of the names present in the counter
          So the above counter might look like
          {{[|g1|] g1

          {[|c1|] c1:v1, [|c2|] c2:v2, [|c3|] c3:v3 }

          , [|g2|] g2

          {[|c4|] c4:v4, [|c5|] c5:v5 }

          }}
          Here |s| means length of s.
          Hence the length of the name in the counter helps to correctly identify the name and helps in counter recovery.


          Thoughts?

          Show
          Amar Kamat added a comment - Currently the way counters are converted to string (in jobhistory) is as follows groupname.countername:value So for a counter which has the following contents groupname countername value g1 c1 v1 g1 c2 v2 g1 c3 v3 g2 c4 v4 g2 c5 v5 We get g1.c1:v1, g1.c2:v2, g1.c3:v3, g2.c4:v4, g2.c5:v5 One way to overcome the problem stated above is to use the length of the names present in the counter So the above counter might look like {{ [|g1|] g1 {[|c1|] c1:v1, [|c2|] c2:v2, [|c3|] c3:v3 } , [|g2|] g2 {[|c4|] c4:v4, [|c5|] c5:v5 } }} Here |s| means length of s. Hence the length of the name in the counter helps to correctly identify the name and helps in counter recovery. Thoughts?
          Hide
          Amar Kamat added a comment -

          Attaching a patch the implements the above mentioned technique. Tested it with HADOOP-3245 and it works fine. I am yet to test it separately.

          Show
          Amar Kamat added a comment - Attaching a patch the implements the above mentioned technique. Tested it with HADOOP-3245 and it works fine. I am yet to test it separately.
          Hide
          Amar Kamat added a comment -

          It seems that the jobhistory uses the display names instead of actual names in the stringified representation.

          One such example of actual name and display name is
          actual-name = HDFS_WRITE and display-name = HDFS bytes written.

          Note that the mapping from the actual name to the display name is mentioned in the .properties file.

          Looks like its difficult to get the (actual) name from the display name because of the following reasons
          1) Display names can be same across counters.
          2) There are two ways to add/increment a counter

          • pass the group, counter and the value
          • pass the (enum) key and the value. For the (enum) key, the name is the classname for that (enum) key.
            The display name is obtained from the property file. In case there is no property file corresponding to the enum, the actual name is used. In either case, figuring out the actual name seems difficult.

          So, I suggest we also add the actual name to jobhistory along with the display name and provide api to set the display-name of the counter/group.

          Show
          Amar Kamat added a comment - It seems that the jobhistory uses the display names instead of actual names in the stringified representation. One such example of actual name and display name is actual-name = HDFS_WRITE and display-name = HDFS bytes written . Note that the mapping from the actual name to the display name is mentioned in the .properties file. Looks like its difficult to get the (actual) name from the display name because of the following reasons 1) Display names can be same across counters. 2) There are two ways to add/increment a counter pass the group , counter and the value pass the (enum) key and the value . For the (enum) key, the name is the classname for that (enum) key . The display name is obtained from the property file. In case there is no property file corresponding to the enum, the actual name is used. In either case, figuring out the actual name seems difficult. So, I suggest we also add the actual name to jobhistory along with the display name and provide api to set the display-name of the counter/group.
          Hide
          Owen O'Malley added a comment -

          Let's just keep the display names. The internal names are for mostly for mapping to the counters. Yes, in theory the display names could collide, but in practice they won't, because they are always displayed to the user by the display name.

          I think that putting in the lengths would make it very hard to read. I think it is better to stay with in the current scheme and use the current:

          g1.c1: v1, g1.c2:v2, ...
          

          As you point out, that means we need to quote the characters . : , and probably " and \.

          Show
          Owen O'Malley added a comment - Let's just keep the display names. The internal names are for mostly for mapping to the counters. Yes, in theory the display names could collide, but in practice they won't, because they are always displayed to the user by the display name. I think that putting in the lengths would make it very hard to read. I think it is better to stay with in the current scheme and use the current: g1.c1: v1, g1.c2:v2, ... As you point out, that means we need to quote the characters . : , and probably " and \.
          Hide
          Owen O'Malley added a comment -

          After an out-of-band discussion with Amar, I agree that we need both the internal and display names to be saved. If we are breaking the format, we should probably pull out the group names and put in both. Maybe we should use a lisp like format:

          {g1 {ci1 cd1 v1} {ci2 cd2 v2} {ci3 cd3 v3}} {g2 {ci4 cd4 v4}}
          

          Thoughts?

          Show
          Owen O'Malley added a comment - After an out-of-band discussion with Amar, I agree that we need both the internal and display names to be saved. If we are breaking the format, we should probably pull out the group names and put in both. Maybe we should use a lisp like format: {g1 {ci1 cd1 v1} {ci2 cd2 v2} {ci3 cd3 v3}} {g2 {ci4 cd4 v4}} Thoughts?
          Hide
          Hemanth Yamijala added a comment -

          Amar and I thought about this a bit. Groups also have internal and display names, apparently. Since space is a character that may be more common in display names (we have examples in the current list of counters), it may be better to do something like:

          {(gi1)(gd1){(ci1)(cd1)(v1)}}
          

          .

          Real example:

          {(org.apache.hadoop.mapred.Task$FileSystemCounter)(File Systems){(HDFS_READ)(HDFS bytes read)(1020634)}}
          

          as opposed to

          {org.apache.hadoop.mapred.Task$FileSystemCounter File\ Systems {HDFS_READ HDFS\ bytes\ read 1020634}}
          

          IMHO, the first is a bit more readable, and clear where the tokens are ending. Comments ?

          Show
          Hemanth Yamijala added a comment - Amar and I thought about this a bit. Groups also have internal and display names, apparently. Since space is a character that may be more common in display names (we have examples in the current list of counters), it may be better to do something like: {(gi1)(gd1){(ci1)(cd1)(v1)}} . Real example: {(org.apache.hadoop.mapred.Task$FileSystemCounter)(File Systems){(HDFS_READ)(HDFS bytes read)(1020634)}} as opposed to {org.apache.hadoop.mapred.Task$FileSystemCounter File\ Systems {HDFS_READ HDFS\ bytes\ read 1020634}} IMHO, the first is a bit more readable, and clear where the tokens are ending. Comments ?
          Hide
          Amar Kamat added a comment -

          Attaching a patch that adds the facility to recover the counter from (encoded)string. The format in which the counters are represented is

          {(gn1) (gd1) [(cn1) (cd1) (cv1)] [] [] []} {}
          Where {} are used for groups, [] are used for counter and () for basic values
          

          Th change w.r.t Hemanth's comment is in the encapsulation for counter from '{ }' to '[ ]' which makes parsing and the recovery code simple. Added testcases that test counters. Comments? Thoughts?

          Show
          Amar Kamat added a comment - Attaching a patch that adds the facility to recover the counter from (encoded)string. The format in which the counters are represented is {(gn1) (gd1) [(cn1) (cd1) (cv1)] [] [] []} {} Where {} are used for groups, [] are used for counter and () for basic values Th change w.r.t Hemanth's comment is in the encapsulation for counter from '{ }' to '[ ]' which makes parsing and the recovery code simple. Added testcases that test counters. Comments? Thoughts?
          Hide
          Amar Kamat added a comment -

          Uploading a new patch the fixes the findbugs warnings.

          Show
          Amar Kamat added a comment - Uploading a new patch the fixes the findbugs warnings.
          Hide
          Hemanth Yamijala added a comment -

          StringUtils

          • I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape. So, do we really need to deprecate this ?
          • charHasPreEscape is not needed. It is always going to be the escape char. This is a bug in the previous code too.

          Counters

          • The token delimiters can be declared final.

          Counter

          • equals and hashCode implementation: Since the counter's value can change over time, I am thinking that these methods should not depend on the value of the counter. Needs to be thought out a bit more.
          • makeEscapedCompactString - use StringBuffer, instead of appending spaces.
          • makeEscapedCompactString - do we need the spaces ? I think the expression is compact without them. If we need them, then a space is missing when the counter closes.

          Group

          • equals: Code is incorrectly returning true if classes don't match.
          • hashCode: Are there standard ways in which we can combine hashCodes of various implementations. Also, the hashCode should not use the toString method to get the hashCode.
            makeCompactEscapedString - names are different between similar API in Counter. Can we have them uniform ? Also, you can use StringBuffer without any String concatanation at all. Same comment on blank spaces.

          Counters:

          • getBlock: We expect a start and end. If there's no start or no end, should we fail fast ? I guess today if the end does not come as expected, it returns the remaining String as the value.
          • Same comments here as with the Group's implementation for overriding equals and hashcode.

          TestCounters:

          • testCounter need not check the mainCounter and copyCounter's equality before getting them from the strings. In fact, why do we need copyCounter at all ?
          Show
          Hemanth Yamijala added a comment - StringUtils I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape. So, do we really need to deprecate this ? charHasPreEscape is not needed. It is always going to be the escape char. This is a bug in the previous code too. Counters The token delimiters can be declared final. Counter equals and hashCode implementation: Since the counter's value can change over time, I am thinking that these methods should not depend on the value of the counter. Needs to be thought out a bit more. makeEscapedCompactString - use StringBuffer, instead of appending spaces. makeEscapedCompactString - do we need the spaces ? I think the expression is compact without them. If we need them, then a space is missing when the counter closes. Group equals: Code is incorrectly returning true if classes don't match. hashCode: Are there standard ways in which we can combine hashCodes of various implementations. Also, the hashCode should not use the toString method to get the hashCode. makeCompactEscapedString - names are different between similar API in Counter. Can we have them uniform ? Also, you can use StringBuffer without any String concatanation at all. Same comment on blank spaces. Counters: getBlock: We expect a start and end. If there's no start or no end, should we fail fast ? I guess today if the end does not come as expected, it returns the remaining String as the value. Same comments here as with the Group's implementation for overriding equals and hashcode. TestCounters: testCounter need not check the mainCounter and copyCounter's equality before getting them from the strings. In fact, why do we need copyCounter at all ?
          Hide
          Amar Kamat added a comment -

          Attaching a patch the incorporates Hemanth's comments.

          I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape. So, do we really need to deprecate this ?

          The reason for deprecating the api is that I have introduced a new/enhanced api that essentially does the same thing but is more generic. I personally dont have any strong opinion about this.

          Show
          Amar Kamat added a comment - Attaching a patch the incorporates Hemanth's comments. I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape. So, do we really need to deprecate this ? The reason for deprecating the api is that I have introduced a new/enhanced api that essentially does the same thing but is more generic. I personally dont have any strong opinion about this.
          Hide
          Amar Kamat added a comment -

          equals and hashCode implementation: Since the counter's value can change over time, I am thinking that these methods should not depend on the value of the counter. ....

          The code to check content equality of counters is moved to the test case as only the test case requires it for now.

          getBlock: We expect a start and end. If there's no start or no end, should we fail fast ?

          Now getBlock() throws ParseException if the string is malformed and returns null is no block found

          Show
          Amar Kamat added a comment - equals and hashCode implementation: Since the counter's value can change over time, I am thinking that these methods should not depend on the value of the counter. .... The code to check content equality of counters is moved to the test case as only the test case requires it for now. getBlock: We expect a start and end. If there's no start or no end, should we fail fast ? Now getBlock() throws ParseException if the string is malformed and returns null is no block found
          Hide
          Amareshwari Sriramadasu added a comment -

          I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape.

          +1 I agree with Hemant to not deprecate the api escapeString(String str, char escapeChar, char charToEscape).

          Show
          Amareshwari Sriramadasu added a comment - I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape. +1 I agree with Hemant to not deprecate the api escapeString(String str, char escapeChar, char charToEscape).
          Hide
          Amar Kamat added a comment -

          Attaching a patch that removes the deprecated tag. As rightly pointed out by Hemanth, I have not optimized the code for performance. Most of the changes in StringUtils.java are due to refactoring. The apis added to Counters are kept simple. Will check if it can be easily optimized.

          Show
          Amar Kamat added a comment - Attaching a patch that removes the deprecated tag. As rightly pointed out by Hemanth, I have not optimized the code for performance. Most of the changes in StringUtils.java are due to refactoring. The apis added to Counters are kept simple. Will check if it can be easily optimized.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389722/HADOOP-3970-v4.1.patch
          against trunk revision 693048.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389722/HADOOP-3970-v4.1.patch against trunk revision 693048. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/console This message is automatically generated.
          Hide
          Amar Kamat added a comment -

          Core test failed on TestDatanodeDeath. Not sure if its related to this patch or Counters. Ran the same test on my machine and it passed.

          Show
          Amar Kamat added a comment - Core test failed on TestDatanodeDeath. Not sure if its related to this patch or Counters. Ran the same test on my machine and it passed.
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Amar!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Amar!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #600 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/600/ )
          Hide
          Suhas Gogate added a comment -

          Amar, patch applied through JIRA provides the function "Counters.fromEscapedCompactString(String compactString)". I was planing to use it to parse the counters printed in job history log file, but looks like job history logfile does NOT have the counters string representation in a escaped compact string format. Is it going to be changed? Which JIRA would address this issue (HADOOP-3200 ?). Pl. comment.

          Show
          Suhas Gogate added a comment - Amar, patch applied through JIRA provides the function "Counters.fromEscapedCompactString(String compactString)". I was planing to use it to parse the counters printed in job history log file, but looks like job history logfile does NOT have the counters string representation in a escaped compact string format. Is it going to be changed? Which JIRA would address this issue ( HADOOP-3200 ?). Pl. comment.
          Hide
          Amar Kamat added a comment -

          Suhas,
          Here we have introduced 2 new api's namely Counters.makeEscapedCompactString() and Counters.fromEscapedCompactString(). What history uses is Counters.makeCompactString() and there is no mechanism to recover it into an object. If you look closely, the counter string in the job history is ambiguous and hence recovery becomes problematic. Counters.makeEscapedCompactString() represents the counters in an unambiguous way and will be used by HADOOP-3245.

          Show
          Amar Kamat added a comment - Suhas, Here we have introduced 2 new api's namely Counters.makeEscapedCompactString() and Counters.fromEscapedCompactString() . What history uses is Counters.makeCompactString() and there is no mechanism to recover it into an object. If you look closely, the counter string in the job history is ambiguous and hence recovery becomes problematic. Counters.makeEscapedCompactString() represents the counters in an unambiguous way and will be used by HADOOP-3245 .
          Hide
          Matei Zaharia added a comment -

          Will this be a standardized format for counters in future Hadoop releases, or are there other issues you guys think will need to be fixed? I've been working on some scripts for parsing job history logs to compute statistics about how a Hadoop cluster is being used (see https://issues.apache.org/jira/browse/HADOOP-3708) and I've seen small things being different in different versions of Hadoop. I think it would be beneficial to choose a format and make it standard for Hadoop 1.0, because I imagine others are also interested in being able to parse out info from the job history logs.

          Show
          Matei Zaharia added a comment - Will this be a standardized format for counters in future Hadoop releases, or are there other issues you guys think will need to be fixed? I've been working on some scripts for parsing job history logs to compute statistics about how a Hadoop cluster is being used (see https://issues.apache.org/jira/browse/HADOOP-3708 ) and I've seen small things being different in different versions of Hadoop. I think it would be beneficial to choose a format and make it standard for Hadoop 1.0, because I imagine others are also interested in being able to parse out info from the job history logs.

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Amar Kamat
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development