Issue Details (XML | Word | Printable)

Key: HADOOP-3970
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Amar Kamat
Reporter: Amar Kamat
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Counters written to the job history cannot be recovered back

Created: 19/Aug/08 09:43 AM   Updated: 08/Jul/09 04:52 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.19.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-3970-v1.patch 2008-08-19 08:28 PM Amar Kamat 4 kB
Text File Licensed for inclusion in ASF works HADOOP-3970-v2.patch 2008-08-30 04:01 PM Amar Kamat 19 kB
Text File Licensed for inclusion in ASF works HADOOP-3970-v3.patch 2008-09-01 03:21 PM Amar Kamat 20 kB
Text File Licensed for inclusion in ASF works HADOOP-3970-v4.1.patch 2008-09-09 03:39 AM Amar Kamat 18 kB
Text File Licensed for inclusion in ASF works HADOOP-3970-v4.patch 2008-09-05 05:14 AM Amar Kamat 20 kB
Issue Links:
Dependants
 

Hadoop Flags: Reviewed
Release Note: Added getEscapedCompactString() and fromEscapedCompactString() to Counters.java to represent counters as Strings and to reconstruct the counters from the Strings.
Resolution Date: 09/Sep/08 07:04 AM


 Description  « Hide
Counters that are written to the JobHistory are stringified using Counters.makeCompactString(). The format in which this api converts the counter into a string is groupname.countername:value. The problem is that groupname and countername can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Amar Kamat added a comment - 19/Aug/08 08:02 PM
Currently the way counters are converted to string (in jobhistory) is as follows
groupname.countername:value
So for a counter which has the following contents
groupname countername value
g1 c1 v1
g1 c2 v2
g1 c3 v3
g2 c4 v4
g2 c5 v5

We get
g1.c1:v1, g1.c2:v2, g1.c3:v3, g2.c4:v4, g2.c5:v5

One way to overcome the problem stated above is to use the length of the names present in the counter
So the above counter might look like
[|g1|] g1 {[|c1|] c1:v1, [|c2|] c2:v2, [|c3|] c3:v3 }, [|g2|] g2 {[|c4|] c4:v4, [|c5|] c5:v5 }
Here |s| means length of s.
Hence the length of the name in the counter helps to correctly identify the name and helps in counter recovery.


Thoughts?


Amar Kamat added a comment - 19/Aug/08 08:28 PM
Attaching a patch the implements the above mentioned technique. Tested it with HADOOP-3245 and it works fine. I am yet to test it separately.

Amar Kamat added a comment - 22/Aug/08 09:39 AM
It seems that the jobhistory uses the display names instead of actual names in the stringified representation.

One such example of actual name and display name is
actual-name = HDFS_WRITE and display-name = HDFS bytes written.

Note that the mapping from the actual name to the display name is mentioned in the .properties file.

Looks like its difficult to get the (actual) name from the display name because of the following reasons
1) Display names can be same across counters.
2) There are two ways to add/increment a counter

  • pass the group, counter and the value
  • pass the (enum) key and the value. For the (enum) key, the name is the classname for that (enum) key.
    The display name is obtained from the property file. In case there is no property file corresponding to the enum, the actual name is used. In either case, figuring out the actual name seems difficult.

So, I suggest we also add the actual name to jobhistory along with the display name and provide api to set the display-name of the counter/group.


Owen O'Malley added a comment - 26/Aug/08 04:26 PM
Let's just keep the display names. The internal names are for mostly for mapping to the counters. Yes, in theory the display names could collide, but in practice they won't, because they are always displayed to the user by the display name.

I think that putting in the lengths would make it very hard to read. I think it is better to stay with in the current scheme and use the current:

g1.c1: v1, g1.c2:v2, ...

As you point out, that means we need to quote the characters . : , and probably " and \.


Owen O'Malley added a comment - 29/Aug/08 05:35 AM
After an out-of-band discussion with Amar, I agree that we need both the internal and display names to be saved. If we are breaking the format, we should probably pull out the group names and put in both. Maybe we should use a lisp like format:
{g1 {ci1 cd1 v1} {ci2 cd2 v2} {ci3 cd3 v3}} {g2 {ci4 cd4 v4}}

Thoughts?


Hemanth Yamijala added a comment - 29/Aug/08 06:43 AM
Amar and I thought about this a bit. Groups also have internal and display names, apparently. Since space is a character that may be more common in display names (we have examples in the current list of counters), it may be better to do something like:
{(gi1)(gd1){(ci1)(cd1)(v1)}}
.

Real example:

{(org.apache.hadoop.mapred.Task$FileSystemCounter)(File Systems){(HDFS_READ)(HDFS bytes read)(1020634)}}

as opposed to

{org.apache.hadoop.mapred.Task$FileSystemCounter File\ Systems {HDFS_READ HDFS\ bytes\ read 1020634}}

IMHO, the first is a bit more readable, and clear where the tokens are ending. Comments ?


Amar Kamat added a comment - 30/Aug/08 04:01 PM
Attaching a patch that adds the facility to recover the counter from (encoded)string. The format in which the counters are represented is
{(gn1) (gd1) [(cn1) (cd1) (cv1)] [] [] []} {}
Where {} are used for groups, [] are used for counter and () for basic values

Th change w.r.t Hemanth's comment is in the encapsulation for counter from '{ }' to '[ ]' which makes parsing and the recovery code simple. Added testcases that test counters. Comments? Thoughts?


Amar Kamat added a comment - 01/Sep/08 03:21 PM
Uploading a new patch the fixes the findbugs warnings.

Hemanth Yamijala added a comment - 04/Sep/08 07:39 PM
StringUtils
  • I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape. So, do we really need to deprecate this ?
  • charHasPreEscape is not needed. It is always going to be the escape char. This is a bug in the previous code too.

Counters

  • The token delimiters can be declared final.

Counter

  • equals and hashCode implementation: Since the counter's value can change over time, I am thinking that these methods should not depend on the value of the counter. Needs to be thought out a bit more.
  • makeEscapedCompactString - use StringBuffer, instead of appending spaces.
  • makeEscapedCompactString - do we need the spaces ? I think the expression is compact without them. If we need them, then a space is missing when the counter closes.

Group

  • equals: Code is incorrectly returning true if classes don't match.
  • hashCode: Are there standard ways in which we can combine hashCodes of various implementations. Also, the hashCode should not use the toString method to get the hashCode.
    makeCompactEscapedString - names are different between similar API in Counter. Can we have them uniform ? Also, you can use StringBuffer without any String concatanation at all. Same comment on blank spaces.

Counters:

  • getBlock: We expect a start and end. If there's no start or no end, should we fail fast ? I guess today if the end does not come as expected, it returns the remaining String as the value.
  • Same comments here as with the Group's implementation for overriding equals and hashcode.

TestCounters:

  • testCounter need not check the mainCounter and copyCounter's equality before getting them from the strings. In fact, why do we need copyCounter at all ?

Amar Kamat added a comment - 05/Sep/08 05:14 AM
Attaching a patch the incorporates Hemanth's comments.

I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape. So, do we really need to deprecate this ?

The reason for deprecating the api is that I have introduced a new/enhanced api that essentially does the same thing but is more generic. I personally dont have any strong opinion about this.


Amar Kamat added a comment - 05/Sep/08 05:49 AM

equals and hashCode implementation: Since the counter's value can change over time, I am thinking that these methods should not depend on the value of the counter. ....

The code to check content equality of counters is moved to the test case as only the test case requires it for now.

getBlock: We expect a start and end. If there's no start or no end, should we fail fast ?

Now getBlock() throws ParseException if the string is malformed and returns null is no block found


Amareshwari Sriramadasu added a comment - 08/Sep/08 10:30 AM

I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape.

+1 I agree with Hemant to not deprecate the api escapeString(String str, char escapeChar, char charToEscape).


Amar Kamat added a comment - 09/Sep/08 03:39 AM
Attaching a patch that removes the deprecated tag. As rightly pointed out by Hemanth, I have not optimized the code for performance. Most of the changes in StringUtils.java are due to refactoring. The apis added to Counters are kept simple. Will check if it can be easily optimized.

Hadoop QA added a comment - 09/Sep/08 05:29 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389722/HADOOP-3970-v4.1.patch
against trunk revision 693048.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/console

This message is automatically generated.


Amar Kamat added a comment - 09/Sep/08 05:39 AM
Core test failed on TestDatanodeDeath. Not sure if its related to this patch or Counters. Ran the same test on my machine and it passed.

Devaraj Das added a comment - 09/Sep/08 07:04 AM
I just committed this. Thanks, Amar!

Hudson added a comment - 11/Sep/08 01:04 PM

Suhas Gogate added a comment - 12/Sep/08 11:04 PM
Amar, patch applied through JIRA provides the function "Counters.fromEscapedCompactString(String compactString)". I was planing to use it to parse the counters printed in job history log file, but looks like job history logfile does NOT have the counters string representation in a escaped compact string format. Is it going to be changed? Which JIRA would address this issue (HADOOP-3200 ?). Pl. comment.

Amar Kamat added a comment - 15/Sep/08 05:21 AM
Suhas,
Here we have introduced 2 new api's namely Counters.makeEscapedCompactString() and Counters.fromEscapedCompactString(). What history uses is Counters.makeCompactString() and there is no mechanism to recover it into an object. If you look closely, the counter string in the job history is ambiguous and hence recovery becomes problematic. Counters.makeEscapedCompactString() represents the counters in an unambiguous way and will be used by HADOOP-3245.

Matei Zaharia added a comment - 03/Nov/08 03:01 AM
Will this be a standardized format for counters in future Hadoop releases, or are there other issues you guys think will need to be fixed? I've been working on some scripts for parsing job history logs to compute statistics about how a Hadoop cluster is being used (see https://issues.apache.org/jira/browse/HADOOP-3708) and I've seen small things being different in different versions of Hadoop. I think it would be beneficial to choose a format and make it standard for Hadoop 1.0, because I imagine others are also interested in being able to parse out info from the job history logs.