Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Changes the Job History file format to use JSON.
      Simplifies the Job History Parsing logic
      Removes duplication of code between HistoryViewer and the JSP files
      History Files are now named as JobID_user
      Introduces a new cluster level configuration "mapreduce.cluster.jobhistory.maxage" for configuring the amount of time history files are kept before getting cleaned up
      The configuration "hadoop.job.history.user.location" is no longer supported.
      Show
      Changes the Job History file format to use JSON. Simplifies the Job History Parsing logic Removes duplication of code between HistoryViewer and the JSP files History Files are now named as JobID_user Introduces a new cluster level configuration "mapreduce.cluster.jobhistory.maxage" for configuring the amount of time history files are kept before getting cleaned up The configuration "hadoop.job.history.user.location" is no longer supported.

      Description

      Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

      1. mapred-157-10Sep.patch
        455 kB
        Jothi Padmanabhan
      2. mapred-157-15Sep.patch
        468 kB
        Jothi Padmanabhan
      3. mapred-157-15Sep-v1.patch
        469 kB
        Jothi Padmanabhan
      4. mapred-157-16Sep.patch
        469 kB
        Jothi Padmanabhan
      5. mapred-157-16Sep-v1.patch
        470 kB
        Jothi Padmanabhan
      6. mapred-157-4Sep.patch
        451 kB
        Jothi Padmanabhan
      7. mapred-157-7Sep.patch
        454 kB
        Jothi Padmanabhan
      8. mapred-157-7Sep-v1.patch
        454 kB
        Jothi Padmanabhan
      9. mapred-157-prelim.patch
        101 kB
        Jothi Padmanabhan
      10. MAPREDUCE-157-avro.patch
        10 kB
        Doug Cutting
      11. MAPREDUCE-157-avro.patch
        4 kB
        Doug Cutting

        Issue Links

          Activity

          Hide
          Owen O'Malley added a comment -

          The current format quotes the character set ["\.]. The format is

          key kind1="val1" kind2="val2"\n

          with newlines in the values converted to '\n' and quotes converted to '\"'. We should also consider how to represent counters better. The current format is pretty painful.

          Show
          Owen O'Malley added a comment - The current format quotes the character set ["\.] . The format is key kind1="val1" kind2="val2"\n with newlines in the values converted to '\n' and quotes converted to '\"'. We should also consider how to represent counters better. The current format is pretty painful.
          Hide
          Amar Kamat added a comment -

          I think for now we can do something like this

          # encoding step
          1. escape all ';' in values
          2. replace '\n' with ';'
          
          # decoding step
          1. replace all unescaped ';' with \n
          2. unescape all ';'
          

          For now this should work. Externally the error message would look like ';' separated values and one line in jobhistory will contain exactly one event. Thoughts?

          Show
          Amar Kamat added a comment - I think for now we can do something like this # encoding step 1. escape all ';' in values 2. replace '\n' with ';' # decoding step 1. replace all unescaped ';' with \n 2. unescape all ';' For now this should work. Externally the error message would look like ';' separated values and one line in jobhistory will contain exactly one event. Thoughts?
          Hide
          Owen O'Malley added a comment -

          I think we should completely redesign the format. I'd propose using JSON so that it is trivial to parse in python, perl and java. If we only put in newlines, between records all of the needs are met using a standard layout. Furthermore, we can encode counters simply and directly rather than complicated nested encoding schemes.

          Show
          Owen O'Malley added a comment - I think we should completely redesign the format. I'd propose using JSON so that it is trivial to parse in python, perl and java. If we only put in newlines, between records all of the needs are met using a standard layout. Furthermore, we can encode counters simply and directly rather than complicated nested encoding schemes.
          Hide
          Owen O'Malley added a comment -

          Doug suggests using Jackson library for JSON parsing and generation. Its url is http://jackson.codehaus.org/.

          Show
          Owen O'Malley added a comment - Doug suggests using Jackson library for JSON parsing and generation. Its url is http://jackson.codehaus.org/ .
          Hide
          Owen O'Malley added a comment - - edited

          It would look like:

          {"KIND":"MapAttempt",
           "TASK_TYPE":"MAP",
           "TASKID":"task_200904210931_0001_m_180280",
           "TASK_ATTEMPT_ID":"attempt_200904210931_0001_m_180280_0",
           "TASK_STATUS":"SUCCESS",
           "FINISH_TIME":"1240321545820",
           "HOSTNAME":"/rack1/node1.purple.ygrid.yahoo.com",
           "STATE_STRING":"",
           "COUNTERS":[{"ID":"org.apache.hadoop.mapred.Task$Counter", "NAME":"Map-Reduce Framework"
                                     "LIST":[{"ID":"COMBINE_OUTPUT_RECORDS", "NAME":"Combine output records", "VALUE":0},
                                              {"ID":"MAP_INPUT_RECORDS","NAME":"Map input records", "VALUE":3363235}]}]
          }
          

          To support grep, we shouldn't add any newlines.

          Show
          Owen O'Malley added a comment - - edited It would look like: { "KIND" : "MapAttempt" , "TASK_TYPE" : "MAP" , "TASKID" : "task_200904210931_0001_m_180280" , "TASK_ATTEMPT_ID" : "attempt_200904210931_0001_m_180280_0" , "TASK_STATUS" : "SUCCESS" , "FINISH_TIME" : "1240321545820" , "HOSTNAME" : "/rack1/node1.purple.ygrid.yahoo.com" , "STATE_STRING" :"", "COUNTERS" :[{ "ID" : "org.apache.hadoop.mapred.Task$Counter" , "NAME" : "Map-Reduce Framework" "LIST" :[{ "ID" : "COMBINE_OUTPUT_RECORDS" , "NAME" : "Combine output records" , "VALUE" :0}, { "ID" : "MAP_INPUT_RECORDS" , "NAME" : "Map input records" , "VALUE" :3363235}]}] } To support grep, we shouldn't add any newlines.
          Hide
          Philip Zeyliger added a comment -

          I'm +1 on JSON. It might make sense to introduce a hook to have multiple sinks for this sort of log data. I'd like to siphon it off some of the data to a database, and I could also see some folks wishing to compress these logs.

          Show
          Philip Zeyliger added a comment - I'm +1 on JSON. It might make sense to introduce a hook to have multiple sinks for this sort of log data. I'd like to siphon it off some of the data to a database, and I could also see some folks wishing to compress these logs.
          Hide
          Todd Lipcon added a comment -

          Along the lines of what Philip said, I'd like to propose using HADOOP-5640 to expose the hooks for this job-history logging system. This would allow people to log in whichever format they prefer (eg JDBC to SQL, JSON, Thrift to something custom, etc) from an external JAR (without modifying mapred source)

          Show
          Todd Lipcon added a comment - Along the lines of what Philip said, I'd like to propose using HADOOP-5640 to expose the hooks for this job-history logging system. This would allow people to log in whichever format they prefer (eg JDBC to SQL, JSON, Thrift to something custom, etc) from an external JAR (without modifying mapred source)
          Hide
          Owen O'Malley added a comment -

          Would adding a log4j logger than was only used for these events work for you? That is certainly something I'd be interested in seeing. Because these event logs are used for JobTracker restart, I don't think making it pluggable is a good plan.

          Show
          Owen O'Malley added a comment - Would adding a log4j logger than was only used for these events work for you? That is certainly something I'd be interested in seeing. Because these event logs are used for JobTracker restart, I don't think making it pluggable is a good plan.
          Hide
          Philip Zeyliger added a comment -

          What do you mean by "event logs are used for JobTracker restart"?

          I'd like to get at these objects in a structured way; i.e., I don't want to parse them from a String. Will a log4j logger let me do that?

          – Philip

          Show
          Philip Zeyliger added a comment - What do you mean by "event logs are used for JobTracker restart"? I'd like to get at these objects in a structured way; i.e., I don't want to parse them from a String. Will a log4j logger let me do that? – Philip
          Hide
          Philip Zeyliger added a comment -

          Ah, right, JobTracker.RecoveryManager seems to use the filenames for the logs to restart jobs.

          – Philip

          Show
          Philip Zeyliger added a comment - Ah, right, JobTracker.RecoveryManager seems to use the filenames for the logs to restart jobs. – Philip
          Hide
          Jothi Padmanabhan added a comment -

          Here is a proposal for the change to write JobHistory events in JSON format.

          1. We will decouple the generation of events with the actual writing/reading of events. JobHistory module will generate Events and pass them on to Event Writers who would do the actual writing of events to the underlying stream. Similarly, on the reading front, Event Readers will read data from the underlying stream and generate events which would then be passed on to the callers (History Viewers, other external log aggregators)
          2. In addition, there would be a provision to stream events directly to external listeners as and when they are generated (See HistoryListener Interface in the code snippet below).
          3. The Framework's event writer would write the events to a local file in JSON format. We will use http://jackson.codehaus.org/
          4. For modularity, we will have abstract classes for HistoryEvents, HistoryEventWriter and HistoryEventReaders. Events will have a kind and a type. Examples of Kind include Job, Task, TaskAttempt. Each kind could support multiple type. Example type for Job include Submitted, Initied, Finished (and others).
          5. While writing Json data, each record will be a separate line by itself. There will not be any new lines within a record.
          6. Each event class would support a toJSON() method that would serialize the event into a JsonNode. Event writers can use this method to write this event in the JSON format to the underlying stream. If the event writers want to write to a different format, they could choose either to parse this JsonNode object or query the Event itself after ascertaining its kind and type.
          7. Similarly, each Event class would support a constructor that takes a JsonNode Object to create an event instance by the event readers while reading from the underlying stream.
          8. Currently, the JobConf object is stored as a separate file, independent of the actual JobHistoryFile. We could possibly store the conf contents as a part of the history file itself. We could wrap the conf object as a special event that is logged during the job submission time.

          Here are some illustrative code snippets

          
          public abstract class HistoryEvent {
          
            protected String type;
            protected HistoryEventKind kind;
            
            public static enum HistoryEventKind {JOB, TASK, TASKATTEMPT, ...}
          
            public String getEventType( ) { return type; }
            public HistoryEventKind getEventKind() { return kind; }
            
            public abstract JsonNode toJSON(JsonNodeFactory jnf);
            
            public HistoryEvent(JsonNode node) { }
          
            public HistoryEvent() {}
          }
          
          public abstract class JobHistoryEvent extends HistoryEvent {
            public JobHistoryEvent() { kind = HistoryEventKind.JOB; }
            public JobHistoryEvent(JsonNode node) { kind = HistoryEventKind.JOB;}
          }
          
          // An example implementation of the JobSubmittedEvent
          
          public class JobSubmittedEvent extends JobHistoryEvent {
          
            private JobID jobid;
            private  String jobName;
            private  String userName;
            private  long submitTime;
            private  Path jobConfPath;
            
            public JobSubmittedEvent(JobID id, String jobName, String userName,
                long submitTime, Path jobConfPath) {
              super();
              this.jobid = id;
              this.jobName = jobName;
              this.userName = userName;
              this.submitTime = submitTime;
              this.jobConfPath = jobConfPath;
              type = "SUBMITTED";
            }
            
            public JobID getJobid() { return jobid; }
            public String getJobName() { return jobName; }
            public String getUserName() { return userName; }
            // other getters
            
            public JobSubmittedEvent(JsonNode node) {
              
            // Code to generate event from JsonNode
              
            }
          
            @Override
            public JsonNode toJSON(JsonNodeFactory jnf) {
              ObjectNode node = new ObjectNode(jnf);
              node.put("EVENT_KIND", kind.toString());
              node.put("EVENT_TYPE", type);
              node.put("JOB_ID", jobid.toString());
              node.put("JOB_NAME", jobName);
              node.put("USER_NAME", userName);
              node.put("SUBMIT_TIME", submitTime);
              node.put("JOB_CONF_PATH", jobConfPath.toString());
              return node;
            }
          
          }
          
          public abstract class HistoryEventWriter {
          
            public abstract void open(String name);
          
            public abstract void write(HistoryEvent event) throws IOException;
          
            public abstract void flush() throws IOException;
          
            public abstract void close() throws IOException;
          }
          
          
          public abstract class HistoryEventReader {
          
            public abstract void open(String name) throws IOException;
          
            public abstract Iterator<HistoryEvent> iterator();
          
            public abstract void close() throws IOException;
          
          }
          
          public interface HistoryListener {
            public handleHistoryEvent(HistoryEvent event);
          }
          
          
          Show
          Jothi Padmanabhan added a comment - Here is a proposal for the change to write JobHistory events in JSON format. We will decouple the generation of events with the actual writing/reading of events. JobHistory module will generate Events and pass them on to Event Writers who would do the actual writing of events to the underlying stream. Similarly, on the reading front, Event Readers will read data from the underlying stream and generate events which would then be passed on to the callers (History Viewers, other external log aggregators) In addition, there would be a provision to stream events directly to external listeners as and when they are generated (See HistoryListener Interface in the code snippet below). The Framework's event writer would write the events to a local file in JSON format. We will use http://jackson.codehaus.org/ For modularity, we will have abstract classes for HistoryEvents, HistoryEventWriter and HistoryEventReaders. Events will have a kind and a type. Examples of Kind include Job, Task, TaskAttempt. Each kind could support multiple type. Example type for Job include Submitted, Initied, Finished (and others). While writing Json data, each record will be a separate line by itself. There will not be any new lines within a record. Each event class would support a toJSON() method that would serialize the event into a JsonNode. Event writers can use this method to write this event in the JSON format to the underlying stream. If the event writers want to write to a different format, they could choose either to parse this JsonNode object or query the Event itself after ascertaining its kind and type. Similarly, each Event class would support a constructor that takes a JsonNode Object to create an event instance by the event readers while reading from the underlying stream. Currently, the JobConf object is stored as a separate file, independent of the actual JobHistoryFile. We could possibly store the conf contents as a part of the history file itself. We could wrap the conf object as a special event that is logged during the job submission time. Here are some illustrative code snippets public abstract class HistoryEvent { protected String type; protected HistoryEventKind kind; public static enum HistoryEventKind {JOB, TASK, TASKATTEMPT, ...} public String getEventType( ) { return type; } public HistoryEventKind getEventKind() { return kind; } public abstract JsonNode toJSON(JsonNodeFactory jnf); public HistoryEvent(JsonNode node) { } public HistoryEvent() {} } public abstract class JobHistoryEvent extends HistoryEvent { public JobHistoryEvent() { kind = HistoryEventKind.JOB; } public JobHistoryEvent(JsonNode node) { kind = HistoryEventKind.JOB;} } // An example implementation of the JobSubmittedEvent public class JobSubmittedEvent extends JobHistoryEvent { private JobID jobid; private String jobName; private String userName; private long submitTime; private Path jobConfPath; public JobSubmittedEvent(JobID id, String jobName, String userName, long submitTime, Path jobConfPath) { super (); this .jobid = id; this .jobName = jobName; this .userName = userName; this .submitTime = submitTime; this .jobConfPath = jobConfPath; type = "SUBMITTED" ; } public JobID getJobid() { return jobid; } public String getJobName() { return jobName; } public String getUserName() { return userName; } // other getters public JobSubmittedEvent(JsonNode node) { // Code to generate event from JsonNode } @Override public JsonNode toJSON(JsonNodeFactory jnf) { ObjectNode node = new ObjectNode(jnf); node.put( "EVENT_KIND" , kind.toString()); node.put( "EVENT_TYPE" , type); node.put( "JOB_ID" , jobid.toString()); node.put( "JOB_NAME" , jobName); node.put( "USER_NAME" , userName); node.put( "SUBMIT_TIME" , submitTime); node.put( "JOB_CONF_PATH" , jobConfPath.toString()); return node; } } public abstract class HistoryEventWriter { public abstract void open( String name); public abstract void write(HistoryEvent event) throws IOException; public abstract void flush() throws IOException; public abstract void close() throws IOException; } public abstract class HistoryEventReader { public abstract void open( String name) throws IOException; public abstract Iterator<HistoryEvent> iterator(); public abstract void close() throws IOException; } public interface HistoryListener { public handleHistoryEvent(HistoryEvent event); }
          Hide
          Jothi Padmanabhan added a comment -

          Based on an offline discussion with Owen, Sharad and Devaraj, it does not appear that we have really strong use cases to support multiple formats for the JobHistory file. As a result, we will strongly tie the format to JSON and will focus on reducing the number of object created, writing information directly to the underlying stream where ever possible. While we will retain the event framework, we will simplify the interface as compared to the previous design.

          One change is to write the event type preceding the actual event object so that the event readers can read the event type and then decide to create the correct event class based on the object. We however, will still have only one record per line. A line in the history file will now look like this:

          {"EVENT_TYPE":"JOB_SUBMITTED"} {"EVENT_KIND":"JOB","JOB_ID":"job_test_0000","JOB_NAME":"TEST-JOB-SUBMITTED","USER_NAME":"Jothi","SUBMIT_TIME":1249887005100,"JOB_CONF_PATH":"/tmp"}
          

          Events will now implement writeFields(JsonGenerator) and readFields(JsonParser) methods.

          The JobHistory module would create one event writer per jobId; event writers would translate this into one history file. The event writer will also internally create a JsonGenerator based on this file and would use this for writing the actual event (by calling event.writeFields).

          Similarly, the job history reading module would create one event reader per jobid/file. This would internally create one JsonParser that would be passed to the individual events' readFields method.

          
          interface HistoryEvent {
            void writeFields (JsonGenerator gen) throws IOException;
            void readFields(JsonParser parser) throws IOException;
          }
          
          class JobHistory {
          ...
             // Generate a history file based on jobId, then create a new EventWriter
              JsonEventWriter eventWriter = new JsonEventWriter(conf, historyFile);
              eventWriter.write(jobSubmittedevent);
             eventWriter.write(jobFinishedEvent);
            ....
            eventWriter.close();
          
          }
          
          class SomeHistoryEventUser {
              JsonEventReader eventReader = new JsonEventReader(conf, historyFile);
              while ((ev = eventReader.getNextEvent()) != null) {
                //process ev
              }
             eventReader.close();
          }
          
          
          Show
          Jothi Padmanabhan added a comment - Based on an offline discussion with Owen, Sharad and Devaraj, it does not appear that we have really strong use cases to support multiple formats for the JobHistory file. As a result, we will strongly tie the format to JSON and will focus on reducing the number of object created, writing information directly to the underlying stream where ever possible. While we will retain the event framework, we will simplify the interface as compared to the previous design. One change is to write the event type preceding the actual event object so that the event readers can read the event type and then decide to create the correct event class based on the object. We however, will still have only one record per line. A line in the history file will now look like this: {"EVENT_TYPE":"JOB_SUBMITTED"} {"EVENT_KIND":"JOB","JOB_ID":"job_test_0000","JOB_NAME":"TEST-JOB-SUBMITTED","USER_NAME":"Jothi","SUBMIT_TIME":1249887005100,"JOB_CONF_PATH":"/tmp"} Events will now implement writeFields(JsonGenerator) and readFields(JsonParser) methods. The JobHistory module would create one event writer per jobId; event writers would translate this into one history file. The event writer will also internally create a JsonGenerator based on this file and would use this for writing the actual event (by calling event.writeFields). Similarly, the job history reading module would create one event reader per jobid/file. This would internally create one JsonParser that would be passed to the individual events' readFields method. interface HistoryEvent { void writeFields (JsonGenerator gen) throws IOException; void readFields(JsonParser parser) throws IOException; } class JobHistory { ... // Generate a history file based on jobId, then create a new EventWriter JsonEventWriter eventWriter = new JsonEventWriter(conf, historyFile); eventWriter.write(jobSubmittedevent); eventWriter.write(jobFinishedEvent); .... eventWriter.close(); } class SomeHistoryEventUser { JsonEventReader eventReader = new JsonEventReader(conf, historyFile); while ((ev = eventReader.getNextEvent()) != null ) { //process ev } eventReader.close(); }
          Hide
          Philip Zeyliger added a comment -

          Would this be a good place to try the Avro serialization format in Hadoop proper? If text-formatting is desired, AVRO-50 has a text format for Avro, which is JSON already. So you'd basically be implementing the same thing, but with the extra context of an Avro schema.

          – Philip

          Show
          Philip Zeyliger added a comment - Would this be a good place to try the Avro serialization format in Hadoop proper? If text-formatting is desired, AVRO-50 has a text format for Avro, which is JSON already. So you'd basically be implementing the same thing, but with the extra context of an Avro schema. – Philip
          Hide
          Philip Zeyliger added a comment -

          done

          On Mon, Aug 10, 2009 at 1:08 AM, Jothi Padmanabhan (JIRA)

          Show
          Philip Zeyliger added a comment - done On Mon, Aug 10, 2009 at 1:08 AM, Jothi Padmanabhan (JIRA)
          Hide
          Philip Zeyliger added a comment -

          Ack, ignore that "done". Was in the wrong browser tab.

          Show
          Philip Zeyliger added a comment - Ack, ignore that "done". Was in the wrong browser tab.
          Hide
          Jothi Padmanabhan added a comment -

          Would this be a good place to try the Avro serialization format in Hadoop proper?

          Possibly. But I think we could get the plain JSON implementation done for the first cut and look at Avro for future, OK?

          Show
          Jothi Padmanabhan added a comment - Would this be a good place to try the Avro serialization format in Hadoop proper? Possibly. But I think we could get the plain JSON implementation done for the first cut and look at Avro for future, OK?
          Hide
          Jeff Hammerbacher added a comment -

          I'm with Philip. I think a commitment to Avro now would be prudent. What's the advantage of cutting a corner now?

          Show
          Jeff Hammerbacher added a comment - I'm with Philip. I think a commitment to Avro now would be prudent. What's the advantage of cutting a corner now?
          Hide
          Owen O'Malley added a comment - - edited

          I'm confused what the goal of using Avro here would be.

          Let's review the goals:
          1. Get an easily parseable text format.
          2. Not require excessive amounts of time for logging
          2a. Not require excessive object allocations.

          It seems like to use Avro, we'd need to create the Avro objects and then write them out. I'd rather just use a JsonWriter to write the events out to the stream. Of course reading is the reverse. It would be like writing xml files by generating the necessary DOM objects. You can do it (and in fact Configuration is written that way. sigh), but it costs a lot of time.

          Not having seen the Avro text format, I can't evaluation how much overhead it adds. None of the features of Avro seem compelling in this case, and could easily lead to unfortunate choices.

          Furthermore, I don't know if there are any guarantees about the Avro text format's stability. We need stability in this format.

          Show
          Owen O'Malley added a comment - - edited I'm confused what the goal of using Avro here would be. Let's review the goals: 1. Get an easily parseable text format. 2. Not require excessive amounts of time for logging 2a. Not require excessive object allocations. It seems like to use Avro, we'd need to create the Avro objects and then write them out. I'd rather just use a JsonWriter to write the events out to the stream. Of course reading is the reverse. It would be like writing xml files by generating the necessary DOM objects. You can do it (and in fact Configuration is written that way. sigh ), but it costs a lot of time. Not having seen the Avro text format, I can't evaluation how much overhead it adds. None of the features of Avro seem compelling in this case, and could easily lead to unfortunate choices. Furthermore, I don't know if there are any guarantees about the Avro text format's stability. We need stability in this format.
          Hide
          Philip Zeyliger added a comment -

          Avro would force you in to a schema, and I think having a schema is the only way to get stability in the format. Yes, there's probably overhead, but if we're using Avro for other things (i.e., all RPCs), we may as well fix those overheads when we get to them. (It may also be a net win to store the data in binary avro format, and write an "avrocat" to deserialize into text before pushing to tools like awk, but I do understand the desire for a text format.)

          All that said, you have specific needs in mind here, and I'm mostly waxing poetical, so I'll certainly defer.

          – Philip

          Show
          Philip Zeyliger added a comment - Avro would force you in to a schema, and I think having a schema is the only way to get stability in the format. Yes, there's probably overhead, but if we're using Avro for other things (i.e., all RPCs), we may as well fix those overheads when we get to them. (It may also be a net win to store the data in binary avro format, and write an "avrocat" to deserialize into text before pushing to tools like awk, but I do understand the desire for a text format.) All that said, you have specific needs in mind here, and I'm mostly waxing poetical, so I'll certainly defer. – Philip
          Hide
          Jeff Hammerbacher added a comment -

          Hey,

          As Philip notes, you guys clearly are working to get work done, so I don't want to stand in the way. For future consideration, here are some reasons I thought using Avro here might be a win for the project as a whole:

          • There will certainly be more performance-critical uses of Avro to come inside of Hadoop, so if we punt on using it here because of performance considerations (2. and 2a.), I'd be concerned with regards to the feasibility of future uses (e.g. data block transfer). I saw this application as a fairly safe place to start getting comfortable with Avro in production use.
          • Similarly, for concern 1, having an Avro schema would allow users to generate intelligent parsers in many languages automagically, and having a useful Avro schema to parse inside of Hadoop could potentially encourage the development of new Avro client libraries.
          • Parsing JSON may be easy in many languages but enforcing structure using Avro's schemas will be valuable for communicating the format of the logs and encouraging the format's stability.
          • Also, it's likely that many of the tools in the Hadoop ecosystem that manipulate structured data (Pig, Hive, etc.) will have utilities for Avro that they may lack for JSON or other formats not central to the Hadoop ecosystem like Avro.

          Either way, I'm happy to see progress being made on a stable and sane format for these logs!

          Later,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey, As Philip notes, you guys clearly are working to get work done, so I don't want to stand in the way. For future consideration, here are some reasons I thought using Avro here might be a win for the project as a whole: There will certainly be more performance-critical uses of Avro to come inside of Hadoop, so if we punt on using it here because of performance considerations (2. and 2a.), I'd be concerned with regards to the feasibility of future uses (e.g. data block transfer). I saw this application as a fairly safe place to start getting comfortable with Avro in production use. Similarly, for concern 1, having an Avro schema would allow users to generate intelligent parsers in many languages automagically, and having a useful Avro schema to parse inside of Hadoop could potentially encourage the development of new Avro client libraries. Parsing JSON may be easy in many languages but enforcing structure using Avro's schemas will be valuable for communicating the format of the logs and encouraging the format's stability. Also, it's likely that many of the tools in the Hadoop ecosystem that manipulate structured data (Pig, Hive, etc.) will have utilities for Avro that they may lack for JSON or other formats not central to the Hadoop ecosystem like Avro. Either way, I'm happy to see progress being made on a stable and sane format for these logs! Later, Jeff
          Hide
          Doug Cutting added a comment -

          Owen> Of course reading is the reverse. It would be like writing xml files by generating the necessary DOM objects.

          Not sure what you mean. Jackson has an event-based JSON reading API.

          http://jackson.codehaus.org/1.2.0/javadoc/org/codehaus/jackson/JsonParser.html

          So, to efficiently read things back into structs you might use an enum of field names, e.g.:

          class Foo { int a; String b; }
          enum FooFields { A, B }
          void readFoo(JsonParser parser, Foo foo) {
            if (parser.nextToken() != JsonToken.START_OBJECT)
              throw new Exception();
            while(parser.nextToken() != JsonToken.END_OBJECT) {
              parser.nextToken();
              switch (Enum.getValue(FooFields.class, parser.getCurrentName()))) {
              case A: foo.a = parser.getIntValue(); break;
              case B: foo.b = parser.getText(); break;
              }
            }
          }
          

          FWIW, Avro supports SAX-like streaming, without object creation. A significant change if we used Avro would be that we'd need to store the schema with the data. We could, for example, make the first line of log files the schema, or write a side file, but there's not much point to Avro data without storing a schema.

          Is the implicit schema proposed here Map<String,String>? For example, would integer values be written as JSON strings, with quotes, or as JSON integers, without quotes? If the schema is Map<String,String> and will be for all time, then there's less point to using Avro. But if fields are typed it might be nice to record the types in a schema.

          Show
          Doug Cutting added a comment - Owen> Of course reading is the reverse. It would be like writing xml files by generating the necessary DOM objects. Not sure what you mean. Jackson has an event-based JSON reading API. http://jackson.codehaus.org/1.2.0/javadoc/org/codehaus/jackson/JsonParser.html So, to efficiently read things back into structs you might use an enum of field names, e.g.: class Foo { int a; String b; } enum FooFields { A, B } void readFoo(JsonParser parser, Foo foo) { if (parser.nextToken() != JsonToken.START_OBJECT) throw new Exception(); while (parser.nextToken() != JsonToken.END_OBJECT) { parser.nextToken(); switch (Enum.getValue(FooFields.class, parser.getCurrentName()))) { case A: foo.a = parser.getIntValue(); break ; case B: foo.b = parser.getText(); break ; } } } FWIW, Avro supports SAX-like streaming, without object creation. A significant change if we used Avro would be that we'd need to store the schema with the data. We could, for example, make the first line of log files the schema, or write a side file, but there's not much point to Avro data without storing a schema. Is the implicit schema proposed here Map<String,String>? For example, would integer values be written as JSON strings, with quotes, or as JSON integers, without quotes? If the schema is Map<String,String> and will be for all time, then there's less point to using Avro. But if fields are typed it might be nice to record the types in a schema.
          Hide
          Jothi Padmanabhan added a comment -

          AVRO-50 has been committed to 1.0.1. We possibly cannot use that unless there is another release of AVRO before Hadoop 0.21 release, no?

          Show
          Jothi Padmanabhan added a comment - AVRO-50 has been committed to 1.0.1. We possibly cannot use that unless there is another release of AVRO before Hadoop 0.21 release, no?
          Hide
          Sharad Agarwal added a comment -

          Is the implicit schema proposed here Map<String,String>? For example, would integer values be written as JSON strings, with quotes, or as JSON integers, without quotes? If the schema is Map<String,String> and will be for all time, then there's less point to using Avro. But if fields are typed it might be nice to record the types in a schema.

          I think that is a reasonable statement. Also apart from types, we would like to have the nested records not just the key-values. (counter info etc.). So Avro looks good fit to me.

          We could, for example, make the first line of log files the schema, or write a side file, but there's not much point to Avro data without storing a schema.

          I think side file would be better as it won't bloat each file with the same info. We can have a union schema comprising of all history events. Perhaps the first line could just be the hadoop version no. as the schema file would be corresponding to it.

          Show
          Sharad Agarwal added a comment - Is the implicit schema proposed here Map<String,String>? For example, would integer values be written as JSON strings, with quotes, or as JSON integers, without quotes? If the schema is Map<String,String> and will be for all time, then there's less point to using Avro. But if fields are typed it might be nice to record the types in a schema. I think that is a reasonable statement. Also apart from types, we would like to have the nested records not just the key-values. (counter info etc.). So Avro looks good fit to me. We could, for example, make the first line of log files the schema, or write a side file, but there's not much point to Avro data without storing a schema. I think side file would be better as it won't bloat each file with the same info. We can have a union schema comprising of all history events. Perhaps the first line could just be the hadoop version no. as the schema file would be corresponding to it.
          Hide
          Jothi Padmanabhan added a comment -

          Is the implicit schema proposed here Map<String,String>?

          No, I think it would just be more efficient to preserve the field types.

          think side file would be better as it won't bloat each file with the same info

          +1. Since we would have one history file per job and we might need to store the schema for several events, I think it just makes sense to have a separate file instead of having them duplicated in each file. The downside is we might have to store a version number as a header in each history file and read that out of band before deciding on which version of schema to choose.

          Show
          Jothi Padmanabhan added a comment - Is the implicit schema proposed here Map<String,String>? No, I think it would just be more efficient to preserve the field types. think side file would be better as it won't bloat each file with the same info +1. Since we would have one history file per job and we might need to store the schema for several events, I think it just makes sense to have a separate file instead of having them duplicated in each file. The downside is we might have to store a version number as a header in each history file and read that out of band before deciding on which version of schema to choose.
          Hide
          eric baldeschwieler added a comment -

          I think using AVRO is interesting down the road. It seems too close to release and to early in AVROs life to do this now.

          Can we move forward with this as planned and then file another bug for the AVRO conversion. I think that will take some more discussion.

          A union schema seems kind of heavy. What we would want here is a schema for each event type, so a parser could throw off a sequence of objects.
          Is a sequence of AVRO records of mixed types something that is easy to express in avro? One could clearly do it by having a sequence
          of "<type> : avro record\n" lines.

          Show
          eric baldeschwieler added a comment - I think using AVRO is interesting down the road. It seems too close to release and to early in AVROs life to do this now. Can we move forward with this as planned and then file another bug for the AVRO conversion. I think that will take some more discussion. A union schema seems kind of heavy. What we would want here is a schema for each event type, so a parser could throw off a sequence of objects. Is a sequence of AVRO records of mixed types something that is easy to express in avro? One could clearly do it by having a sequence of "<type> : avro record\n" lines.
          Hide
          Jothi Padmanabhan added a comment -

          Regarding the interface for readers, we could support two kinds of users:

          1. Users who want fine grained control and would handle the individual events themselves.
          2. Users who want a much more granular, summary kind of information.

          For users of type 1, who want finer grained information, they could use Event Readers to iterate through events and do the necessary processing

          For users of type 2, we could provide more granular information through a JobHistoryParser class. This class would internally build the Job-Task-Attempt hierarchy/information by consuming all events using a event reader and make the summary information available for users to access. Users could do some thing like

          
          parser.init(history file or stream)
          
          JobInfo jobInfo = parser.getJobInfo();
          
          // use the getters to get jobinfo (example: start time, finish time, counters, id, user name, conf, total maps, total reds, among others)
          
          List<TaskInfo> taskInfoList = jobInfo.getAllTasks();
          
          // Iterate through the list and do necessary processing. Getters for taskinfo would include taskid, task type, status, splits, counters, etc
          
          List<TaskAttemptInfo> attemptsList = taskinfo.getAllAttempts();
          
          // Attempt info would have getters for attempt id, errors, status, state, start time, finish time, tracker name, port etc.
          
          

          Comments/Suggestions/Thoughts?

          Show
          Jothi Padmanabhan added a comment - Regarding the interface for readers, we could support two kinds of users: Users who want fine grained control and would handle the individual events themselves. Users who want a much more granular, summary kind of information. For users of type 1, who want finer grained information, they could use Event Readers to iterate through events and do the necessary processing For users of type 2, we could provide more granular information through a JobHistoryParser class. This class would internally build the Job-Task-Attempt hierarchy/information by consuming all events using a event reader and make the summary information available for users to access. Users could do some thing like parser.init(history file or stream) JobInfo jobInfo = parser.getJobInfo(); // use the getters to get jobinfo (example: start time, finish time, counters, id, user name, conf, total maps, total reds, among others) List<TaskInfo> taskInfoList = jobInfo.getAllTasks(); // Iterate through the list and do necessary processing. Getters for taskinfo would include taskid, task type, status, splits, counters, etc List<TaskAttemptInfo> attemptsList = taskinfo.getAllAttempts(); // Attempt info would have getters for attempt id, errors, status, state, start time, finish time, tracker name, port etc. Comments/Suggestions/Thoughts?
          Hide
          Doug Cutting added a comment -

          Jothi, if you have an early version of this patch, please post it. That way we can better evaluate converting it to use Avro. Thanks!

          Show
          Doug Cutting added a comment - Jothi, if you have an early version of this patch, please post it. That way we can better evaluate converting it to use Avro. Thanks!
          Hide
          eric baldeschwieler added a comment -

          Re: AVRO conversion

          Doug and I chatted. My concern is that we are working through a job history refactor with a bunch of moving parts. I want to get those all into 21 and stable. Until that is done I don't want to consider AVRO since it might put that rework at risk (since our team has already invested in JSON). That said, I'm not against binary AVRO here. It could have advantages. If someone else can put in the time to demonstrate that this will work, I think that might be a better approach. I just hope we can do that as a distinct patch that follows this one.

          (Or collaborate to make one patch, we just don't have the resources before 21 freeze)

          (For the wider context of the refactor, see MAPREDUCE-863)

          Show
          eric baldeschwieler added a comment - Re: AVRO conversion Doug and I chatted. My concern is that we are working through a job history refactor with a bunch of moving parts. I want to get those all into 21 and stable. Until that is done I don't want to consider AVRO since it might put that rework at risk (since our team has already invested in JSON). That said, I'm not against binary AVRO here. It could have advantages. If someone else can put in the time to demonstrate that this will work, I think that might be a better approach. I just hope we can do that as a distinct patch that follows this one. (Or collaborate to make one patch, we just don't have the resources before 21 freeze) (For the wider context of the refactor, see MAPREDUCE-863 )
          Hide
          Jothi Padmanabhan added a comment -

          Preliminary patch as requested by Doug. I got this to produce a history file in Json format (but only after commenting lots of code), so more useful for illustration than otherwise.

          My guess is that this should be fairly straight forward to port this to use Avro.

          Show
          Jothi Padmanabhan added a comment - Preliminary patch as requested by Doug. I got this to produce a history file in Json format (but only after commenting lots of code), so more useful for illustration than otherwise. My guess is that this should be fairly straight forward to port this to use Avro.
          Hide
          Doug Cutting added a comment -

          Here's a patch that illustrates how Avro can be used to generate the classes equivalent to those in your patch.

          To write them to an output stream, you'd use something like:

          Encoder encoder = new BinaryEncoder(outputStream);
          DatumWriter writer = new SpecificDatumWriter(Events.Event._SCHEMA);
          ...
          Events.JobSubmitted submitted = new Events.JobSubmitted();
          submitted.kind = Events.Kind.SUBMITTED;
          submitted.jobId = ... ;
          submitted.jobName = ...;
          ...
          writer.write(submitted, encoder);
          ...
          writer.close();
          

          To read them, use something like:

          Decoder decoder = new BinaryDecoder(inputStream);
          DatumReader reader = new SpecificDatumReader(Events.Event._SCHEMA);
          ...
          Events.Event event =  reader.read(decoder);
          switch (event.kind) {
          JOB_SUBMITTED : 
             Events.JobSubmitted submission = (Events.JobSubmitted)event.event;
            ...
            break;
          ...
          }
          
          Show
          Doug Cutting added a comment - Here's a patch that illustrates how Avro can be used to generate the classes equivalent to those in your patch. To write them to an output stream, you'd use something like: Encoder encoder = new BinaryEncoder(outputStream); DatumWriter writer = new SpecificDatumWriter(Events.Event._SCHEMA); ... Events.JobSubmitted submitted = new Events.JobSubmitted(); submitted.kind = Events.Kind.SUBMITTED; submitted.jobId = ... ; submitted.jobName = ...; ... writer.write(submitted, encoder); ... writer.close(); To read them, use something like: Decoder decoder = new BinaryDecoder(inputStream); DatumReader reader = new SpecificDatumReader(Events.Event._SCHEMA); ... Events.Event event = reader.read(decoder); switch (event.kind) { JOB_SUBMITTED : Events.JobSubmitted submission = (Events.JobSubmitted)event.event; ... break ; ... }
          Hide
          Doug Cutting added a comment -

          A hint, run 'ant schemata' then look at build/src/org/apache/hadoop/mapreduce/jobhistory/Events.java to see the Java API that's generated.

          Show
          Doug Cutting added a comment - A hint, run 'ant schemata' then look at build/src/org/apache/hadoop/mapreduce/jobhistory/Events.java to see the Java API that's generated.
          Hide
          Jothi Padmanabhan added a comment -

          Patch for changing History to use JSON format. Some notes about the patch:

          All history information are logged using events.
          A version event is prepended to all history files.
          History viewer and History Parser have been cleaned up and duplication of code in the jsp files and HistoryViewer has been removed.
          History files are named JobID_username. Filters on the UI page will now be based only on JobID and User name
          History Viewer now takes a history file as an argument instead of output directory
          All events are made up of new API objects, including counters. As a result I had to open up a couple of constructors in Counters to public.

          Hadoop-Vaidya has been changed to use the new History Viewer, but has not been tested with it.
          A temporary fix has been put for Rumen to get it compiled, it still works only with the old history format and not the new one.

          Show
          Jothi Padmanabhan added a comment - Patch for changing History to use JSON format. Some notes about the patch: All history information are logged using events. A version event is prepended to all history files. History viewer and History Parser have been cleaned up and duplication of code in the jsp files and HistoryViewer has been removed. History files are named JobID_username. Filters on the UI page will now be based only on JobID and User name History Viewer now takes a history file as an argument instead of output directory All events are made up of new API objects, including counters. As a result I had to open up a couple of constructors in Counters to public. Hadoop-Vaidya has been changed to use the new History Viewer, but has not been tested with it. A temporary fix has been put for Rumen to get it compiled, it still works only with the old history format and not the new one.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12418622/mapred-157-4Sep.patch
          against trunk revision 811134.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 27 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/41/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/41/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/41/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418622/mapred-157-4Sep.patch against trunk revision 811134. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/41/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/41/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/41/console This message is automatically generated.
          Hide
          Jothi Padmanabhan added a comment -

          Patch that includes ivy.xml changes to the contrib modules so that they have access to the Jackson library
          Also modified TestCopyFiles so that it checks for the map count correctly

          Show
          Jothi Padmanabhan added a comment - Patch that includes ivy.xml changes to the contrib modules so that they have access to the Jackson library Also modified TestCopyFiles so that it checks for the map count correctly
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12418781/mapred-157-7Sep.patch
          against trunk revision 812002.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/9/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/9/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/9/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418781/mapred-157-7Sep.patch against trunk revision 812002. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/9/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/9/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/9/console This message is automatically generated.
          Hide
          Jothi Padmanabhan added a comment -

          Resubmitting the patch so that Hudson picks it up again – unable to explain why Hudson failed the patch

          Show
          Jothi Padmanabhan added a comment - Resubmitting the patch so that Hudson picks it up again – unable to explain why Hudson failed the patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12418781/mapred-157-7Sep.patch
          against trunk revision 812109.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/11/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/11/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/11/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/11/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418781/mapred-157-7Sep.patch against trunk revision 812109. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/11/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/11/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/11/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/11/console This message is automatically generated.
          Hide
          Jothi Padmanabhan added a comment -

          Now, sqoop's ivy.xml needs to be updated too!

          Show
          Jothi Padmanabhan added a comment - Now, sqoop's ivy.xml needs to be updated too!
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12418824/mapred-157-7Sep-v1.patch
          against trunk revision 812209.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418824/mapred-157-7Sep-v1.patch against trunk revision 812209. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/console This message is automatically generated.
          Hide
          Jothi Padmanabhan added a comment -

          Iyappan identified a few bugs. This patch fixes them.

          Show
          Jothi Padmanabhan added a comment - Iyappan identified a few bugs. This patch fixes them.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419138/mapred-157-10Sep.patch
          against trunk revision 813140.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/23/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/23/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/23/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/23/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419138/mapred-157-10Sep.patch against trunk revision 813140. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/23/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/23/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/23/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/23/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          Some comments:

          • Responsibilities of JobHistory and JobHistoryWriter are not clear. I think these should be merged in single class and name it say JobHistoryManager
          • Base interface HistoryEvent should be public
          • Getters in event classes should be public
          • HISTORY_VERSION should be 1.0 instead of '0.21'
          • JobHistory#initDone is swallowing the exception
          • Code for localizing of job conf should be moved out of history
          • JobHistory has two logEvents methods. That it confusing. Better would be to just have one logEvent method, and no state check in logEvents. JobInProgress will explicitly create/close job history writers
          • HistoryCleaner thread should have a check as lastRan != 0 and not lastRan == 0. Also it should use doneDirFs
          • Using:
            if (jobName == null || jobName.length() == 0) { jobName = "NA"; }

            is not a nice thing. This was perhaps done for some reason in the old format but in new one not required. we can have empty string in json

          • file name encoding not required. JobHistoryUtil class can go away
          • TaskInProgress and JobInProgress do a jobHistory != null check everywhere. This is not elegant.
          • test case is missed from TestJobHistory#testDoneFolderOnHDFS
          • EventReader -> better msg in readOneGroup - > throw new IOException("Internal error");
          • EventReader -> getHistoryEventType need not be static. It can use the instance parser handle
          • History Viewer command line has changed. It takes file name instead of output folder. JobClient#displayUsage and commands manual documentation should be corrected.
          • All public APIs including constructors must be documentated
          Show
          Sharad Agarwal added a comment - Some comments: Responsibilities of JobHistory and JobHistoryWriter are not clear. I think these should be merged in single class and name it say JobHistoryManager Base interface HistoryEvent should be public Getters in event classes should be public HISTORY_VERSION should be 1.0 instead of '0.21' JobHistory#initDone is swallowing the exception Code for localizing of job conf should be moved out of history JobHistory has two logEvents methods. That it confusing. Better would be to just have one logEvent method, and no state check in logEvents. JobInProgress will explicitly create/close job history writers HistoryCleaner thread should have a check as lastRan != 0 and not lastRan == 0. Also it should use doneDirFs Using: if (jobName == null || jobName.length() == 0) { jobName = "NA"; } is not a nice thing. This was perhaps done for some reason in the old format but in new one not required. we can have empty string in json file name encoding not required. JobHistoryUtil class can go away TaskInProgress and JobInProgress do a jobHistory != null check everywhere. This is not elegant. test case is missed from TestJobHistory#testDoneFolderOnHDFS EventReader -> better msg in readOneGroup - > throw new IOException("Internal error"); EventReader -> getHistoryEventType need not be static. It can use the instance parser handle History Viewer command line has changed. It takes file name instead of output folder. JobClient#displayUsage and commands manual documentation should be corrected. All public APIs including constructors must be documentated
          Hide
          Amar Kamat added a comment -

          Note : This jira should take care of MAPREDUCE-926 and MAPREDUCE-881.

          Show
          Amar Kamat added a comment - Note : This jira should take care of MAPREDUCE-926 and MAPREDUCE-881 .
          Hide
          Philip Zeyliger added a comment -

          Some comments:

          • EventWriter class needs javadoc.
          • You can use Java 1.5 enum fanciness (http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html) to associate the event types with their enums.
            So, instead of storing this map in a separate file, where it will get out of date if someone not intimately familiar with the code wishes to modify it, you store the mapping inside the Enum object itself. So, instead of:
                classMap = 
                  new HashMap<HistoryEvent.EventType, Class<? extends HistoryEvent>>();
                classMap.put(HistoryEvent.EventType.JOB_SUBMITTED,
                    JobSubmittedEvent.class);
                classMap.put(HistoryEvent.EventType.JOB_INITED,
                    JobInitedEvent.class);
                ...
            

            you might do something like this example here:

              enum Collection {    
                MAP(java.util.Map.class),
                LIST(java.util.List.class),
                QUEUE(java.util.Queue.class);
                
                Class klass;
                
                Collection(Class klass) {
                  this.klass = klass;
                }
                
                Class getKlass() {
                  return this.klass;
                }
                
                private static Map<Class, Collection> klasses = new HashMap();
            
                static {
                  for (Collection f : Collection.values()) {
                    klasses.put(f.getKlass(), f);
                  }
                }
                
                public static Collection getEnumForClass(Class klass) {
                  return klasses.get(klass);
                }
              }
            

            This puts the 1-1 mapping in the same place. It's impossible to add an enum value without specifying its associated value.

          • Honestly, the dozen or so *Event.java classes are begging to have a framework. Each of those classes merely specifies a schema (a set of fields, each with a type), and implements
            a simple struct. Code generation is a typical solution for this sort of problem. You could also specify very little in each class, and implement readFields and writeFields generically in each class.

          – Philip

          Show
          Philip Zeyliger added a comment - Some comments: EventWriter class needs javadoc. You can use Java 1.5 enum fanciness ( http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html ) to associate the event types with their enums. So, instead of storing this map in a separate file, where it will get out of date if someone not intimately familiar with the code wishes to modify it, you store the mapping inside the Enum object itself. So, instead of: classMap = new HashMap<HistoryEvent.EventType, Class <? extends HistoryEvent>>(); classMap.put(HistoryEvent.EventType.JOB_SUBMITTED, JobSubmittedEvent.class); classMap.put(HistoryEvent.EventType.JOB_INITED, JobInitedEvent.class); ... you might do something like this example here: enum Collection { MAP(java.util.Map.class), LIST(java.util.List.class), QUEUE(java.util.Queue.class); Class klass; Collection( Class klass) { this .klass = klass; } Class getKlass() { return this .klass; } private static Map< Class , Collection> klasses = new HashMap(); static { for (Collection f : Collection.values()) { klasses.put(f.getKlass(), f); } } public static Collection getEnumForClass( Class klass) { return klasses.get(klass); } } This puts the 1-1 mapping in the same place. It's impossible to add an enum value without specifying its associated value. Honestly, the dozen or so *Event.java classes are begging to have a framework. Each of those classes merely specifies a schema (a set of fields, each with a type), and implements a simple struct. Code generation is a typical solution for this sort of problem. You could also specify very little in each class, and implement readFields and writeFields generically in each class. – Philip
          Hide
          Doug Cutting added a comment -

          > Honestly, the dozen or so *Event.java classes are begging to have a framework.

          I am working to convert this patch to use Avro. I have a schema that generates code for all of the event classes. I am in the process of converting each of the event classes to be a wrapper around the generated class, providing constructors and accessor methods to the generated class. The serialization will switch to using Avro binary. If Avro 1.1 is released this week as expected, then this could be trivially changed to generate JSON instead.

          Show
          Doug Cutting added a comment - > Honestly, the dozen or so *Event.java classes are begging to have a framework. I am working to convert this patch to use Avro. I have a schema that generates code for all of the event classes. I am in the process of converting each of the event classes to be a wrapper around the generated class, providing constructors and accessor methods to the generated class. The serialization will switch to using Avro binary. If Avro 1.1 is released this week as expected, then this could be trivially changed to generate JSON instead.
          Hide
          Jothi Padmanabhan added a comment -

          Patch that incorporates most of the review comments, including the use of enum fanciness to associate types with their enums.

          Honestly, the dozen or so *Event.java classes are begging to have a framework.

          Yes, I agree, I realized it too. But given that we were moving to Avro which would do the automatic code generation, just left it simple for now, without trying to develop any other framework.

          I am working to convert this patch to use Avro

          Doug, thanks. If possible, could you use this patch as your base patch?

          Show
          Jothi Padmanabhan added a comment - Patch that incorporates most of the review comments, including the use of enum fanciness to associate types with their enums. Honestly, the dozen or so *Event.java classes are begging to have a framework. Yes, I agree, I realized it too. But given that we were moving to Avro which would do the automatic code generation, just left it simple for now, without trying to develop any other framework. I am working to convert this patch to use Avro Doug, thanks. If possible, could you use this patch as your base patch?
          Hide
          Jothi Padmanabhan added a comment -

          Attaching the correct patch

          Show
          Jothi Padmanabhan added a comment - Attaching the correct patch
          Hide
          Jothi Padmanabhan added a comment -

          I created MAPREDUCE-980 to track the Avro port

          Show
          Jothi Padmanabhan added a comment - I created MAPREDUCE-980 to track the Avro port
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419605/mapred-157-15Sep.patch
          against trunk revision 814749.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 42 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/31/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/31/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/31/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/31/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419605/mapred-157-15Sep.patch against trunk revision 814749. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 42 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/31/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/31/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/31/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/31/console This message is automatically generated.
          Hide
          Jothi Padmanabhan added a comment -

          Incorporated some offline comments from Sharad

          Show
          Jothi Padmanabhan added a comment - Incorporated some offline comments from Sharad
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419622/mapred-157-15Sep-v1.patch
          against trunk revision 815249.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 42 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/32/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/32/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/32/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/32/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419622/mapred-157-15Sep-v1.patch against trunk revision 815249. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 42 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/32/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/32/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/32/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/32/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          Any reason HistoryEvent interface doesn't have method: public EventCategory getEventCategory() ? Otherwise there is no point in defining the category enum in HistoryEvent interface, no ? Also I think getEventType() and enum EventCategory should be public as events objects are expected to be consumed by clients in different package. readFields and writeFields can stay non-public as these would always be used by only EventReader/EventWriter. These should be kept as non-public in implementations as well.
          Otherwise patch looks good to me.

          Show
          Sharad Agarwal added a comment - Any reason HistoryEvent interface doesn't have method: public EventCategory getEventCategory() ? Otherwise there is no point in defining the category enum in HistoryEvent interface, no ? Also I think getEventType() and enum EventCategory should be public as events objects are expected to be consumed by clients in different package. readFields and writeFields can stay non-public as these would always be used by only EventReader/EventWriter. These should be kept as non-public in implementations as well. Otherwise patch looks good to me.
          Hide
          Sharad Agarwal added a comment -

          oops. Realized HistoryEvent is an interface, not an abstract class. Pl ignore my last comment about making getEventType as public.

          Show
          Sharad Agarwal added a comment - oops. Realized HistoryEvent is an interface, not an abstract class. Pl ignore my last comment about making getEventType as public.
          Hide
          eric baldeschwieler added a comment -

          Doug, so your patch will store binary AVRO? If we can convert, we should probably convert all the way to native AVRO. That will be more tested / common than text AVRO and presumably require less storage. Presumably AVRO will have a full toolset for dumping / exploring files like this, so the binary format should not be a problem?

          Show
          eric baldeschwieler added a comment - Doug, so your patch will store binary AVRO? If we can convert, we should probably convert all the way to native AVRO. That will be more tested / common than text AVRO and presumably require less storage. Presumably AVRO will have a full toolset for dumping / exploring files like this, so the binary format should not be a problem?
          Hide
          eric baldeschwieler added a comment -

          PS is this going to make code freeze?

          Show
          eric baldeschwieler added a comment - PS is this going to make code freeze?
          Hide
          Hemanth Yamijala added a comment -

          Eric, we have done quite a bit of testing on Jothi's patch and also are quite close to commit. Given the code freeze date, I would request we stay the course as we decided for this JIRA and use MAPREDUCE-980 to track the Avro port.

          Show
          Hemanth Yamijala added a comment - Eric, we have done quite a bit of testing on Jothi's patch and also are quite close to commit. Given the code freeze date, I would request we stay the course as we decided for this JIRA and use MAPREDUCE-980 to track the Avro port.
          Hide
          eric baldeschwieler added a comment -

          Yup.


          E14 - via iPhone

          On Sep 15, 2009, at 10:20 AM, "Hemanth Yamijala (JIRA)"

          Show
          eric baldeschwieler added a comment - Yup. — E14 - via iPhone On Sep 15, 2009, at 10:20 AM, "Hemanth Yamijala (JIRA)"
          Hide
          Doug Cutting added a comment -

          I will move the Avro port of this to MAPREDUCE-980. I do not want to impede progress here.

          > Doug, so your patch will store binary AVRO?

          Yes. Avro 1.0 (the current release) only supports binary.

          Avro 1.1 (out today if it can get one more PMC +1, please) adds JSON i/o.
          Printing the log as JSON will be easy if I can upgrade Hadoop to Avro 1.1 in MAPREDUCE-980.

          Show
          Doug Cutting added a comment - I will move the Avro port of this to MAPREDUCE-980 . I do not want to impede progress here. > Doug, so your patch will store binary AVRO? Yes. Avro 1.0 (the current release) only supports binary. Avro 1.1 (out today if it can get one more PMC +1, please) adds JSON i/o. Printing the log as JSON will be easy if I can upgrade Hadoop to Avro 1.1 in MAPREDUCE-980 .
          Hide
          Jothi Padmanabhan added a comment -

          Patch updated to the trunk.
          Removed the version event; version information is logged in EventWriter constructor and read in EventReader ctor
          Cleaned up the history cleaner time-to-run logic

          Show
          Jothi Padmanabhan added a comment - Patch updated to the trunk. Removed the version event; version information is logged in EventWriter constructor and read in EventReader ctor Cleaned up the history cleaner time-to-run logic
          Hide
          Iyappan Srinivasan added a comment -

          +1 for QA

          1) Tried multiple restarts. It resubmits properly and links work correct.
          2) Looked at the logs of TT to see if the numbers showed at web page are really correct.
          3) Checked all the links including the "Analyse" logs page to see if there is a mistake.
          4) Got more than 100 jobs in history page to see if pagination works
          5) searched based on user and based on job to see if it works.
          6) Tried deleting some job.xml and job files in done folder to see how jobhistory parser reacts.
          7) Tried corrupting some job fles and then restarting JT. They are blocked from viewing.
          8) Tried killing some jobs, killing some attempt ids, using kill -9 to see if it is getting handled properly.
          9) Tried killing some jobs, killing some attempt ids, using bin/hadoop job -kill to see if it is getting handled properly.
          10) When mapreduce.cluster.jobhistory.maxage -> set to 10*60*60 (10 mins),then the job files are getting cleaned up after 10 minutes.
          11) Run some jobs and do a JT restart. After restart, visit those jobs back in jobs history and see if they exist.
          12)Set hadoop.job.history.location to a unreachable location and see if the JT does not come up, throwing a proper
          error.

          Show
          Iyappan Srinivasan added a comment - +1 for QA 1) Tried multiple restarts. It resubmits properly and links work correct. 2) Looked at the logs of TT to see if the numbers showed at web page are really correct. 3) Checked all the links including the "Analyse" logs page to see if there is a mistake. 4) Got more than 100 jobs in history page to see if pagination works 5) searched based on user and based on job to see if it works. 6) Tried deleting some job.xml and job files in done folder to see how jobhistory parser reacts. 7) Tried corrupting some job fles and then restarting JT. They are blocked from viewing. 8) Tried killing some jobs, killing some attempt ids, using kill -9 to see if it is getting handled properly. 9) Tried killing some jobs, killing some attempt ids, using bin/hadoop job -kill to see if it is getting handled properly. 10) When mapreduce.cluster.jobhistory.maxage -> set to 10*60*60 (10 mins),then the job files are getting cleaned up after 10 minutes. 11) Run some jobs and do a JT restart. After restart, visit those jobs back in jobs history and see if they exist. 12)Set hadoop.job.history.location to a unreachable location and see if the JT does not come up, throwing a proper error.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419758/mapred-157-16Sep.patch
          against trunk revision 815628.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 42 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/37/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/37/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/37/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/37/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419758/mapred-157-16Sep.patch against trunk revision 815628. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 42 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/37/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/37/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/37/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/37/console This message is automatically generated.
          Hide
          Jothi Padmanabhan added a comment -

          retrying hudson

          Show
          Jothi Padmanabhan added a comment - retrying hudson
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419758/mapred-157-16Sep.patch
          against trunk revision 815628.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 42 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/89/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/89/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/89/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/89/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419758/mapred-157-16Sep.patch against trunk revision 815628. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 42 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/89/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/89/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/89/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/89/console This message is automatically generated.
          Hide
          Jothi Padmanabhan added a comment -

          Missed out ivy.xml changes to include jackson jars in the newly added contrib/gridmix

          Show
          Jothi Padmanabhan added a comment - Missed out ivy.xml changes to include jackson jars in the newly added contrib/gridmix
          Hide
          Sharad Agarwal added a comment -

          +1. Once Hudson passes this can be committed.

          Show
          Sharad Agarwal added a comment - +1. Once Hudson passes this can be committed.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419790/mapred-157-16Sep-v1.patch
          against trunk revision 815628.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 45 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/90/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/90/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/90/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/90/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419790/mapred-157-16Sep-v1.patch against trunk revision 815628. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 45 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/90/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/90/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/90/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/90/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          I just committed this. Thanks Jothi.

          Show
          Sharad Agarwal added a comment - I just committed this. Thanks Jothi.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #43 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/43/)
          . Refactor job history APIs and change the history format to JSON. Contributed by Jothi Padmanabhan.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #43 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/43/ ) . Refactor job history APIs and change the history format to JSON. Contributed by Jothi Padmanabhan.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #85 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/85/)
          . Refactor job history APIs and change the history format to JSON. Contributed by Jothi Padmanabhan.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #85 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/85/ ) . Refactor job history APIs and change the history format to JSON. Contributed by Jothi Padmanabhan.

            People

            • Assignee:
              Jothi Padmanabhan
              Reporter:
              Owen O'Malley
            • Votes:
              1 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development