Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: tools/rumen
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Add forrest documentation to Rumen tool.

      1. rumen.pdf
        27 kB
        Amar Kamat
      2. mapreduce-1918-v1.3.patch
        18 kB
        Amar Kamat
      3. rumen.pdf
        28 kB
        Amar Kamat
      4. mapreduce-1918-v1.4.patch
        19 kB
        Amar Kamat
      5. mapreduce-1918-v1.7.patch
        43 kB
        Amar Kamat
      6. mapreduce-1918-v1.8.patch
        41 kB
        Amar Kamat
      7. mapreduce-1918-v1.10.patch
        42 kB
        Amar Kamat

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )
        Hide
        Amareshwari Sriramadasu added a comment -

        I just committed this. Thanks Amar !

        Show
        Amareshwari Sriramadasu added a comment - I just committed this. Thanks Amar !
        Hide
        Amar Kamat added a comment -

        Attaching a new patch that fixes the javadoc warning. TaskAttemptInfo.java wasn't modified in the earlier patch but still resulted into a javadoc warning (not caught by test-patch).

        Show
        Amar Kamat added a comment - Attaching a new patch that fixes the javadoc warning. TaskAttemptInfo.java wasn't modified in the earlier patch but still resulted into a javadoc warning (not caught by test-patch ).
        Hide
        Amareshwari Sriramadasu added a comment -

        There is a javadoc warning with the patch. Can you fix it?

          [javadoc] /home/amarsri/workspace/mapreduce/src/tools/org/apache/hadoop/tools/rumen/TaskAttemptInfo.java:45: warning - Tag @link: reference not found: TaskStatus.State
        
        Show
        Amareshwari Sriramadasu added a comment - There is a javadoc warning with the patch. Can you fix it? [javadoc] /home/amarsri/workspace/mapreduce/src/tools/org/apache/hadoop/tools/rumen/TaskAttemptInfo.java:45: warning - Tag @link: reference not found: TaskStatus.State
        Hide
        Amar Kamat added a comment -

        Attaching a new patch incorporating Hong's comments.

        Show
        Amar Kamat added a comment - Attaching a new patch incorporating Hong's comments.
        Hide
        Hong Tang added a comment -

        A few minor nits:

        • "Incase" => "in case"
        • For TraceBuilder, does it descend recursively into the input foloder, or do we need to specify the immediate parent directory that contains the files?
        • Can we add a bit more details on "demuxer"? How about the following?

          Demuxer decides how the input file maps to jobhistory file(s). [insert]Job history logs and job conf files are typically small files, and can be more effectively stored if we embed them in some container file format like SequenceFile or TFile. To support such usage cases, one can specify a customized Demuxer class that can extract individual job history logs and job conf files from source files. [/insert]

        • There is no need to do canParse() check if you know which parser to use (hence no need to use ris). The parser will (or should) simply abort if the source is not of the expected version.
        • VersionDetector seems rather internal, getParser() is probably what users should care about.
        Show
        Hong Tang added a comment - A few minor nits: "Incase" => "in case" For TraceBuilder, does it descend recursively into the input foloder, or do we need to specify the immediate parent directory that contains the files? Can we add a bit more details on "demuxer"? How about the following? Demuxer decides how the input file maps to jobhistory file(s). [insert] Job history logs and job conf files are typically small files, and can be more effectively stored if we embed them in some container file format like SequenceFile or TFile. To support such usage cases, one can specify a customized Demuxer class that can extract individual job history logs and job conf files from source files. [/insert] There is no need to do canParse() check if you know which parser to use (hence no need to use ris). The parser will (or should) simply abort if the source is not of the expected version. VersionDetector seems rather internal, getParser() is probably what users should care about.
        Hide
        Amar Kamat added a comment -

        Attaching a patch that adds user and API documentation to Rumen. test-patch passed

        Show
        Amar Kamat added a comment - Attaching a patch that adds user and API documentation to Rumen. test-patch passed
        Hide
        Ranjit Mathew added a comment -

        I would suggest keeping the API information in the package-level JavaDoc documentation and the user-guide information in the document being worked upon using this ticket.
        A user looking to run Rumen, to feed its output to GridMix3 for example, would look at the Forrest documentation, while a developer looking to integrate directly or indirectly with Rumen will look at the JavaDoc documentation. We should definitely not mirror the information in both the places as it would add to the maintenance burden and will lead to stale documentation.

        Show
        Ranjit Mathew added a comment - I would suggest keeping the API information in the package-level JavaDoc documentation and the user-guide information in the document being worked upon using this ticket. A user looking to run Rumen, to feed its output to GridMix3 for example, would look at the Forrest documentation, while a developer looking to integrate directly or indirectly with Rumen will look at the JavaDoc documentation. We should definitely not mirror the information in both the places as it would add to the maintenance burden and will lead to stale documentation.
        Hide
        Hong Tang added a comment -

        I think we should also describe (1) the Json objects are created through Jackson ObjectMapper from LoggedXXX classes; (2) the API interface how to build LoggedXXX objects, and how to read them.

        The basic API flow for creating parsed rumen object is as follows (user's responsibility of creating input streams from job conf xml and job history logs):

        • JobConfigurationParser: parser that parses job conf xml. One instance can be reused to parse many job conf xml files.
          	JobConfigurationParser jcp = new JobConfigurationParser(interestedProperties); // interestedProperties is a a list of keys to be extracted from the job conf xml file.
          	Properties parsedProperties = jcp.parse(inputStream); // inputStream is the file input stream for the job conf xml file.
          
        • JobHistoryParser: parser that parses job history files. It is an interface and actual implementations are defined as enums in JobHistoryParserFactory. One can directly use the version matching the the version of job history logs. Or she can also use method "canParse()" to detect which parser is suitable for parsing the job history logs (following the pattern in TraceBuilder). Create one instance to parse a job history log and close it after use.
          	JobHistoryParser parser = new Hadoop20JHParser(inputStream); // inputStream is the file input stream for the job history file.
          	// JobHistoryParser APIs will be used later when being fed into JobBuilder (below).
          	parser.close();
          
        • JobBuilder: builder for LoggedJobs. Create one instance to parse the pairing job history log and job conf. The order of parsing conf file or job history file is not important.
          	JobBuilder jb = new JobBuilder(jobID); // you will need to extract the job ID from the file name: <jobtracker>_job_<timestamp>_<sequence>
          	jb.process(jcp.parse(jobConfInputStream));
          	JobHistoryParser parser = new Hadoop20JHParser(jobHistoryInputStream);
          	try {
          		HistoryEvent e;
          		while ((e = parser.nextEvent()) != null) {
          			jobBuilder.process(e);
          		}
          	} finally {
          		parser.close();
          	}
          	LoggedJob job = jb.build();
          

        From the reading side, the output produced by TraceBuilder or Folder can be read through JobTraceReader or ClusterTopologyReader. One can also use Jackson's ObjectMapper to parse the json formatted data into LoggedJob or LoggedTopology objects.

        Show
        Hong Tang added a comment - I think we should also describe (1) the Json objects are created through Jackson ObjectMapper from LoggedXXX classes; (2) the API interface how to build LoggedXXX objects, and how to read them. The basic API flow for creating parsed rumen object is as follows (user's responsibility of creating input streams from job conf xml and job history logs): JobConfigurationParser: parser that parses job conf xml. One instance can be reused to parse many job conf xml files. JobConfigurationParser jcp = new JobConfigurationParser(interestedProperties); // interestedProperties is a a list of keys to be extracted from the job conf xml file. Properties parsedProperties = jcp.parse(inputStream); // inputStream is the file input stream for the job conf xml file. JobHistoryParser: parser that parses job history files. It is an interface and actual implementations are defined as enums in JobHistoryParserFactory. One can directly use the version matching the the version of job history logs. Or she can also use method "canParse()" to detect which parser is suitable for parsing the job history logs (following the pattern in TraceBuilder). Create one instance to parse a job history log and close it after use. JobHistoryParser parser = new Hadoop20JHParser(inputStream); // inputStream is the file input stream for the job history file. // JobHistoryParser APIs will be used later when being fed into JobBuilder (below). parser.close(); JobBuilder: builder for LoggedJobs. Create one instance to parse the pairing job history log and job conf. The order of parsing conf file or job history file is not important. JobBuilder jb = new JobBuilder(jobID); // you will need to extract the job ID from the file name: <jobtracker>_job_<timestamp>_<sequence> jb.process(jcp.parse(jobConfInputStream)); JobHistoryParser parser = new Hadoop20JHParser(jobHistoryInputStream); try { HistoryEvent e; while ((e = parser.nextEvent()) != null ) { jobBuilder.process(e); } } finally { parser.close(); } LoggedJob job = jb.build(); From the reading side, the output produced by TraceBuilder or Folder can be read through JobTraceReader or ClusterTopologyReader. One can also use Jackson's ObjectMapper to parse the json formatted data into LoggedJob or LoggedTopology objects.
        Hide
        Amar Kamat added a comment -

        Attaching a patch for the same. test-patch passed on my box.

        Show
        Amar Kamat added a comment - Attaching a patch for the same. test-patch passed on my box.
        Hide
        Amar Kamat added a comment -

        Attaching a modified document incorporating changes from Dick.

        Show
        Amar Kamat added a comment - Attaching a modified document incorporating changes from Dick.
        Hide
        Amar Kamat added a comment -

        Attaching a patch for review. test-patch passed on my box.

        Show
        Amar Kamat added a comment - Attaching a patch for review. test-patch passed on my box.

          People

          • Assignee:
            Amar Kamat
            Reporter:
            Amar Kamat
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development