Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6415

Create a tool to combine aggregated logs into HAR files

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details.

      See YARN-2942 for more context.

      1. HAR-ableAggregatedLogs_v1.pdf
        111 kB
        Robert Kanter
      2. MAPREDUCE-6415_branch-2_prelim_001.patch
        25 kB
        Robert Kanter
      3. MAPREDUCE-6415_branch-2_prelim_002.patch
        32 kB
        Robert Kanter
      4. MAPREDUCE-6415_branch-2.001.patch
        48 kB
        Robert Kanter
      5. MAPREDUCE-6415_branch-2.002.patch
        50 kB
        Robert Kanter
      6. MAPREDUCE-6415_branch-2.003.patch
        50 kB
        Robert Kanter
      7. MAPREDUCE-6415_prelim_001.patch
        25 kB
        Robert Kanter
      8. MAPREDUCE-6415_prelim_002.patch
        32 kB
        Robert Kanter
      9. MAPREDUCE-6415.001.patch
        48 kB
        Robert Kanter
      10. MAPREDUCE-6415.002.patch
        50 kB
        Robert Kanter
      11. MAPREDUCE-6415.002.patch
        50 kB
        Robert Kanter
      12. MAPREDUCE-6415.003.patch
        49 kB
        Robert Kanter

        Issue Links

          Activity

          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Robert Kanter I gone through attached proposal,same we are going to implement as part of MAPREDUCE-6283.. Can you please have look at this jira and let me know if you have some suggestions..

          Show
          brahmareddy Brahma Reddy Battula added a comment - Robert Kanter I gone through attached proposal,same we are going to implement as part of MAPREDUCE-6283 .. Can you please have look at this jira and let me know if you have some suggestions..
          Hide
          varun_saxena Varun Saxena added a comment -

          Robert Kanter, thats correct. We actually have a private implementation which combines aggregated files into HAR files. It runs as a service in JHS and combines aggregated logs periodically. From Vinod Kumar Vavilapalli's comment it seemed that the community does not need it(because of YARN-2942) hence did not push it.

          Show
          varun_saxena Varun Saxena added a comment - Robert Kanter , thats correct. We actually have a private implementation which combines aggregated files into HAR files. It runs as a service in JHS and combines aggregated logs periodically. From Vinod Kumar Vavilapalli 's comment it seemed that the community does not need it(because of YARN-2942 ) hence did not push it.
          Hide
          rkanter Robert Kanter added a comment -

          Yahoo! also has their own private implementation. So, it seems like there's a need for something like this, and it would be great if everyone could use and contribute to the same version of it.

          YARN-2942 is being put on hold for the moment because of concerns about HDFS-3689. We can still do it eventually.

          As for MAPREDUCE-6283, I agree with Vinod's comment there that it seems to be a duplicate of YARN-2942 for the logs part, and a duplicate of the ATSv2 work for the jhist part.

          Show
          rkanter Robert Kanter added a comment - Yahoo! also has their own private implementation. So, it seems like there's a need for something like this, and it would be great if everyone could use and contribute to the same version of it. YARN-2942 is being put on hold for the moment because of concerns about HDFS-3689 . We can still do it eventually. As for MAPREDUCE-6283 , I agree with Vinod's comment there that it seems to be a duplicate of YARN-2942 for the logs part, and a duplicate of the ATSv2 work for the jhist part.
          Hide
          rkanter Robert Kanter added a comment -

          I've uploaded a preliminary patch. It adds a command that will look for eligible apps to process, generate a script that will run the 'hadoop archive' command, and runs the script in the distributed shell. It also modifies the 'yarn logs' command and JHS to be able to read the har files. All as described in the design document.

          I still have to write some unit tests and split up the patch into MAPREDUCE and YARN (and HADOOP?) JIRAs.

          We can also discuss if we have the right criteria for eligibility. I implemented the ones mentioned in the design document, but it shouldn't be too hard to change them.

          Here's the CLI usage:

          >> bin/mapred archive-logs -help
          usage: yarn archive-logs
           -help                       Prints this message
           -maxEligibleApps <n>        The maximum number of eligible apps to
                                       process (default: -1 (all))
           -maxTotalLogsSize <bytes>   The maximum total logs size required to be
                                       eligible (default: 1GB)
           -memory <megabytes>         The amount of memory for each container
                                       (default: 1024)
           -minNumberLogFiles <n>      The minimum number of log files required to
                                       be eligible (default: 20)
          

          I know it's a bit hard to tell from the Java code what the shell script looks like, so here's an example of one:

          #!/bin/bash
          set -e
          set -x
          CONTAINER_ID_NUM=`echo $CONTAINER_ID | cut -d "_" -f 5`
          if [ "$CONTAINER_ID_NUM" == "000002" ]; then
                  appId="application_1437514991365_0004"
                  user="rkanter"
          elif [ "$CONTAINER_ID_NUM" == "000003" ]; then
                  appId="application_1437514991365_0005"
                  user="rkanter"
          elif [ "$CONTAINER_ID_NUM" == "000004" ]; then
                  appId="application_1437514991365_0003"
                  user="rkanter"
          elif [ "$CONTAINER_ID_NUM" == "000005" ]; then
                  appId="application_1437514991365_0007"
                  user="rkanter"
          elif [ "$CONTAINER_ID_NUM" == "000006" ]; then
                  appId="application_1437514991365_0006"
                  user="rkanter"
          else
                  echo "Unknown Mapping!"
                  exit -1
          fi
          export HADOOP_CLIENT_OPTS="-Xmx1024m"
          $HADOOP_HOME/bin/hadoop archive -Dmapreduce.framework.name=local -archiveName $appId.har -p /tmp/logs/$user/logs/$appId \* /tmp/logs/archive-logs-work
          $HADOOP_HOME/bin/hadoop fs -mv /tmp/logs/archive-logs-work/$appId.har /tmp/logs/$user/logs/$appId/$appId.har
          originalLogs=`$HADOOP_HOME/bin/hadoop fs -ls /tmp/logs/$user/logs/$appId | grep "^-" | awk '{print $8}'`
          if [ ! -z "$originalLogs" ]; then
                  $HADOOP_HOME/bin/hadoop fs -rm $originalLogs
          fi
          
          Show
          rkanter Robert Kanter added a comment - I've uploaded a preliminary patch. It adds a command that will look for eligible apps to process, generate a script that will run the 'hadoop archive' command, and runs the script in the distributed shell. It also modifies the 'yarn logs' command and JHS to be able to read the har files. All as described in the design document. I still have to write some unit tests and split up the patch into MAPREDUCE and YARN (and HADOOP?) JIRAs. We can also discuss if we have the right criteria for eligibility. I implemented the ones mentioned in the design document, but it shouldn't be too hard to change them. Here's the CLI usage: >> bin/mapred archive-logs -help usage: yarn archive-logs -help Prints this message -maxEligibleApps <n> The maximum number of eligible apps to process (default: -1 (all)) -maxTotalLogsSize <bytes> The maximum total logs size required to be eligible (default: 1GB) -memory <megabytes> The amount of memory for each container (default: 1024) -minNumberLogFiles <n> The minimum number of log files required to be eligible (default: 20) I know it's a bit hard to tell from the Java code what the shell script looks like, so here's an example of one: #!/bin/bash set -e set -x CONTAINER_ID_NUM=`echo $CONTAINER_ID | cut -d "_" -f 5` if [ "$CONTAINER_ID_NUM" == "000002" ]; then appId= "application_1437514991365_0004" user= "rkanter" elif [ "$CONTAINER_ID_NUM" == "000003" ]; then appId= "application_1437514991365_0005" user= "rkanter" elif [ "$CONTAINER_ID_NUM" == "000004" ]; then appId= "application_1437514991365_0003" user= "rkanter" elif [ "$CONTAINER_ID_NUM" == "000005" ]; then appId= "application_1437514991365_0007" user= "rkanter" elif [ "$CONTAINER_ID_NUM" == "000006" ]; then appId= "application_1437514991365_0006" user= "rkanter" else echo "Unknown Mapping!" exit -1 fi export HADOOP_CLIENT_OPTS= "-Xmx1024m" $HADOOP_HOME/bin/hadoop archive -Dmapreduce.framework.name=local -archiveName $appId.har -p /tmp/logs/$user/logs/$appId \* /tmp/logs/archive-logs-work $HADOOP_HOME/bin/hadoop fs -mv /tmp/logs/archive-logs-work/$appId.har /tmp/logs/$user/logs/$appId/$appId.har originalLogs=`$HADOOP_HOME/bin/hadoop fs -ls /tmp/logs/$user/logs/$appId | grep "^-" | awk '{print $8}'` if [ ! -z "$originalLogs" ]; then $HADOOP_HOME/bin/hadoop fs -rm $originalLogs fi
          Hide
          jlowe Jason Lowe added a comment -

          Note that container IDs are not guaranteed to be consecutive nor are they guaranteed to start at 1 for the AM. Due to how reservations are processed and other race conditions, a container ID may not actually correspond to a physically launched container. For example, on our busy clusters it is not rare for the AM container to have an ID greater than 000001. So the danger here is that if the RM ends up skipping one or more container IDs when handing out containers to the application then we will skip one or more applications to aggregate. We'll get another crack at it on the next pass, but again on a busy cluster we could fairly consistently fail to hit a number of them and we could have indefinite postponement on the aggregation of some applications (especially the first few in the list).

          A more robust approach would be to have the distributed shell explicitly set something in the container's environment that is a sequence number from the distributed shell's point of view. In other words, regardless of what container ID is allocated, the distributed shell can set a monotonically increasing number in each new container's env that the script can leverage to do instance-specific behavior. This is akin to the task ID in MapReduce which again is disconnected from YARN's container ID.

          Show
          jlowe Jason Lowe added a comment - Note that container IDs are not guaranteed to be consecutive nor are they guaranteed to start at 1 for the AM. Due to how reservations are processed and other race conditions, a container ID may not actually correspond to a physically launched container. For example, on our busy clusters it is not rare for the AM container to have an ID greater than 000001. So the danger here is that if the RM ends up skipping one or more container IDs when handing out containers to the application then we will skip one or more applications to aggregate. We'll get another crack at it on the next pass, but again on a busy cluster we could fairly consistently fail to hit a number of them and we could have indefinite postponement on the aggregation of some applications (especially the first few in the list). A more robust approach would be to have the distributed shell explicitly set something in the container's environment that is a sequence number from the distributed shell's point of view. In other words, regardless of what container ID is allocated, the distributed shell can set a monotonically increasing number in each new container's env that the script can leverage to do instance-specific behavior. This is akin to the task ID in MapReduce which again is disconnected from YARN's container ID.
          Hide
          rkanter Robert Kanter added a comment -

          I didn't realize that that can happen. In that case, having a monotonically increasing number in each container's env independent of the CONTAINER_ID sounds like a good solution. Plus, I won't have to do any parsing to get the unique number.

          I'll double check, but I think each shell has the same env (other than the CONTAINER_ID), and there's no way to set different ones per shell. If that's the case, it should be fairly easy to add a "SHELL_ID" env var to the DistributedShell AM that behaves how we want, as a separate JIRA.

          Show
          rkanter Robert Kanter added a comment - I didn't realize that that can happen. In that case, having a monotonically increasing number in each container's env independent of the CONTAINER_ID sounds like a good solution. Plus, I won't have to do any parsing to get the unique number. I'll double check, but I think each shell has the same env (other than the CONTAINER_ID), and there's no way to set different ones per shell. If that's the case, it should be fairly easy to add a "SHELL_ID" env var to the DistributedShell AM that behaves how we want, as a separate JIRA.
          Hide
          rkanter Robert Kanter added a comment -

          I've created YARN-3950 to add the SHELL_ID and put up a patch there.

          Show
          rkanter Robert Kanter added a comment - I've created YARN-3950 to add the SHELL_ID and put up a patch there.
          Hide
          aw Allen Wittenauer added a comment -

          Maybe I'm missing it, but why is this being written in bash instead of as an actual yarn application? The JVM startup costs are going to be massive. Also, is there something that is guaranteeing that HADOOP_HOME is set?

          Show
          aw Allen Wittenauer added a comment - Maybe I'm missing it, but why is this being written in bash instead of as an actual yarn application? The JVM startup costs are going to be massive. Also, is there something that is guaranteeing that HADOOP_HOME is set?
          Hide
          aw Allen Wittenauer added a comment -

          Here's what shellcheck had to say about the generated bash:

          In /tmp/1 line 4:
          CONTAINER_ID_NUM=`echo $CONTAINER_ID | cut -d "_" -f 5`
                           ^-- SC2006: Use $(..) instead of legacy `..`.
                                 ^-- SC2086: Double quote to prevent globbing and word splitting.
          
          
          In /tmp/1 line 25:
          $HADOOP_HOME/bin/hadoop archive -Dmapreduce.framework.name=local -archiveName $appId.har -p /tmp/logs/$user/logs/$appId \* /tmp/logs/archive-logs-work
          ^-- SC2086: Double quote to prevent globbing and word splitting.
          
          
          In /tmp/1 line 26:
          $HADOOP_HOME/bin/hadoop fs -mv /tmp/logs/archive-logs-work/$appId.har /tmp/logs/$user/logs/$appId/$appId.har
          ^-- SC2086: Double quote to prevent globbing and word splitting.
          
          
          In /tmp/1 line 27:
          originalLogs=`$HADOOP_HOME/bin/hadoop fs -ls /tmp/logs/$user/logs/$appId | grep "^-" | awk '{print $8}'`
                       ^-- SC2006: Use $(..) instead of legacy `..`.
                        ^-- SC2086: Double quote to prevent globbing and word splitting.
          
          
          In /tmp/1 line 29:
                  $HADOOP_HOME/bin/hadoop fs -rm $originalLogs
                  ^-- SC2086: Double quote to prevent globbing and word splitting.
                                                 ^-- SC2086: Double quote to prevent globbing and word splitting.
          
          Show
          aw Allen Wittenauer added a comment - Here's what shellcheck had to say about the generated bash: In /tmp/1 line 4: CONTAINER_ID_NUM=`echo $CONTAINER_ID | cut -d "_" -f 5` ^-- SC2006: Use $(..) instead of legacy `..`. ^-- SC2086: Double quote to prevent globbing and word splitting. In /tmp/1 line 25: $HADOOP_HOME/bin/hadoop archive -Dmapreduce.framework.name=local -archiveName $appId.har -p /tmp/logs/$user/logs/$appId \* /tmp/logs/archive-logs-work ^-- SC2086: Double quote to prevent globbing and word splitting. In /tmp/1 line 26: $HADOOP_HOME/bin/hadoop fs -mv /tmp/logs/archive-logs-work/$appId.har /tmp/logs/$user/logs/$appId/$appId.har ^-- SC2086: Double quote to prevent globbing and word splitting. In /tmp/1 line 27: originalLogs=`$HADOOP_HOME/bin/hadoop fs -ls /tmp/logs/$user/logs/$appId | grep "^-" | awk '{print $8}'` ^-- SC2006: Use $(..) instead of legacy `..`. ^-- SC2086: Double quote to prevent globbing and word splitting. In /tmp/1 line 29: $HADOOP_HOME/bin/hadoop fs -rm $originalLogs ^-- SC2086: Double quote to prevent globbing and word splitting. ^-- SC2086: Double quote to prevent globbing and word splitting.
          Hide
          rkanter Robert Kanter added a comment -

          Maybe I'm missing it, but why is this being written in bash instead of as an actual yarn application? The JVM startup costs are going to be massive.

          The 'hadoop archive' command starts up a JVM. I don't see how we can get around that unless we call it programmatically from an existing JVM and also do it serially, which is going to take a lot longer overall.
          I figured it would be simpler to use the DistributedShell because it already exists and does most of what we need, than to write a whole new AM that creates containers to run 'hadoop archive'.

          Also, is there something that is guaranteeing that HADOOP_HOME is set?

          The shell inherits the env of the NodeManager as a base. HADOOP_HOME should be defined for the NM, so it ends up in env of the shell.

          I wasn't aware of shellcheck before, but that looks like a really useful tool. I'll fix those.

          Show
          rkanter Robert Kanter added a comment - Maybe I'm missing it, but why is this being written in bash instead of as an actual yarn application? The JVM startup costs are going to be massive. The 'hadoop archive' command starts up a JVM. I don't see how we can get around that unless we call it programmatically from an existing JVM and also do it serially, which is going to take a lot longer overall. I figured it would be simpler to use the DistributedShell because it already exists and does most of what we need, than to write a whole new AM that creates containers to run 'hadoop archive'. Also, is there something that is guaranteeing that HADOOP_HOME is set? The shell inherits the env of the NodeManager as a base. HADOOP_HOME should be defined for the NM, so it ends up in env of the shell. I wasn't aware of shellcheck before, but that looks like a really useful tool. I'll fix those.
          Hide
          aw Allen Wittenauer added a comment - - edited

          The shell inherits the env of the NodeManager as a base. HADOOP_HOME should be defined for the NM, so it ends up in env of the shell.

          a) This is only true for Windows. Unix has been using HADOOP_PREFIX since 0.21. If it's being defined, it's not by the bash code that starts the NM that ships with Apache Hadoop.

          b) I'm unsure if LCE actually inherits all of the shell environment or only specific variables.

          The 'hadoop archive' command starts up a JVM. I don't see how we can get around that unless we call it programmatically from an existing JVM and also do it serially, which is going to take a lot longer overall.

          There are several hadoop command in the generated shell code. That's many many JVM startup costs. Granted there has been a lot of work in trunk to minimize those costs (classpath dedupe, etc), but it's still very expensive.

          Show
          aw Allen Wittenauer added a comment - - edited The shell inherits the env of the NodeManager as a base. HADOOP_HOME should be defined for the NM, so it ends up in env of the shell. a) This is only true for Windows. Unix has been using HADOOP_PREFIX since 0.21. If it's being defined, it's not by the bash code that starts the NM that ships with Apache Hadoop. b) I'm unsure if LCE actually inherits all of the shell environment or only specific variables. The 'hadoop archive' command starts up a JVM. I don't see how we can get around that unless we call it programmatically from an existing JVM and also do it serially, which is going to take a lot longer overall. There are several hadoop command in the generated shell code. That's many many JVM startup costs. Granted there has been a lot of work in trunk to minimize those costs (classpath dedupe, etc), but it's still very expensive.
          Hide
          rkanter Robert Kanter added a comment -

          I that case, I suppose I could write a Java program that calls the 'hadoop archive' command programmatically, and then the equivalent 'hadoop fs' operations with the Java API. This would only require the one JVM startup.

          Show
          rkanter Robert Kanter added a comment - I that case, I suppose I could write a Java program that calls the 'hadoop archive' command programmatically, and then the equivalent 'hadoop fs' operations with the Java API. This would only require the one JVM startup.
          Hide
          aw Allen Wittenauer added a comment -

          I forgot that HADOOP_HOME got exported in trunk for all platforms as part of HADOOP-11464.

          Show
          aw Allen Wittenauer added a comment - I forgot that HADOOP_HOME got exported in trunk for all platforms as part of HADOOP-11464 .
          Hide
          rkanter Robert Kanter added a comment -

          The prelim_002 patch:

          • Uses YARN_SHELL_ID from YARN-3950 instead of parsing CONTAINER_ID
          • Runs 'hadoop archive' and the FileSystem commands from a Java program, so we can limit the JVM startup cost
          Show
          rkanter Robert Kanter added a comment - The prelim_002 patch: Uses YARN_SHELL_ID from YARN-3950 instead of parsing CONTAINER_ID Runs 'hadoop archive' and the FileSystem commands from a Java program, so we can limit the JVM startup cost
          Hide
          asuresh Arun Suresh added a comment -

          Robert Kanter, The patch looks good to me. You might want to clean up the TODOs and add some javaDocs though.
          +1 pending that.

          Show
          asuresh Arun Suresh added a comment - Robert Kanter , The patch looks good to me. You might want to clean up the TODOs and add some javaDocs though. +1 pending that.
          Hide
          rkanter Robert Kanter added a comment -

          Thanks for the review Arun Suresh. This is just the preliminary patch. I still have to write unit tests, javadocs, and split out the yarn changes into a YARN JIRA. But it sounds like you're good with the approach.

          Allen Wittenauer, any other comments?
          How about you Jason Lowe?

          Show
          rkanter Robert Kanter added a comment - Thanks for the review Arun Suresh . This is just the preliminary patch. I still have to write unit tests, javadocs, and split out the yarn changes into a YARN JIRA. But it sounds like you're good with the approach. Allen Wittenauer , any other comments? How about you Jason Lowe ?
          Hide
          kasha Karthik Kambatla added a comment -

          Skimmed through the patch. Looks generally good. Can do a more thorough review on the non-prelim patch(es).

          May be, we should avoid logging to System.out and System.err and use the LOG instead? It is possible users invoke this through other programs in a non-interactive mode.

          Show
          kasha Karthik Kambatla added a comment - Skimmed through the patch. Looks generally good. Can do a more thorough review on the non-prelim patch(es). May be, we should avoid logging to System.out and System.err and use the LOG instead? It is possible users invoke this through other programs in a non-interactive mode.
          Hide
          rkanter Robert Kanter added a comment -

          Ok, I'll change the logging, start adding unit tests, and clean up some things.

          Show
          rkanter Robert Kanter added a comment - Ok, I'll change the logging, start adding unit tests, and clean up some things.
          Hide
          rkanter Robert Kanter added a comment -

          MAPREDUCE-6415.001.patch and MAPREDUCE-6415_branch-2.001.patch contain the MapReduce changes, though most of it's actually under hadoop-tools. This includes all of the code to find and process the aggregated log files into HAR files. It's mostly the same as the prelim patch, with some minor changes and unit tests. I've uploaded the YARN changes to YARN-4086. The patches for this and YARN-4086 can be applied independently.

          Show
          rkanter Robert Kanter added a comment - MAPREDUCE-6415 .001.patch and MAPREDUCE-6415 _branch-2.001.patch contain the MapReduce changes, though most of it's actually under hadoop-tools. This includes all of the code to find and process the aggregated log files into HAR files. It's mostly the same as the prelim patch, with some minor changes and unit tests. I've uploaded the YARN changes to YARN-4086 . The patches for this and YARN-4086 can be applied independently.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 1s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12752559/MAPREDUCE-6415_branch-2.001.patch
          Optional Tests javadoc javac unit shellcheck findbugs checkstyle
          git revision trunk / a4d9acc
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5955/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 1s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12752559/MAPREDUCE-6415_branch-2.001.patch Optional Tests javadoc javac unit shellcheck findbugs checkstyle git revision trunk / a4d9acc Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5955/console This message was automatically generated.
          Hide
          rkanter Robert Kanter added a comment -

          Jason Lowe, can you take a look at this?

          Show
          rkanter Robert Kanter added a comment - Jason Lowe , can you take a look at this?
          Hide
          jlowe Jason Lowe added a comment -

          Have been very busy, but I will try to get a look at this next week.

          Show
          jlowe Jason Lowe added a comment - Have been very busy, but I will try to get a look at this next week.
          Hide
          jlowe Jason Lowe added a comment -

          mvn dependency:analyze says there's a number of things that should be cleaned up in the new pom:

          [INFO] --- maven-dependency-plugin:2.2:analyze (default-cli) @ hadoop-archive-logs ---
          [WARNING] Used undeclared dependencies found:
          [WARNING]    org.apache.hadoop:hadoop-yarn-common:jar:2.8.0-SNAPSHOT:provided
          [WARNING]    com.google.guava:guava:jar:11.0.2:provided
          [WARNING]    commons-io:commons-io:jar:2.4:compile
          [WARNING]    commons-logging:commons-logging:jar:1.1.3:provided
          [WARNING]    org.apache.hadoop:hadoop-yarn-client:jar:2.8.0-SNAPSHOT:provided
          [WARNING]    org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:2.8.0-SNAPSHOT:test
          [WARNING]    org.apache.hadoop:hadoop-yarn-api:jar:2.8.0-SNAPSHOT:provided
          [WARNING]    commons-cli:commons-cli:jar:1.2:provided
          [WARNING] Unused declared dependencies found:
          [WARNING]    org.apache.hadoop:hadoop-annotations:jar:2.8.0-SNAPSHOT:provided
          [WARNING]    org.apache.hadoop:hadoop-mapreduce-client-hs:jar:2.8.0-SNAPSHOT:test
          [WARNING]    org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.8.0-SNAPSHOT:provided
          [WARNING]    org.apache.hadoop:hadoop-mapreduce-client-jobclient:test-jar:tests:2.8.0-SNAPSHOT:test
          [WARNING]    org.apache.hadoop:hadoop-hdfs:jar:2.8.0-SNAPSHOT:provided
          [WARNING]    org.apache.hadoop:hadoop-common:test-jar:tests:2.8.0-SNAPSHOT:test
          

          It would be nice if the usage output used the actual values in the code rather than hardcoded strings. For example, we now have to keep minNumLogFiles and the usage string manually in sync. If the usage output leveraged the minNumLogFiles value directly then updating it would automatically correct the usage message. On a related note the usage currently mentions values like "1GB", but I don't believe the code supports memory units.

          Do we only want to consider aggregating logs that have totally succeeded? What about the FAILED case or other terminal states? Seems like any terminal state where we know there aren't going to be any more logs arriving should be eligible.

          Nit: it's wasteful for checkFiles to continue iterating the files once it finds an excluding condition. We can also eliminate the need to track file counts explicitly and simply check files.length directly before we even start looping.

          Is there a reason to support maxEligible being zero? Wondering if that should be equivalent to a negative value and just cover everything.

          Should the working directory contain something unique like the application ID in it somewhere? This has the benefit of making it easier to cleanup after a run and not worry about affecting other, possibly simultaneous runs.

          Show
          jlowe Jason Lowe added a comment - mvn dependency:analyze says there's a number of things that should be cleaned up in the new pom: [INFO] --- maven-dependency-plugin:2.2:analyze (default-cli) @ hadoop-archive-logs --- [WARNING] Used undeclared dependencies found: [WARNING] org.apache.hadoop:hadoop-yarn-common:jar:2.8.0-SNAPSHOT:provided [WARNING] com.google.guava:guava:jar:11.0.2:provided [WARNING] commons-io:commons-io:jar:2.4:compile [WARNING] commons-logging:commons-logging:jar:1.1.3:provided [WARNING] org.apache.hadoop:hadoop-yarn-client:jar:2.8.0-SNAPSHOT:provided [WARNING] org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:2.8.0-SNAPSHOT:test [WARNING] org.apache.hadoop:hadoop-yarn-api:jar:2.8.0-SNAPSHOT:provided [WARNING] commons-cli:commons-cli:jar:1.2:provided [WARNING] Unused declared dependencies found: [WARNING] org.apache.hadoop:hadoop-annotations:jar:2.8.0-SNAPSHOT:provided [WARNING] org.apache.hadoop:hadoop-mapreduce-client-hs:jar:2.8.0-SNAPSHOT:test [WARNING] org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.8.0-SNAPSHOT:provided [WARNING] org.apache.hadoop:hadoop-mapreduce-client-jobclient:test-jar:tests:2.8.0-SNAPSHOT:test [WARNING] org.apache.hadoop:hadoop-hdfs:jar:2.8.0-SNAPSHOT:provided [WARNING] org.apache.hadoop:hadoop-common:test-jar:tests:2.8.0-SNAPSHOT:test It would be nice if the usage output used the actual values in the code rather than hardcoded strings. For example, we now have to keep minNumLogFiles and the usage string manually in sync. If the usage output leveraged the minNumLogFiles value directly then updating it would automatically correct the usage message. On a related note the usage currently mentions values like "1GB", but I don't believe the code supports memory units. Do we only want to consider aggregating logs that have totally succeeded? What about the FAILED case or other terminal states? Seems like any terminal state where we know there aren't going to be any more logs arriving should be eligible. Nit: it's wasteful for checkFiles to continue iterating the files once it finds an excluding condition. We can also eliminate the need to track file counts explicitly and simply check files.length directly before we even start looping. Is there a reason to support maxEligible being zero? Wondering if that should be equivalent to a negative value and just cover everything. Should the working directory contain something unique like the application ID in it somewhere? This has the benefit of making it easier to cleanup after a run and not worry about affecting other, possibly simultaneous runs.
          Hide
          rkanter Robert Kanter added a comment -

          Thanks for the review Jason Lowe!

          The 002 patch address most of the issues Jason brought up:

          • fixes dependencies, though I had to keep some of the ones that maven didn't think it needed
          • fixes usage output to use variables for the defaults. I also changed the units for the max total logs size to megabytes instead of bytes to be easier to use.
          • now SUCCEEDED and FAILED log aggregation statuses are considered.
          • improves checkFiles to be more efficient
          • if maxEligible is 0, it will now print out a message and exit right away. I think having 0 be equivalent to all might be confusing? I'm fine either way; let me know if you think it's better to treat it as equivalent to a negative value.

          I don't think we should add a unique ID to the working directory. The tool won't work correctly with simultaneous runs anyway because it doesn't acquire any sort of "lock" that would stop another instance from trying to process the same application's logs. As it is now, by using a non-unique directory, anything left over will get cleaned up when you run the tool again (presumably, you're running it at some interval).

          On that last point, it would be good if we could prevent two instances of the tool from running at the same time. I think the best way to do (without using a lock) is for the tool to check for a RUNNING job named "ArchiveLogs" in the RM, though this won't protect against all situations and will have a false positive if the user has another job named "ArchiveLogs".

          Show
          rkanter Robert Kanter added a comment - Thanks for the review Jason Lowe ! The 002 patch address most of the issues Jason brought up: fixes dependencies, though I had to keep some of the ones that maven didn't think it needed fixes usage output to use variables for the defaults. I also changed the units for the max total logs size to megabytes instead of bytes to be easier to use. now SUCCEEDED and FAILED log aggregation statuses are considered. improves checkFiles to be more efficient if maxEligible is 0, it will now print out a message and exit right away. I think having 0 be equivalent to all might be confusing? I'm fine either way; let me know if you think it's better to treat it as equivalent to a negative value. I don't think we should add a unique ID to the working directory. The tool won't work correctly with simultaneous runs anyway because it doesn't acquire any sort of "lock" that would stop another instance from trying to process the same application's logs. As it is now, by using a non-unique directory, anything left over will get cleaned up when you run the tool again (presumably, you're running it at some interval). On that last point, it would be good if we could prevent two instances of the tool from running at the same time. I think the best way to do (without using a lock) is for the tool to check for a RUNNING job named "ArchiveLogs" in the RM, though this won't protect against all situations and will have a false positive if the user has another job named "ArchiveLogs".
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 0s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12753723/MAPREDUCE-6415_branch-2.002.patch
          Optional Tests javadoc javac unit shellcheck findbugs checkstyle
          git revision trunk / 095ab9a
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5965/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12753723/MAPREDUCE-6415_branch-2.002.patch Optional Tests javadoc javac unit shellcheck findbugs checkstyle git revision trunk / 095ab9a Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5965/console This message was automatically generated.
          Hide
          rkanter Robert Kanter added a comment -

          Reuploading trunk patch; Jenkins tried to ran the branch-2 patch.

          Show
          rkanter Robert Kanter added a comment - Reuploading trunk patch; Jenkins tried to ran the branch-2 patch.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 15m 49s Findbugs (version 3.0.0) appears to be broken on trunk.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 8m 2s There were no new javac warning messages.
          +1 javadoc 10m 23s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 22s There were no new checkstyle issues.
          +1 shellcheck 0m 6s There were no new shellcheck (v0.3.3) issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 29s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          -1 findbugs 0m 13s Post-patch findbugs hadoop-assemblies compilation is broken.
          -1 findbugs 0m 27s Post-patch findbugs hadoop-tools/hadoop-tools-dist compilation is broken.
          +1 findbugs 0m 27s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 assemblies tests 0m 11s Tests passed in hadoop-assemblies.
          +1 tools/hadoop tests 0m 13s Tests passed in hadoop-tools-dist.
              38m 0s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12753801/MAPREDUCE-6415.002.patch
          Optional Tests javadoc javac unit shellcheck findbugs checkstyle
          git revision trunk / 7d6687f
          hadoop-assemblies test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5966/artifact/patchprocess/testrun_hadoop-assemblies.txt
          hadoop-tools-dist test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5966/artifact/patchprocess/testrun_hadoop-tools-dist.txt
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5966/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5966/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 15m 49s Findbugs (version 3.0.0) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 8m 2s There were no new javac warning messages. +1 javadoc 10m 23s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 22s There were no new checkstyle issues. +1 shellcheck 0m 6s There were no new shellcheck (v0.3.3) issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 29s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. -1 findbugs 0m 13s Post-patch findbugs hadoop-assemblies compilation is broken. -1 findbugs 0m 27s Post-patch findbugs hadoop-tools/hadoop-tools-dist compilation is broken. +1 findbugs 0m 27s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 assemblies tests 0m 11s Tests passed in hadoop-assemblies. +1 tools/hadoop tests 0m 13s Tests passed in hadoop-tools-dist.     38m 0s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12753801/MAPREDUCE-6415.002.patch Optional Tests javadoc javac unit shellcheck findbugs checkstyle git revision trunk / 7d6687f hadoop-assemblies test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5966/artifact/patchprocess/testrun_hadoop-assemblies.txt hadoop-tools-dist test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5966/artifact/patchprocess/testrun_hadoop-tools-dist.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5966/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5966/console This message was automatically generated.
          Hide
          asuresh Arun Suresh added a comment -

          The latest patch looks good,
          +1, as long as Jason Lowe / Karthik Kambatla has no other issues..

          Thanks Robert Kanter

          Show
          asuresh Arun Suresh added a comment - The latest patch looks good, +1, as long as Jason Lowe / Karthik Kambatla has no other issues.. Thanks Robert Kanter
          Hide
          kasha Karthik Kambatla added a comment -

          The patch looks mostly good to me, but for the following nits:

          1. HadoopArchiveLogs constructor doesn't need type on HashSet in Java 7
          2. HadoopArchiveLogs#run returns -1. Could we return a positive value, say 1, instead?
          3. HadoopArchiveLogs#checkFiles has an unused variable

          Once the nits are fixed, I think we should get this in. Let us work on avoiding concurrent runs and any other bugs we find in a follow-up JIRA?

          Show
          kasha Karthik Kambatla added a comment - The patch looks mostly good to me, but for the following nits: HadoopArchiveLogs constructor doesn't need type on HashSet in Java 7 HadoopArchiveLogs#run returns -1. Could we return a positive value, say 1, instead? HadoopArchiveLogs#checkFiles has an unused variable Once the nits are fixed, I think we should get this in. Let us work on avoiding concurrent runs and any other bugs we find in a follow-up JIRA?
          Hide
          rkanter Robert Kanter added a comment -

          The 003 patch addresses the issues Karthik pointed out. I agree that we can follow up with those other things in new JIRAs.

          Show
          rkanter Robert Kanter added a comment - The 003 patch addresses the issues Karthik pointed out. I agree that we can follow up with those other things in new JIRAs.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 15m 34s Findbugs (version 3.0.0) appears to be broken on trunk.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 44s There were no new javac warning messages.
          +1 javadoc 10m 6s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 38s There were no new checkstyle issues.
          +1 shellcheck 0m 5s There were no new shellcheck (v0.3.3) issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 30s mvn install still works.
          +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse.
          -1 findbugs 0m 12s Post-patch findbugs hadoop-assemblies compilation is broken.
          -1 findbugs 0m 24s Post-patch findbugs hadoop-tools/hadoop-tools-dist compilation is broken.
          +1 findbugs 0m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 assemblies tests 0m 10s Tests passed in hadoop-assemblies.
          +1 tools/hadoop tests 0m 13s Tests passed in hadoop-tools-dist.
              37m 25s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12754769/MAPREDUCE-6415.003.patch
          Optional Tests javadoc javac unit shellcheck findbugs checkstyle
          git revision trunk / d9c1fab
          hadoop-assemblies test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5974/artifact/patchprocess/testrun_hadoop-assemblies.txt
          hadoop-tools-dist test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5974/artifact/patchprocess/testrun_hadoop-tools-dist.txt
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5974/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5974/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 15m 34s Findbugs (version 3.0.0) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 44s There were no new javac warning messages. +1 javadoc 10m 6s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 38s There were no new checkstyle issues. +1 shellcheck 0m 5s There were no new shellcheck (v0.3.3) issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 30s mvn install still works. +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse. -1 findbugs 0m 12s Post-patch findbugs hadoop-assemblies compilation is broken. -1 findbugs 0m 24s Post-patch findbugs hadoop-tools/hadoop-tools-dist compilation is broken. +1 findbugs 0m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 assemblies tests 0m 10s Tests passed in hadoop-assemblies. +1 tools/hadoop tests 0m 13s Tests passed in hadoop-tools-dist.     37m 25s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12754769/MAPREDUCE-6415.003.patch Optional Tests javadoc javac unit shellcheck findbugs checkstyle git revision trunk / d9c1fab hadoop-assemblies test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5974/artifact/patchprocess/testrun_hadoop-assemblies.txt hadoop-tools-dist test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5974/artifact/patchprocess/testrun_hadoop-tools-dist.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5974/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5974/console This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          +1 latest patch looks good to me. Will commit this later today if there are no objections.

          Show
          jlowe Jason Lowe added a comment - +1 latest patch looks good to me. Will commit this later today if there are no objections.
          Hide
          kasha Karthik Kambatla added a comment -

          LGTM too. Checked with Robert on the findbugs, looks like it was broken before the patch as well.

          +1

          Show
          kasha Karthik Kambatla added a comment - LGTM too. Checked with Robert on the findbugs, looks like it was broken before the patch as well. +1
          Hide
          kasha Karthik Kambatla added a comment -

          Checking this in. (Jason, sorry for the jumping the gun here).

          Show
          kasha Karthik Kambatla added a comment - Checking this in. (Jason, sorry for the jumping the gun here).
          Hide
          kasha Karthik Kambatla added a comment -

          Committed to trunk and branch-2. Thanks Robert Kanter for this handy tool, and Jason Lowe for your reviews.

          Show
          kasha Karthik Kambatla added a comment - Committed to trunk and branch-2. Thanks Robert Kanter for this handy tool, and Jason Lowe for your reviews.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8424 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8424/)
          MAPREDUCE-6415. Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java
          • hadoop-mapreduce-project/bin/mapred
          • MAPREDUCE-6415.003.patch
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
          • hadoop-tools/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java
          • hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml
          • hadoop-tools/hadoop-tools-dist/pom.xml
          • hadoop-project/pom.xml
          • hadoop-tools/hadoop-archive-logs/pom.xml
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8424 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8424/ ) MAPREDUCE-6415 . Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2) hadoop-mapreduce-project/CHANGES.txt hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java hadoop-mapreduce-project/bin/mapred MAPREDUCE-6415 .003.patch hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java hadoop-tools/pom.xml hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml hadoop-tools/hadoop-tools-dist/pom.xml hadoop-project/pom.xml hadoop-tools/hadoop-archive-logs/pom.xml
          Hide
          rkanter Robert Kanter added a comment -

          Thanks everyone! I'm glad we finally have a workable solution to this issue in now

          Show
          rkanter Robert Kanter added a comment - Thanks everyone! I'm glad we finally have a workable solution to this issue in now
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #364 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/364/)
          MAPREDUCE-6415. Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2)

          • hadoop-tools/hadoop-tools-dist/pom.xml
          • hadoop-tools/hadoop-archive-logs/pom.xml
          • hadoop-project/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
          • hadoop-mapreduce-project/CHANGES.txt
          • MAPREDUCE-6415.003.patch
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java
          • hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml
          • hadoop-tools/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java
          • hadoop-mapreduce-project/bin/mapred
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #364 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/364/ ) MAPREDUCE-6415 . Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2) hadoop-tools/hadoop-tools-dist/pom.xml hadoop-tools/hadoop-archive-logs/pom.xml hadoop-project/pom.xml hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java hadoop-mapreduce-project/CHANGES.txt MAPREDUCE-6415 .003.patch hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml hadoop-tools/pom.xml hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java hadoop-mapreduce-project/bin/mapred hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8426 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8426/)
          removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69)

          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8426 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8426/ ) removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69) MAPREDUCE-6415 .003.patch
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1102 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1102/)
          MAPREDUCE-6415. Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2)

          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java
          • MAPREDUCE-6415.003.patch
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml
          • hadoop-mapreduce-project/bin/mapred
          • hadoop-tools/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java
          • hadoop-tools/hadoop-archive-logs/pom.xml
          • hadoop-project/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
          • hadoop-tools/hadoop-tools-dist/pom.xml
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1102 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1102/ ) MAPREDUCE-6415 . Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2) hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java MAPREDUCE-6415 .003.patch hadoop-mapreduce-project/CHANGES.txt hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml hadoop-mapreduce-project/bin/mapred hadoop-tools/pom.xml hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java hadoop-tools/hadoop-archive-logs/pom.xml hadoop-project/pom.xml hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java hadoop-tools/hadoop-tools-dist/pom.xml
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #371 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/371/)
          MAPREDUCE-6415. Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2)

          • hadoop-tools/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-project/pom.xml
          • hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml
          • MAPREDUCE-6415.003.patch
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java
          • hadoop-tools/hadoop-tools-dist/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java
          • hadoop-mapreduce-project/bin/mapred
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java
          • hadoop-tools/hadoop-archive-logs/pom.xml
            removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69)
          • MAPREDUCE-6415.003.patch
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #371 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/371/ ) MAPREDUCE-6415 . Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2) hadoop-tools/pom.xml hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java hadoop-mapreduce-project/CHANGES.txt hadoop-project/pom.xml hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml MAPREDUCE-6415 .003.patch hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java hadoop-tools/hadoop-tools-dist/pom.xml hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java hadoop-mapreduce-project/bin/mapred hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java hadoop-tools/hadoop-archive-logs/pom.xml removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69) MAPREDUCE-6415 .003.patch
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #1103 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1103/)
          removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69)

          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #1103 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1103/ ) removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69) MAPREDUCE-6415 .003.patch
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2313 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2313/)
          MAPREDUCE-6415. Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2)

          • hadoop-tools/hadoop-tools-dist/pom.xml
          • hadoop-tools/hadoop-archive-logs/pom.xml
          • hadoop-project/pom.xml
          • MAPREDUCE-6415.003.patch
          • hadoop-tools/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
          • hadoop-mapreduce-project/bin/mapred
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java
            removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69)
          • MAPREDUCE-6415.003.patch
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2313 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2313/ ) MAPREDUCE-6415 . Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2) hadoop-tools/hadoop-tools-dist/pom.xml hadoop-tools/hadoop-archive-logs/pom.xml hadoop-project/pom.xml MAPREDUCE-6415 .003.patch hadoop-tools/pom.xml hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java hadoop-mapreduce-project/bin/mapred hadoop-mapreduce-project/CHANGES.txt hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69) MAPREDUCE-6415 .003.patch
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #365 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/365/)
          removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69)

          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #365 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/365/ ) removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69) MAPREDUCE-6415 .003.patch
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2290 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2290/)
          MAPREDUCE-6415. Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2)

          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java
          • hadoop-tools/hadoop-tools-dist/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java
          • hadoop-tools/hadoop-archive-logs/pom.xml
          • hadoop-mapreduce-project/bin/mapred
          • MAPREDUCE-6415.003.patch
          • hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
          • hadoop-tools/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java
          • hadoop-project/pom.xml
          • hadoop-mapreduce-project/CHANGES.txt
            removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69)
          • MAPREDUCE-6415.003.patch
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2290 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2290/ ) MAPREDUCE-6415 . Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2) hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java hadoop-tools/hadoop-tools-dist/pom.xml hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java hadoop-tools/hadoop-archive-logs/pom.xml hadoop-mapreduce-project/bin/mapred MAPREDUCE-6415 .003.patch hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java hadoop-tools/pom.xml hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java hadoop-project/pom.xml hadoop-mapreduce-project/CHANGES.txt removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69) MAPREDUCE-6415 .003.patch
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #351 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/351/)
          MAPREDUCE-6415. Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2)

          • hadoop-tools/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java
          • hadoop-mapreduce-project/bin/mapred
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-project/pom.xml
          • hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java
          • hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
          • hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml
          • hadoop-tools/hadoop-tools-dist/pom.xml
          • MAPREDUCE-6415.003.patch
          • hadoop-tools/hadoop-archive-logs/pom.xml
            removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69)
          • MAPREDUCE-6415.003.patch
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #351 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/351/ ) MAPREDUCE-6415 . Create a tool to combine aggregated logs into HAR files. (Robert Kanter via kasha) (kasha: rev 119cc75e7ebd723790f6326498383304aba384a2) hadoop-tools/pom.xml hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogsRunner.java hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogsRunner.java hadoop-mapreduce-project/bin/mapred hadoop-mapreduce-project/CHANGES.txt hadoop-project/pom.xml hadoop-tools/hadoop-archive-logs/src/test/java/org/apache/hadoop/tools/TestHadoopArchiveLogs.java hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml hadoop-tools/hadoop-tools-dist/pom.xml MAPREDUCE-6415 .003.patch hadoop-tools/hadoop-archive-logs/pom.xml removing accidental file in MAPREDUCE-6415 (rkanter: rev f15371062f1bbcbb79bf44fd67ec647020d56c69) MAPREDUCE-6415 .003.patch

            People

            • Assignee:
              rkanter Robert Kanter
              Reporter:
              rkanter Robert Kanter
            • Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development