Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: None
    • Component/s: contrib/raid, test
    • Labels:
      None

      Description

      After MAPREDUCE-3868 re-enabled raid, TestRaidNode has been failing in Jenkins builds.

        Issue Links

          Activity

          Hide
          weiyan Weiyan Wang added a comment -

          I did run the test in my local box before submitting patch. Don't have an idea why it fails after committed. Will look at it as soon as possible.

          Show
          weiyan Weiyan Wang added a comment - I did run the test in my local box before submitting patch. Don't have an idea why it fails after committed. Will look at it as soon as possible.
          Hide
          weiyan Weiyan Wang added a comment -

          After hours of debugging, I found the reason for the failure is that JobMonitor gets the following exception when it check the completeness of one job:
          2012-06-23 19:52:18,196 ERROR [org.apache.hadoop.raid.JobMonitor@3515f1d3] raid.JobMonitor (JobMonitor.java:doMonitor(116)) - JobMonitor exception
          java.io.IOException: Job status not available
          at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:315)
          at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:592)
          at org.apache.hadoop.raid.DistRaid.checkComplete(DistRaid.java:282)
          at org.apache.hadoop.raid.JobMonitor.doMonitor(JobMonitor.java:106)
          at org.apache.hadoop.raid.JobMonitor.run(JobMonitor.java:61)
          at java.lang.Thread.run(Thread.java:619)

          But I don't know what causes this. I guess it's mapreduce bug because sometimes I get this exception while sometimes I don't. Anyone have an idea? Thanks!

          Show
          weiyan Weiyan Wang added a comment - After hours of debugging, I found the reason for the failure is that JobMonitor gets the following exception when it check the completeness of one job: 2012-06-23 19:52:18,196 ERROR [org.apache.hadoop.raid.JobMonitor@3515f1d3] raid.JobMonitor (JobMonitor.java:doMonitor(116)) - JobMonitor exception java.io.IOException: Job status not available at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:315) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:592) at org.apache.hadoop.raid.DistRaid.checkComplete(DistRaid.java:282) at org.apache.hadoop.raid.JobMonitor.doMonitor(JobMonitor.java:106) at org.apache.hadoop.raid.JobMonitor.run(JobMonitor.java:61) at java.lang.Thread.run(Thread.java:619) But I don't know what causes this. I guess it's mapreduce bug because sometimes I get this exception while sometimes I don't. Anyone have an idea? Thanks!
          Hide
          revans2 Robert Joseph Evans added a comment -

          It looks like there is no history server up and running. In Yarn there is a race in the client. If the client asks for status if the AM is still up and running then it will talk to the AM. If it has exited, which it tends to do when the MR job has completed then the client will fall over to the history server. It looks like while you are running using the minicluster there is no corresponding history server to fulfill the request.

          Show
          revans2 Robert Joseph Evans added a comment - It looks like there is no history server up and running. In Yarn there is a race in the client. If the client asks for status if the AM is still up and running then it will talk to the AM. If it has exited, which it tends to do when the MR job has completed then the client will fall over to the history server. It looks like while you are running using the minicluster there is no corresponding history server to fulfill the request.
          Hide
          weiyan Weiyan Wang added a comment -

          Do you mean I should use MiniMRYarnCluster instead of MiniMRCluster? Is there any example I could follow to start a job history server?

          Show
          weiyan Weiyan Wang added a comment - Do you mean I should use MiniMRYarnCluster instead of MiniMRCluster? Is there any example I could follow to start a job history server?
          Hide
          andrew.wang Andrew Wang added a comment -

          Resolving since HDFS-RAID has been removed from Hadoop.

          Show
          andrew.wang Andrew Wang added a comment - Resolving since HDFS-RAID has been removed from Hadoop.

            People

            • Assignee:
              weiyan Weiyan Wang
              Reporter:
              jlowe Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development