Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      TestCheckpoint started intermittently failing last night:

      http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/55/testReport/org.apache.hadoop.dfs/TestCheckpoint/testCheckpoint/

      This is probably caused by one of the changes introduced yesterday:

      http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/55/

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          For posterity, since Hudson builds are only kept for 30 days, the stack trace is:

          org.apache.hadoop.ipc.RemoteException: java.io.IOException: Failed to create file /user/hudson/checkpointxx.dat on client 127.0.0.1 because there were not enough datanodes available. Found 0 datanodes but MIN_REPLICATION for the cluster is configured to be 1.
          at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:813)
          at org.apache.hadoop.dfs.NameNode.create(NameNode.java:294)
          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:339)
          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:573)

          at org.apache.hadoop.ipc.Client.call(Client.java:471)
          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
          at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
          at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1141)
          at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1079)
          at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1305)
          at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1258)
          at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.write(DFSClient.java:1240)
          at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
          at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
          at java.io.DataOutputStream.write(DataOutputStream.java:90)
          at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.write(ChecksumFileSystem.java:395)
          at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
          at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
          at java.io.DataOutputStream.write(DataOutputStream.java:90)
          at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
          at org.apache.hadoop.dfs.TestCheckpoint.writeFile(TestCheckpoint.java:47)
          at org.apache.hadoop.dfs.TestCheckpoint.testSecondaryNamenodeError1(TestCheckpoint.java:153)
          at org.apache.hadoop.dfs.TestCheckpoint.testCheckpoint(TestCheckpoint.java:323)

          And the probable causes are:

          1. HADOOP-1001. Check the type of keys and values generated by the mapper against the types specified in JobConf. Contributed by Tahir Hashmi. (detail)
          2. HADOOP-971. Improve DFS Scalability: Improve name node performance by adding a hostname to datanodes map. Contributed by Hairong Kuang. (detail)
          3. HADOOP-1189. Fix 'No space left on device' exceptions on datanodes. Contributed by Raghu Angadi. (detail)
          4. HADOOP-819. Change LineRecordWriter to not insert a tab between key and value when either is null, and to print nothing when both are null. Contributed by Runping Qi. (detail)
          5. HADOOP-1204. Rename InputFormatBase to be FileInputFormat. (detail)
          6. HADOOP-1213. Improve logging of errors by IPC server. (detail)
          7. HADOOP-1114. Permit user to specify additional CLASSPATH elements with a HADOOP_CLASSPATH environment variable. (detail)
          8. HADOOP-1238. Fix metrics reporting by TaskTracker to correctly track maps_running and reduces_running. Contributed by Michael Bieniosek. (detail)

          Show
          Doug Cutting added a comment - For posterity, since Hudson builds are only kept for 30 days, the stack trace is: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Failed to create file /user/hudson/checkpointxx.dat on client 127.0.0.1 because there were not enough datanodes available. Found 0 datanodes but MIN_REPLICATION for the cluster is configured to be 1. at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:813) at org.apache.hadoop.dfs.NameNode.create(NameNode.java:294) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:339) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:573) at org.apache.hadoop.ipc.Client.call(Client.java:471) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163) at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1141) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1079) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1305) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1258) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.write(DFSClient.java:1240) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.write(ChecksumFileSystem.java:395) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.hadoop.dfs.TestCheckpoint.writeFile(TestCheckpoint.java:47) at org.apache.hadoop.dfs.TestCheckpoint.testSecondaryNamenodeError1(TestCheckpoint.java:153) at org.apache.hadoop.dfs.TestCheckpoint.testCheckpoint(TestCheckpoint.java:323) And the probable causes are: 1. HADOOP-1001 . Check the type of keys and values generated by the mapper against the types specified in JobConf. Contributed by Tahir Hashmi. (detail) 2. HADOOP-971 . Improve DFS Scalability: Improve name node performance by adding a hostname to datanodes map. Contributed by Hairong Kuang. (detail) 3. HADOOP-1189 . Fix 'No space left on device' exceptions on datanodes. Contributed by Raghu Angadi. (detail) 4. HADOOP-819 . Change LineRecordWriter to not insert a tab between key and value when either is null, and to print nothing when both are null. Contributed by Runping Qi. (detail) 5. HADOOP-1204 . Rename InputFormatBase to be FileInputFormat. (detail) 6. HADOOP-1213 . Improve logging of errors by IPC server. (detail) 7. HADOOP-1114 . Permit user to specify additional CLASSPATH elements with a HADOOP_CLASSPATH environment variable. (detail) 8. HADOOP-1238 . Fix metrics reporting by TaskTracker to correctly track maps_running and reduces_running. Contributed by Michael Bieniosek. (detail)
          Hide
          Nigel Daley added a comment -

          TestCheckpoint has one case where it creates a MiniDFSCluster but doesn't wait for it to be active. I'll file a patch for this.

          I wonder if this started showing up due to some new speed up or slow down in starting a NameNode and/or a DataNode, perhaps introduced by HADOOP-971 or HADOOP-1189...

          Show
          Nigel Daley added a comment - TestCheckpoint has one case where it creates a MiniDFSCluster but doesn't wait for it to be active. I'll file a patch for this. I wonder if this started showing up due to some new speed up or slow down in starting a NameNode and/or a DataNode, perhaps introduced by HADOOP-971 or HADOOP-1189 ...
          Hide
          Nigel Daley added a comment -

          This issue is a duplicate of HADOOP-1248 and HADOOP-1256. It is fixed by the patch for HADOOP-1256.

          Show
          Nigel Daley added a comment - This issue is a duplicate of HADOOP-1248 and HADOOP-1256 . It is fixed by the patch for HADOOP-1256 .

            People

            • Assignee:
              Nigel Daley
              Reporter:
              Doug Cutting
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development