Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-646

Cannot run Wikipedia example on Amazon Elastic MapReduce (EMR)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.5
    • Component/s: Classification
    • Labels:
      None

      Description

      When I tried to run the Wikipedia example on EMR with all the categories existing in the Wikipedia dump, I got this error :

      org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /yatter.tagger/wikipedia/input/temporary/_attempt_0000_r_000000_0/part-r-00000 for DFSClient_attempt_201103292134_0010_r_000000_0 on client 10.240.10.157 because current leaseholder is trying to recreate file.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1045)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377)
      at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)

      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

      at org.apache.hadoop.ipc.Client.call(Client.java:740)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
      at $Proxy1.create(Unknown Source)

      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
      at $Proxy1.create(Unknown Source)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2709)
      at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:491)
      at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:195)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:524)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:505)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:412)
      at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
      at org.apache.mahout.classifier.bayes.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:41)
      at org.apache.mahout.classifier.bayes.MultipleOutputFormat$1.write(MultipleOutputFormat.java:81)
      at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:517)
      at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:35)
      at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:28)
      at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:575)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
      at org.apache.hadoop.mapred.Child.main(Child.java:170)

      org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to create file /yatter.tagger/wikipedia/input/temporary/_attempt_0000_r_000000_0/part-r-00000 on client 10.240.10.157 either because the filename is invalid or the file exists
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1092)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377)
      at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)

      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

      at org.apache.hadoop.ipc.Client.call(Client.java:740)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
      at $Proxy1.create(Unknown Source)

      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
      at $Proxy1.create(Unknown Source)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2709)
      at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:491)
      at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:195)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:524)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:505)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:412)
      at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
      at org.apache.mahout.classifier.bayes.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:41)
      at org.apache.mahout.classifier.bayes.MultipleOutputFormat$1.write(MultipleOutputFormat.java:81)
      at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:517)
      at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:35)
      at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:28)
      at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:575)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
      at org.apache.hadoop.mapred.Child.main(Child.java:170)

      4 more :
      org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to create file /yatter.tagger/wikipedia/input/temporary/_attempt_0000_r_000000_0/part-r-00000 on client 10.240.10.157 either because the filename is invalid or the file exists

        Attachments

          Activity

            People

            • Assignee:
              srowen Sean R. Owen
              Reporter:
              vivrass Martin Provencher
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: