Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-5268

Using MultipleOutputFormat and setting reducers to 0 causes org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException and job to fail

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.19.0
    • 0.19.1
    • None
    • None

    Description

      Hi,

      I'm trying to save the sorting step by only runnign the map phase (setting the reducers to 0), but my job will fail then.
      The job runs fine when the reduce phase is activated.

      I'm using MultipleInputFormat and MultipleOutputFormat. Here is my outputformat class, below is the exception.

      public class MultipleSequenceFileOutputFormat<K extends WritableComparable, V extends Writable> extends MultipleOutputFormat<K, V> {

      private SequenceFileOutputFormat<K, V> sequencefileoutputformat = null;
      private String uniqueprefix = "";
      private boolean set = false;
      private static Random r = new Random();

      @Override
      protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3) throws IOException {
      if (sequencefileoutputformat == null)

      { sequencefileoutputformat = new SequenceFileOutputFormat<K, V>(); }

      return sequencefileoutputformat.getRecordWriter(fs, job, name, arg3);
      }

      @Override
      protected String generateFileNameForKeyValue(K key, V value, String name) {

      if (!set) {
      synchronized (r)

      { uniqueprefix = new Long(System.currentTimeMillis()).toString() + "_" + r.nextInt(); set = true; }

      }

      return "prefix....." + uniqueprefix + "_" + name;
      }

      @Override
      public void checkOutputSpecs(FileSystem fs, JobConf conf) {
      }

      org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file ......1234809836818_-1723031414_part-00000 for DFSClient_attempt_200902111714_0492_m_000000_0 on client 192.168.0.6 because current leaseholder is trying to recreate file.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1052)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:995)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
      at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

      at org.apache.hadoop.ipc.Client.call(Client.java:696)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
      at $Proxy1.create(Unknown Source)
      at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
      at $Proxy1.create(Unknown Source)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2587)
      at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454)
      at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:169)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487)
      at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.(SequenceFile.java:1198)
      at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:401)
      at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:354)
      at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:427)
      at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:57)
      at MultipleSequenceFileOutputFormat.getBaseRecordWriter(MultipleSequenceFileOutputFormat.java:33)
      at org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:99)
      at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:385)
      at ...

      Attachments

        Activity

          People

            Unassigned Unassigned
            bluelu Thibaut
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: