Avro
  1. Avro
  2. AVRO-1215

AvroMultipleOutputs not working when specifying baseOutputPath

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.2
    • Fix Version/s: 1.7.4
    • Component/s: java
    • Labels:
    • Tags:
      avro

      Description

      I'm calling the write() method of AvroMultipleOutputs which takes the baseOutputPath. The reducer appears to begin hanging once it tries writing to a baseOuputPath value not already encountered. It then fails with:

      org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file ... because current leaseholder is trying to recreate file.

      I think the problem has to do with this line in AvroMultipleOutputs:

      // get the record writer from context output format
      //FileOutputFormat.setOutputName(taskContext, baseFileName);
      

      This line is not commented out in the similar code from Hadoop. So I think the baseOutputPath is ignored. As a result when each record writer is created it uses the same path, leading to the exception.

      Uncommenting this line does not work because of visibility of the method. However what this method does is set "mapreduce.output.basename". But setting this doesn't work either.

      After digging through Avro code I found that AvroOutputFormatBase is using "avro.mo.config.namedOutput" to create the path. If I replace the commented out line with this it seems to work:

      taskContext.getConfiguration().set("avro.mo.config.namedOutput", baseFileName);  
      
      1. AVRO-1215-v3.patch
        7 kB
        Ashish Nagavaram
      2. AVRO-1215.patch
        29 kB
        Ashish Nagavaram
      3. AVRO-1215.patch
        10 kB
        Ashish Nagavaram
      4. AVRO-1215.patch
        9 kB
        Ashish Nagavaram
      5. AVRO-1215_final.patch
        9 kB
        Ashish Nagavaram

        Issue Links

          Activity

          Matthew Hayes created issue -
          Matthew Hayes made changes -
          Field Original Value New Value
          Description I'm calling the write() method of AvroMultipleOutputs which takes the baseOutputPath. The reducer appears to begin hanging once it tries writing to a baseOuputPath value not already encountered. It then fails with:

          org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file ... because current leaseholder is trying to recreate file.

          I think the problem has to do with this line in AvroMultipleOutputs:

          {code}
                // get the record writer from context output format
                //FileOutputFormat.setOutputName(taskContext, baseFileName);
          {code}

          This line is not commented out in the similar code from Hadoop. So I think the baseOutputPath is ignored. As a result when each record writer is created it uses the same path, leading to the exception.

          Uncommenting this line does not work because of visibility of the method. However what this method does is set "mapreduce.output.basename". But setting this doesn't work either.

          After digging through Avro code I found that AvroOutputFormatBase is using "avro.mo.config.namedOutput" to create the path. If I replace the commented out line with this it seems to work:

          {code}
          taskContext.getConfiguration().set("avro.mo.config.namedOutput", baseFileName);
          {code}
          I'm calling the write() method of AvroMultipleOutputs which takes the baseOutputPath. The reducer appears to begin hanging once it tries writing to a baseOuputPath value not already encountered. It then fails with:

          org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file ... because current leaseholder is trying to recreate file.

          I think the problem has to do with this line in AvroMultipleOutputs:

          {code}
          // get the record writer from context output format
          //FileOutputFormat.setOutputName(taskContext, baseFileName);
          {code}

          This line is not commented out in the similar code from Hadoop. So I think the baseOutputPath is ignored. As a result when each record writer is created it uses the same path, leading to the exception.

          Uncommenting this line does not work because of visibility of the method. However what this method does is set "mapreduce.output.basename". But setting this doesn't work either.

          After digging through Avro code I found that AvroOutputFormatBase is using "avro.mo.config.namedOutput" to create the path. If I replace the commented out line with this it seems to work:

          {code}
          taskContext.getConfiguration().set("avro.mo.config.namedOutput", baseFileName);
          {code}
          Matthew Hayes made changes -
          Link This issue blocks AVRO-1106 [ AVRO-1106 ]
          Matthew Hayes made changes -
          Labels avro mapreduce
          Matthew Hayes made changes -
          Tags avro
          Matthew Hayes made changes -
          Link This issue blocks AVRO-1106 [ AVRO-1106 ]
          Matthew Hayes made changes -
          Link This issue is related to AVRO-1106 [ AVRO-1106 ]
          Matthew Hayes made changes -
          Link This issue duplicates AVRO-1179 [ AVRO-1179 ]
          Ashish Nagavaram made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Ashish Nagavaram made changes -
          Attachment avro-1215.patch [ 12559907 ]
          Ashish Nagavaram made changes -
          Assignee Ashish Nagavaram [ nagav.ashish ]
          Ashish Nagavaram made changes -
          Attachment AVRO-1215.patch [ 12565903 ]
          Ashish Nagavaram made changes -
          Attachment AVRO-1215-v2.patch [ 12565997 ]
          Ashish Nagavaram made changes -
          Attachment AVRO-1215-v3.patch [ 12566040 ]
          Ashish Nagavaram made changes -
          Link This issue duplicates AVRO-1236 [ AVRO-1236 ]
          Johannes Schulte made changes -
          Link This issue is duplicated by AVRO-1239 [ AVRO-1239 ]
          Ashish Nagavaram made changes -
          Attachment AVRO-1215.patch [ 12567959 ]
          Ashish Nagavaram made changes -
          Attachment avro-1215.patch [ 12559907 ]
          Ashish Nagavaram made changes -
          Attachment AVRO-1215.patch [ 12565903 ]
          Ashish Nagavaram made changes -
          Attachment AVRO-1215-v2.patch [ 12565997 ]
          Ashish Nagavaram made changes -
          Attachment AVRO-1215.patch [ 12568308 ]
          Ashish Nagavaram made changes -
          Attachment AVRO-1215.patch [ 12568310 ]
          Ashish Nagavaram made changes -
          Attachment AVRO-1215_final.patch [ 12569736 ]
          Doug Cutting made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 1.7.4 [ 12323742 ]
          Resolution Fixed [ 1 ]
          Doug Cutting made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Ashish Nagavaram made changes -
          Link This issue relates AVRO-1266 [ AVRO-1266 ]
          Gavin made changes -
          Link This issue relates to AVRO-1266 [ AVRO-1266 ]
          Gavin made changes -
          Link This issue relates to AVRO-1266 [ AVRO-1266 ]

            People

            • Assignee:
              Ashish Nagavaram
              Reporter:
              Matthew Hayes
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development