Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1418

AvroMultipleOutputs should support sync-able writers

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.6
    • Fix Version/s: 1.7.6
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      1. Modified org.apache.avro.mapreduce.AvroMultipleOutputs to support sync() on a given named output.
      2. org.apache.avro.mapreduce.AvroKeyValueRecordWriter and org.apache.avro.mapreduce.AvroKeyRecordWriter implements org.apache.avro.mapreduce.Syncable
      This allows sync to be invoked on writer instance from AvroMultipleOutputs.sync() as the writer is stored as org.apache.hadoop.mapreduce.RecordWriter (Hence could be anything apart from Sycnable).

      Test Cases
      1) TestAvroKeyValueRecordWriter and TestAvroKeyRecordWriter modified to include a test case each for sync() tests.
      2) TestAvroMultipleOutputsSyncable new class to test synch for AvroMultipleOutputs
      Show
      1. Modified org.apache.avro.mapreduce.AvroMultipleOutputs to support sync() on a given named output. 2. org.apache.avro.mapreduce.AvroKeyValueRecordWriter and org.apache.avro.mapreduce.AvroKeyRecordWriter implements org.apache.avro.mapreduce.Syncable This allows sync to be invoked on writer instance from AvroMultipleOutputs.sync() as the writer is stored as org.apache.hadoop.mapreduce.RecordWriter (Hence could be anything apart from Sycnable). Test Cases 1) TestAvroKeyValueRecordWriter and TestAvroKeyRecordWriter modified to include a test case each for sync() tests. 2) TestAvroMultipleOutputsSyncable new class to test synch for AvroMultipleOutputs

      Description

      DataFileWriter supports APIs like sync() (that allows to emit synchronization markers) so that DataFileReader could later use sync() or seek() to move to a particular synchronization point.

      AvroMultipleOutputs does not support or provide a way to invoke sync on its individual writers. One could extend its behavior, however its design is closed for extension. (All states are private and getRecordWriter() are private). Hence AvroMultipleOutputs must first be modified so as to support extension and additional classes must be provided to support a synch able MutilpleOutputFormats.

      Solution
      ======

      I) MarkableAvroMultipleOutputs : Allows users to set synchronization points before/after writing Key-Value pairs with AvroMultipleOutputs.write()
      A public api to invoke sync on a named output.
      Ex: public void sync(String namedOutput, String baseOutputPath) throws IOException, InterruptedException {}

      To achieve above AvroMultipleOutputs should be modified so as to allow support for additional behavior. The following must be marked as protected instead of private
      1) private static void checkBaseOutputPath(String outputPath) {} from private.
      2) private static void checkNamedOutputName(JobContext job, String namedOutput, boolean alreadyDefined) {} from private.
      3) private TaskInputOutputContext<?, ?, ?, ?> context;
      4) private Set<String> namedOutputs;
      5) private synchronized RecordWriter getRecordWriter(TaskAttemptContext taskContext, String baseFileName)

      II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers for individual writers is again closed for extension. It must allow to invoke sync() on writer.

      To achieve that the following private members must be marked protected.
      1) private final DataFileWriter<GenericRecord> mAvroFileWriter;

      A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API to invoke sync on its writer.
      public void sync() throws IOException {}

      III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat and uses MarkableAvroKeyValueRecordWriter.

      Include similar support for AvroKeyOutputFormat & AvroKeyRecordWriter.

        Attachments

        1. AVRO-1418.patch
          33 kB
          Deepak Kumar V

          Activity

            People

            • Assignee:
              deepujain Deepak Kumar V
              Reporter:
              deepujain Deepak Kumar V
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: