Pig
  1. Pig
  2. PIG-2195

AvroStorage fails to STORE when LOADing via PigStorage

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      AvroStorage support for using a schema from an Avro schema file.

      Description

      Reading data via PigStorage and writing it via AvroStorage fails with an exception like this

      java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord

      The Pig script in this section of the documentation shows an example like this that fails:

      http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data#AvroStorage-PigsupportforAvrodata-A.Howtostoredataindifferentways.

      A workaround currently exists to produce avro from TSVs like this:

      avro = LOAD 'inputPath/' AS (foo);
      STORE avro INTO 'outputPath/' USING oap.piggybank.storage.avro.AvroStorage(
        '{"data":"data_file.avro",
          "same":"data_file.avro", "field0":"def:bar"}');
      

      This is redundant though and data and same seem to indicate the same thing. This approach also requires an existing avro data file to exist. This patch will make the following alternate constructor syntax's work as well.

      1. Read schema from an existing data file:
          '{"data":"data_file.avro", "field0":"def:bar"}');
        
      2. Read schema from an existing schema file:
          '{"schema_file":"data_file.avsc", "field0":"def:bar"}');
        
      1. expected_testRecordSplitFromText1.avro
        0.2 kB
        Bill Graham
      2. expected_testRecordSplitFromText2.avro
        0.3 kB
        Bill Graham
      3. PIG-2195_1.patch
        20 kB
        Bill Graham

        Activity

        Bill Graham created issue -
        Bill Graham made changes -
        Field Original Value New Value
        Attachment PIG-2195_1.patch [ 12489137 ]
        Attachment expected_testRecordSplitFromText1.avro [ 12489138 ]
        Attachment expected_testRecordSplitFromText2.avro [ 12489139 ]
        Bill Graham made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Release Note AvroStorage support for using a schema from an Avro schema file.
        Alan Gates made changes -
        Resolution Fixed [ 1 ]
        Fix Version/s 0.10 [ 12316246 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Bill Graham
            Reporter:
            Bill Graham
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development