Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3661

Piggybank AvroStorage fails if used in more than one load or store statement

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.1
    • Fix Version/s: 0.12.1, 0.13.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      To reproduce:
      A =load '/tmp/data' as (a1:int, a2:int, a3:int);
      B = load '/tmp/data1' as (b1:chararray, b2:chararray, b3:chararray);
      store A into '/tmp/out/a' using org.apache.pig.piggybank.storage.avro.AvroStorage();
      store B into '/tmp/out2/b' using org.apache.pig.piggybank.storage.avro.AvroStorage();

      It either fails in the map job if schema is incompatible, or B gets schema of A and B merged leading to incorrect results.

      Reason is schema is stored and accessed from a property of UDFContext without using a context signature.

      UDFContext context = UDFContext.getUDFContext();
      Properties property = context.getUDFProperties(ResourceSchema.class);
      String prevSchemaStr = property.getProperty(AVRO_OUTPUT_SCHEMA_PROPERTY);

        Attachments

        1. PIG-3661-1.patch
          28 kB
          Rohini Palaniswamy
        2. PIG-3661-2.patch
          28 kB
          Rohini Palaniswamy

          Issue Links

            Activity

              People

              • Assignee:
                rohini Rohini Palaniswamy
                Reporter:
                rohini Rohini Palaniswamy
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: