Pig
  1. Pig
  2. PIG-3661

Piggybank AvroStorage fails if used in more than one load or store statement

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.1
    • Fix Version/s: 0.12.1, 0.13.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      To reproduce:
      A =load '/tmp/data' as (a1:int, a2:int, a3:int);
      B = load '/tmp/data1' as (b1:chararray, b2:chararray, b3:chararray);
      store A into '/tmp/out/a' using org.apache.pig.piggybank.storage.avro.AvroStorage();
      store B into '/tmp/out2/b' using org.apache.pig.piggybank.storage.avro.AvroStorage();

      It either fails in the map job if schema is incompatible, or B gets schema of A and B merged leading to incorrect results.

      Reason is schema is stored and accessed from a property of UDFContext without using a context signature.

      UDFContext context = UDFContext.getUDFContext();
      Properties property = context.getUDFProperties(ResourceSchema.class);
      String prevSchemaStr = property.getProperty(AVRO_OUTPUT_SCHEMA_PROPERTY);

      1. PIG-3661-1.patch
        28 kB
        Rohini Palaniswamy
      2. PIG-3661-2.patch
        28 kB
        Rohini Palaniswamy

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Rohini Palaniswamy
              Reporter:
              Rohini Palaniswamy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development