Pig
  1. Pig
  2. PIG-3661

Piggybank AvroStorage fails if used in more than one load or store statement

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.1
    • Fix Version/s: 0.12.1, 0.13.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      To reproduce:
      A =load '/tmp/data' as (a1:int, a2:int, a3:int);
      B = load '/tmp/data1' as (b1:chararray, b2:chararray, b3:chararray);
      store A into '/tmp/out/a' using org.apache.pig.piggybank.storage.avro.AvroStorage();
      store B into '/tmp/out2/b' using org.apache.pig.piggybank.storage.avro.AvroStorage();

      It either fails in the map job if schema is incompatible, or B gets schema of A and B merged leading to incorrect results.

      Reason is schema is stored and accessed from a property of UDFContext without using a context signature.

      UDFContext context = UDFContext.getUDFContext();
      Properties property = context.getUDFProperties(ResourceSchema.class);
      String prevSchemaStr = property.getProperty(AVRO_OUTPUT_SCHEMA_PROPERTY);

      1. PIG-3661-1.patch
        28 kB
        Rohini Palaniswamy
      2. PIG-3661-2.patch
        28 kB
        Rohini Palaniswamy

        Issue Links

          Activity

          Hide
          Cheolsoo Park added a comment -

          We discovered piggybank TestAvroStorage fails in branch 0.12 w/o this patch due to .svn files. Backported to 0.12.1.

          Show
          Cheolsoo Park added a comment - We discovered piggybank TestAvroStorage fails in branch 0.12 w/o this patch due to .svn files. Backported to 0.12.1.
          Hide
          Rohini Palaniswamy added a comment -

          Committed to trunk. Thanks Cheolsoo for the review.

          Show
          Rohini Palaniswamy added a comment - Committed to trunk. Thanks Cheolsoo for the review.
          Hide
          Rohini Palaniswamy added a comment -

          This patch fixes other issues with AvroStorage as well apart from fixing multiple load and store

          • Hidden files were not excluded (PIG-3717)
          • mapred.input.dir was getting populated with all files instead of the top level directory making the conf very big
          • Default value was not set for a Union

          https://reviews.apache.org/r/17266/

          Show
          Rohini Palaniswamy added a comment - This patch fixes other issues with AvroStorage as well apart from fixing multiple load and store Hidden files were not excluded ( PIG-3717 ) mapred.input.dir was getting populated with all files instead of the top level directory making the conf very big Default value was not set for a Union https://reviews.apache.org/r/17266/

            People

            • Assignee:
              Rohini Palaniswamy
              Reporter:
              Rohini Palaniswamy
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development