Pig
  1. Pig
  2. PIG-3318

AVRO: 'default value' not honored when merging schemas on load with AvroStorage

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.2
    • Fix Version/s: 0.12.0
    • Component/s: piggybank
    • Labels:
    • Tags:
      AvroStorage

      Description

      Piggybank - AvroStorage. When merging multiple schemas where default values have been specified in the avro schema;
      The AvroStorage puts nulls in the merged data set.

      ==> Employee3.avro <==
      {
      "type" : "record",
      "name" : "employee",
      "fields":[

      {"name" : "name", "type" : "string", "default" : "NU"}

      ,

      {"name" : "age", "type" : "int", "default" : 0 }

      ,

      {"name" : "dept", "type": "string", "default" : "DU"}

      ] }

      ==> Employee4.avro <==
      {
      "type" : "record",
      "name" : "employee",
      "fields":[

      {"name" : "name", "type" : "string", "default" : "NU"}

      ,

      {"name" : "age", "type" : "int", "default" : 0}

      ,

      {"name" : "dept", "type": "string", "default" : "DU"}

      ,

      {"name" : "office", "type": "string", "default" : "OU"}

      ] }

      ==> Employee6.avro <==
      {
      "type" : "record",
      "name" : "employee",
      "fields":[

      {"name" : "name", "type" : "string", "default" : "NU"}

      ,

      {"name" : "lastname", "type": "string", "default" : "LNU"}

      ,

      {"name" : "age", "type" : "int","default" : 0}

      ,

      {"name" : "salary", "type": "int", "default" : 0}

      ,

      {"name" : "dept", "type": "string","default" : "DU"}

      ,

      {"name" : "office", "type": "string","default" : "OU"}

      ] }

      The pig script:
      employee = load 'employee

      {3,4,6}

      .ser' using org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
      describe employee;
      dump employee;

      Output Schema:
      employee:

      {name: chararray,age: int,dept: chararray,lastname: chararray,salary: int,office: chararray}

      (Milo,30,DH,,,)
      (Asmya,34,PQ,,,)
      (Baljit,23,RS,,,)
      (Pune,60,Astrophysics,Warriors,5466,UTA)
      (Rajsathan,20,Biochemistry,Royals,1378,Stanford)
      (Chennai,50,Microbiology,Superkings,7338,Hopkins)
      (Mumbai,20,Applied Math,Indians,4468,UAH)
      (Praj,54,RMX,,,Champaign)
      (Buba,767,HD,,,Sunnyvale)
      (Manku,375,MS,,,New York)

      Regards
      Viraj

      1. Employee3.avro
        0.3 kB
        Viraj Bhat
      2. Employee4.avro
        0.3 kB
        Viraj Bhat
      3. Employee6.avro
        0.5 kB
        Viraj Bhat
      4. PIG-3318_5.patch
        19 kB
        Viraj Bhat

        Activity

        Hide
        Rohini Palaniswamy added a comment -

        Committed to trunk(0.12). Thanks Viraj.

        Show
        Rohini Palaniswamy added a comment - Committed to trunk(0.12). Thanks Viraj.
        Hide
        Viraj Bhat added a comment -

        Sorry for attaching the wrong patch, which makes the test case write to an Avro file. I have modified the test to use mock.Storage(), will reattach the correct patch.
        Viraj

        Show
        Viraj Bhat added a comment - Sorry for attaching the wrong patch, which makes the test case write to an Avro file. I have modified the test to use mock.Storage(), will reattach the correct patch. Viraj
        Hide
        Rohini Palaniswamy added a comment -

        Ran TestAvroStorage before committing. Encountered

        Testcase: testMultipleSchemasWithDefaultValue took 3.543 sec
        Caused an ERROR
        Not a data file.
        java.io.IOException: Not a data file.
        at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
        at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
        at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.verifyResults(TestAvroStorage.java:1292)
        at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.verifyResults(TestAvroStorage.java:1262)
        at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testMultipleSchemasWithDefaultValue(TestAvroStorage.java:704)

        The problem is that the testcase is storing output using PigStorage and so output is not a avro file. I tried changing it to AvroStorage but still failed as it did not match with the expected_testMultipleSchemasWithDefaultValue.avro. Can you fix the testcase? Also can you rename Employee*.ser to Employee*.avro to be consistent with naming.

        Show
        Rohini Palaniswamy added a comment - Ran TestAvroStorage before committing. Encountered Testcase: testMultipleSchemasWithDefaultValue took 3.543 sec Caused an ERROR Not a data file. java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84) at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.verifyResults(TestAvroStorage.java:1292) at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.verifyResults(TestAvroStorage.java:1262) at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testMultipleSchemasWithDefaultValue(TestAvroStorage.java:704) The problem is that the testcase is storing output using PigStorage and so output is not a avro file. I tried changing it to AvroStorage but still failed as it did not match with the expected_testMultipleSchemasWithDefaultValue.avro. Can you fix the testcase? Also can you rename Employee*.ser to Employee*.avro to be consistent with naming.
        Hide
        Rohini Palaniswamy added a comment -

        Review board request from Virag for the patch - https://reviews.apache.org/r/11135/

        Show
        Rohini Palaniswamy added a comment - Review board request from Virag for the patch - https://reviews.apache.org/r/11135/
        Hide
        Viraj Bhat added a comment -

        Patch for adding default values for merged schemas.

        Show
        Viraj Bhat added a comment - Patch for adding default values for merged schemas.
        Hide
        Viraj Bhat added a comment -

        Expected resulting avro file

        Show
        Viraj Bhat added a comment - Expected resulting avro file
        Hide
        Viraj Bhat added a comment -

        Avro test file

        Show
        Viraj Bhat added a comment - Avro test file
        Hide
        Viraj Bhat added a comment -

        avro test file

        Show
        Viraj Bhat added a comment - avro test file
        Hide
        Viraj Bhat added a comment -

        Avro file

        Show
        Viraj Bhat added a comment - Avro file
        Hide
        Viraj Bhat added a comment -

        Patch for branch 0.11.2

        Show
        Viraj Bhat added a comment - Patch for branch 0.11.2
        Hide
        Viraj Bhat added a comment -

        Trunk patch which contains the fix

        Show
        Viraj Bhat added a comment - Trunk patch which contains the fix
        Hide
        Viraj Bhat added a comment -

        Submitting a patch in both Pig 0.11 and Pig 0.12/trunk, which will fix this issue with the relevant test cases and files.

        Show
        Viraj Bhat added a comment - Submitting a patch in both Pig 0.11 and Pig 0.12/trunk, which will fix this issue with the relevant test cases and files.

          People

          • Assignee:
            Viraj Bhat
            Reporter:
            Viraj Bhat
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development