Pig
  1. Pig
  2. PIG-2330

Problem in org.apache.pig.piggybank.storage.avro.AvroStorage when storing a record with a single field.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.9.0
    • Fix Version/s: None
    • Component/s: piggybank
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Running the following script yields a RuntimeException. If the schema is changed to contain two fields, then A can be stored successfully.

      REGISTER 'piggybank.jar'
      REGISTER 'avro-1.5.4.jar'
      REGISTER 'json-simple-1.1.jar'
      
      A = load 'input.txt' AS (name1:chararray, name2:chararray);
      B = foreach A generate $0;
      store B into './output' using org.apache.pig.piggybank.storage.avro.AvroStorage(
      '{"schema": {"type": "record", "name": "main", "fields": [{"name": "name", "type": ["null", "string"]}]}}');
      
      1. input.txt
        0.0 kB
        Stan Rosenberg
      2. AvroStorage.patch
        0.5 kB
        Stan Rosenberg

        Issue Links

          Activity

          Hide
          Stan Rosenberg added a comment -

          The problem seems to be localized to only one line of code.

          Show
          Stan Rosenberg added a comment - The problem seems to be localized to only one line of code.
          Hide
          Viraj Bhat added a comment -

          Hi,
          The issue here is not related to : PIG-3322. The 1 line fix should solve the above problem.

          Consider a change in the script to add TOTUPLE: The below works to generate the following

          A = load 'input.txt' AS (name1:chararray, name2:chararray);
          B = foreach A generate TOTUPLE($0);
          dump B;
          store B into 'singlefieldoutput' using
          org.apache.pig.piggybank.storage.avro.AvroStorage('{"schema": {"type":
          "record", "name": "main", "fields": [{"name": "name", "type": ["null",
          "string"]}]}}')
          

          Output

          ((Viraj))
          ((Roh))
          ((Govind))
          

          The table provided in: https://cwiki.apache.org/PIG/avrostorage.html shows that it is possible to convert from Pig Tuple to Avro Record as they are set of ordered fields. But is not possible to convert from "chararray" to "record". In Pig you cannot generate a single chararray, it is always wrapped by a tuple.

          Try loading the output generated by the older Pig script.

          A = load 'singlefieldoutput' using org.apache.pig.piggybank.storage.avro.AvroStorage();
          describe A;
          dump A;
          

          Now we see the following:

          (Viraj)
          (Roh)
          (Govind)
          

          Which is different from "dump B"

          Viraj

          Show
          Viraj Bhat added a comment - Hi, The issue here is not related to : PIG-3322 . The 1 line fix should solve the above problem. Consider a change in the script to add TOTUPLE: The below works to generate the following A = load 'input.txt' AS (name1:chararray, name2:chararray); B = foreach A generate TOTUPLE($0); dump B; store B into 'singlefieldoutput' using org.apache.pig.piggybank.storage.avro.AvroStorage('{ "schema" : { "type" : "record" , "name" : "main" , "fields" : [{ "name" : "name" , "type" : [ " null " , "string" ]}]}}') Output ((Viraj)) ((Roh)) ((Govind)) The table provided in: https://cwiki.apache.org/PIG/avrostorage.html shows that it is possible to convert from Pig Tuple to Avro Record as they are set of ordered fields. But is not possible to convert from "chararray" to "record". In Pig you cannot generate a single chararray, it is always wrapped by a tuple. Try loading the output generated by the older Pig script. A = load 'singlefieldoutput' using org.apache.pig.piggybank.storage.avro.AvroStorage(); describe A; dump A; Now we see the following: (Viraj) (Roh) (Govind) Which is different from "dump B" Viraj

            People

            • Assignee:
              Unassigned
              Reporter:
              Stan Rosenberg
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development