Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.10.1
    • Component/s: impl
    • Labels:
      None

      Description

      Pig will report avro records twice.

      To Reproduce:

      • Place attached files on hdfs
      • run pig
        > register lib/piggybank.jar
        > register lib/avro-1.7.4.jar
        > register lib/json-simple-1.1.jar
        > register lib/jackson-mapper-asl-1.6.0.jar
        > register lib/jackson-core-asl-1.6.0.jar
        > user_data= LOAD 'twitter.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage();
        > dump user_data;

      Result:
      (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
      (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
      (Test1,One Tweet,1366154490)
      (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
      (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
      (Test1,One Tweet,1366154490)

      1. twitter.avro
        0.5 kB
        Hans Uhlig
      2. twitter.avsc
        0.5 kB
        Hans Uhlig
      3. twitter.json
        0.2 kB
        Hans Uhlig

        Activity

        Hans Uhlig created issue -
        Hans Uhlig made changes -
        Field Original Value New Value
        Attachment twitter.avro [ 12581761 ]
        Attachment twitter.avsc [ 12581762 ]
        Attachment twitter.json [ 12581763 ]
        Hans Uhlig made changes -
        Affects Version/s 0.8.1 [ 12316393 ]
        Affects Version/s 0.11.1 [ 12324080 ]
        Hide
        Viraj Bhat added a comment -

        Hi Hans,
        Could you try upgrading only the piggybank.jar, which contains the AvroStorage related classes from Pig 0.8.1 to Pig 0.10.1. I did not see this problem in Pig 0.10.1 and beyond.

        user_data= LOAD 'twitter_files/twitter.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage();
        describe user_data;
        dump user_data;

        Results in:
        (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
        (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
        (Test1,One Tweet,1366154490)

        You however cannot read the twitter.json using AvroStorage.

        Caused by: java.io.IOException: Not a data file.
        at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
        at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:218)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:169)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:145)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:293)
        at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
        ... 18 more

        Viraj

        Show
        Viraj Bhat added a comment - Hi Hans, Could you try upgrading only the piggybank.jar, which contains the AvroStorage related classes from Pig 0.8.1 to Pig 0.10.1. I did not see this problem in Pig 0.10.1 and beyond. user_data= LOAD 'twitter_files/twitter.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); describe user_data; dump user_data; Results in: (miguno,Rock: Nerf paper, scissors is fine.,1366150681) (BlizzardCS,Works as intended. Terran is IMBA.,1366154481) (Test1,One Tweet,1366154490) You however cannot read the twitter.json using AvroStorage. Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84) at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:218) at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:169) at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:145) at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:293) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151) ... 18 more Viraj
        Hide
        Hans Uhlig added a comment -

        Corrected in a newer version.

        Show
        Hans Uhlig added a comment - Corrected in a newer version.
        Hans Uhlig made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.10.1 [ 12320547 ]
        Resolution Won't Fix [ 2 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        34d 9h 17m 1 Hans Uhlig 07/Jun/13 09:23

          People

          • Assignee:
            Unassigned
            Reporter:
            Hans Uhlig
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development