Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.10.1
    • Component/s: impl
    • Labels:
      None

      Description

      Pig will report avro records twice.

      To Reproduce:

      • Place attached files on hdfs
      • run pig
        > register lib/piggybank.jar
        > register lib/avro-1.7.4.jar
        > register lib/json-simple-1.1.jar
        > register lib/jackson-mapper-asl-1.6.0.jar
        > register lib/jackson-core-asl-1.6.0.jar
        > user_data= LOAD 'twitter.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage();
        > dump user_data;

      Result:
      (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
      (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
      (Test1,One Tweet,1366154490)
      (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
      (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
      (Test1,One Tweet,1366154490)

      1. twitter.avro
        0.5 kB
        Hans Uhlig
      2. twitter.avsc
        0.5 kB
        Hans Uhlig
      3. twitter.json
        0.2 kB
        Hans Uhlig

        Activity

        Hide
        Viraj Bhat added a comment -

        Hi Hans,
        Could you try upgrading only the piggybank.jar, which contains the AvroStorage related classes from Pig 0.8.1 to Pig 0.10.1. I did not see this problem in Pig 0.10.1 and beyond.

        user_data= LOAD 'twitter_files/twitter.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage();
        describe user_data;
        dump user_data;

        Results in:
        (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
        (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
        (Test1,One Tweet,1366154490)

        You however cannot read the twitter.json using AvroStorage.

        Caused by: java.io.IOException: Not a data file.
        at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
        at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:218)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:169)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:145)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:293)
        at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
        ... 18 more

        Viraj

        Show
        Viraj Bhat added a comment - Hi Hans, Could you try upgrading only the piggybank.jar, which contains the AvroStorage related classes from Pig 0.8.1 to Pig 0.10.1. I did not see this problem in Pig 0.10.1 and beyond. user_data= LOAD 'twitter_files/twitter.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); describe user_data; dump user_data; Results in: (miguno,Rock: Nerf paper, scissors is fine.,1366150681) (BlizzardCS,Works as intended. Terran is IMBA.,1366154481) (Test1,One Tweet,1366154490) You however cannot read the twitter.json using AvroStorage. Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84) at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:218) at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:169) at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:145) at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:293) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151) ... 18 more Viraj
        Hide
        Hans Uhlig added a comment -

        Corrected in a newer version.

        Show
        Hans Uhlig added a comment - Corrected in a newer version.

          People

          • Assignee:
            Unassigned
            Reporter:
            Hans Uhlig
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development