Pig
  1. Pig
  2. PIG-3297

Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.1
    • Fix Version/s: 0.12.0
    • Component/s: piggybank
    • Labels:
      None
    • Release Note:
      Hide
      Read Avro files that have string fields that were written with avro.java.string = String
      Show
      Read Avro files that have string fields that were written with avro.java.string = String

      Description

      When an Avro file is created there exists the option to set the "String Type" to a different class than the default Utf8.
      A very common situation is that the "String Type" is set to the default String class.

      When trying to read such an Avro file in Pig using the AvroStorage LoadFunc from the included piggybank this gives the following Exception:

      Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8
      at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)

      1. PIG-3297-1.patch
        5 kB
        Niels Basjes
      2. test_record.avro
        0.2 kB
        Niels Basjes
      3. PIG-3297-v2-20130521-The-fix.patch
        0.8 kB
        Niels Basjes
      4. PIG-3297-v2-20130521-Unittest.patch
        11 kB
        Niels Basjes

        Issue Links

          Activity

          Hide
          Niels Basjes added a comment -

          I have a working fix that I'll submit shortly.

          Show
          Niels Basjes added a comment - I have a working fix that I'll submit shortly.
          Hide
          Niels Basjes added a comment -

          The patch I created.

          Show
          Niels Basjes added a comment - The patch I created.
          Hide
          Niels Basjes added a comment -

          I found that although the fix correct is that the test I created not right is.

          Show
          Niels Basjes added a comment - I found that although the fix correct is that the test I created not right is.
          Hide
          Niels Basjes added a comment -

          The simplest test file that demonstrates the bug.

          Show
          Niels Basjes added a comment - The simplest test file that demonstrates the bug.
          Hide
          Michael Moss added a comment -

          Niels, I've run into this also (and a similar issue with Hive), and it seems that it might be brought on not by the code you patched, but perhaps in the avro-1.x.y.jar files itself.

          We are serializing strings as avro.java.string and everything was working fine on our HDP1.2 (Hortonworks) cluster, but when I upgraded the avro jar that pig uses to avro-1.7.4 from avro-1.5.3, I get this exception.

          I'm also have this issue on the latest version of CDH4.2 (with Impala1.0) in both pig and hive and the culprit there seems to be the avro-1.7.x.jar that they use.

          I'm just starting to dig into finding out why, but was hoping you or someone here might have some insight.

          Thanks.

          Show
          Michael Moss added a comment - Niels, I've run into this also (and a similar issue with Hive), and it seems that it might be brought on not by the code you patched, but perhaps in the avro-1.x.y.jar files itself. We are serializing strings as avro.java.string and everything was working fine on our HDP1.2 (Hortonworks) cluster, but when I upgraded the avro jar that pig uses to avro-1.7.4 from avro-1.5.3, I get this exception. I'm also have this issue on the latest version of CDH4.2 (with Impala1.0) in both pig and hive and the culprit there seems to be the avro-1.7.x.jar that they use. I'm just starting to dig into finding out why, but was hoping you or someone here might have some insight. Thanks.
          Hide
          Niels Basjes added a comment -

          In reply to the question Michael Moss posted to the Avro mailing list the reponse was that the proposed fix is the correct way to solve the issue.
          See: http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3CCDB6AEC8.EE942%25scott@richrelevance.com%3E

          Quote:

          The change in the Pig loader in PIG-3297 seems correct ‹ they must use
          CharSequence, not Utf8.

          Show
          Niels Basjes added a comment - In reply to the question Michael Moss posted to the Avro mailing list the reponse was that the proposed fix is the correct way to solve the issue. See: http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3CCDB6AEC8.EE942%25scott@richrelevance.com%3E Quote: The change in the Pig loader in PIG-3297 seems correct ‹ they must use CharSequence, not Utf8.
          Hide
          Niels Basjes added a comment -

          The only way to have a unit test for this bug is to bump the version of Avro that is used while doing the junit tests to version 1.7.4

          Show
          Niels Basjes added a comment - The only way to have a unit test for this bug is to bump the version of Avro that is used while doing the junit tests to version 1.7.4
          Hide
          Niels Basjes added a comment -

          The fix for this bug is very simple.
          In order to have a unit test that actually fails without this fix we create a dependency with upgrading the dependency with avro to version 1.7.4

          I leave it to the committers to decide to "just do the fix" or to "do the fix, upgrade to Avro, and add the unit test".

          Show
          Niels Basjes added a comment - The fix for this bug is very simple. In order to have a unit test that actually fails without this fix we create a dependency with upgrading the dependency with avro to version 1.7.4 I leave it to the committers to decide to "just do the fix" or to "do the fix, upgrade to Avro, and add the unit test".
          Hide
          Niels Basjes added a comment -

          Is blocked iff we want to use the provided unit test.

          Show
          Niels Basjes added a comment - Is blocked iff we want to use the provided unit test.
          Hide
          Niels Basjes added a comment -

          The fix for this bug is very simple.
          In order to have a unit test that actually fails without this fix we create a dependency with upgrading the dependency with avro to version 1.7.4
          I leave it to the committers to decide to "just do the fix" or to "do the fix, upgrade to Avro, and add the unit test".
          For this reason the fix and the unit test are placed in separate files.

          Show
          Niels Basjes added a comment - The fix for this bug is very simple. In order to have a unit test that actually fails without this fix we create a dependency with upgrading the dependency with avro to version 1.7.4 I leave it to the committers to decide to "just do the fix" or to "do the fix, upgrade to Avro, and add the unit test". For this reason the fix and the unit test are placed in separate files.
          Hide
          Cheolsoo Park added a comment -

          +1.

          Show
          Cheolsoo Park added a comment - +1.
          Hide
          Cheolsoo Park added a comment -

          I only committed the fix to trunk. Please let me know if you'd like to get the unit test committed as well.

          Thanks Niels!

          Show
          Cheolsoo Park added a comment - I only committed the fix to trunk. Please let me know if you'd like to get the unit test committed as well. Thanks Niels!

            People

            • Assignee:
              Niels Basjes
              Reporter:
              Niels Basjes
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development