Avro
  1. Avro
  2. AVRO-1493

Avoid the "Turkish Locale Problem"

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.6
    • Fix Version/s: 1.8.1
    • Component/s: java
    • Labels:
      None
    • Environment:

      Hadoop trunk build error on mac-os with turkish locale.

      Description

      Locale dependent String.toUpperCase(), String.toLowerCase() causes unexpected behavior if the the locale is Turkish
      Not sure about String.equalsIgnoreCase(..).

      Here is the error :

      [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hadoop-common: Compilation failure
      [ERROR] /Users/serkan/programlar/dev/hadooptest/hadoop-trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java/org/apache/hadoop/io/serializer/avro/AvroRecord.java:[10,244] unmappable character for encoding UTF-8
      [ERROR] -> [Help 1]
      org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hadoop-common: Compilation failure
      /Users/serkan/programlar/dev/hadooptest/hadoop-trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java/org/apache/hadoop/io/serializer/avro/AvroRecord.java:[10,244] unmappable character for encoding UTF-8

      I f i check the code i discovered the reason for error :

      public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"AvroRecord\",\"namespace\":\"org.apache.hadoop.io.serializer.avro\",\"fields\":[

      {\"name\":\"intField\",\"type\":\"Ýnt\"}

      ]}");

      For the code generated from schema, locale dependent capitalization of letter "i" turns in to "Ý" should be the same for "I" to "ı".

      Same bug exist in OPENEJB-1071, OAK-260, IBATIS-218.

        Activity

        Hide
        James Aley added a comment -

        We ship Avro to Android devices and use it for data serialisation. We have a lot of Turkish users, and have consequently run into a couple of issues relating to this. I thought perhaps it would be a good idea to mention the two ways we've seen this manifest itself so far:

        • Schema fingerprint generation behaves differently (generates different values for the schemata) if locale is set to Turkish
        • The "order: ignore" annotation causes schema parsing to fail, as the Turkish dotted I character is used on the word "ignore", when loading the enum at runtime dynamically using Enum.valueOf() in generated Java code.
        Show
        James Aley added a comment - We ship Avro to Android devices and use it for data serialisation. We have a lot of Turkish users, and have consequently run into a couple of issues relating to this. I thought perhaps it would be a good idea to mention the two ways we've seen this manifest itself so far: Schema fingerprint generation behaves differently (generates different values for the schemata) if locale is set to Turkish The "order: ignore" annotation causes schema parsing to fail, as the Turkish dotted I character is used on the word "ignore", when loading the enum at runtime dynamically using Enum.valueOf() in generated Java code.
        Hide
        Kevin Schultz added a comment -

        We are seeing the same error using Avro on Android. I dug around a bit and at least for the fingerprint generation problem my guess is that the problem is caused by a few places where the Locale is not explicitly set.

        org.apache.avro.generic.GenericData.java: builder.append(hex.toUpperCase());
        org.apache.avro.Schema.java: order Field.Order.valueOf(orderNode.getTextValue().toUpperCase());
        org.apache.avro.Schema.java: private Type()

        { this.name = this.name().toLowerCase(); }
        org.apache.avro.Schema.java: private Order() { this.name = this.name().toLowerCase(); }
        Show
        Kevin Schultz added a comment - We are seeing the same error using Avro on Android. I dug around a bit and at least for the fingerprint generation problem my guess is that the problem is caused by a few places where the Locale is not explicitly set. org.apache.avro.generic.GenericData.java: builder.append(hex.toUpperCase()); org.apache.avro.Schema.java: order Field.Order.valueOf(orderNode.getTextValue().toUpperCase()); org.apache.avro.Schema.java: private Type() { this.name = this.name().toLowerCase(); } org.apache.avro.Schema.java: private Order() { this.name = this.name().toLowerCase(); }
        Hide
        Kevin Schultz added a comment -

        I have a test case that illustrates the issue & a patch to fix it, I just need to find my way through the Avro contribution process (and remember how to use SVN)

        https://github.com/krschultz/avro/commit/39d6b9db492b0f5f4d0ed1a32f0cb5c7be0fa11f

        Show
        Kevin Schultz added a comment - I have a test case that illustrates the issue & a patch to fix it, I just need to find my way through the Avro contribution process (and remember how to use SVN) https://github.com/krschultz/avro/commit/39d6b9db492b0f5f4d0ed1a32f0cb5c7be0fa11f
        Hide
        Ryan Blue added a comment -

        Thanks, Kevin Schultz! Feel free to open a pull request against the Avro github project (and post a link) if that's easier. Otherwise, svn diff is your friend. I haven't used SVN in years – I do everything through git-svn and development in git.

        Show
        Ryan Blue added a comment - Thanks, Kevin Schultz ! Feel free to open a pull request against the Avro github project (and post a link) if that's easier. Otherwise, svn diff is your friend. I haven't used SVN in years – I do everything through git-svn and development in git.
        Hide
        ASF GitHub Bot added a comment -

        GitHub user krschultz opened a pull request:

        https://github.com/apache/avro/pull/63

        AVRO-1493: Java: Schema fingerprint vary by locale

        fixes https://issues.apache.org/jira/browse/AVRO-1493

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/krschultz/avro AVRO-1493

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/avro/pull/63.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #63


        commit 39d6b9db492b0f5f4d0ed1a32f0cb5c7be0fa11f
        Author: Kevin Schultz <kschultz@gilt.com>
        Date: 2015-12-09T21:01:59Z

        AVRO-1493: Java: Schema fingerprint vary by locale


        Show
        ASF GitHub Bot added a comment - GitHub user krschultz opened a pull request: https://github.com/apache/avro/pull/63 AVRO-1493 : Java: Schema fingerprint vary by locale fixes https://issues.apache.org/jira/browse/AVRO-1493 You can merge this pull request into a Git repository by running: $ git pull https://github.com/krschultz/avro AVRO-1493 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/63.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #63 commit 39d6b9db492b0f5f4d0ed1a32f0cb5c7be0fa11f Author: Kevin Schultz <kschultz@gilt.com> Date: 2015-12-09T21:01:59Z AVRO-1493 : Java: Schema fingerprint vary by locale
        Hide
        ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/avro/pull/63

        Show
        ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/avro/pull/63
        Hide
        ASF GitHub Bot added a comment -

        GitHub user krschultz opened a pull request:

        https://github.com/apache/avro/pull/69

        AVRO-1493: Java: Schema fingerprint vary by locale

        fixes https://issues.apache.org/jira/browse/AVRO-1493

        was originally https://github.com/apache/avro/pull/63 before the SVN -> git migration

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/krschultz/avro AVRO-1493

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/avro/pull/69.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #69


        commit e7de3f850f66afbb72813e542c21e45d3bf670e6
        Author: Kevin Schultz <kschultz@gilt.com>
        Date: 2015-12-09T21:01:59Z

        AVRO-1493: Java: Schema fingerprint vary by locale


        Show
        ASF GitHub Bot added a comment - GitHub user krschultz opened a pull request: https://github.com/apache/avro/pull/69 AVRO-1493 : Java: Schema fingerprint vary by locale fixes https://issues.apache.org/jira/browse/AVRO-1493 was originally https://github.com/apache/avro/pull/63 before the SVN -> git migration You can merge this pull request into a Git repository by running: $ git pull https://github.com/krschultz/avro AVRO-1493 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/69.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #69 commit e7de3f850f66afbb72813e542c21e45d3bf670e6 Author: Kevin Schultz <kschultz@gilt.com> Date: 2015-12-09T21:01:59Z AVRO-1493 : Java: Schema fingerprint vary by locale
        Hide
        ASF subversion and git services added a comment -

        Commit 5e6ffb8d444c0ed3fb6d0180718a9a7c131f2ce6 in avro's branch refs/heads/master from Kevin Schultz
        [ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=5e6ffb8 ]

        AVRO-1493: Java: Schema fingerprint vary by locale

        Show
        ASF subversion and git services added a comment - Commit 5e6ffb8d444c0ed3fb6d0180718a9a7c131f2ce6 in avro's branch refs/heads/master from Kevin Schultz [ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=5e6ffb8 ] AVRO-1493 : Java: Schema fingerprint vary by locale
        Hide
        Ryan Blue added a comment -

        I committed the fix. Thanks Kevin Schultz! And thank you Serkan Taş for reporting the problem!

        Show
        Ryan Blue added a comment - I committed the fix. Thanks Kevin Schultz ! And thank you Serkan Taş for reporting the problem!

          People

          • Assignee:
            Kevin Schultz
            Reporter:
            Serkan Taş
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development