Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-1493

Avoid the "Turkish Locale Problem"

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.6
    • Fix Version/s: 1.8.1
    • Component/s: java
    • Labels:
      None
    • Environment:

      Hadoop trunk build error on mac-os with turkish locale.

      Description

      Locale dependent String.toUpperCase(), String.toLowerCase() causes unexpected behavior if the the locale is Turkish
      Not sure about String.equalsIgnoreCase(..).

      Here is the error :

      [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hadoop-common: Compilation failure
      [ERROR] /Users/serkan/programlar/dev/hadooptest/hadoop-trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java/org/apache/hadoop/io/serializer/avro/AvroRecord.java:[10,244] unmappable character for encoding UTF-8
      [ERROR] -> [Help 1]
      org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hadoop-common: Compilation failure
      /Users/serkan/programlar/dev/hadooptest/hadoop-trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java/org/apache/hadoop/io/serializer/avro/AvroRecord.java:[10,244] unmappable character for encoding UTF-8

      I f i check the code i discovered the reason for error :

      public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"AvroRecord\",\"namespace\":\"org.apache.hadoop.io.serializer.avro\",\"fields\":[

      {\"name\":\"intField\",\"type\":\"Ýnt\"}

      ]}");

      For the code generated from schema, locale dependent capitalization of letter "i" turns in to "Ý" should be the same for "I" to "ı".

      Same bug exist in OPENEJB-1071, OAK-260, IBATIS-218.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/avro/pull/69

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/avro/pull/69
          Hide
          rdblue Ryan Blue added a comment -

          I committed the fix. Thanks Kevin Schultz! And thank you Serkan Taş for reporting the problem!

          Show
          rdblue Ryan Blue added a comment - I committed the fix. Thanks Kevin Schultz ! And thank you Serkan Taş for reporting the problem!
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5e6ffb8d444c0ed3fb6d0180718a9a7c131f2ce6 in avro's branch refs/heads/master from Kevin Schultz
          [ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=5e6ffb8 ]

          AVRO-1493: Java: Schema fingerprint vary by locale

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5e6ffb8d444c0ed3fb6d0180718a9a7c131f2ce6 in avro's branch refs/heads/master from Kevin Schultz [ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=5e6ffb8 ] AVRO-1493 : Java: Schema fingerprint vary by locale
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user krschultz opened a pull request:

          https://github.com/apache/avro/pull/69

          AVRO-1493: Java: Schema fingerprint vary by locale

          fixes https://issues.apache.org/jira/browse/AVRO-1493

          was originally https://github.com/apache/avro/pull/63 before the SVN -> git migration

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/krschultz/avro AVRO-1493

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/avro/pull/69.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #69


          commit e7de3f850f66afbb72813e542c21e45d3bf670e6
          Author: Kevin Schultz <kschultz@gilt.com>
          Date: 2015-12-09T21:01:59Z

          AVRO-1493: Java: Schema fingerprint vary by locale


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user krschultz opened a pull request: https://github.com/apache/avro/pull/69 AVRO-1493 : Java: Schema fingerprint vary by locale fixes https://issues.apache.org/jira/browse/AVRO-1493 was originally https://github.com/apache/avro/pull/63 before the SVN -> git migration You can merge this pull request into a Git repository by running: $ git pull https://github.com/krschultz/avro AVRO-1493 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/69.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #69 commit e7de3f850f66afbb72813e542c21e45d3bf670e6 Author: Kevin Schultz <kschultz@gilt.com> Date: 2015-12-09T21:01:59Z AVRO-1493 : Java: Schema fingerprint vary by locale
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/avro/pull/63

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/avro/pull/63
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user krschultz opened a pull request:

          https://github.com/apache/avro/pull/63

          AVRO-1493: Java: Schema fingerprint vary by locale

          fixes https://issues.apache.org/jira/browse/AVRO-1493

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/krschultz/avro AVRO-1493

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/avro/pull/63.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #63


          commit 39d6b9db492b0f5f4d0ed1a32f0cb5c7be0fa11f
          Author: Kevin Schultz <kschultz@gilt.com>
          Date: 2015-12-09T21:01:59Z

          AVRO-1493: Java: Schema fingerprint vary by locale


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user krschultz opened a pull request: https://github.com/apache/avro/pull/63 AVRO-1493 : Java: Schema fingerprint vary by locale fixes https://issues.apache.org/jira/browse/AVRO-1493 You can merge this pull request into a Git repository by running: $ git pull https://github.com/krschultz/avro AVRO-1493 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/63.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #63 commit 39d6b9db492b0f5f4d0ed1a32f0cb5c7be0fa11f Author: Kevin Schultz <kschultz@gilt.com> Date: 2015-12-09T21:01:59Z AVRO-1493 : Java: Schema fingerprint vary by locale
          Hide
          rdblue Ryan Blue added a comment -

          Thanks, Kevin Schultz! Feel free to open a pull request against the Avro github project (and post a link) if that's easier. Otherwise, svn diff is your friend. I haven't used SVN in years – I do everything through git-svn and development in git.

          Show
          rdblue Ryan Blue added a comment - Thanks, Kevin Schultz ! Feel free to open a pull request against the Avro github project (and post a link) if that's easier. Otherwise, svn diff is your friend. I haven't used SVN in years – I do everything through git-svn and development in git.
          Hide
          krschultz Kevin Schultz added a comment -

          I have a test case that illustrates the issue & a patch to fix it, I just need to find my way through the Avro contribution process (and remember how to use SVN)

          https://github.com/krschultz/avro/commit/39d6b9db492b0f5f4d0ed1a32f0cb5c7be0fa11f

          Show
          krschultz Kevin Schultz added a comment - I have a test case that illustrates the issue & a patch to fix it, I just need to find my way through the Avro contribution process (and remember how to use SVN) https://github.com/krschultz/avro/commit/39d6b9db492b0f5f4d0ed1a32f0cb5c7be0fa11f
          Hide
          krschultz Kevin Schultz added a comment -

          We are seeing the same error using Avro on Android. I dug around a bit and at least for the fingerprint generation problem my guess is that the problem is caused by a few places where the Locale is not explicitly set.

          org.apache.avro.generic.GenericData.java: builder.append(hex.toUpperCase());
          org.apache.avro.Schema.java: order Field.Order.valueOf(orderNode.getTextValue().toUpperCase());
          org.apache.avro.Schema.java: private Type()

          { this.name = this.name().toLowerCase(); }
          org.apache.avro.Schema.java: private Order() { this.name = this.name().toLowerCase(); }
          Show
          krschultz Kevin Schultz added a comment - We are seeing the same error using Avro on Android. I dug around a bit and at least for the fingerprint generation problem my guess is that the problem is caused by a few places where the Locale is not explicitly set. org.apache.avro.generic.GenericData.java: builder.append(hex.toUpperCase()); org.apache.avro.Schema.java: order Field.Order.valueOf(orderNode.getTextValue().toUpperCase()); org.apache.avro.Schema.java: private Type() { this.name = this.name().toLowerCase(); } org.apache.avro.Schema.java: private Order() { this.name = this.name().toLowerCase(); }
          Hide
          jaley James Aley added a comment -

          We ship Avro to Android devices and use it for data serialisation. We have a lot of Turkish users, and have consequently run into a couple of issues relating to this. I thought perhaps it would be a good idea to mention the two ways we've seen this manifest itself so far:

          • Schema fingerprint generation behaves differently (generates different values for the schemata) if locale is set to Turkish
          • The "order: ignore" annotation causes schema parsing to fail, as the Turkish dotted I character is used on the word "ignore", when loading the enum at runtime dynamically using Enum.valueOf() in generated Java code.
          Show
          jaley James Aley added a comment - We ship Avro to Android devices and use it for data serialisation. We have a lot of Turkish users, and have consequently run into a couple of issues relating to this. I thought perhaps it would be a good idea to mention the two ways we've seen this manifest itself so far: Schema fingerprint generation behaves differently (generates different values for the schemata) if locale is set to Turkish The "order: ignore" annotation causes schema parsing to fail, as the Turkish dotted I character is used on the word "ignore", when loading the enum at runtime dynamically using Enum.valueOf() in generated Java code.

            People

            • Assignee:
              krschultz Kevin Schultz
              Reporter:
              serkan_tas Serkan Taş
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development