Solr
  1. Solr
  2. SOLR-700

NumberFormatTransformer should have configurable locales

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Labels:
      None

      Description

      NumberFormatTransformer uses a NumberFormatter which relies on the system locale. This makes it impossible to use NumberFormatTransformer with data whose locale does not match the system locale.

      TestNumberFormatTransformer fails on some locales for similar reasons because the grouping symbol differs in different locales.

      This issue adds a locale attribute for NumberFormatTransformer which allows the user to specify the locale which should be used for formatting. The locale must be specified as land-country e.g. en-US

      <field column="myNumber" formatStyle="number" locale="de-DE" />
      
      1. SOLR-700.patch
        6 kB
        Shalin Shekhar Mangar
      2. handler.dataimport.NumberFormatTransformer-locale.patch..txt
        3 kB
        Stefan Oestreicher
      3. handler.dataimport.NumberFormatTransformer-locale.patch..txt
        3 kB
        Stefan Oestreicher
      4. handler.dataimport.NumberFormatTransformer-locale.patch..txt
        3 kB
        Stefan Oestreicher
      5. handler.dataimport.NumberFormatTransformer-locale.patch..txt
        2 kB
        Stefan Oestreicher
      6. schema.DateField-locale.patch.txt
        1 kB
        Stefan Oestreicher
      7. handler.dataimport.TestNumberFormatTransformer-locale.patch.txt
        2 kB
        Stefan Oestreicher

        Activity

        Hide
        Stefan Oestreicher added a comment -

        based on Shalins suggestion to add an attribute for the locale I prepared another patch implementing those changes.

        Show
        Stefan Oestreicher added a comment - based on Shalins suggestion to add an attribute for the locale I prepared another patch implementing those changes.
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Stefan.

        I was thinking about having a locale attribute on each field element in data-config.xml. We should modify the NumberFormatTransformer#transformRow method to read each field's locale attribute. Every field's attributes are exposed as a map, so we should have:

        // Each locale must be of format lang:country e.g. en-US
        String locale = fld.get("locale");
        // Create a locale object based on  language and country and pass it to the process method
        

        That way locale will be configurable on a per-field basis and no EntityProcessor would need to be modified.

        Show
        Shalin Shekhar Mangar added a comment - Thanks Stefan. I was thinking about having a locale attribute on each field element in data-config.xml. We should modify the NumberFormatTransformer#transformRow method to read each field's locale attribute. Every field's attributes are exposed as a map, so we should have: // Each locale must be of format lang:country e.g. en-US String locale = fld.get( "locale" ); // Create a locale object based on language and country and pass it to the process method That way locale will be configurable on a per-field basis and no EntityProcessor would need to be modified.
        Hide
        Stefan Oestreicher added a comment -

        I see, that's certainly more flexible. I modified the patch.

        Show
        Stefan Oestreicher added a comment - I see, that's certainly more flexible. I modified the patch.
        Hide
        Stefan Oestreicher added a comment -

        minor update using the LOCALE constant instead of literal String.

        Show
        Stefan Oestreicher added a comment - minor update using the LOCALE constant instead of literal String.
        Hide
        Stefan Oestreicher added a comment -

        setting the locale to the default locale just to have it changed afterwards if the parameter exists is not very nice I guess, so I changed that as well.

        Furthermore the user should probably be notified if he supplies an invalid locale, but I'm unsure how to proceed in this case, so I leave this for now as it is.

        Show
        Stefan Oestreicher added a comment - setting the locale to the default locale just to have it changed afterwards if the parameter exists is not very nice I guess, so I changed that as well. Furthermore the user should probably be notified if he supplies an invalid locale, but I'm unsure how to proceed in this case, so I leave this for now as it is.
        Hide
        Shalin Shekhar Mangar added a comment -

        This is great, thanks Stefan!

        IMO, one JIRA issue should deal with one task. Do you mind opening another issue for the DateField and LegacyDateField bug and attach it's patch there? I shall commit your NumberFormatTransformer enhancement patch, edit this issue's description and update the wiki docs.

        Show
        Shalin Shekhar Mangar added a comment - This is great, thanks Stefan! IMO, one JIRA issue should deal with one task. Do you mind opening another issue for the DateField and LegacyDateField bug and attach it's patch there? I shall commit your NumberFormatTransformer enhancement patch, edit this issue's description and update the wiki docs.
        Hide
        Stefan Oestreicher added a comment -

        done: SOLR-701.

        Show
        Stefan Oestreicher added a comment - done: SOLR-701 .
        Hide
        Shalin Shekhar Mangar added a comment -

        A new patch (SOLR-700.patch) based on Stefan's work with the following changes:

        1. Changed the locale parsing code to use pre-compiled regex.
        2. Throw an exception if an invalid value (which does not match the regex) is specified. Interestingly, the java.util.Locale class's docs do not say anything about invalid language or countries.
        3. Modified TestNumberFormatTransformer to demonstrate the fix and enhancement.

        I will commit this shortly.

        Show
        Shalin Shekhar Mangar added a comment - A new patch ( SOLR-700 .patch) based on Stefan's work with the following changes: Changed the locale parsing code to use pre-compiled regex. Throw an exception if an invalid value (which does not match the regex) is specified. Interestingly, the java.util.Locale class's docs do not say anything about invalid language or countries. Modified TestNumberFormatTransformer to demonstrate the fix and enhancement. I will commit this shortly.
        Hide
        Stefan Oestreicher added a comment -

        Nice. I also stumbled upon the fact that the java docs don't say anything about invalid locales. I quickly tested that and interestingly the DateFormatSymbols instance returned "." as decimal and "," as grouping separator. I would've expected the default locale (which is de-AT in my case) to be used but obviously it did not. In any case the getISO3Country method of the Locale class (among others) throws an exception if it's invoked on an invalid locale. Maybe it would be best to check all available locales explicitly instead of relying on the regex, possibly by constructing a static HashMap of them?!

        Show
        Stefan Oestreicher added a comment - Nice. I also stumbled upon the fact that the java docs don't say anything about invalid locales. I quickly tested that and interestingly the DateFormatSymbols instance returned "." as decimal and "," as grouping separator. I would've expected the default locale (which is de-AT in my case) to be used but obviously it did not. In any case the getISO3Country method of the Locale class (among others) throws an exception if it's invoked on an invalid locale. Maybe it would be best to check all available locales explicitly instead of relying on the regex, possibly by constructing a static HashMap of them?!
        Hide
        Shalin Shekhar Mangar added a comment -

        In any case the getISO3Country method of the Locale class (among others) throws an exception if it's invoked on an invalid locale. Maybe it would be best to check all available locales explicitly instead of relying on the regex, possibly by constructing a static HashMap of them?!

        We don't need to be paranoid about this because if such a Locale is used for parsing, the NumberFormat#getInstance will throw a MissingResourceException for the lack of corresponding resource bundle.

        Reading through the docs, another area I'm becoming concerned about is partial parsing of the string. Pehaps we should use ParsePosition and verify that the whole string was actually used. See the section on validation at http://www.ibm.com/developerworks/java/library/j-numberformat/

        Show
        Shalin Shekhar Mangar added a comment - In any case the getISO3Country method of the Locale class (among others) throws an exception if it's invoked on an invalid locale. Maybe it would be best to check all available locales explicitly instead of relying on the regex, possibly by constructing a static HashMap of them?! We don't need to be paranoid about this because if such a Locale is used for parsing, the NumberFormat#getInstance will throw a MissingResourceException for the lack of corresponding resource bundle. Reading through the docs, another area I'm becoming concerned about is partial parsing of the string. Pehaps we should use ParsePosition and verify that the whole string was actually used. See the section on validation at http://www.ibm.com/developerworks/java/library/j-numberformat/
        Hide
        Stefan Oestreicher added a comment -

        We don't need to be paranoid about this [...]

        Ok.

        Reading through the docs, another area I'm becoming concerned about is partial parsing of the string

        I see. Interestingly this behaviour of accepting invalid strings seems to only apply to the generic formatter. I wrote some tests and one of them uses a percent formatter (I also tested currency) and that throws a ParseException.
        Fixing that seems to be even easier than shown in the example I already have a patch ready but I guess I should open another issue for this (seems pretty seperate to me)?

        Show
        Stefan Oestreicher added a comment - We don't need to be paranoid about this [...] Ok. Reading through the docs, another area I'm becoming concerned about is partial parsing of the string I see. Interestingly this behaviour of accepting invalid strings seems to only apply to the generic formatter. I wrote some tests and one of them uses a percent formatter (I also tested currency) and that throws a ParseException. Fixing that seems to be even easier than shown in the example I already have a patch ready but I guess I should open another issue for this (seems pretty seperate to me)?
        Hide
        Shalin Shekhar Mangar added a comment -

        Fixing that seems to be even easier than shown in the example I already have a patch ready but I guess I should open another issue for this (seems pretty seperate to me)?

        By all means, go ahead

        I will commit this one now.

        Show
        Shalin Shekhar Mangar added a comment - Fixing that seems to be even easier than shown in the example I already have a patch ready but I guess I should open another issue for this (seems pretty seperate to me)? By all means, go ahead I will commit this one now.
        Hide
        Shalin Shekhar Mangar added a comment -

        Committed revision 686176.

        Thanks Stefan!

        Show
        Shalin Shekhar Mangar added a comment - Committed revision 686176. Thanks Stefan!

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Stefan Oestreicher
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development