Lucene - Core
  1. Lucene - Core
  2. LUCENE-6978

Make LuceneTestCase use language tags instead of parsing locales by hand

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: modules/test-framework
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Since we are on Java 7, the JDK supports standardized language tags as identifiers for Locales. Previous versions of JDK were missing a constructor from Locale#toString() back to a locale, so we had our own, which was broken several times after the JDK changed their Locale internals.

      This patch will do the following:

      • When printing the reproduce line, it will use Locale#getLanguageTag(), so you can identify the locale in standardized form. Most notable change is (next to more flexibility around asian languages) the change away from undescores. So it prints "en-US", not "en_US".
      • The code that parses a locale uses Locale's Builder and sets the language tag. This will fail if the tag is invalid! A trap is Locale#forLanguageTag, because this one silently returns root locale if unparseable...
      • The random locale is choosen from all language tags, which are extracted from the JDK as a String[] array.

      I would also like to place Locale#forLanguageTag on the forbidden list and disallow directly calling Locale#toString(), the latter is legacy API (according to Java 7 Javadocs). This would fail code that calls toString() directly, e.g. when formatting stuff like "my Locale: " + locale. Of course we cannot catch all bad uses.

      1. LUCENE-6978.patch
        19 kB
        Uwe Schindler
      2. LUCENE-6978.patch
        14 kB
        Uwe Schindler
      3. LUCENE-6978.patch
        5 kB
        Uwe Schindler
      4. LUCENE-6978-5x.patch
        23 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        See LUCENE-6973 about the fact, why Locale.forLanguageTag() is broken and should be forbidden.

        Show
        Uwe Schindler added a comment - See LUCENE-6973 about the fact, why Locale.forLanguageTag() is broken and should be forbidden.
        Hide
        Uwe Schindler added a comment - - edited

        First patch with LTC changes only. I will now work on the forbidden stuff (where we may need to fix some other Lucene code parts which try to parse locales or print them).

        This patch requires Java 8. We can backport (which I suggest), but the streams API need to be replaced by a more verbose list-sort-addSet-toArray sequence in a static block.

        Show
        Uwe Schindler added a comment - - edited First patch with LTC changes only. I will now work on the forbidden stuff (where we may need to fix some other Lucene code parts which try to parse locales or print them). This patch requires Java 8. We can backport (which I suggest), but the streams API need to be replaced by a more verbose list-sort-addSet-toArray sequence in a static block.
        Hide
        Uwe Schindler added a comment - - edited

        I discussed that with Robert Muir already, the sort and disticnt is done for those reasons:

        • sorting is done to have reproduceable ordered list (random index)
        • duplicate language tags should be filtered. The reason why we choose the locale from the language tag list is one limitation: the to/forLanguageTag will not round-trip (as the tags are normalized while parsing). So its better to have a list of tags instead of instances.
        Show
        Uwe Schindler added a comment - - edited I discussed that with Robert Muir already, the sort and disticnt is done for those reasons: sorting is done to have reproduceable ordered list (random index) duplicate language tags should be filtered. The reason why we choose the locale from the language tag list is one limitation: the to/forLanguageTag will not round-trip (as the tags are normalized while parsing). So its better to have a list of tags instead of instances.
        Hide
        Uwe Schindler added a comment -

        The list of language tags in Java 8 is: [ar, ar-AE, ar-BH, ar-DZ, ar-EG, ar-IQ, ar-JO, ar-KW, ar-LB, ar-LY, ar-MA, ar-OM, ar-QA, ar-SA, ar-SD, ar-SY, ar-TN, ar-YE, be, be-BY, bg, bg-BG, ca, ca-ES, cs, cs-CZ, da, da-DK, de, de-AT, de-CH, de-DE, de-GR, de-LU, el, el-CY, el-GR, en, en-AU, en-CA, en-GB, en-IE, en-IN, en-MT, en-NZ, en-PH, en-SG, en-US, en-ZA, es, es-AR, es-BO, es-CL, es-CO, es-CR, es-CU, es-DO, es-EC, es-ES, es-GT, es-HN, es-MX, es-NI, es-PA, es-PE, es-PR, es-PY, es-SV, es-US, es-UY, es-VE, et, et-EE, fi, fi-FI, fr, fr-BE, fr-CA, fr-CH, fr-FR, fr-LU, ga, ga-IE, he, he-IL, hi, hi-IN, hr, hr-HR, hu, hu-HU, id, id-ID, is, is-IS, it, it-CH, it-IT, ja, ja-JP, ja-JP-u-ca-japanese-x-lvariant-JP, ko, ko-KR, lt, lt-LT, lv, lv-LV, mk, mk-MK, ms, ms-MY, mt, mt-MT, nl, nl-BE, nl-NL, nn-NO, no, no-NO, pl, pl-PL, pt, pt-BR, pt-PT, ro, ro-RO, ru, ru-RU, sk, sk-SK, sl, sl-SI, sq, sq-AL, sr, sr-BA, sr-CS, sr-Latn, sr-Latn-BA, sr-Latn-ME, sr-Latn-RS, sr-ME, sr-RS, sv, sv-SE, th, th-TH, th-TH-u-nu-thai-x-lvariant-TH, tr, tr-TR, uk, uk-UA, und, vi, vi-VN, zh, zh-CN, zh-HK, zh-SG, zh-TW]

        Show
        Uwe Schindler added a comment - The list of language tags in Java 8 is: [ar, ar-AE, ar-BH, ar-DZ, ar-EG, ar-IQ, ar-JO, ar-KW, ar-LB, ar-LY, ar-MA, ar-OM, ar-QA, ar-SA, ar-SD, ar-SY, ar-TN, ar-YE, be, be-BY, bg, bg-BG, ca, ca-ES, cs, cs-CZ, da, da-DK, de, de-AT, de-CH, de-DE, de-GR, de-LU, el, el-CY, el-GR, en, en-AU, en-CA, en-GB, en-IE, en-IN, en-MT, en-NZ, en-PH, en-SG, en-US, en-ZA, es, es-AR, es-BO, es-CL, es-CO, es-CR, es-CU, es-DO, es-EC, es-ES, es-GT, es-HN, es-MX, es-NI, es-PA, es-PE, es-PR, es-PY, es-SV, es-US, es-UY, es-VE, et, et-EE, fi, fi-FI, fr, fr-BE, fr-CA, fr-CH, fr-FR, fr-LU, ga, ga-IE, he, he-IL, hi, hi-IN, hr, hr-HR, hu, hu-HU, id, id-ID, is, is-IS, it, it-CH, it-IT, ja, ja-JP, ja-JP-u-ca-japanese-x-lvariant-JP, ko, ko-KR, lt, lt-LT, lv, lv-LV, mk, mk-MK, ms, ms-MY, mt, mt-MT, nl, nl-BE, nl-NL, nn-NO, no, no-NO, pl, pl-PL, pt, pt-BR, pt-PT, ro, ro-RO, ru, ru-RU, sk, sk-SK, sl, sl-SI, sq, sq-AL, sr, sr-BA, sr-CS, sr-Latn, sr-Latn-BA, sr-Latn-ME, sr-Latn-RS, sr-ME, sr-RS, sv, sv-SE, th, th-TH, th-TH-u-nu-thai-x-lvariant-TH, tr, tr-TR, uk, uk-UA, und, vi, vi-VN, zh, zh-CN, zh-HK, zh-SG, zh-TW]
        Hide
        Dawid Weiss added a comment -

        This is extracted from one particular JVM – is this going to be a problem if some of these are not available everywhere?

        Show
        Dawid Weiss added a comment - This is extracted from one particular JVM – is this going to be a problem if some of these are not available everywhere?
        Hide
        Shai Erera added a comment -

        Patch looks good. And +1 for adding Locale.forLanguageTag() and Locale.toString() to the forbidden APIs.

        Show
        Shai Erera added a comment - Patch looks good. And +1 for adding Locale.forLanguageTag() and Locale.toString() to the forbidden APIs.
        Hide
        Dawid Weiss added a comment -

        Nevermind, looked at the patch, makes sense.

        Show
        Dawid Weiss added a comment - Nevermind, looked at the patch, makes sense.
        Hide
        Uwe Schindler added a comment -

        I just extracted the list for reference Of course its taken what runtime supports.

        I also did some round-trip tests. With the current patch we are always safe, because we only rely on a list of language tags and choose one from those strings that are distinct and unique, so we only rely on the order String -> LocaleFromLanguageTag never vice versa.

        Show
        Uwe Schindler added a comment - I just extracted the list for reference Of course its taken what runtime supports. I also did some round-trip tests. With the current patch we are always safe, because we only rely on a list of language tags and choose one from those strings that are distinct and unique, so we only rely on the order String -> LocaleFromLanguageTag never vice versa.
        Hide
        Robert Muir added a comment -

        +1

        Show
        Robert Muir added a comment - +1
        Hide
        Uwe Schindler added a comment - - edited

        One that does not round trip is a "deprecated locale": no_NO_NY, which is replaced by nn_NO, the old one is identical but does not compare equal. We are still fine. With the new code we really test all availabe ones and no risks.

        Show
        Uwe Schindler added a comment - - edited One that does not round trip is a "deprecated locale": no_NO_NY, which is replaced by nn_NO, the old one is identical but does not compare equal. We are still fine. With the new code we really test all availabe ones and no risks.
        Hide
        Uwe Schindler added a comment -

        Patch with adding forbiddenapis and fixes. In Solr I replaced 2 locale parsing parts to accept both (old and new BCP47) locale names. I added a SuppressForbidden for it.

        Unfortunately forbiddening Locale#toString() does not work always, because if you concat strings like "string" + locale it internally transforms that to new StringBuilder("string).append(locale) which calls toString on Object. So it can never be detected by forbidden.

        I will review other Locale usage and parsing later.

        Show
        Uwe Schindler added a comment - Patch with adding forbiddenapis and fixes. In Solr I replaced 2 locale parsing parts to accept both (old and new BCP47) locale names. I added a SuppressForbidden for it. Unfortunately forbiddening Locale#toString() does not work always, because if you concat strings like "string" + locale it internally transforms that to new StringBuilder("string).append(locale) which calls toString on Object. So it can never be detected by forbidden. I will review other Locale usage and parsing later.
        Hide
        Uwe Schindler added a comment -

        All tests pass. I think we can commit that for now to let it bake on trunk.

        Show
        Uwe Schindler added a comment - All tests pass. I think we can commit that for now to let it bake on trunk.
        Hide
        Uwe Schindler added a comment -

        I found some more locale parsing by searching for constructors of Locale called (using Eclipse''s search for constructors).

        I fixed those. Tests are passing, I will commit later!

        Show
        Uwe Schindler added a comment - I found some more locale parsing by searching for constructors of Locale called (using Eclipse''s search for constructors). I fixed those. Tests are passing, I will commit later!
        Hide
        ASF subversion and git services added a comment -

        Commit 1724891 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1724891 ]

        LUCENE-6978: Refactor several code places that lookup locales by string name to use BCP47 locale tag instead

        Show
        ASF subversion and git services added a comment - Commit 1724891 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1724891 ] LUCENE-6978 : Refactor several code places that lookup locales by string name to use BCP47 locale tag instead
        Hide
        Uwe Schindler added a comment -

        Java 7 compliant patch. No other changes.

        Show
        Uwe Schindler added a comment - Java 7 compliant patch. No other changes.
        Hide
        ASF subversion and git services added a comment -

        Commit 1724893 from Uwe Schindler in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1724893 ]

        Merged revision(s) 1724891 from lucene/dev/trunk:
        LUCENE-6978: Refactor several code places that lookup locales by string name to use BCP47 locale tag instead

        Show
        ASF subversion and git services added a comment - Commit 1724893 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1724893 ] Merged revision(s) 1724891 from lucene/dev/trunk: LUCENE-6978 : Refactor several code places that lookup locales by string name to use BCP47 locale tag instead
        Hide
        ASF subversion and git services added a comment -

        Commit 1726118 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1726118 ]

        LUCENE-6978: Fix Morphlines locale parsing with empty string / null: use ROOT

        Show
        ASF subversion and git services added a comment - Commit 1726118 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1726118 ] LUCENE-6978 : Fix Morphlines locale parsing with empty string / null: use ROOT
        Hide
        ASF subversion and git services added a comment -

        Commit 1726119 from Uwe Schindler in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1726119 ]

        Merged revision(s) 1726118 from lucene/dev/trunk:
        LUCENE-6978: Fix Morphlines locale parsing with empty string / null: use ROOT

        Show
        ASF subversion and git services added a comment - Commit 1726119 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1726119 ] Merged revision(s) 1726118 from lucene/dev/trunk: LUCENE-6978 : Fix Morphlines locale parsing with empty string / null: use ROOT
        Hide
        Uwe Schindler added a comment -

        Actually, the bug happened on Java 7 only: The empty root locale is not listed in Locale#getAvaialbelLocales in Java 7, but it is listed in Java 8 - ths is why i did not hit that issue and why I removed the code. The commit works around that (like it did before the changes here), by reverting the if-statement.

        Show
        Uwe Schindler added a comment - Actually, the bug happened on Java 7 only: The empty root locale is not listed in Locale#getAvaialbelLocales in Java 7, but it is listed in Java 8 - ths is why i did not hit that issue and why I removed the code. The commit works around that (like it did before the changes here), by reverting the if-statement.
        Hide
        Steve Rowe added a comment -

        My Jenkins found a reproducible locale-triggered DIH failure on branch_5x:

          [junit4] Suite: org.apache.solr.handler.dataimport.TestVariableResolverEndToEnd
        [...]
          [junit4]   2> 24391 T75 C8 oasc.SolrException.log ERROR Exception while processing: FIRST document : SolrInputDocument(fields: [select_keyword_s=SELECT, id=1]):org.apache.solr.handler.dataimport.DataImportHandlerException: Malformed / non-existent locale: nn_NO Processing Document # 1
        [...]
           [junit4]   2> 24392 T75 C8 oasup.LogUpdateProcessorFactory$LogUpdateProcessor.finish [collection1]  webapp=null path=null params={synchronous=true&command=full-import&wt=xml&indent=true&clean=true&dataConfig=<dataConfig>+%0a<dataSource+name%3D"hsqldb"+driver%3D"${dataimporter.request.dots.in.hsqldb.driver}"+url%3D"jdbc:hsqldb:mem:."+/>+%0a<document+name%3D"TestEvaluators">+%0a<entity+name%3D"FIRST"+processor%3D"SqlEntityProcessor"+dataSource%3D"hsqldb"++query%3D"select++1+as+id,++'SELECT'+as+SELECT_KEYWORD,++CURRENT_TIMESTAMP+as+FIRST_TS+from+DUAL+"+>%0a++<field+column%3D"SELECT_KEYWORD"+name%3D"select_keyword_s"+/>+%0a++<entity+name%3D"SECOND"+processor%3D"SqlEntityProcessor"+dataSource%3D"hsqldb"+transformer%3D"TemplateTransformer"++++query%3D"${dataimporter.functions.encodeUrl(FIRST.SELECT_KEYWORD)}++1+as+SORT,++CURRENT_TIMESTAMP+as+SECOND_TS,++'${dataimporter.functions.formatDate(FIRST.FIRST_TS,+'yyyy',+'nn_NO')}'+as+SECOND1_S,+++'PORK'+AS+MEAT,++'GRILL'+AS+METHOD,++'ROUND'+AS+CUTS,++'BEEF_CUTS'+AS+WHATKIND+from+DUAL+WHERE+1%3D${FIRST.ID}+UNION+${dataimporter.functions.encodeUrl(FIRST.SELECT_KEYWORD)}++2+as+SORT,++CURRENT_TIMESTAMP+as+SECOND_TS,++'${dataimporter.functions.formatDate(FIRST.FIRST_TS,+'yyyy',+'nn_NO')}'+as+SECOND1_S,+++'FISH'+AS+MEAT,++'FRY'+AS+METHOD,++'SIRLOIN'+AS+CUTS,++'BEEF_CUTS'+AS+WHATKIND+from+DUAL+WHERE+1%3D${FIRST.ID}+ORDER+BY+SORT+">%0a+++<field+column%3D"SECOND_S"+name%3D"second_s"+/>+%0a+++<field+column%3D"SECOND1_S"+name%3D"second1_s"+/>+%0a+++<field+column%3D"second2_s"+template%3D"${dataimporter.functions.formatDate(SECOND.SECOND_TS,+'yyyy',+'nn_NO')}"+/>+%0a+++<field+column%3D"second3_s"+template%3D"${dih.functions.formatDate(SECOND.SECOND_TS,+'yyyy',+'nn_NO')}"+/>+%0a+++<field+column%3D"METHOD"+name%3D"${SECOND.MEAT}_s"/>%0a+++<field+column%3D"CUTS"+name%3D"${SECOND.WHATKIND}_mult_s"/>%0a++</entity>%0a</entity>%0a</document>+%0a</dataConfig>+%0a&commit=true}{deleteByQuery=*:*} 0 9
           [junit4]   2> 24393 T75 C8 oasc.SolrException.log ERROR Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Malformed / non-existent locale: nn_NO Processing Document # 1
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:417)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:481)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:200)
           [junit4]   2> 		at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
           [junit4]   2> 		at org.apache.solr.core.SolrCore.execute(SolrCore.java:2083)
           [junit4]   2> 		at org.apache.solr.util.TestHarness.query(TestHarness.java:311)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.TestVariableResolverEndToEnd.test(TestVariableResolverEndToEnd.java:40)
        [...]
           [junit4]   2> 	Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Malformed / non-existent locale: nn_NO Processing Document # 1
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
           [junit4]   2> 		... 46 more
           [junit4]   2> 	Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Malformed / non-existent locale: nn_NO Processing Document # 1
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DateFormatEvaluator.evaluate(DateFormatEvaluator.java:100)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.VariableResolver.resolveEvaluator(VariableResolver.java:136)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.VariableResolver.resolve(VariableResolver.java:100)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.VariableResolver.replaceTokens(VariableResolver.java:155)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.ContextImpl.replaceTokens(ContextImpl.java:257)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:515)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
           [junit4]   2> 		... 48 more
           [junit4]   2> 	Caused by: java.util.IllformedLocaleException: Invalid subtag: nn_NO [at index 0]
           [junit4]   2> 		at java.util.Locale$Builder.setLanguageTag(Locale.java:2311)
           [junit4]   2> 		at org.apache.solr.handler.dataimport.DateFormatEvaluator.evaluate(DateFormatEvaluator.java:98)
           [junit4]   2> 		... 57 more
           [junit4]   2> 	
        [...]
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestVariableResolverEndToEnd -Dtests.method=test -Dtests.seed=E764DCBE41663305 -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=nn-NO -Dtests.timezone=America/Mendoza -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
           [junit4] FAILURE 0.04s J2 | TestVariableResolverEndToEnd.test <<<
           [junit4]    > Throwable #1: junit.framework.AssertionFailedError
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([E764DCBE41663305:6F30E364EF9A5EFD]:0)
           [junit4]    > 	at junit.framework.Assert.fail(Assert.java:48)
           [junit4]    > 	at junit.framework.Assert.assertTrue(Assert.java:20)
           [junit4]    > 	at junit.framework.Assert.assertTrue(Assert.java:27)
           [junit4]    > 	at org.apache.solr.handler.dataimport.TestVariableResolverEndToEnd.test(TestVariableResolverEndToEnd.java:47)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
        [...]  
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene54): {}, docValues:{}, sim=DefaultSimilarity, locale=nn-NO, timezone=America/Mendoza
           [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.7.0_79 (64-bit)/cpus=16,threads=1,free=493820680,total=514326528
           [junit4]   2> NOTE: All tests run in this JVM: [TestVariableResolver, TestSimplePropertiesWriter, TestURLDataSource, TestDocBuilder2, TestSqlEntityProcessorDelta, TestDateFormatTransformer, TestRegexTransformer, TestClobTransformer, TestDataConfig, TestNestedChildren, TestXPathEntityProcessor, TestFileListWithLineEntityProcessor, TestVariableResolverEndToEnd]
           [junit4] Completed [25/38 (1!)] on J2 in 0.60s, 1 test, 1 failure <<< FAILURES!
        
        Show
        Steve Rowe added a comment - My Jenkins found a reproducible locale-triggered DIH failure on branch_5x: [junit4] Suite: org.apache.solr.handler.dataimport.TestVariableResolverEndToEnd [...] [junit4] 2> 24391 T75 C8 oasc.SolrException.log ERROR Exception while processing: FIRST document : SolrInputDocument(fields: [select_keyword_s=SELECT, id=1]):org.apache.solr.handler.dataimport.DataImportHandlerException: Malformed / non-existent locale: nn_NO Processing Document # 1 [...] [junit4] 2> 24392 T75 C8 oasup.LogUpdateProcessorFactory$LogUpdateProcessor.finish [collection1] webapp=null path=null params={synchronous=true&command=full-import&wt=xml&indent=true&clean=true&dataConfig=<dataConfig>+%0a<dataSource+name%3D"hsqldb"+driver%3D"${dataimporter.request.dots.in.hsqldb.driver}"+url%3D"jdbc:hsqldb:mem:."+/>+%0a<document+name%3D"TestEvaluators">+%0a<entity+name%3D"FIRST"+processor%3D"SqlEntityProcessor"+dataSource%3D"hsqldb"++query%3D"select++1+as+id,++'SELECT'+as+SELECT_KEYWORD,++CURRENT_TIMESTAMP+as+FIRST_TS+from+DUAL+"+>%0a++<field+column%3D"SELECT_KEYWORD"+name%3D"select_keyword_s"+/>+%0a++<entity+name%3D"SECOND"+processor%3D"SqlEntityProcessor"+dataSource%3D"hsqldb"+transformer%3D"TemplateTransformer"++++query%3D"${dataimporter.functions.encodeUrl(FIRST.SELECT_KEYWORD)}++1+as+SORT,++CURRENT_TIMESTAMP+as+SECOND_TS,++'${dataimporter.functions.formatDate(FIRST.FIRST_TS,+'yyyy',+'nn_NO')}'+as+SECOND1_S,+++'PORK'+AS+MEAT,++'GRILL'+AS+METHOD,++'ROUND'+AS+CUTS,++'BEEF_CUTS'+AS+WHATKIND+from+DUAL+WHERE+1%3D${FIRST.ID}+UNION+${dataimporter.functions.encodeUrl(FIRST.SELECT_KEYWORD)}++2+as+SORT,++CURRENT_TIMESTAMP+as+SECOND_TS,++'${dataimporter.functions.formatDate(FIRST.FIRST_TS,+'yyyy',+'nn_NO')}'+as+SECOND1_S,+++'FISH'+AS+MEAT,++'FRY'+AS+METHOD,++'SIRLOIN'+AS+CUTS,++'BEEF_CUTS'+AS+WHATKIND+from+DUAL+WHERE+1%3D${FIRST.ID}+ORDER+BY+SORT+">%0a+++<field+column%3D"SECOND_S"+name%3D"second_s"+/>+%0a+++<field+column%3D"SECOND1_S"+name%3D"second1_s"+/>+%0a+++<field+column%3D"second2_s"+template%3D"${dataimporter.functions.formatDate(SECOND.SECOND_TS,+'yyyy',+'nn_NO')}"+/>+%0a+++<field+column%3D"second3_s"+template%3D"${dih.functions.formatDate(SECOND.SECOND_TS,+'yyyy',+'nn_NO')}"+/>+%0a+++<field+column%3D"METHOD"+name%3D"${SECOND.MEAT}_s"/>%0a+++<field+column%3D"CUTS"+name%3D"${SECOND.WHATKIND}_mult_s"/>%0a++</entity>%0a</entity>%0a</document>+%0a</dataConfig>+%0a&commit=true}{deleteByQuery=*:*} 0 9 [junit4] 2> 24393 T75 C8 oasc.SolrException.log ERROR Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Malformed / non-existent locale: nn_NO Processing Document # 1 [junit4] 2> at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271) [junit4] 2> at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:417) [junit4] 2> at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:481) [junit4] 2> at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:200) [junit4] 2> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156) [junit4] 2> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2083) [junit4] 2> at org.apache.solr.util.TestHarness.query(TestHarness.java:311) [junit4] 2> at org.apache.solr.handler.dataimport.TestVariableResolverEndToEnd.test(TestVariableResolverEndToEnd.java:40) [...] [junit4] 2> Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Malformed / non-existent locale: nn_NO Processing Document # 1 [junit4] 2> at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417) [junit4] 2> at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) [junit4] 2> at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) [junit4] 2> ... 46 more [junit4] 2> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Malformed / non-existent locale: nn_NO Processing Document # 1 [junit4] 2> at org.apache.solr.handler.dataimport.DateFormatEvaluator.evaluate(DateFormatEvaluator.java:100) [junit4] 2> at org.apache.solr.handler.dataimport.VariableResolver.resolveEvaluator(VariableResolver.java:136) [junit4] 2> at org.apache.solr.handler.dataimport.VariableResolver.resolve(VariableResolver.java:100) [junit4] 2> at org.apache.solr.handler.dataimport.VariableResolver.replaceTokens(VariableResolver.java:155) [junit4] 2> at org.apache.solr.handler.dataimport.ContextImpl.replaceTokens(ContextImpl.java:257) [junit4] 2> at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) [junit4] 2> at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244) [junit4] 2> at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) [junit4] 2> at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:515) [junit4] 2> at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) [junit4] 2> ... 48 more [junit4] 2> Caused by: java.util.IllformedLocaleException: Invalid subtag: nn_NO [at index 0] [junit4] 2> at java.util.Locale$Builder.setLanguageTag(Locale.java:2311) [junit4] 2> at org.apache.solr.handler.dataimport.DateFormatEvaluator.evaluate(DateFormatEvaluator.java:98) [junit4] 2> ... 57 more [junit4] 2> [...] [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestVariableResolverEndToEnd -Dtests.method=test -Dtests.seed=E764DCBE41663305 -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=nn-NO -Dtests.timezone=America/Mendoza -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 0.04s J2 | TestVariableResolverEndToEnd.test <<< [junit4] > Throwable #1: junit.framework.AssertionFailedError [junit4] > at __randomizedtesting.SeedInfo.seed([E764DCBE41663305:6F30E364EF9A5EFD]:0) [junit4] > at junit.framework.Assert.fail(Assert.java:48) [junit4] > at junit.framework.Assert.assertTrue(Assert.java:20) [junit4] > at junit.framework.Assert.assertTrue(Assert.java:27) [junit4] > at org.apache.solr.handler.dataimport.TestVariableResolverEndToEnd.test(TestVariableResolverEndToEnd.java:47) [junit4] > at java.lang.Thread.run(Thread.java:745) [...] [junit4] 2> NOTE: test params are: codec=Asserting(Lucene54): {}, docValues:{}, sim=DefaultSimilarity, locale=nn-NO, timezone=America/Mendoza [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.7.0_79 (64-bit)/cpus=16,threads=1,free=493820680,total=514326528 [junit4] 2> NOTE: All tests run in this JVM: [TestVariableResolver, TestSimplePropertiesWriter, TestURLDataSource, TestDocBuilder2, TestSqlEntityProcessorDelta, TestDateFormatTransformer, TestRegexTransformer, TestClobTransformer, TestDataConfig, TestNestedChildren, TestXPathEntityProcessor, TestFileListWithLineEntityProcessor, TestVariableResolverEndToEnd] [junit4] Completed [25/38 (1!)] on J2 in 0.60s, 1 test, 1 failure <<< FAILURES!
        Hide
        Uwe Schindler added a comment -

        I know why this happens. Will fix later. It is only partly related to this Issue.

        D)}++2+as+SORT,+CURRENT_TIMESTAMP+as+SECOND_TS,+'$

        {dataimporter.functions.formatDate(FIRST.FIRST_TS,+'yyyy',+'nn_NO')}

        'as+SECOND1_S,+'FISH'+AS+MEAT,'FRY'+AS+METHOD,'SIRLOIN'+AS+CUTS,+'BEEF_CUTS'+AS+WHATKIND+from+DUAL+WHERE+1%3D$

        {FIRST.ID}

        ORDER+BY+SORT">%0a++<field+column%3D"SECOND_S"+name%3D"second_s"/>%0a+<field+column%3D"SECOND1_S"+name%3D"second1_s"/>%0a++<field+column%3D"second2_s"+template%3D"$

        {dataimporter.functions.formatDate(SECOND.SECOND_TS,+'yyyy',+'nn_NO')}

        "/>%0a+++<field+column%3D"second3_s"+template%3D"$

        {dih.functions.formatDate(SECOND.SECOND_TS,+'yyyy',+'nn_NO')}

        "/>%0a+++<field+column%3D"METHOD"+name%3D"$

        {SECOND.MEAT}

        _s"/>%0a+++<field+column%3D"CUTS"+name%3D"$

        {SECOND.WHATKIND}

        _mult_s"/>%0a+</entity>%0a</entity>%0a</document>%0a</dataConfig>+%0a&commit=true}

        {deleteByQuery=*:*}


        Uwe Schindler
        H.-H.-Meier-Allee 63, 28213 Bremen
        http://www.thetaphi.de

        Show
        Uwe Schindler added a comment - I know why this happens. Will fix later. It is only partly related to this Issue. D)}++2+as+SORT,+ CURRENT_TIMESTAMP+as+SECOND_TS, +'$ {dataimporter.functions.formatDate(FIRST.FIRST_TS,+'yyyy',+'nn_NO')} ' as+SECOND1_S, + 'FISH'+AS+MEAT, 'FRY'+AS+METHOD, 'SIRLOIN'+AS+CUTS, +'BEEF_CUTS'+AS+WHATKIND+from+DUAL+WHERE+1%3D$ {FIRST.ID} ORDER+BY+SORT ">%0a++ <field+column%3D"SECOND_S"+name%3D"second_s" /> %0a + <field+column%3D"SECOND1_S"+name%3D"second1_s" /> %0a ++<field+column%3D"second2_s"+template%3D"$ {dataimporter.functions.formatDate(SECOND.SECOND_TS,+'yyyy',+'nn_NO')} " /> %0a+++<field+column%3D"second3_s"+template%3D"$ {dih.functions.formatDate(SECOND.SECOND_TS,+'yyyy',+'nn_NO')} " /> %0a+++<field+column%3D"METHOD"+name%3D"$ {SECOND.MEAT} _s"/>%0a+++<field+column%3D"CUTS"+name%3D"$ {SECOND.WHATKIND} _mult_s"/>%0a+ </entity>%0a</entity>%0a</document> %0a</dataConfig>+%0a&commit=true} {deleteByQuery=*:*} – Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de
        Hide
        ASF subversion and git services added a comment -

        Commit 1726311 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1726311 ]

        LUCENE-6978: Fix usage of Locale#toString in DIH

        Show
        ASF subversion and git services added a comment - Commit 1726311 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1726311 ] LUCENE-6978 : Fix usage of Locale#toString in DIH
        Hide
        ASF subversion and git services added a comment -

        Commit 1726313 from Uwe Schindler in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1726313 ]

        Merged revision(s) 1726311 from lucene/dev/trunk:
        LUCENE-6978: Fix usage of Locale#toString in DIH

        Show
        ASF subversion and git services added a comment - Commit 1726313 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1726313 ] Merged revision(s) 1726311 from lucene/dev/trunk: LUCENE-6978 : Fix usage of Locale#toString in DIH
        Hide
        Uwe Schindler added a comment -

        Issue should be fixed now. The problem is the non-always-working forbidden API Locale.toString(). The DIH test cases used string concat with Locale.getDefault(). This should work in most cases, but "nn-NO" is a special case. This Locale only exists in the list of available locales with a different (outdated) name, so the backwards-compatibility layer does not catch it.

        The problem is generally: We should remove support for the "old" and no longer supported by Java Locale syntax in DIH and morphlines. But this would be separate issue for 6.0 only.

        Thanks Steve Rowe for reporting this!

        Show
        Uwe Schindler added a comment - Issue should be fixed now. The problem is the non-always-working forbidden API Locale.toString(). The DIH test cases used string concat with Locale.getDefault(). This should work in most cases, but "nn-NO" is a special case. This Locale only exists in the list of available locales with a different (outdated) name, so the backwards-compatibility layer does not catch it. The problem is generally: We should remove support for the "old" and no longer supported by Java Locale syntax in DIH and morphlines. But this would be separate issue for 6.0 only. Thanks Steve Rowe for reporting this!

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development