Solr
  1. Solr
  2. SOLR-2088

contrib/extraction fails on a turkish computer

    Details

      Description

      reproduce with: ant test -Dtests.locale=tr_TR

      test:
          [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
          [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
          [junit] <response>
          [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
      <result name="response" numFound="0" start="0"/>
          [junit] </response>
          [junit]
          [junit]  request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
          [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
          [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
      
      BUILD FAILED
      

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          I'm running into this on my hudson box - more info:

          Stacktrace

          junit.framework.AssertionFailedError: query failed XPath: //*[@numFound='1']
          xml response was: <?xml version="1.0" encoding="UTF-8"?>
          <response>
          <lst name="responseHeader"><int name="status">0</int><int name="QTime">3</int></lst><result name="response" numFound="0" start="0"/>
          </response>

          request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2
          at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:320)
          at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:310)
          at org.apache.solr.handler.ExtractingRequestHandlerTest.testExtraction(ExtractingRequestHandlerTest.java:83)
          Standard Output

          NOTE: random codec of testcase 'testExtraction' was: MockSep
          NOTE: random locale of testcase 'testExtraction' was: tr
          NOTE: random timezone of testcase 'testExtraction' was: Africa/Dar_es_Salaam
          Standard Error

          25.Ağu.2010 08:51:38 org.apache.solr.common.SolrException log
          SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'a'
          at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:321)
          at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
          at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120)
          at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:125)
          at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195)
          at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
          at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
          at org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:334)
          at org.apache.solr.handler.ExtractingRequestHandlerTest.loadLocal(ExtractingRequestHandlerTest.java:361)
          at org.apache.solr.handler.ExtractingRequestHandlerTest.testDefaultField(ExtractingRequestHandlerTest.java:149)

          Show
          Mark Miller added a comment - I'm running into this on my hudson box - more info: Stacktrace junit.framework.AssertionFailedError: query failed XPath: //* [@numFound='1'] xml response was: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">3</int></lst><result name="response" numFound="0" start="0"/> </response> request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2 at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:320) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:310) at org.apache.solr.handler.ExtractingRequestHandlerTest.testExtraction(ExtractingRequestHandlerTest.java:83) Standard Output NOTE: random codec of testcase 'testExtraction' was: MockSep NOTE: random locale of testcase 'testExtraction' was: tr NOTE: random timezone of testcase 'testExtraction' was: Africa/Dar_es_Salaam Standard Error 25.Ağu.2010 08:51:38 org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'a' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:321) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:125) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:334) at org.apache.solr.handler.ExtractingRequestHandlerTest.loadLocal(ExtractingRequestHandlerTest.java:361) at org.apache.solr.handler.ExtractingRequestHandlerTest.testDefaultField(ExtractingRequestHandlerTest.java:149)
          Hide
          Robert Muir added a comment -

          Looks like the same problem (in this case you got the random locale of 'tr').

          the bug is likely a toLowerCase() that should be toLowerCase(Locale.ENGLISH).

          All tests used to pass with this locale, definitely as of revision 945343. See LUCENE-2466

          Was tika upgraded since then? perhaps the problem is in a dependency?
          I did a few quick reviews of the solr code and nothing stood out.

          Show
          Robert Muir added a comment - Looks like the same problem (in this case you got the random locale of 'tr'). the bug is likely a toLowerCase() that should be toLowerCase(Locale.ENGLISH). All tests used to pass with this locale, definitely as of revision 945343. See LUCENE-2466 Was tika upgraded since then? perhaps the problem is in a dependency? I did a few quick reviews of the solr code and nothing stood out.
          Hide
          Mark Miller added a comment -

          Yes, I think Tika was upgraded fairly recently. To a .8 snapshot I think.

          Show
          Mark Miller added a comment - Yes, I think Tika was upgraded fairly recently. To a .8 snapshot I think.
          Hide
          Robert Muir added a comment -

          ok, i'll look at tika with this locale. perhaps one of its own tests will be triggered.

          Show
          Robert Muir added a comment - ok, i'll look at tika with this locale. perhaps one of its own tests will be triggered.
          Hide
          Robert Muir added a comment -

          Well i found one problem in html parsing (TIKA-498) that causes tika's own tests to fail:

          But i havent tested yet with rebuilt jars to see if this is the problem causing this issue, too

          Show
          Robert Muir added a comment - Well i found one problem in html parsing ( TIKA-498 ) that causes tika's own tests to fail: But i havent tested yet with rebuilt jars to see if this is the problem causing this issue, too
          Hide
          Grant Ingersoll added a comment -

          I will fix this as part of the upgrade to Tika 0.8

          Show
          Grant Ingersoll added a comment - I will fix this as part of the upgrade to Tika 0.8
          Hide
          Grant Ingersoll added a comment -

          Should be resolved via SOLR-2241.

          Show
          Grant Ingersoll added a comment - Should be resolved via SOLR-2241 .
          Hide
          Robert Muir added a comment -

          with tika 0.8, this is no longer a problem... html/pdf seems to work fine (the tests pass)

          Show
          Robert Muir added a comment - with tika 0.8, this is no longer a problem... html/pdf seems to work fine (the tests pass)
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development