Lucene - Core
  1. Lucene - Core
  2. LUCENE-4589

Upgrade benchmark modules nekohtml and remove turkish HTML element lowercasing workaround!

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1, 6.0
    • Component/s: modules/benchmark
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      LUCENE-4220 added nekohtml as new parser for HTML files in benchamrk module. Unfortunately the nekohtml parser had the well known lowercase dotless-i bug when using the turkish locale.

      Version 1.9.17 of nekohtml fixes this bug and was released a few days ago (http://nekohtml.sourceforge.net/changes.html). This issue will update it and remove the workaround.

      1. LUCENE-4589.patch
        7 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Patch that upgrades nekohtml and removes the workaround. WIthout the workaround, the old nekohtml version failed the testTurkish() test, but with 1.9.17, it passes as expected.

          I will commit soon.

          Show
          Uwe Schindler added a comment - Patch that upgrades nekohtml and removes the workaround. WIthout the workaround, the old nekohtml version failed the testTurkish() test, but with 1.9.17, it passes as expected. I will commit soon.
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision 1417694, 4.x revision 1417696.

          Show
          Uwe Schindler added a comment - Committed trunk revision 1417694, 4.x revision 1417696.
          Hide
          Robert Muir added a comment -

          Thanks Uwe!

          Show
          Robert Muir added a comment - Thanks Uwe!
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Uwe Schindler
          http://svn.apache.org/viewvc?view=revision&revision=1417904

          LUCENE-4589: Fix maven pom

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1417904 LUCENE-4589 : Fix maven pom
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Uwe Schindler
          http://svn.apache.org/viewvc?view=revision&revision=1417696

          LUCENE-4589: Upgraded benchmark module's Nekohtml dependency to version 1.9.17, removing the workaround in Lucene's HTML parser for the Turkish locale

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1417696 LUCENE-4589 : Upgraded benchmark module's Nekohtml dependency to version 1.9.17, removing the workaround in Lucene's HTML parser for the Turkish locale
          Hide
          Commit Tag Bot added a comment -
          Show
          Commit Tag Bot added a comment - [trunk commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1417901 LUCENE-4589 : Fix maven pom
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Uwe Schindler
          http://svn.apache.org/viewvc?view=revision&revision=1417694

          LUCENE-4589: Upgraded benchmark module's Nekohtml dependency to version 1.9.17, removing the workaround in Lucene's HTML parser for the Turkish locale

          Show
          Commit Tag Bot added a comment - [trunk commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1417694 LUCENE-4589 : Upgraded benchmark module's Nekohtml dependency to version 1.9.17, removing the workaround in Lucene's HTML parser for the Turkish locale
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Uwe Schindler
          http://svn.apache.org/viewvc?view=revision&revision=1417904

          Merged revision(s) 1417901 from lucene/dev/trunk:
          LUCENE-4589: Fix maven pom

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1417904 Merged revision(s) 1417901 from lucene/dev/trunk: LUCENE-4589 : Fix maven pom
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Uwe Schindler
          http://svn.apache.org/viewvc?view=revision&revision=1417696

          Merged revision(s) 1417694 from lucene/dev/trunk:
          LUCENE-4589: Upgraded benchmark module's Nekohtml dependency to version 1.9.17, removing the workaround in Lucene's HTML parser for the Turkish locale

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1417696 Merged revision(s) 1417694 from lucene/dev/trunk: LUCENE-4589 : Upgraded benchmark module's Nekohtml dependency to version 1.9.17, removing the workaround in Lucene's HTML parser for the Turkish locale

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development