Lucene - Core
  1. Lucene - Core
  2. LUCENE-1623

Back-compat break with non-ascii field names

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4, 2.4.1
    • Fix Version/s: 2.9
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      If a field name contains non-ascii characters in a 2.3.x index, then
      on upgrade to 2.4.x unexpected problems are hit. It's possible to hit
      a "read past EOF" IOException; it's also possible to not hit an
      exception but get an incorrect field name.

      This was caused by LUCENE-510, because the FieldInfos (*.fnm) file is
      not properly versioned.

      Spinoff from http://www.nabble.com/Read-past-EOF-td23276171.html

      1. LUCENE-1623.patch
        55 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Attached patch. I plan to commit in a day or two, and back-port to
        2.4.x branch.

        I updated the back compat test to show the failure, and also
        separately added 2.4 cases to the back-compat test.

        Show
        Michael McCandless added a comment - Attached patch. I plan to commit in a day or two, and back-port to 2.4.x branch. I updated the back compat test to show the failure, and also separately added 2.4 cases to the back-compat test.
        Hide
        Michael McCandless added a comment -

        Committed to trunk & 2.4 branch.

        Show
        Michael McCandless added a comment - Committed to trunk & 2.4 branch.
        Hide
        Uwe Schindler added a comment -

        Hi Mike,

        a little bit too late, but there is a small flow-error in handling of the IOException in FieldInfos ctor:

        } catch (IOException ioe) {
          if (format == FORMAT_PRE) {
             ...
          }
        }
        

        The problem: If the IOException occurs and the Format is not FORMAT_PRE, the Exception should be re-thrown.

        And here a suggestion:

        byNumber = new ArrayList();
        byName = new HashMap();
        

        I would simply clear() the two collections...

        Show
        Uwe Schindler added a comment - Hi Mike, a little bit too late, but there is a small flow-error in handling of the IOException in FieldInfos ctor: } catch (IOException ioe) { if (format == FORMAT_PRE) { ... } } The problem: If the IOException occurs and the Format is not FORMAT_PRE, the Exception should be re-thrown. And here a suggestion: byNumber = new ArrayList(); byName = new HashMap(); I would simply clear() the two collections...
        Hide
        Michael McCandless added a comment -

        Great catches Uwe, I'll fold them in – thanks!

        Show
        Michael McCandless added a comment - Great catches Uwe, I'll fold them in – thanks!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development