Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2001

wordnet parsing bug

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.9
    • 2.9.1, 3.0
    • modules/other
    • None
    • New, Patch Available

    Description

      A user reported that wordnet parses the prolog file incorrectly.

      Also need to check the wordnet parser in the memory contrib for this problem.

      If this is a false alarm, i'm not worried, because the test will be the first unit test wordnet package ever had.

      For example, looking up the synsets for the
      word "king", we get:
      
      java SynLookup wnindex king
      baron
      magnate
      mogul
      power
      queen
      rex
      scrofula
      struma
      tycoon
      
      Here, "scrofula" and "struma" are extraneous. This happens because, the line
      parser code in Syns2Index.java interpretes the two consecutive single quotes
      in entry s(114144247,3,'king''s evil',n,1,1) in  wn_s.pl file, as
      termination
      of the string and separates into "king". This entry concerns
      synset of words "scrofula" and "struma", and thus they get inserted in the
      synset of "king". *There 1382 such entries, in wn_s.pl* and more in other
      WordNet
      Prolog data-base files, where such use of two consecutive single quotes
      appears.
      
      We have resolved this by adding a statement in the line parsing portion of
      Syns2Index.java, as follows:
      
                 // parse line
                 line = line.substring(2);
                * line = line.replaceAll("\'\'", "`"); // added statement*
                 int comma = line.indexOf(',');
                 String num = line.substring(0, comma);  ... ... etc.
      In short we replace "''" by "`" (a back-quote). Then on recreating the
      index, we get:
      
      java SynLookup zwnindex king
      baron
      magnate
      mogul
      power
      queen
      rex
      tycoon
      

      Attachments

        1. LUCENE-2001.patch
          7 kB
          Robert Muir
        2. LUCENE-2001_branch.patch
          7 kB
          Robert Muir
        3. LUCENE-2001_branch.patch
          7 kB
          Robert Muir

        Activity

          People

            gsingers Grant Ingersoll
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: