Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
2.9
-
None
-
New, Patch Available
Description
A user reported that wordnet parses the prolog file incorrectly.
Also need to check the wordnet parser in the memory contrib for this problem.
If this is a false alarm, i'm not worried, because the test will be the first unit test wordnet package ever had.
For example, looking up the synsets for the word "king", we get: java SynLookup wnindex king baron magnate mogul power queen rex scrofula struma tycoon Here, "scrofula" and "struma" are extraneous. This happens because, the line parser code in Syns2Index.java interpretes the two consecutive single quotes in entry s(114144247,3,'king''s evil',n,1,1) in wn_s.pl file, as termination of the string and separates into "king". This entry concerns synset of words "scrofula" and "struma", and thus they get inserted in the synset of "king". *There 1382 such entries, in wn_s.pl* and more in other WordNet Prolog data-base files, where such use of two consecutive single quotes appears. We have resolved this by adding a statement in the line parsing portion of Syns2Index.java, as follows: // parse line line = line.substring(2); * line = line.replaceAll("\'\'", "`"); // added statement* int comma = line.indexOf(','); String num = line.substring(0, comma); ... ... etc. In short we replace "''" by "`" (a back-quote). Then on recreating the index, we get: java SynLookup zwnindex king baron magnate mogul power queen rex tycoon