Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-5775

Performance problem MARKTABLE when matching case insensitive


    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.1ruta
    • Fix Version/s: 2.7.0ruta
    • Component/s: Ruta
    • Labels:



      We encounter a performance issue (or maybe infinitive loop) when we use the MARKTABLE action, with case insenstive valuelists.

      The call in our script is:

      MARKTABLE(LawName, 1, 'nl_law_names.ignorecase.csv', true, 0, "", 0, "lawIdentifier" = 2);

      Using the following input fragment will result in a timeout exception after 1 minute.

      Groenboek COM(2006) 105 definitief een Europese strategie voor duurzame, concurrerende en continu geleverde energie voor Europa {SEC(2006)317}

      That complete name is a Dutch lawname and also be an entry of the nl_law_names.csv file.

      When we try to match it and we have the ignoreCase flag to false, it is no problem and fast.. If we toggle that flag to true (case is ignored), the matching is really slow or even hanging in an infinitive loop.

      I debugged the code and pinpoint me to the TreeWordList class. The recursive method recursiveContains have a potential bug. 

      I think that the problem is when the item have a special character, that it is the same character in upper and lowercase. The recursive method will then look/fork twice on the same tree item.

      I made a fix that checks if the uppercase character is the same as the lowercase character, and in that case it only do the recursive call once. That solved the (performance) issue but I'm not sure if this is really the main problem and the current fix is the best fix for this.




            • Assignee:
              pkluegl Peter Klügl
              feaster83 Jasper Huzen
            • Votes:
              1 Vote for this issue
              3 Start watching this issue


              • Created: