Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-6232

Reduce overhead of createTypeSystemDescription() and friends



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0uimaFIT, 3.2.0uimaFIT
    • uimaFIT
    • None


      uimaFIT offers a range of factory methods which use classpath scanning to locate type system descriptions, type priority definitions and index definitions.

      The present implementation scans for each type of object once and then stores the locations in which the descriptors were found in a global static variable. The user can call a method to clear this variable and force a re-scan.

      Whenever client code calls a method such as createTypeSystemDescription() the cached locations are read, parsed, and a corresponding Java descriptor object is created and returned.

      This issue is about two problems with this approach:

      1) finding of the descriptor locations does only consider the ClassLoader situation the first time the scanning takes place. If at a later stage, createTypeSystemDescription() is called in the context of a ClassLoader with access to a different set of descriptions, this is not considered.
      2) parsing the XML files every time e.g. createTypeSystemDescription() is called is slowing uimaFIT down overall. These methods are potentially called very often, in particular every time that createEngineDescription() or similar methods are called. Depending on the context, the parse overhead can have significant impact on the overall execution time.

      As a solution for 1), we could adopt a similar approach as it is used for JCas wrapper classes in the JCasImpl: the locations are stored in a WeakHashMap mapping the current ClassLoader to the discovered locations. The "current" ClassLoader is obtained via the Spring ClassUtils.getDefaultClassLoader() which is also (indirectly) used in many other places in uimaFIT. In particular, this method uses a Thead context classloader - if one is available.

      As a solution for 2), we do not only keep a WeakHashMap cache for the locations, but also for the parsed and aggregated XML files. When calling e.g. createTypeSystemDescription() and the cache already contains a respective descriptor, then a deep clone of it is returned. A similar approach (cloning a descriptor) was recently also introduced into UIMA Core to avoid repeatedly loading and parsing default flow controller definitions.

      *Benchmarking UIMA-




            rec Richard Eckart de Castilho
            rec Richard Eckart de Castilho
            0 Vote for this issue
            1 Start watching this issue