Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3650

Removal of duplicate javax/* classes from tika-app jar

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • None
    • None
    • Java 8 

    Description

      The javax.xml.parsers.DocumentBuilderFactory.class present both in rt.jar from JDK and tika-app.jar. 
      We are using child first classloader to isolate the tika-app jar from the classpath for file parsing, the child first classloader loads the DocumentBuilderFactory interface from the tika-app jar. 

      If the tika-app.jar didn't contain the DocumentBuilderFactory class, the class will be loaded from the rt.jar. 

      Inside the serviceloader, there is a check happening to validate whether the interface and implementation classes are assignable to each other. We are facing a break here, as the interface is loaded from the tika-app jar. 

       

      public static DocumentBuilderFactory newInstance() {
      return FactoryFinder.find(
      /* The default property name according to the JAXP spec */
      DocumentBuilderFactory.class, // "javax.xml.parsers.DocumentBuilderFactory"
      /* The fallback implementation class name */
      "com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl");
      }

       

      DocumentBuilderFactory.class - this .class operator loads the class into the default classloader regardless of which classloader is in the current path. So this will always return the class object from the default classloader. 

      But during the tika parsing execution, the class loader will be different from the default one (child first classloader), and it will load both interface and implementation from the tika app jar. 

      As the DocumentBuilderFactory.class is created from the default classloader and the implementation class org.apache.xerces.jaxp.DocumentBuilderFactoryImpl is created in a different classloader (interface too loaded in the child first classloader), 
      both are not assignable to each other. 

      In a normal scenario ( most of us will use parent first classloader I assume), The javax.xml.parsers.DocumentBuilderFactory.class will be always loaded from the rt.jar (Java 8 has). The javax.xml.parsers.DocumentBuilderFactory inside the tika-app jar is redundant. 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              imaravin Aravinth
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: