Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3639

NullPointerException throws when parsing zip file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.0, 2.2.1
    • 2.3.0
    • parser
    • None

    Description

      Always throws a NullPointerException when detect zip file, it can be reproduced through the following steps.

      1. Create a zip file with a index.xml, the xml is simple
        <?xml version='1.0' encoding='UTF-8' ?>
        <index>
        </index> 

         

      2. add dependency to pom.xml, the Key  dependency ** is tika-parser-apple-module 
        <dependencies>
                <dependency>
                    <groupId>org.apache.tika</groupId>
                    <artifactId>tika-core</artifactId>
                    <version>2.2.1</version>
                </dependency>        
                    <dependency>
                    <groupId>org.apache.tika</groupId>
                    <artifactId>tika-parsers</artifactId>
                    <type>pom</type>
                    <version>2.2.1</version>
                </dependency>        
                    <dependency>
                    <groupId>org.apache.tika</groupId>
                    <artifactId>tika-parser-apple-module</artifactId>
                    <version>2.2.1</version>
                </dependency> 
      1. using tika.detect to parse zip file, it will throws a NPE
        String filePath = "123.zip";
        Tika tika = new Tika(); 
        type = tika.detect(new FileInputStream(new File(filePath)));

         Notice that when using tika.detect(String name), it‘s normal and return "application/zip",  the NPE situation only occur  when using tika.detect(InputStream stream)。

       

      It seems when tika parse a zip file through IWorkPackageParser,  tika will parsing index.xml, it will parse '.Number', '.key', '.pages', 'encrypted' file using below class in xml, when Number, key, pages are all empty, the encrypted's namespace is null, then in the for-loop it will throws a NPE.

      the source code below:

      KEYNOTE("http://developer.apple.com/namespaces/keynote2", "presentation",
                      MediaType.application("vnd.apple.keynote")),
      NUMBERS("http://developer.apple.com/namespaces/ls", "document",
                      MediaType.application("vnd.apple.numbers")),
      PAGES("http://developer.apple.com/namespaces/sl", "document",
                      MediaType.application("vnd.apple.pages")),
      ENCRYPTED(null, null, MediaType.application("x-tika-iworks-protected")); 
      public static IWORKDocumentType detectType(InputStream stream) {  
         QName qname = new XmlRootExtractor().extractRootElement(stream);      
         if (qname != null) {                
          String uri = qname.getNamespaceURI();                 
          String local = qname.getLocalPart();                     
          for (IWORKDocumentType type : values()) {                     
          if (type.getNamespace().equals(uri) && type.getPart().equals(local)) {            return type;                     
          }              
         } 
      

       

      Attachments

        1. exception.png
          49 kB
          Kaka Lee
        2. detectype.png
          37 kB
          Kaka Lee
        3. IWORKDocumentType.png
          52 kB
          Kaka Lee
        4. 123.zip
          0.2 kB
          Kaka Lee

        Activity

          People

            tallison Tim Allison
            Hermes_Lee Kaka Lee
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: