Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-826

TikaException / OfficeXmlFileException with .xlsb files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1
    • 1.1
    • parser
    • None
    • Windows 7

    Description

      The file testEXCEL.xlsb in the tika-parsers test-documents folder causes a POI OfficeXmlFileException when one tries to open it with TikaGUI or TikaCLI, using a latest build. The reason: Tika has it configured to be opened with the OfficeParser class, rather than the OOXMLParser class; it is an Office 2007 file, and should be opened with the OOXMLParser class. Neither the ExcelParserTest class nor the OOXMLParserTest class has anything related to .xlsb files. Once changes are made to these two parsers so that the OOXMLParser is used (I'll submit a patch shortly for these), the OfficeXmlFileException goes away, and a new POI exception (IllegalArgumentException in the ExtractorFactory class) arises in its place, somewhat related to unsolved POI bug 51921; the creator of this bug mentions a .xlsb file among others. This exception appears to occur because POI doesn't seem to be able to handle .xlsb files whatsoever. A cursory search of the source for "xlsb" or its mime type yields nothing relevant, and its project has no .xlsb test files that I can see.

      Attachments

        1. TIKA-826.patch
          2 kB
          John Mastarone

        Activity

          People

            Unassigned Unassigned
            jfm.apache John Mastarone
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: