Tika
  1. Tika
  2. TIKA-903

NPE thrown with password protected Pages file

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0
    • Fix Version/s: None
    • Component/s: parser
    • Environment:

      Windows 7

      Description

      When trying to view a password-protected Pages file in Tika GUI, you get an NPE:

      org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.iwork.IWorkPackageParser@30583058
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
      at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:279)
      at org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94)
      at org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
      at javax.swing.TransferHandler.importData(TransferHandler.java:756)
      at javax.swing.TransferHandler$DropHandler.drop(TransferHandler.java:1479)
      at java.awt.dnd.DropTarget.drop(DropTarget.java:445)
      at javax.swing.TransferHandler$SwingDropTarget.drop(TransferHandler.java:1204)
      at sun.awt.dnd.SunDropTargetContextPeer.processDropMessage(SunDropTargetContextPeer.java:531)
      at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchDropEvent(SunDropTargetContextPeer.java:844)
      at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(SunDropTargetContextPeer.java:768)
      at sun.awt.dnd.SunDropTargetEvent.dispatch(SunDropTargetEvent.java:42)
      at java.awt.Component.dispatchEventImpl(Component.java:4498)
      at java.awt.Container.dispatchEventImpl(Container.java:2110)
      at java.awt.Component.dispatchEvent(Component.java:4471)
      at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4588)
      at java.awt.LightweightDispatcher.processDropTargetEvent(Container.java:4323)
      at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4174)
      at java.awt.Container.dispatchEventImpl(Container.java:2096)
      at java.awt.Window.dispatchEventImpl(Window.java:2490)
      at java.awt.Component.dispatchEvent(Component.java:4471)
      at java.awt.EventQueue.dispatchEvent(EventQueue.java:610)
      at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:280)
      at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:195)
      at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:185)
      at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:180)
      at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:172)
      at java.awt.EventDispatchThread.run(EventDispatchThread.java:133)
      Caused by: java.lang.NullPointerException
      at org.apache.tika.parser.iwork.IWorkPackageParser$IWORKDocumentType.detectType(IWorkPackageParser.java:125)
      at org.apache.tika.parser.iwork.IWorkPackageParser$IWORKDocumentType.access$000(IWorkPackageParser.java:71)
      at org.apache.tika.parser.iwork.IWorkPackageParser.parse(IWorkPackageParser.java:166)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      ... 30 more

      I tried viewing the contents in 7-zip, but it tells me it can't understand the compression format.

        Activity

        Hide
        Gabriel Valencia added a comment -

        Password for this file is: tika

        Show
        Gabriel Valencia added a comment - Password for this file is: tika
        Hide
        Gabriel Valencia added a comment -

        Password for this file is: tika

        Show
        Gabriel Valencia added a comment - Password for this file is: tika
        Hide
        Nick Burch added a comment -

        This file certainly isn't protected using regular zip passwords or similar, there's something very odd instead. It's enough to break unzip listing....

        $ unzip -l testPagesVariousPwdProtected.pagesArchive: testPagesVariousPwdProtected.pages
        Length Date Time Name
        --------- ---------- ----- ----

            • buffer overflow detected ***: unzip terminated
              ======= Backtrace: =========
              /lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x50)[0xb76ef2d0]
              /lib/tls/i686/cmov/libc.so.6(+0xe120a)[0xb76ee20a]
              /lib/tls/i686/cmov/libc.so.6(+0xe0948)[0xb76ed948]
              /lib/tls/i686/cmov/libc.so.6(_IO_default_xsputn+0x9e)[0xb76766ce]
              /lib/tls/i686/cmov/libc.so.6(_IO_vfprintf+0xf3e)[0xb764ab4e]
              /lib/tls/i686/cmov/libc.so.6(__vsprintf_chk+0xad)[0xb76ed9fd]
              /lib/tls/i686/cmov/libc.so.6(__sprintf_chk+0x2d)[0xb76ed93d]
              unzip[0x8056c4a]
              unzip[0x805a313]
              unzip[0x805a4b7]
              unzip[0x804b0cf]
              unzip[0x804b3d0]
              /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xb7623bd6]
              unzip[0x8049671]

        $ unzip testPagesVariousPwdProtected.pages
        Archive: testPagesVariousPwdProtected.pages
        skipping: buildVersionHistory.plist unsupported compression method 25451
        skipping: index.xml unsupported compression method 25452

        Show
        Nick Burch added a comment - This file certainly isn't protected using regular zip passwords or similar, there's something very odd instead. It's enough to break unzip listing.... $ unzip -l testPagesVariousPwdProtected.pagesArchive: testPagesVariousPwdProtected.pages Length Date Time Name --------- ---------- ----- ---- buffer overflow detected ***: unzip terminated ======= Backtrace: ========= /lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x50) [0xb76ef2d0] /lib/tls/i686/cmov/libc.so.6(+0xe120a) [0xb76ee20a] /lib/tls/i686/cmov/libc.so.6(+0xe0948) [0xb76ed948] /lib/tls/i686/cmov/libc.so.6(_IO_default_xsputn+0x9e) [0xb76766ce] /lib/tls/i686/cmov/libc.so.6(_IO_vfprintf+0xf3e) [0xb764ab4e] /lib/tls/i686/cmov/libc.so.6(__vsprintf_chk+0xad) [0xb76ed9fd] /lib/tls/i686/cmov/libc.so.6(__sprintf_chk+0x2d) [0xb76ed93d] unzip [0x8056c4a] unzip [0x805a313] unzip [0x805a4b7] unzip [0x804b0cf] unzip [0x804b3d0] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7623bd6] unzip [0x8049671] $ unzip testPagesVariousPwdProtected.pages Archive: testPagesVariousPwdProtected.pages skipping: buildVersionHistory.plist unsupported compression method 25451 skipping: index.xml unsupported compression method 25452
        Hide
        Nick Burch added a comment -

        As of r1331503 these should no longer break. The iWorks parser will return an empty document for them, and set the content type to application/x-tika-iworks-protected . Until we know how the encryption works though, we can't do anything else

        Show
        Nick Burch added a comment - As of r1331503 these should no longer break. The iWorks parser will return an empty document for them, and set the content type to application/x-tika-iworks-protected . Until we know how the encryption works though, we can't do anything else
        Hide
        Nick Burch added a comment -

        On a related note, it's not looking hopeful for getting any sort of documentation on the file formats - https://discussions.apple.com/message/18239551#18239551

        Show
        Nick Burch added a comment - On a related note, it's not looking hopeful for getting any sort of documentation on the file formats - https://discussions.apple.com/message/18239551#18239551
        Hide
        Gabriel Valencia added a comment -

        Yes, TIKA-402 has a URL to a page that used to have the file specification for some of the apps, but that page was taken down at some point.

        Show
        Gabriel Valencia added a comment - Yes, TIKA-402 has a URL to a page that used to have the file specification for some of the apps, but that page was taken down at some point.
        Hide
        Nick Burch added a comment -

        Ah, good spot. That page has gone, but if you view it in the internet archive you can get the text of the first few pages. Plug that into google, and you find an archive of the PDF version: http://www.filibeto.org/unix/macos/lib/dev/documentation/AppleApplications/Conceptual/iWork2-0_XML/iWork2-0_XML.pdf

        Based on the dates in the file, I don't think it covers the most recent version. Doesn't have password stuff in it for example, but may be handy for some of the very basics

        Show
        Nick Burch added a comment - Ah, good spot. That page has gone, but if you view it in the internet archive you can get the text of the first few pages. Plug that into google, and you find an archive of the PDF version: http://www.filibeto.org/unix/macos/lib/dev/documentation/AppleApplications/Conceptual/iWork2-0_XML/iWork2-0_XML.pdf Based on the dates in the file, I don't think it covers the most recent version. Doesn't have password stuff in it for example, but may be handy for some of the very basics

          People

          • Assignee:
            Unassigned
            Reporter:
            Gabriel Valencia
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development