Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1548

System property added while catching exception on parsing PDF encrypted doc

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.7
    • 1.8
    • parser
    • None
    • Mac OS 10.10.2
      java version "1.7.0_60"

    Description

      I'm using Tika 1.7. I'm parsing an encrypted PDF document which raise an exception. So far, so good.

      My concern is that after that I have a new System property set sun.font.CFontManager.

      Code to reproduce the error:

      @Test
      public void testSystem() {
          Properties props = System.getProperties();
          assertThat(props.get("sun.font.fontmanager"), nullValue());
          try {
              tika().parseToString(new URL("https://github.com/elasticsearch/elasticsearch-mapper-attachments/raw/master/src/test/resources/org/elasticsearch/index/mapper/xcontent/encrypted.pdf"));
          } catch (Throwable e) {
          }
          assertThat(props.get("sun.font.fontmanager"), nullValue());
      }
      

      With Tika 1.7:

      [2015-02-11 16:43:36,166][INFO ][org.apache.pdfbox.pdfparser.PDFParser] Document is encrypted
      [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      [2015-02-11 16:43:36,839][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      [2015-02-11 16:43:36,842][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
      
      java.lang.AssertionError: 
      Expected: null
           but: was "sun.font.CFontManager"
       <Click to see difference>
      	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
      	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
      	at org.elasticsearch.plugin.mapper.attachments.test.TikaSystemTest.testSystem(TikaSystemTest.java:41)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
      	at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
      	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
      	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
      	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
      

      With Tika 1.6:

      [2015-02-11 16:38:42,922][INFO ][org.apache.pdfbox.pdfparser.PDFParser] Document is encrypted
      

      Note also that it logs a lot of errors which was not the case in Tika 1.6.

      Attachments

        Activity

          People

            Unassigned Unassigned
            dadoonet David Pilato
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: