Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2013

Upgrade to POI 3.15 when available

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0, 1.14
    • Component/s: None
    • Labels:
      None

      Issue Links

        Activity

        Hide
        tallison@mitre.org Tim Allison added a comment -

        POI 60044 was fixed before the release of 3.15, so that shouldn't be a problem with the new upgrade.

        Show
        tallison@mitre.org Tim Allison added a comment - POI 60044 was fixed before the release of 3.15, so that shouldn't be a problem with the new upgrade.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build tika-2.x #145 (See https://builds.apache.org/job/tika-2.x/145/)
        TIKA-2013 – upgrade to POI 3.15 – don't forget to close new NPOIFS and (tallison: rev 12b1d435bbdc5df9d5e396285c83ddeda44240ae)

        • (edit) tika-parser-bundles/tika-parser-office-bundle/pom.xml
        • (edit) tika-parser-modules/pom.xml
        • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
        • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java
        • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
        • (edit) tika-bundle/pom.xml
        • (edit) CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build tika-2.x #145 (See https://builds.apache.org/job/tika-2.x/145/ ) TIKA-2013 – upgrade to POI 3.15 – don't forget to close new NPOIFS and (tallison: rev 12b1d435bbdc5df9d5e396285c83ddeda44240ae) (edit) tika-parser-bundles/tika-parser-office-bundle/pom.xml (edit) tika-parser-modules/pom.xml (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java (edit) tika-bundle/pom.xml (edit) CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Tika-trunk #1103 (See https://builds.apache.org/job/Tika-trunk/1103/)
        TIKA-2013 – upgrade to POI 3.15-final, make sure to add new close() (tallison: rev cc6f6dcc8fed2826ae8093b7a4aed0ddee74dc40)

        • (edit) CHANGES.txt
        • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
        • (edit) tika-parsers/pom.xml
        • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
        • (edit) tika-bundle/pom.xml
        • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Tika-trunk #1103 (See https://builds.apache.org/job/Tika-trunk/1103/ ) TIKA-2013 – upgrade to POI 3.15-final, make sure to add new close() (tallison: rev cc6f6dcc8fed2826ae8093b7a4aed0ddee74dc40) (edit) CHANGES.txt (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java (edit) tika-parsers/pom.xml (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java (edit) tika-bundle/pom.xml (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Jenkins build tika-2.x-windows #49 (See https://builds.apache.org/job/tika-2.x-windows/49/)
        TIKA-2013 – upgrade to POI 3.15 – don't forget to close new NPOIFS and (tallison: rev 12b1d435bbdc5df9d5e396285c83ddeda44240ae)

        • (edit) tika-parser-bundles/tika-parser-office-bundle/pom.xml
        • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
        • (edit) CHANGES.txt
        • (edit) tika-parser-modules/pom.xml
        • (edit) tika-bundle/pom.xml
        • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java
        • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Jenkins build tika-2.x-windows #49 (See https://builds.apache.org/job/tika-2.x-windows/49/ ) TIKA-2013 – upgrade to POI 3.15 – don't forget to close new NPOIFS and (tallison: rev 12b1d435bbdc5df9d5e396285c83ddeda44240ae) (edit) tika-parser-bundles/tika-parser-office-bundle/pom.xml (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java (edit) CHANGES.txt (edit) tika-parser-modules/pom.xml (edit) tika-bundle/pom.xml (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
        Hide
        tallison@mitre.org Tim Allison added a comment -

        We should check to see if we should invoke new close methods for at least MAPIMessage. Thank you, Nick Burch, for pointing that out on TIKA-2058.

        Show
        tallison@mitre.org Tim Allison added a comment - We should check to see if we should invoke new close methods for at least MAPIMessage. Thank you, Nick Burch , for pointing that out on TIKA-2058 .
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Thank you, Javen O'Neal. I opened https://bz.apache.org/bugzilla/show_bug.cgi?id=60044 to track the regression in currency formatting in some xls.

        Show
        tallison@mitre.org Tim Allison added a comment - Thank you, Javen O'Neal . I opened https://bz.apache.org/bugzilla/show_bug.cgi?id=60044 to track the regression in currency formatting in some xls.
        Hide
        onealj Javen O'Neal added a comment - - edited

        > 1) some footers in PPT are not being extracted ("Prague" doesn't appear in -beta3)
        This is being investigated on the Apache POI bugzilla: https://bz.apache.org/bugzilla/show_bug.cgi?id=60003

        Show
        onealj Javen O'Neal added a comment - - edited > 1) some footers in PPT are not being extracted ("Prague" doesn't appear in -beta3) This is being investigated on the Apache POI bugzilla: https://bz.apache.org/bugzilla/show_bug.cgi?id=60003
        Hide
        tallison@mitre.org Tim Allison added a comment - - edited

        I compared Tika with poi-3.15-beta1 vs the pre-release poi-3.15-beta3.

        A number of exceptions were fixed. There was only one new exception.

        There may be two small regressions in content:
        1) some footers in PPT are not being extracted ("Prague" doesn't appear in -beta3)
        2) some numbers in XLS are being corrupted

        NOTE: these may be the fault of something we're doing at the Tika level. However, the upgrade from beta1 to the pre-release beta3 required no code changes.

        More investigation is required.

        The full batch of reports is available on github.

        To download the original files, prepend: http://162.242.228.174/docs/

        Show
        tallison@mitre.org Tim Allison added a comment - - edited I compared Tika with poi-3.15-beta1 vs the pre-release poi-3.15-beta3. A number of exceptions were fixed. There was only one new exception. There may be two small regressions in content: 1) some footers in PPT are not being extracted ("Prague" doesn't appear in -beta3) 2) some numbers in XLS are being corrupted NOTE: these may be the fault of something we're doing at the Tika level. However, the upgrade from beta1 to the pre-release beta3 required no code changes. More investigation is required. The full batch of reports is available on github . To download the original files, prepend: http://162.242.228.174/docs/

          People

          • Assignee:
            tallison@mitre.org Tim Allison
            Reporter:
            tallison@mitre.org Tim Allison
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development