Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0, 1.14
    • Component/s: None
    • Labels:
      None

      Description

      Now that we have detection for applefile (regular file w extra apple header), we are losing content that used to be extracted. For example, pdf files that have this extra header are now detected as applefile and then not parsed. I found a spec and will commit on Monday.

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tika-trunk #1070 (See https://builds.apache.org/job/Tika-trunk/1070/)
        TIKA-2022 – add applefile parser (tallison: rev 47221b90624eb1bba990a1930cb4163489883d8b)

        • CHANGES.txt
          TIKA-2022 – add applefile parser (tallison: rev 0f3b0bdb5b78177e9f0fca88f889e7919823c177)
        • tika-parsers/src/test/resources/test-documents/testAppleSingleFile.pdf
        • tika-parsers/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
        • tika-parsers/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java
        • tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tika-trunk #1070 (See https://builds.apache.org/job/Tika-trunk/1070/ ) TIKA-2022 – add applefile parser (tallison: rev 47221b90624eb1bba990a1930cb4163489883d8b) CHANGES.txt TIKA-2022 – add applefile parser (tallison: rev 0f3b0bdb5b78177e9f0fca88f889e7919823c177) tika-parsers/src/test/resources/test-documents/testAppleSingleFile.pdf tika-parsers/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java tika-parsers/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tika-trunk #1071 (See https://builds.apache.org/job/Tika-trunk/1071/)
        TIKA-2022 – clean up test, change dependency on CloseShieldInputStream (tallison: rev e6c2839c0a77db90c52e7cf5c3841d09e8cce3b3)

        • tika-parsers/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java
        • tika-parsers/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tika-trunk #1071 (See https://builds.apache.org/job/Tika-trunk/1071/ ) TIKA-2022 – clean up test, change dependency on CloseShieldInputStream (tallison: rev e6c2839c0a77db90c52e7cf5c3841d09e8cce3b3) tika-parsers/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java tika-parsers/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in tika-2.x-windows #18 (See https://builds.apache.org/job/tika-2.x-windows/18/)
        TIKA-2022 – add parser for applefile (tallison: rev b14b47e76a4cba829b17d5180ebd591e641ad683)

        • tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
        • tika-parser-modules/tika-parser-office-module/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
        • CHANGES.txt
        • tika-test-resources/src/test/resources/test-documents/testAppleSingleFile.pdf
        • tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
        • tika-parser-bundles/tika-parser-office-bundle/src/test/java/org/apache/tika/module/office/BundleIT.java
        • tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in tika-2.x-windows #18 (See https://builds.apache.org/job/tika-2.x-windows/18/ ) TIKA-2022 – add parser for applefile (tallison: rev b14b47e76a4cba829b17d5180ebd591e641ad683) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-parser-modules/tika-parser-office-module/src/main/resources/META-INF/services/org.apache.tika.parser.Parser CHANGES.txt tika-test-resources/src/test/resources/test-documents/testAppleSingleFile.pdf tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java tika-parser-bundles/tika-parser-office-bundle/src/test/java/org/apache/tika/module/office/BundleIT.java tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-2.x #114 (See https://builds.apache.org/job/tika-2.x/114/)
        TIKA-2022 – add parser for applefile (tallison: rev b14b47e76a4cba829b17d5180ebd591e641ad683)

        • tika-parser-modules/tika-parser-office-module/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
        • tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
        • tika-test-resources/src/test/resources/test-documents/testAppleSingleFile.pdf
        • tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
        • tika-parser-bundles/tika-parser-office-bundle/src/test/java/org/apache/tika/module/office/BundleIT.java
        • tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java
        • CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-2.x #114 (See https://builds.apache.org/job/tika-2.x/114/ ) TIKA-2022 – add parser for applefile (tallison: rev b14b47e76a4cba829b17d5180ebd591e641ad683) tika-parser-modules/tika-parser-office-module/src/main/resources/META-INF/services/org.apache.tika.parser.Parser tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-test-resources/src/test/resources/test-documents/testAppleSingleFile.pdf tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java tika-parser-bundles/tika-parser-office-bundle/src/test/java/org/apache/tika/module/office/BundleIT.java tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tika-trunk #1073 (See https://builds.apache.org/job/Tika-trunk/1073/)
        TIKA-2022 – clean up AppleSingleFileParser to use EndianUtils, shorten (tallison: rev 2c4670e534bcc3535601de8a069e7708961ae269)

        • tika-core/src/test/java/org/apache/tika/io/EndianUtilsTest.java
        • tika-core/src/main/java/org/apache/tika/io/EndianUtils.java
        • tika-parsers/src/test/resources/test-documents/testAppleSingleFile.pdf
        • tika-parsers/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java
        • tika-parsers/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tika-trunk #1073 (See https://builds.apache.org/job/Tika-trunk/1073/ ) TIKA-2022 – clean up AppleSingleFileParser to use EndianUtils, shorten (tallison: rev 2c4670e534bcc3535601de8a069e7708961ae269) tika-core/src/test/java/org/apache/tika/io/EndianUtilsTest.java tika-core/src/main/java/org/apache/tika/io/EndianUtils.java tika-parsers/src/test/resources/test-documents/testAppleSingleFile.pdf tika-parsers/src/test/java/org/apache/tika/parser/apple/AppleSingleFileParserTest.java tika-parsers/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-2.x #116 (See https://builds.apache.org/job/tika-2.x/116/)
        TIKA-2022 - clean up – make entries private, move more into EndianUtils (tallison: rev c84855f6757c714a9fdcec55ca14b628a107642e)

        • tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
        • tika-core/src/main/java/org/apache/tika/io/EndianUtils.java
        • tika-core/src/test/java/org/apache/tika/io/EndianUtilsTest.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-2.x #116 (See https://builds.apache.org/job/tika-2.x/116/ ) TIKA-2022 - clean up – make entries private, move more into EndianUtils (tallison: rev c84855f6757c714a9fdcec55ca14b628a107642e) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java tika-core/src/main/java/org/apache/tika/io/EndianUtils.java tika-core/src/test/java/org/apache/tika/io/EndianUtilsTest.java

          People

          • Assignee:
            tallison@mitre.org Tim Allison
            Reporter:
            tallison@mitre.org Tim Allison
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development