Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2228

WordPerfect parser update to support 5.x

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0, 1.15
    • Component/s: parser
    • Labels:
      None
    • Environment:

      Any

      Description

      I refactored the WordPerfect parser classes to support WP 5.x files. I will create a pull-request for it. It is an improvement to TIKA-1946.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user essiembre opened a pull request:

          https://github.com/apache/tika/pull/142

          Update to WordPerfect parser to support 5.x for TIKA-2228 contributed by pascal.essiembre

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/essiembre/tika TIKA-2228

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tika/pull/142.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #142


          commit c01451e027f107de6b7a306657bb292280ed4fff
          Author: Pascal Essiembre <pascal.essiembre@norconex.com>
          Date: 2016-12-23T19:59:20Z

          New WP5.x parser for TIKA-2228 contributed by pascal.essiembre


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user essiembre opened a pull request: https://github.com/apache/tika/pull/142 Update to WordPerfect parser to support 5.x for TIKA-2228 contributed by pascal.essiembre You can merge this pull request into a Git repository by running: $ git pull https://github.com/essiembre/tika TIKA-2228 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #142 commit c01451e027f107de6b7a306657bb292280ed4fff Author: Pascal Essiembre <pascal.essiembre@norconex.com> Date: 2016-12-23T19:59:20Z New WP5.x parser for TIKA-2228 contributed by pascal.essiembre
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Will take a look today. Sorry for the delay. Thank you for the contribution!

          Show
          tallison@mitre.org Tim Allison added a comment - Will take a look today. Sorry for the delay. Thank you for the contribution!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tika/pull/142

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/142
          Hide
          tallison@mitre.org Tim Allison added a comment - - edited

          Thank you, Pascal Essiembre!

          Looks like we're now getting more reliable text out of 5.0 than is viewable in LibreOffice at least on our one 5.0 test file.

          Show
          tallison@mitre.org Tim Allison added a comment - - edited Thank you, Pascal Essiembre ! Looks like we're now getting more reliable text out of 5.0 than is viewable in LibreOffice at least on our one 5.0 test file.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build tika-2.x #190 (See https://builds.apache.org/job/tika-2.x/190/)
          TIKA-2228 from Pascal Essiembre and TIKA-2230. (tallison: rev aaa661e25bd73a8ffa9f5eb96fba42d945e34c27)

          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/QPWTextExtractor.java
          • (edit) tika-parser-modules/tika-parser-office-module/pom.xml
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WPPrefixArea.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WordPerfectParser.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WPInputStream.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/wordperfect/WPInputStreamTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/wordperfect/QuattroProTest.java
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WP5DocumentAreaExtractor.java
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WP6DocumentAreaExtractor.java
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WP5Charsets.java
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WP6Charsets.java
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WPDocumentAreaExtractor.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/wordperfect/WordPerfectTest.java
          • (edit) tika-app/pom.xml
          • (edit) tika-parent/pom.xml
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/QuattroProParser.java
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WPPrefixAreaExtractor.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build tika-2.x #190 (See https://builds.apache.org/job/tika-2.x/190/ ) TIKA-2228 from Pascal Essiembre and TIKA-2230 . (tallison: rev aaa661e25bd73a8ffa9f5eb96fba42d945e34c27) (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/QPWTextExtractor.java (edit) tika-parser-modules/tika-parser-office-module/pom.xml (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WPPrefixArea.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WordPerfectParser.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WPInputStream.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/wordperfect/WPInputStreamTest.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/wordperfect/QuattroProTest.java (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WP5DocumentAreaExtractor.java (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WP6DocumentAreaExtractor.java (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WP5Charsets.java (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WP6Charsets.java (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WPDocumentAreaExtractor.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/wordperfect/WordPerfectTest.java (edit) tika-app/pom.xml (edit) tika-parent/pom.xml (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/QuattroProParser.java (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/wordperfect/WPPrefixAreaExtractor.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Tika-trunk #1170 (See https://builds.apache.org/job/Tika-trunk/1170/)
          TIKA-2228 - WordPerfect parser update to handle 5.x from Pascal (tallison: rev 6dc442da50dbb22bbf2b73076d5080f32af03067)

          • (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP5DocumentAreaExtractor.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/QuattroProParser.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/wordperfect/QuattroProTest.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/QPWTextExtractor.java
          • (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WPPrefixArea.java
          • (delete) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6Constants.java
          • (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP5Charsets.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WordPerfectParser.java
          • (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6Charsets.java
          • (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6DocumentAreaExtractor.java
          • (delete) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6FileHeader.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WPInputStream.java
          • (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WPDocumentAreaExtractor.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/wordperfect/WordPerfectTest.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/wordperfect/WPInputStreamTest.java
          • (delete) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6TextExtractor.java
          • (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WPPrefixAreaExtractor.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Tika-trunk #1170 (See https://builds.apache.org/job/Tika-trunk/1170/ ) TIKA-2228 - WordPerfect parser update to handle 5.x from Pascal (tallison: rev 6dc442da50dbb22bbf2b73076d5080f32af03067) (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP5DocumentAreaExtractor.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/QuattroProParser.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/wordperfect/QuattroProTest.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/QPWTextExtractor.java (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WPPrefixArea.java (delete) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6Constants.java (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP5Charsets.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WordPerfectParser.java (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6Charsets.java (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6DocumentAreaExtractor.java (delete) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6FileHeader.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WPInputStream.java (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WPDocumentAreaExtractor.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/wordperfect/WordPerfectTest.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/wordperfect/WPInputStreamTest.java (delete) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6TextExtractor.java (add) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WPPrefixAreaExtractor.java

            People

            • Assignee:
              Unassigned
              Reporter:
              pascal.essiembre Pascal Essiembre
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development