Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: fetcher
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Due to API changes, the RTF parser (which is not compiled by default due to licensing problem) doesn't compile anymore.

      The build.xml script doesn't work anymore too as http://www.cobase.cs.ucla.edu/pub/javacc/rtf_parser_src.jar doesn't exist anymore (404). I didn't fix the build.xml as I don't know from where we want to get the jar file but only the compilations issues.

      Regards,


      Guillaume

      1. RTFParseFactory.java-compilation_issues.diff
        2 kB
        Guillaume Smet
      2. NUTCH-644_v3.patch
        171 kB
        Dmitry Lihachev
      3. NUTCH-644_v2.patch
        168 kB
        Dmitry Lihachev

        Issue Links

          Activity

          Hide
          dogacan Doğacan Güney added a comment -

          I am going to commit this but

          1) parse-rtf unit test is not updated
          2) Does anyone know where the rtf_parser_src.jar is?

          Show
          dogacan Doğacan Güney added a comment - I am going to commit this but 1) parse-rtf unit test is not updated 2) Does anyone know where the rtf_parser_src.jar is?
          Hide
          dmitry.lihachev Dmitry Lihachev added a comment -

          I found sources of RTFParser.jj (ASF) and RTFParserDelegate.java (LGPL) at https://atleap.dev.java.net/source/browse/atleap/application/src/common/com/blandware/atleap/common/parsers/rtf/.

          Show
          dmitry.lihachev Dmitry Lihachev added a comment - I found sources of RTFParser.jj (ASF) and RTFParserDelegate.java (LGPL) at https://atleap.dev.java.net/source/browse/atleap/application/src/common/com/blandware/atleap/common/parsers/rtf/ .
          Hide
          dmitry.lihachev Dmitry Lihachev added a comment -

          this parser incorrectly handles non-ascii input (when system encoding UTF-8).
          so I create new issue (NUTCH-705) whith new parser, and I think it can be released with nutch-1.0

          Show
          dmitry.lihachev Dmitry Lihachev added a comment - this parser incorrectly handles non-ascii input (when system encoding UTF-8). so I create new issue ( NUTCH-705 ) whith new parser, and I think it can be released with nutch-1.0
          Hide
          jnioche Julien Nioche added a comment -

          RTF parsing is now handled by the TikaPlugin (NUTCH-766) which solves the issue of licensing.

          Show
          jnioche Julien Nioche added a comment - RTF parsing is now handled by the TikaPlugin ( NUTCH-766 ) which solves the issue of licensing.
          Show
          markus17 Markus Jelsma added a comment - Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open_legacy_issues_in_jira

            People

            • Assignee:
              Unassigned
              Reporter:
              gsmet Guillaume Smet
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development