Nutch
  1. Nutch
  2. NUTCH-644

RTF parser doesn't compile anymore

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: fetcher
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Due to API changes, the RTF parser (which is not compiled by default due to licensing problem) doesn't compile anymore.

      The build.xml script doesn't work anymore too as http://www.cobase.cs.ucla.edu/pub/javacc/rtf_parser_src.jar doesn't exist anymore (404). I didn't fix the build.xml as I don't know from where we want to get the jar file but only the compilations issues.

      Regards,


      Guillaume

      1. RTFParseFactory.java-compilation_issues.diff
        2 kB
        Guillaume Smet
      2. NUTCH-644_v3.patch
        171 kB
        Dmitry Lihachev
      3. NUTCH-644_v2.patch
        168 kB
        Dmitry Lihachev

        Issue Links

          Activity

          Guillaume Smet created issue -
          Guillaume Smet made changes -
          Field Original Value New Value
          Attachment RTFParseFactory.java-compilation_issues.diff [ 12387819 ]
          Hide
          Doğacan Güney added a comment -

          I am going to commit this but

          1) parse-rtf unit test is not updated
          2) Does anyone know where the rtf_parser_src.jar is?

          Show
          Doğacan Güney added a comment - I am going to commit this but 1) parse-rtf unit test is not updated 2) Does anyone know where the rtf_parser_src.jar is?
          Hide
          Dmitry Lihachev added a comment -

          I found sources of RTFParser.jj (ASF) and RTFParserDelegate.java (LGPL) at https://atleap.dev.java.net/source/browse/atleap/application/src/common/com/blandware/atleap/common/parsers/rtf/.

          Show
          Dmitry Lihachev added a comment - I found sources of RTFParser.jj (ASF) and RTFParserDelegate.java (LGPL) at https://atleap.dev.java.net/source/browse/atleap/application/src/common/com/blandware/atleap/common/parsers/rtf/ .
          Dmitry Lihachev made changes -
          Attachment NUTCH-644_v2.patch [ 12400846 ]
          Dmitry Lihachev made changes -
          Attachment NUTCH-644_v3.patch [ 12400848 ]
          Hide
          Dmitry Lihachev added a comment -

          this parser incorrectly handles non-ascii input (when system encoding UTF-8).
          so I create new issue (NUTCH-705) whith new parser, and I think it can be released with nutch-1.0

          Show
          Dmitry Lihachev added a comment - this parser incorrectly handles non-ascii input (when system encoding UTF-8). so I create new issue ( NUTCH-705 ) whith new parser, and I think it can be released with nutch-1.0
          Dmitry Lihachev made changes -
          Link This issue is duplicated by NUTCH-705 [ NUTCH-705 ]
          Hide
          Julien Nioche added a comment -

          RTF parsing is now handled by the TikaPlugin (NUTCH-766) which solves the issue of licensing.

          Show
          Julien Nioche added a comment - RTF parsing is now handled by the TikaPlugin ( NUTCH-766 ) which solves the issue of licensing.
          Julien Nioche made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Show
          Markus Jelsma added a comment - Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open_legacy_issues_in_jira
          Markus Jelsma made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          558d 20h 23m 1 Julien Nioche 18/Feb/10 10:49
          Resolved Resolved Closed Closed
          407d 4h 17m 1 Markus Jelsma 01/Apr/11 16:07

            People

            • Assignee:
              Unassigned
              Reporter:
              Guillaume Smet
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development