Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-337

SWF parser

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • parser
    • None

    Description

      Here is an initial implementation of a SWF Parser which uses JavaSWF and has been adapted from A. Bialecki's implementation for Nutch.
      The main differences with the implementation for Nutch is that we use the latest version of JavaSWF and do not try to extract text from the actions or structured URLs. As usual URLs can be obtained from the text extracted using ParserPostProcessor.
      JavaSWF has changed quite a bit since the Nutch integration and I wanted to keep this initial port nice and simple. It should be possible to extract the URLs from the actions using JavaSWF's API, I think this is what they did in Heritrix.

      Attachments

        1. test.swf
          50 kB
          Julien Nioche
        2. TIKA-337.patch
          12 kB
          Julien Nioche

        Issue Links

          Activity

            People

              jukkaz Jukka Zitting
              jnioche Julien Nioche
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: