Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-824

Extract rel attr with LinkContentHandler

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.0, 1.1
    • 1.1
    • parser
    • None

    Description

      For Nutch we need to extract URL's but need the rel attribute to check for the nofollow value. I've patched the code to return this information in the Link object. It's been tested and i can read the rel in Nutch now.

      Thoughts?

      Attachments

        1. TIKA-824-trunk-1.patch
          3 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              chrismattmann Chris A. Mattmann
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: