Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:

      Description

      Currently the XHTML doesn't contain links, although PDFBox parses them. I'm new to Tika and haven't done java for 6 years, but someone more experienced could probably do this in a few hours.

      The PDF2XHTML method loops through the annotations.

      See:

      136: for(Object o : page.getAnnotations()) {
      

      I found some code for dealing with links in annotations:
      http://stackoverflow.com/questions/7174709/pdfbox-not-recognizing-a-link

      It involves checking the class.

      if( annotation instanceof PDAnnotationLink ) {
                      PDAnnotationLink link = (PDAnnotationLink)annotation;
      

      I hope this helps someone.

        Attachments

        1. TIKA-861.patch
          2 kB
          Ryan Quam
        2. TIKA-861-test.patch
          0.8 kB
          Ryan Quam

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              zoomby Sasha Goodman
            • Votes:
              2 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified