Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:

      Description

      Currently the XHTML doesn't contain links, although PDFBox parses them. I'm new to Tika and haven't done java for 6 years, but someone more experienced could probably do this in a few hours.

      The PDF2XHTML method loops through the annotations.

      See:

      136: for(Object o : page.getAnnotations()) {
      

      I found some code for dealing with links in annotations:
      http://stackoverflow.com/questions/7174709/pdfbox-not-recognizing-a-link

      It involves checking the class.

      if( annotation instanceof PDAnnotationLink ) {
                      PDAnnotationLink link = (PDAnnotationLink)annotation;
      

      I hope this helps someone.

      1. TIKA-861-test.patch
        0.8 kB
        Ryan Quam
      2. TIKA-861.patch
        2 kB
        Ryan Quam

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            Sasha Goodman
          • Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 4h
              4h
              Remaining:
              Remaining Estimate - 4h
              4h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development