Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1232

Add PDF version to PDFParser output

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.5
    • 1.6
    • parser
    • None
    • JDK6

    Description

      I'd like to identify the PDF version of files, this is not currently reported by the PDFParser although the information is available via PDFBox. I have attached a patch that adds the format version to the Metadata object.

      However, I am not familiar enough with the Tika source to know if an alternative metadata key should be used, or this new one added.

      Comments welcome.

      Attachments

        1. pdfversion.patch
          0.8 kB
          William Palmer
        2. TIKA-1232v1.patch
          8 kB
          Tim Allison
        3. TIKA-1232v2.patch
          9 kB
          Tim Allison
        4. Sample 4.x.pdf
          10 kB
          Alexandre Madurell
        5. Sample 5.x.pdf
          6 kB
          Alexandre Madurell
        6. Sample 6.x.pdf
          6 kB
          Alexandre Madurell
        7. Sample 7.x.pdf
          6 kB
          Alexandre Madurell
        8. Sample 8.x.pdf
          6 kB
          Alexandre Madurell
        9. Sample 9.x.pdf
          6 kB
          Alexandre Madurell
        10. Sample 10.x.pdf
          6 kB
          Alexandre Madurell
        11. Sample 11.x PDFA-1b.pdf
          23 kB
          Alexandre Madurell
        12. testComment.pdf
          67 kB
          Tyler Bui-Palsulich

        Activity

          People

            tallison Tim Allison
            willp-bl William Palmer
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: