Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1232

Add PDF version to PDFParser output

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.5
    • 1.6
    • parser
    • None
    • JDK6

    Description

      I'd like to identify the PDF version of files, this is not currently reported by the PDFParser although the information is available via PDFBox. I have attached a patch that adds the format version to the Metadata object.

      However, I am not familiar enough with the Tika source to know if an alternative metadata key should be used, or this new one added.

      Comments welcome.

      Attachments

        1. pdfversion.patch
          0.8 kB
          William Palmer
        2. Sample 10.x.pdf
          6 kB
          Alexandre Madurell
        3. Sample 11.x PDFA-1b.pdf
          23 kB
          Alexandre Madurell
        4. Sample 4.x.pdf
          10 kB
          Alexandre Madurell
        5. Sample 5.x.pdf
          6 kB
          Alexandre Madurell
        6. Sample 6.x.pdf
          6 kB
          Alexandre Madurell
        7. Sample 7.x.pdf
          6 kB
          Alexandre Madurell
        8. Sample 8.x.pdf
          6 kB
          Alexandre Madurell
        9. Sample 9.x.pdf
          6 kB
          Alexandre Madurell
        10. testComment.pdf
          67 kB
          Tyler Bui-Palsulich
        11. TIKA-1232v1.patch
          8 kB
          Tim Allison
        12. TIKA-1232v2.patch
          9 kB
          Tim Allison

        Activity

          People

            tallison Tim Allison
            willp-bl William Palmer
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: