Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1232

Add PDF version to PDFParser output

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.5
    • 1.6
    • parser
    • None
    • JDK6

    Description

      I'd like to identify the PDF version of files, this is not currently reported by the PDFParser although the information is available via PDFBox. I have attached a patch that adds the format version to the Metadata object.

      However, I am not familiar enough with the Tika source to know if an alternative metadata key should be used, or this new one added.

      Comments welcome.

      Attachments

        1. TIKA-1232v2.patch
          9 kB
          Tim Allison
        2. TIKA-1232v1.patch
          8 kB
          Tim Allison
        3. testComment.pdf
          67 kB
          Tyler Bui-Palsulich
        4. Sample 9.x.pdf
          6 kB
          Alexandre Madurell
        5. Sample 8.x.pdf
          6 kB
          Alexandre Madurell
        6. Sample 7.x.pdf
          6 kB
          Alexandre Madurell
        7. Sample 6.x.pdf
          6 kB
          Alexandre Madurell
        8. Sample 5.x.pdf
          6 kB
          Alexandre Madurell
        9. Sample 4.x.pdf
          10 kB
          Alexandre Madurell
        10. Sample 11.x PDFA-1b.pdf
          23 kB
          Alexandre Madurell
        11. Sample 10.x.pdf
          6 kB
          Alexandre Madurell
        12. pdfversion.patch
          0.8 kB
          William Palmer

        Activity

          People

            tallison Tim Allison
            willp-bl William Palmer
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: