Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-4126

PDF XMP ModifyDate extracted without TimeZone info

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0, 2.7.0, 2.8.0, 2.9.0
    • 2.9.1
    • parser
    • None
    • Patch, Important

    Description

      I've run:
      [root@localhost Downloads]# java -jar tika-app-2.9.0.jar sobreavisoEditado3.pdf | grep xmp

      that returned a time in UTC

      WARN [main] 07:42:34,238 org.apache.pdfbox.pdmodel.font.PDType1Font Using fallback font LiberationSans for base font Symbol
      WARN [main] 07:42:34,241 org.apache.pdfbox.pdmodel.font.PDType1Font Using fallback font LiberationSans for base font ZapfDingbats
      <meta name="xmp:ModifyDate" content="2023-09-06T13:35:38Z"/>
      <meta name="xmp:MetadataDate" content="2023-09-06T13:35:38Z"/>
      <meta name="xmpTPg:NPages" content="11"/>

       

       

      While running:

       {{java -jar pdfbox-app-2.0.29.jar ExtractXMP -console sobreavisoEditado3.pdf }}

      Returned the correct info with the timezone info (-04:00):

      <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description rdf:about="" xmp:ModifyDate="2023-09-06T13:35:38-04:00" xmlns:xmp="http://ns.adobe.com/xap/1.0/"><xmp:MetadataDate>2023-09-06T13:35:38-04:00</xmp:MetadataDate></rdf:Description></rdf:RDF></x:xmpmeta><?xpacket end="w"?>{}

       

      So the metadata string had striped its timezone info, without making any HOUR OF DAY shift to the UTC timezone.

      Attachments

        Issue Links

          Activity

            People

              tallison Tim Allison
              patrickdalla Patrick Dalla Bernardina
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 168h
                  168h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified