Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2722

Don't call Date.toString (Possible issue with JDK 11)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • Tika 1.18, JDK 11 with locale set to "ar-EG".  

    Description

      I'm troubleshooting a test failure in Apache Lucene/Sor "extracting" contrib that occurs in JDK 11 with locale "ar-EG".  JDK 8 & 9 passes; I don't know about JDK 10. It has to do with extracting date metadata from a PDF, particularly the created date but perhaps others too.

      I stepped through the code into Tika and I think I've found out where the troublesome code is. First note PDFParser line 271: addMetadata(metadata, "created", info.getCreationDate());. That addMetadata overload variant will call toString on a Date. IMO that's asking for trouble since the output of that is Locale-dependent. I think that's okay to show to a user but not for machine-to-machine information exchange. In the case of the test, it yielded this odd looking date string:

      Thu Nov 13 18:35:51 GMT+٠٥:٠٠ 2008

      I pasted that in and it looks consistent with what I see in IntelliJ and in Jenkins logs; hopefully will post correctly to JIRA. The odd part is the hour & minutes relative to GMT. I won't be certain until after I click "Create".

      Perhaps this problem is also indicative of a JDK 11 bug? Nevertheless I think Tika should avoid calling Date.toString().

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dsmiley David Smiley
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: