Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-759

Better handling of content type metadata

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • metadata, mime
    • None

    Description

      Currently we use the "Content-Type" metadata key for storing (and looking up) the media type of a document. This is simple enough and works well especially with HTTP, but not too well in line with XMP or other metadata standards like Dublin Core. So as an improvement I propose the following:

      • Switch to "dc:format" as the standard metadata key for the content type
      • Keep the existing "Content-Type" key for backwards compatibility with existing clients
      • Make the Metadata class aware of such aliases
      • Add getFormat() and setFormat() utility methods to Metadata to simplify client code and to make the exact metadata key more of an implementation detail in Tika

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jukkaz Jukka Zitting
              Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: