Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2458

Unify number of pages metadata key?

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • core
    • None

    Description

      On TIKA-2451, we're adding a metadata value for the number of images in a tiff. This raises the broader (admittedly minor) question of how we want to handle "number of pages".

      I'm opening this issue for discussion and feedback.

      Unfortunately Dublin Core doesn't have a number of pages element as far as a I can tell.

      Do we want to have a single key in TikaCoreProperties that is "number of pages" that would be used for:

      1. number of pages in a PDF
      2. number of pages that a .docx alleges it has
      3. the number of slides in a PPT
      4. the number of sheets in an XLS
      5. the number of tiffs in a multi-image tiff

      Others?

      Or, do we want to have different keys MSOffice.PageCount, PagedText.N_PAGES, TIFF.NUM_TIFFS

      Or, thanks to the beauty of composite keys, do we want to have both a unified key and the above individual keys?

      *I would propose using PagedText's N_PAGES as the unifying key, but the definition of that seems to be strictly within XMP-land and it should be a sum of the pages in the container document and all embedded documents according to our javadocs.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: