Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
On TIKA-2451, we're adding a metadata value for the number of images in a tiff. This raises the broader (admittedly minor) question of how we want to handle "number of pages".
I'm opening this issue for discussion and feedback.
Unfortunately Dublin Core doesn't have a number of pages element as far as a I can tell.
Do we want to have a single key in TikaCoreProperties that is "number of pages" that would be used for:
- number of pages in a PDF
- number of pages that a .docx alleges it has
- the number of slides in a PPT
- the number of sheets in an XLS
- the number of tiffs in a multi-image tiff
Others?
Or, do we want to have different keys MSOffice.PageCount, PagedText.N_PAGES, TIFF.NUM_TIFFS
Or, thanks to the beauty of composite keys, do we want to have both a unified key and the above individual keys?
*I would propose using PagedText's N_PAGES as the unifying key, but the definition of that seems to be strictly within XMP-land and it should be a sum of the pages in the container document and all embedded documents according to our javadocs.