I am currently working implementing more comprehensive extraction and enhancement of the Tika support for Phone number extraction and metadata modeling.
Right now we utilize the String multivalued support available within Tika to persist phone numbers as
I would like to propose we extend multi-valued support outside of the String paradigm by implementing a more abstract Collection of Objects such that we could consider and implement the phone number use case as follows
Where Object could be a Collection<HashMap<String/Property, HashMap<String/Property, String/Int/Long>> e.g.
There are obvious backwards compatibility issues with this approach... additionally it is a fundamental change to the code Metadata API. I hope that the <String, Object> Mapping however is flexible enough to allow me to model Tika Metadata the way I want.
Any comments folks? Thanks