I have a possible implementation for this jira. I created a class SolrFileInputDocument that extends SolrInputDocument, the main difference is that it contains the methods:
public void addFile(InputStream file)
public void addFile(InputStream file , Metadata metadata)
This two methods will use Tika to extract the content and will end up creating fields (this.addField(...)) of the parent class SolrInputDocument. The SolrFileInputDocument accepts a Map instance to map the extracted metadata to a Solr field, something like this:
Map<String, String> map = new HashMap<String, String>();
SolrFileInputDocument document = new SolrFileInputDocument(map);
I added the classes to another "contrib" directory, I don't know if this should be done this way, I just didn't want to add a dependency with Tika that might be not always needed. Adding this code to a client application would require to add the SolrJ jar plus the "clientextraction" jar
I still haven't done anything to keep the "prefix" feature of the ExtractingRequestHandler (which I don't think is going to be difficult) and I'm still don't manage non text fields like dates, but I could do it if you think this is a good approach.
Do you think this could work? I can upload the code tomorrow.