Details
Description
Actually parse-zip plugin donĀ“t extract language from zip document, therefore lang field is empty in solr or elastic. If the package(.zip) contains a list of documents so the lang field could be multivalued to support that list of languages. A simple change to parse-zip pluging could fix this problem. I will use Language Identifier class from tika and analyze each document inside.