Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7229

Allow DIH to handle attachments as separate documents

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      With Tika 1.7's RecursiveParserWrapper, it is possible to maintain metadata of individual attachments/embedded documents. Tika's default handling was to maintain the metadata of the container document and concatenate the contents of all embedded files. With SOLR-7189, we added the legacy behavior.

      It might be handy, for example, to be able to send an MSG file through DIH and treat the container email as well each attachment as separate (child?) documents, or send a zip of jpeg files and correctly index the geo locations for each image file.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                arafalov Alexandre Rafalovitch
                Reporter:
                tallison Tim Allison
              • Votes:
                2 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: