Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7189

Allow DIH to extract content from embedded documents via Tika

    XMLWordPrintableJSON

Details

    Description

      DIH's TikaEntityProcessor doesn't currently extract content from embedded documents/attachments within a file. It might be useful if users could configure whether or not to include extraction of content from embedded documents.

      Attachments

        1. SOLR-7189.patch
          5 kB
          Tim Allison
        2. test_recursive_embedded.docx
          26 kB
          Tim Allison

        Issue Links

          Activity

            People

              shalin Shalin Shekhar Mangar
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: