Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2101

TikaEntityProcessor does not extract files- does not pick parser correctly

    XMLWordPrintableJSON

    Details

      Description

      The TikaEntityProcessor does not choose a parser and does not extract data. The attached DIH config file only works if the Tika parser is specified with:

      parser="org.apache.tika.parser.html.HtmlParser".

      Remove that line and Tika will contribute nothing to the document.

        Attachments

        1. htmllist-data-config.xml
          1.0 kB
          Lance Norskog
        2. htmllist.xml
          0.2 kB
          Lance Norskog

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                lancenorskog Lance Norskog
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: