Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2101

TikaEntityProcessor does not extract files- does not pick parser correctly

    XMLWordPrintableJSON

Details

    Description

      The TikaEntityProcessor does not choose a parser and does not extract data. The attached DIH config file only works if the Tika parser is specified with:

      parser="org.apache.tika.parser.html.HtmlParser".

      Remove that line and Tika will contribute nothing to the document.

      Attachments

        1. htmllist-data-config.xml
          1.0 kB
          Lance Norskog
        2. htmllist.xml
          0.2 kB
          Lance Norskog

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lancenorskog Lance Norskog
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: