Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1079

the parsing in TikaExtractor always return empty result

    XMLWordPrintableJSON

Details

    Description

      When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results.
      I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data.
      I've tried to move lib/tika-core-1.6.jar into connector-lib/,
      Then, the Tika content extractor returned data as expected.

      My configurations are as below:
      ==
      Transformation:
      Type: Tika content extractor
      Output:
      Type:Solr(Use extract update handler=false)
      Repository:
      type: Web
      Job:
      1.type: repository
      2.type: transformation
      3.type: output
      ==

      Maybe, it is related to CONNECTORS-1074,
      It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor.

      Attachments

        Activity

          People

            kwright@metacarta.com Karl Wright
            mingchun.zhao Mingchun Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: