Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2351

Getting error while parsing documents

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.14
    • None
    • general
    • Red Hat Enterprise Linux Server release 7.3
      ElasticSearch 5.2.1
      ingest-attachment 5.2.1

    • any docs other than .txt

    Description

      Hi Everyone,

      I am using Ingest-attachment for indexing documents. I am able to parse text documents (.txt files). When I try to parse .doc or pdf files getting this error.

      FILE = /elastic/files/englishAnalyzer.doc
      ID = 6

      "error" : {
      "root_cause" : [
      {
      "type" : "exception",
      "reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaExc
      eption[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];
      ",
      "header" : {
      "processor_type" : "attachment"
      }
      }
      ],
      "type" : "exception",
      "reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaExcepti
      on[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];",
      "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaException[Unexpected RuntimeException fro
      m org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];",
      "caused_by" : {
      "type" : "parse_exception",
      "reason" : "Error parsing document in field [data]",
      "caused_by" : {
      "type" : "tika_exception",
      "reason" : "Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079",
      "caused_by" : {
      "type" : "array_index_out_of_bounds_exception",
      "reason" : "-1"
      }
      }
      }
      },
      "header" : {
      "processor_type" : "attachment"
      }
      },
      "status" : 500
      }

      Please help me to resolve the issue

      Attachments

        1. 01 - Templete.txt
          4 kB
          VENU
        2. 02 - Pipeline.txt
          0.4 kB
          VENU
        3. 03 - Json_creat_code.txt
          1 kB
          VENU
        4. 04 - stackTrace.txt
          5 kB
          VENU
        5. englishAnalyzer.doc
          26 kB
          VENU

        Activity

          People

            Unassigned Unassigned
            venuambati VENU
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: