Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2734

Tika addes extra characters at the end of text in extracting from excel file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.18
    • None
    • handler
    • None

    Description

      when extracting text from some relatively large excel files (9000 rows or so), I found an extra string of "&A PAGE &P" is added to the end of the resulting text, when Tika.parseToString is called. Is it a known issue? Is there any configuration that I can do that will opt out from outputting these extra characters?

      did not find a good answer over google. 

      the input excel spreadsheet is attached. 

      Attachments

        1. AIRPORTSOK.xls
          2.36 MB
          feng ye
        2. extra_A_Page_P.png
          8 kB
          feng ye

        Activity

          People

            Unassigned Unassigned
            fyemaple feng ye
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: