Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1730

Excel to HTML filtering seems to produce some font setting gibberish in output

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Noticed while upgrading form Tika 1.8 to 1.10 - An .xls file linked below, which used to filter pretty normally, now produces the following...

      <div class="outside">&amp;C&amp;"Arial,Bold"&amp;11&amp;F</div>
      

      ...seemingly at the end of the first sheet's output when filtered with java -jar tika-app-1.10.jar funnelback-claim-form-with-expense-codes.xls.

      It looks like some styling information which should not be getting displayed as text here.

      Would be nice if that could be fixed in some future version.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tallison Tim Allison
                Reporter:
                mattsheppard Matt Sheppard
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: