Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1067

Tika extracts non-existent asterisks (*) from .ppt files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • None
    • None
    • parser
    • None

    Description

      I created a new blank presentation, put in title + subtitle, saved it as .ppt, and then ran TikaCLI -t:

      <body><div class="slideShow"><div class="slide"><p class="slide-master-content">*<br/>
      *<br/>
      </p>
      <p class="slide-content">Testing<br/>
      testing<br/>
      </p>
      </div>
      </div>
      <div class="slideNotes"/>
      

      The two extra *'s seem to be coming from the master slide, but I'm not sure which text runs they are and how to stop them ...

      Attachments

        Activity

          People

            Unassigned Unassigned
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: