Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1840

No way to link slide notes to slide in PPT output.

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 1.11
    • 1.17, 2.0.0-BETA, 2.1.0
    • parser
    • None

    Description

      I'm integrating Apache Tika into my project, and I want to extract (text) information from Powerpoint slides. Both PPT and PPTX

      I've noticed when using PPT format, the slide notes are all aggregated at the end of the XML output, and there is no way to identify which note belongs to which slide.

      I began looking at the code and found the following:

      // TODO Find the Notes for this slide and extract inline
      

      in HSLFExtractor.java on line 140

      I would like to implement this part and contribute

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chrismattmann Chris A. Mattmann
            zetisam Sam H

            Dates

              Created:
              Updated:

              Slack

                Issue deployment