Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1288

Epub's content extracted partially

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.5
    • None
    • parser
    • None
    • win 8, jre 1.5

    Description

      There are 11 parts (xhtml files) in epub document, but tika extracted content from single file only. i extracted all files from epub's zip and all of them are valid xhtml files.

      Attachments

        1. bad-parsed.epub
          485 kB
          Dmitry Sokolov
        2. epub30-spec-20121128.epub
          223 kB
          Alex Andrushchak

        Activity

          People

            Unassigned Unassigned
            alex.andrushchak Alex Andrushchak
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: