Tika
  1. Tika
  2. TIKA-906

Headers, footers, and footnotes not extracted from Pages documents

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:
    • Environment:

      Windows 7

      Description

      Tika does not extract anything from the header or footer area and also does not extract footnotes.

        Activity

        Gabriel Valencia created issue -
        Hide
        Gabriel Valencia added a comment -

        Contains header text, footer text (including automatic page numbering), and some footnotes.

        Show
        Gabriel Valencia added a comment - Contains header text, footer text (including automatic page numbering), and some footnotes.
        Gabriel Valencia made changes -
        Field Original Value New Value
        Attachment testPagesHeadersFootersFootnotesJIRA.pages [ 12524890 ]
        Gabriel Valencia made changes -
        Issue Type Bug [ 1 ] Improvement [ 4 ]
        Gabriel Valencia made changes -
        Labels iwork iWork
        Hide
        Nick Burch added a comment -

        Support added in r1331618. We can now get headers, footers and footnotes, assuming a file only has one set of each, with the default names. (If a file has multiple styles with different ones, the code will likely just end up with the last one)

        Note that we are rapidly approaching the point when the current model for the parser won't cope. At that point, we'll need to start holding things like styles, headers, footers etc properly, track state more as we process the file (a single state level isn't really enough), be aware of styles applied to text etc.

        Show
        Nick Burch added a comment - Support added in r1331618. We can now get headers, footers and footnotes, assuming a file only has one set of each, with the default names. (If a file has multiple styles with different ones, the code will likely just end up with the last one) Note that we are rapidly approaching the point when the current model for the parser won't cope. At that point, we'll need to start holding things like styles, headers, footers etc properly, track state more as we process the file (a single state level isn't really enough), be aware of styles applied to text etc.
        Nick Burch made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 1.2 [ 12320169 ]
        Resolution Fixed [ 1 ]
        Hide
        Gabriel Valencia added a comment -

        This document also had automatic page numbering in the footer, but that doesn't get parsed. It's contained in the sf in the sf:footer as an sf:page-number. However, it only has one of them even though there are 2 pages. I guess the rest are automatically added by Pages.

        Show
        Gabriel Valencia added a comment - This document also had automatic page numbering in the footer, but that doesn't get parsed. It's contained in the sf in the sf:footer as an sf:page-number. However, it only has one of them even though there are 2 pages. I guess the rest are automatically added by Pages.
        Hide
        Gabriel Valencia added a comment -

        Going to reopen in light of the automatic page number issue.

        Show
        Gabriel Valencia added a comment - Going to reopen in light of the automatic page number issue.
        Gabriel Valencia made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        Chris A. Mattmann added a comment -
        • push to 1.3
        Show
        Chris A. Mattmann added a comment - push to 1.3
        Hide
        Chris A. Mattmann added a comment -
        • push to 1.3
        Show
        Chris A. Mattmann added a comment - push to 1.3
        Chris A. Mattmann made changes -
        Fix Version/s 1.3 [ 12321647 ]
        Fix Version/s 1.2 [ 12320169 ]
        Hide
        Dave Meikle added a comment -

        Support for AutoPageNumbers added in r1358856.

        Show
        Dave Meikle added a comment - Support for AutoPageNumbers added in r1358856.
        Dave Meikle made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Dave Meikle made changes -
        Fix Version/s 1.2 [ 12320169 ]
        Fix Version/s 1.3 [ 12321647 ]
        Hide
        Ray Gauss II added a comment -

        AutoPageNumberUtilsTest,java is missing a license header and causing rat to fail.

        Shall I add the header?

        Show
        Ray Gauss II added a comment - AutoPageNumberUtilsTest,java is missing a license header and causing rat to fail. Shall I add the header?
        Hide
        Michael McCandless added a comment -

        Shall I add the header?

        +1

        Show
        Michael McCandless added a comment - Shall I add the header? +1
        Hide
        Dave Meikle added a comment -

        Sorry - I missed the header the first time. Added it now in r1367301.

        Thanks for spotting Ray.

        Show
        Dave Meikle added a comment - Sorry - I missed the header the first time. Added it now in r1367301. Thanks for spotting Ray.

          People

          • Assignee:
            Unassigned
            Reporter:
            Gabriel Valencia
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development