Tika
  1. Tika
  2. TIKA-907

Comments embedded in Pages documents not supported

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:
    • Environment:

      Windows 7

      Description

      Comments added to a Pages document are not extracted. This also applies to documents annotated on iWork.com.

      1. testPagesShareiWorkJIRA.pages
        153 kB
        Gabriel Valencia
      2. testPagesCommentsJIRA.pages
        151 kB
        Gabriel Valencia

        Activity

        Hide
        Gabriel Valencia added a comment -

        Pages documents with a few comments. The ShareiWork document was commented on iWork.com by two people and then saved as a Pages document.

        Show
        Gabriel Valencia added a comment - Pages documents with a few comments. The ShareiWork document was commented on iWork.com by two people and then saved as a Pages document.
        Hide
        Nick Burch added a comment -

        Support added in r1331640. We now collect the annotations (id -> text) when they occur earlier in the file. When handling the main text, when we reach an annotation reference we output the annotation text for it. The annotation currently comes before the text it annotates, due to the order of the elements, but that could be fixed in future if needed (when we have a better document model)

        Show
        Nick Burch added a comment - Support added in r1331640. We now collect the annotations (id -> text) when they occur earlier in the file. When handling the main text, when we reach an annotation reference we output the annotation text for it. The annotation currently comes before the text it annotates, due to the order of the elements, but that could be fixed in future if needed (when we have a better document model)
        Hide
        Gabriel Valencia added a comment -

        Looks like the fix doesn't work for iWork.com annotated files. I think the comments are put in an entirely separate .json file in the archive.

        Since iWork.com is set to be discontinued, this is lower priority.

        Show
        Gabriel Valencia added a comment - Looks like the fix doesn't work for iWork.com annotated files. I think the comments are put in an entirely separate .json file in the archive. Since iWork.com is set to be discontinued, this is lower priority.

          People

          • Assignee:
            Unassigned
            Reporter:
            Gabriel Valencia
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development