Tika
  1. Tika
  2. TIKA-905

Embedded text boxes and shapes with text not supported

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 1.0
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:
    • Environment:

      Windows 7

      Description

      This is similar to TIKA-904 but for normal word processing documents. In those, text contained in text boxes and shapes is not extracted.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        20d 17h 42m 1 Michael McCandless 18/May/12 11:40
        Michael McCandless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 1.2 [ 12320169 ]
        Resolution Duplicate [ 3 ]
        Hide
        Michael McCandless added a comment -

        Looks like this was fixed with TIKA-904.

        Show
        Michael McCandless added a comment - Looks like this was fixed with TIKA-904 .
        Hide
        Gabriel Valencia added a comment -

        Check out my comment in TIKA-904. They are all contained in sl:document -> sl:drawables -> sl:page-group (1 or more) -> sf:drawable-shape (1 or more) -> sf:text -> sf:text-storage -> sf:text-body -> sf.

        You get one sf:drawable-shape for each text box.

        Show
        Gabriel Valencia added a comment - Check out my comment in TIKA-904 . They are all contained in sl:document -> sl:drawables -> sl:page-group (1 or more) -> sf:drawable-shape (1 or more) -> sf:text -> sf:text-storage -> sf:text-body -> sf . You get one sf:drawable-shape for each text box.
        Hide
        Nick Burch added a comment -

        Are you able to identify where in the file these text boxes occur, and what sort of tags hold the text? If the text boxes don't occur in the main text area, can you identify how to link back from the main text to the text box? (You might find it helpful to review how annotations work, which we now support as of r1331640, for an idea of how this might work)

        Show
        Nick Burch added a comment - Are you able to identify where in the file these text boxes occur, and what sort of tags hold the text? If the text boxes don't occur in the main text area, can you identify how to link back from the main text to the text box? (You might find it helpful to review how annotations work, which we now support as of r1331640, for an idea of how this might work)
        Gabriel Valencia made changes -
        Labels iwork iWork
        Gabriel Valencia made changes -
        Issue Type Bug [ 1 ] Improvement [ 4 ]
        Hide
        Gabriel Valencia added a comment -

        I'm new to JIRA, so please change if I'm wrong. I figure this should be an improvement, not a bug.

        Show
        Gabriel Valencia added a comment - I'm new to JIRA, so please change if I'm wrong. I figure this should be an improvement, not a bug.
        Gabriel Valencia made changes -
        Labels iwork
        Gabriel Valencia made changes -
        Field Original Value New Value
        Attachment testPagesEmbeddedJIRA.pages [ 12524887 ]
        Hide
        Gabriel Valencia added a comment -

        Contains various embedded objects including text boxes and shapes with text

        Show
        Gabriel Valencia added a comment - Contains various embedded objects including text boxes and shapes with text
        Gabriel Valencia created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Gabriel Valencia
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development