Tika
  1. Tika
  2. TIKA-910

Text contained in text boxes or shapes in Keynote docs runs together

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:
    • Environment:

      Windows 7

      Description

      Tika grabs the text in the various boxes/shapes and combines it into one word.

      1. testKeynoteTemplateShapes.key
        214 kB
        Gabriel Valencia
      2. TIKA-910.patch
        3 kB
        Michael McCandless
      3. testTextBoxes.key
        204 kB
        Michael McCandless

        Activity

        Hide
        Gabriel Valencia added a comment -

        This presentation contains a slide that has one text box containing the text 'TextBox 2', a shape containing the text 'invisible', and another shape containing the text 'ooooohhhhhdang'. The result of parsing is 'TextBox 2invisibleoooohhhhhdang'.

        Show
        Gabriel Valencia added a comment - This presentation contains a slide that has one text box containing the text 'TextBox 2', a shape containing the text 'invisible', and another shape containing the text 'ooooohhhhhdang'. The result of parsing is 'TextBox 2invisibleoooohhhhhdang'.
        Hide
        Michael McCandless added a comment -

        I think we just have to start/end p element when we see sf in the doc...

        Show
        Michael McCandless added a comment - I think we just have to start/end p element when we see sf in the doc...
        Hide
        Michael McCandless added a comment -

        Patch w/ test case & fix.

        Show
        Michael McCandless added a comment - Patch w/ test case & fix.
        Hide
        Erik Peterson added a comment -

        Hi Michael,

        We are also seeing this behavior with multiple bullet points running together. I didn't want to open up another ticket for this, so I'm merely commenting on that issue here.

        Show
        Erik Peterson added a comment - Hi Michael, We are also seeing this behavior with multiple bullet points running together. I didn't want to open up another ticket for this, so I'm merely commenting on that issue here.
        Hide
        Michael McCandless added a comment -

        Thanks Erik, I confirmed that Tika trunk does that and that this patch fixes it; I'll add another test case for it....

        Show
        Michael McCandless added a comment - Thanks Erik, I confirmed that Tika trunk does that and that this patch fixes it; I'll add another test case for it....

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Gabriel Valencia
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development