Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1880

Attribute number-columns-repeated not correctly used in ODS documents

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.12
    • Fix Version/s: None
    • Component/s: parser
    • Labels:

      Description

      When the ODS writer has first written, it made the assumption that the the `number-columns-repeated` attribute for cells would only be used for blank cells. This is not the case with documents created by (at least) LibreOffice 4.4.7.2. The current work approach to repeated cells is to use the html concept of spanning, which is not suitable for repeated content.

      The note in the Tika source (OpenDocumentContentParser.java#L459):

      TODO: The following is not correct, the cell should be repeated not spanned!
      Code generates a HTML cell, spanning all repeated columns, to make the cell look correct. Problems may occur when both spanning and repeating is given, which is not allowed by spec. Cell spanning instead of repeating is not a problem, because OpenOffice uses it only for empty cells.

        Attachments

        1. example.ods
          7 kB
          Ryan Desmond

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rddesmond@gmail.com Ryan Desmond
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: