Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2618

LabelRecord and LabelSSTRecord text can be overwritten in xls

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.18, 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      In our regression tests, we've lost small amounts of text from quite a few xls (standalone, but especially embedded). This is somewhat caused by removing the listenForAllRecords=true that I accidentally left in as part of debugging something a while ago. When that is true, we don't cache the records in currentSheet, so they are added to the extraTextCells list. When that is false, which is now the default, the LabelRecord and LabelSSTRecord are sometimes being overwritten because multiple cells can have the same x/y coordinates in the currentSheet map.

      When listenForAllRecords=false, we're trying to listen for labels, but we're often overwriting them because of the map.

      Let's add labels to extraTextCells so that at least the text is processed.

      As one example: "africa" in govdocs1/199/199294.ppt

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: