Description
When parting an excel 2010 table, if a worksheet has a missing value, then it is not reported in the sax handler. As a result a missing value can result in unordered data.
For example given the table:
Bar.java
A B B 1 2 3 4 6 7 8 9
the returned sax handler reports elements
Bar.java
<tr><td>A</td><td>B</td><td>C</td><tr> <tr><td>1</td><td>2</td><td>3</td><tr> <tr><td>4</td><td>6</td><tr> <tr><td>7</td><td>8</td><td>9</td><tr>
As a result the handler can detect that the third row as incomplete cell values but it is ambiguous which columns have missing data.
As a possible fix for this excel 2010 xml data contains the cell reference value, which could be returned to the sax handler as an attribute.
Bar.java
*** XSSFExcelExtractorDecorator.java 2012-11-08 10:51:55.881207100 +0000 --- XSSFExcelExtractorDecorator.java.1 2012-11-08 10:59:02.972223700 +0000 *************** *** 200,206 **** public void cell(String cellRef, String formattedValue) { try { ! xhtml.startElement("td"); // Main cell contents xhtml.characters(formattedValue); --- 200,208 ---- public void cell(String cellRef, String formattedValue) { try { ! AttributesImpl attributes = new AttributesImpl(); ! attributes.addAttribute(null, "cellRef", "cellRef", null, cellRef) ; ! xhtml.startElement("td",attributes); // Main cell contents xhtml.characters(formattedValue);
Attachments
Issue Links
- is related to
-
TIKA-2479 Handle empty cells in tables uniformly
- Resolved