Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1490

Basic parser for old Excel files (eg Excel 4)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6
    • 1.7
    • parser
    • None

    Description

      In TIKA-1487, we added mime magic for the pre-OLE2 excel file formats. Based on the reading of the OpenOffice Excel docs for that, it looks like it should be possible to produce a basic parser to extract key bits of info (eg strings) from these older file formats.

      This would likely largely be done by having a custom record iterator for the older formats, then passing the handful of "interesting" records to POI's record classes (maybe with some tweaks for the older formats) to have the binary data parsed, then returned by the parser

      Attachments

        Activity

          People

            nick Nick Burch
            nick Nick Burch
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: