Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-105

Excel parser implementation based on POI's Event API

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.2
    • parser
    • None

    Description

      Tika's existing ExcelParser implementation uses POI's HSSFWorkbook to extract text from an Excel file. POI also provides an alternative "Event API"[1] for processing Excel files - the advantage being that it has a much smaller memory footprint, but at the cost of a slightly more complex API.

      I have written an alternative excel parser implementation based on the Event API - if its of interest to the Tika project I'll write a test case for it.

      [1] http://poi.apache.org/hssf/how-to.html#event_api

      Attachments

        1. ExcelEventParser.java
          20 kB
          Niall Pemberton

        Activity

          People

            jukkaz Jukka Zitting
            niallp Niall Pemberton
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: