Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2766

Be able to extract raw values from excel, not formatted

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • core
    • None

    Description

      By default, tika extracts excel values as formatted in the sheet.  It's a fine default.

       

      However, many times, I am asked to extract raw values as the nicely formatted sheet for human eyes is losing precision.

       

      In local instances, I've cloned the tika classes in order to do so, but it's messy due to how the code is layered (i wind up extending/copying 3-4 classes because chain of class construction). 

      I believe by adding a config option to the open office config class I can implement same option much more cleanly. 

       

      I plan to issue a pull request in few weeks (doing this contribute on the side based on professional use)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jtbdevelopment JTB Development
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: