Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2641

Unit test for consistency between tabular/columnar formats

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.18, 2.0.0
    • None
    • parser
    • None

    Description

      We now have a number of parsers which deal with file formats which are either wholey or optionally "table-based" formats with consistency in the data types held in a given column. This includes multi-table formats like sqlite, single-table formats like sas7bdat, and anything-goes-table formats like csv or xlsx

      We should firstly try to create a simple-ish, small but rich file for each of these formats, similar to what we do for archive formats with the test-documents archives. Then, we should add unit tests that verified that, as much as formats permit, you get basically the same XHTML out for the "same" input. Oh, and fix up any obvious inconsistencies...

      Attachments

        Activity

          People

            Unassigned Unassigned
            nick Nick Burch
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: