Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2852

Add reports for missing/unaligned files in tika-eval Compare mode

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.21
    • Component/s: None
    • Labels:
      None

      Description

      We currently include reports for differences in attachment counts. It would also be useful to report on the number of "unaligned" files by mime type.

      Consider the two extracts from the same file by different versions of Tika with attachments.

      ExtractA                  ExtractB
      msword (container)        msword
         /emf                       /zip
         /emf                          /txt
         /zip
            /txt
      

      We know from the current reports that msword files are missing attachments in extractB. It would be useful to know that 2 emfs went missing in ExtractB, or rather, to sum the mimes for missing attachments in the B run and the A run.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: