Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2853

Consider applying NaiveBayes or similar simple ML to streaming zip detector

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Whether we use actual ml or build rules from patterns we see in the data, it would be useful to gather features from field names, directory names, etc of zipfile-based file types from our regression corpus to (potentially) improve the efficiency of mime detection.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: