Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2099

Tar files without magic bytes are sporadically detected as text

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.11
    • 1.15
    • None
    • None

    Description

      When a tar is created with 7 Zip 9.20 the magic bytes "ustar" are not added. Everything seems to work file if the tar contains Microsoft Office files. But when only text files are contained Tika sporadically recognices it as text/plain. It also seems to depend on the size of the first file in the tar. This has to be several KB big.
      The problem was found in version 1.11 and also exists in the latest 1.14-SNAPSHOT.

      Attachments

        Issue Links

          Activity

            People

              tallison Tim Allison
              rschimpf Robin Schimpf
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: