Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-516

Excel 5 files are inconsistently detected as either "application/msword" or "application/vnd.ms-excel"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.7
    • 0.8
    • parser
    • None

    Description

      Using the AutoDetectParser on an Excel 5 file inconsistently detects it as either "application/msword" or "application/vnd.ms-excel"

      See the following code:

      public static void main(String[] args) throws Exception {
      FileInputStream stream = null;
      try {
      for (int i = 0; i < 10; i++)

      { File file = new File("excel5.xls"); stream = new FileInputStream(file); AutoDetectParser parser = new AutoDetectParser(); Metadata metadata = new Metadata(); metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName()); parser.parse(stream, new DefaultHandler(), metadata); System.out.println(metadata.get(Metadata.CONTENT_TYPE)); }

      } finally {
      if (stream != null)

      { stream.close(); }

      }
      }

      an example output is:
      application/vnd.ms-excel
      application/msword
      application/msword
      application/vnd.ms-excel
      application/vnd.ms-excel
      application/vnd.ms-excel
      application/vnd.ms-excel
      application/msword
      application/vnd.ms-excel
      application/msword

      The excel 5 file I used is attached to this bug.

      Attachments

        1. Test.java
          1 kB
          Andrey Sidorenko
        2. excel5.xls
          6 kB
          Victor Kazakov

        Activity

          People

            Unassigned Unassigned
            kazvictor Victor Kazakov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: