Tika
  1. Tika
  2. TIKA-877

Embedded document not extracted (regression)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.1
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:

      Description

      Testing the 1.1 rc, I believe I found a regression, hence the priority.

      dbonniot-t520 /tmp/1.0 java -jar ../tika-app-1.0.jar -z ../coffee.xls 
      Extracting 'file0.wmf' (application/x-msmetafile)
      Extracting 'file1.wmf' (application/x-msmetafile)
      Extracting 'file2.wmf' (application/x-msmetafile)
      Extracting 'file3.wmf' (application/x-msmetafile)
      Extracting 'file4.png' (image/png)
      Extracting 'MBD002B040A.wps' (application/vnd.ms-works)
      Extracting 'file5.bin' (application/octet-stream)
      Extracting 'MBD00262FE3.unknown' (application/x-tika-msoffice)
      
      dbonniot-t520 /tmp/1.0 cd ../1.1
      dbonniot-t520 /tmp/1.1 java -jar ../tika-app-1.1.jar -z ../coffee.xls 
      Extracting 'file0.emf' (application/x-emf)
      Extracting 'file1.emf' (application/x-emf)
      Extracting 'file2.emf' (application/x-emf)
      Extracting 'file3.emf' (application/x-emf)
      Extracting 'file4.png' (image/png)
      Extracting 'MBD002B040A.wps' (application/vnd.ms-works)
      Extracting 'file5' (application/x-tika-msoffice-embedded)
      Extracting 'MBD00262FE3.unknown' (application/x-tika-msoffice)
      
      dbonniot-t520 /tmp/1.1 ls -l ../1.0/file5.bin ../1.1/file5 
      -rw-r--r-- 1 dbonniot dbonniot 2519 2012-03-18 21:51 ../1.0/file5.bin
      -rw-r--r-- 1 dbonniot dbonniot    0 2012-03-18 21:51 ../1.1/file5
      

      Notice how 1.0 could extract the data for file5, but 1.1 creates an empty file instead.

      By the way, I do see improvements in 1.1 as well, congrats for that!

      1. coffee.xls
        113 kB
        Daniel Bonniot de Ruisselet

        Activity

          People

          • Assignee:
            Maxim Valyanskiy
            Reporter:
            Daniel Bonniot de Ruisselet
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development