Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1634

Detecting problem with Matlab source code

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.8
    • Fix Version/s: 1.9
    • Component/s: mime
    • Labels:

      Description

      Both Matlab source code and Objective-C source code have the same suffix, which is .m. Therefore, Matlab has additional match value in mime types.xml.

      In tika-mimetypes.xml Matlab is defined as:

      <mime-type type="text/x-matlab">
      <_comment>Matlab source code</_comment>
      <magic priority="50">
      <match value="function [" type="string" offset="0"/>
      </magic>
      <!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->
      <sub-class-of type="text/plain"/>
      </mime-type>

      However, Matlab codes does not always start with "function [“. Therefore, some Matlab codes are detected as text/x-bojcsrc. Based on the source codes collected from NOAA Paleoclimatology Software Resources, many Matlab codes have match value like these (problematic files are attached as an example):

      <mime-type type="text/x-matlab">
      <_comment>Matlab source code</_comment>
      <magic priority="50">
      <match value="function" type="string" offset="0"/>
      <match value="%" type="string" offset="0"/>
      </magic>
      <!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->
      <sub-class-of type="text/plain"/>
      </mime-type>

      Conducted several detecting tests by using different Matlab packages obtained from NOAA Paleoclimatology Software Resources, with/without Custom-mimtypes.xml. Results are attached. As a results, total 103 Matlab files are detected correctly with custom-mimetypes.xml, while 42 Matlab files are detected as Matlab files without custom-mimetypes.xml (= only with current match value). However, this match value for Matlab source code could be only common in Paleoclimatology community.

        Attachments

        1. tika-mimetypes.xml
          217 kB
          Ji-Hyun Oh
        2. Initial_Vals_Maker.m
          4 kB
          Ji-Hyun Oh
        3. custom-mimetypes.xml
          217 kB
          Ji-Hyun Oh
        4. wtsgaus.m
          1 kB
          Ji-Hyun Oh
        5. BARCAST_MainCode.m
          18 kB
          Ji-Hyun Oh

          Activity

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              Ji-Hyun.Oh@jpl.nasa.gov Ji-Hyun Oh
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: