Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1634

Detecting problem with Matlab source code

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 1.8
    • 1.9
    • mime

    Description

      Both Matlab source code and Objective-C source code have the same suffix, which is .m. Therefore, Matlab has additional match value in mime types.xml.

      In tika-mimetypes.xml Matlab is defined as:

      <mime-type type="text/x-matlab">
      <_comment>Matlab source code</_comment>
      <magic priority="50">
      <match value="function [" type="string" offset="0"/>
      </magic>
      <!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->
      <sub-class-of type="text/plain"/>
      </mime-type>

      However, Matlab codes does not always start with "function [“. Therefore, some Matlab codes are detected as text/x-bojcsrc. Based on the source codes collected from NOAA Paleoclimatology Software Resources, many Matlab codes have match value like these (problematic files are attached as an example):

      <mime-type type="text/x-matlab">
      <_comment>Matlab source code</_comment>
      <magic priority="50">
      <match value="function" type="string" offset="0"/>
      <match value="%" type="string" offset="0"/>
      </magic>
      <!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->
      <sub-class-of type="text/plain"/>
      </mime-type>

      Conducted several detecting tests by using different Matlab packages obtained from NOAA Paleoclimatology Software Resources, with/without Custom-mimtypes.xml. Results are attached. As a results, total 103 Matlab files are detected correctly with custom-mimetypes.xml, while 42 Matlab files are detected as Matlab files without custom-mimetypes.xml (= only with current match value). However, this match value for Matlab source code could be only common in Paleoclimatology community.

      Attachments

        1. BARCAST_MainCode.m
          18 kB
          Ji-Hyun Oh
        2. custom-mimetypes.xml
          217 kB
          Ji-Hyun Oh
        3. Initial_Vals_Maker.m
          4 kB
          Ji-Hyun Oh
        4. tika-mimetypes.xml
          217 kB
          Ji-Hyun Oh
        5. wtsgaus.m
          1 kB
          Ji-Hyun Oh

        Activity

          People

            chrismattmann Chris A. Mattmann
            Ji-Hyun.Oh@jpl.nasa.gov Ji-Hyun Oh
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: