Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6155

MapFiles are not always correctly detected by SequenceFileInputFormat

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      MapFiles are not correctly detected by SequenceFileInputFormat.

      This is because the listStatus method only detects a MapFile correctly if the path it checks is a directory - it then replaces it by the path of the data file.

      This is likely to fail if the data file does not exist, i.e., if the input path is a directory, but does not belong to a MapFile, or if recursion is turned on and the input format comes across a file (not a directory) which is indeed part of a MapFile.

      The listStatus method should be changed to detect these cases correctly:

      • if the current candidate is a file and its name is "index" or "data", check if its corresponding other file exists, and if the key types of both files match and if the value type of the index file is LongWritable
      • If the current candidate is a directory, it is only a MapFile if (and only if) an index and a data file exist, they are both SequenceFiles and their key types match (and the index value type is LongWritable)

      Attachments

        1. MAPREDUCE-6155.002.patch
          13 kB
          Jens Rabe
        2. MAPREDUCE-6155.001.patch
          13 kB
          Jens Rabe

        Activity

          People

            Unassigned Unassigned
            rabejens Jens Rabe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified