Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1176

FixedLengthInputFormat and FixedLengthRecordReader

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Any

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Addition of FixedLengthInputFormat and FixedLengthRecordReader in the org.apache.hadoop.mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. When creating a job that specifies this input format, the job must have the "mapreduce.input.fixedlengthinputformat.record.length" property set as follows myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);

      Please see javadoc for more details.
      Show
      Addition of FixedLengthInputFormat and FixedLengthRecordReader in the org.apache.hadoop.mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. When creating a job that specifies this input format, the job must have the "mapreduce.input.fixedlengthinputformat.record.length" property set as follows myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); Please see javadoc for more details.
    • Tags:
      fixed length, fixed width, recordreader, inputformat

      Description

      Hello,
      I would like to contribute the following two classes for incorporation into the mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file.

      Provided are two classes first is the FixedLengthInputFormat and its corresponding FixedLengthRecordReader. When creating a job that specifies this input format, the job must have the "mapreduce.input.fixedlengthinputformat.record.length" property set as follows

      myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);

      OR

      myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, [myFixedRecordLength]);

      This input format overrides computeSplitSize() in order to ensure that InputSplits do not contain any partial records since with fixed records there is no way to determine where a record begins if that were to occur. Each InputSplit passed to the FixedLengthRecordReader will start at the beginning of a record, and the last byte in the InputSplit will be the last byte of a record. The override of computeSplitSize() delegates to FileInputFormat's compute method, and then adjusts the returned split size by doing the following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength)

      This suite of fixed length input format classes, does not support compressed files.

      1. mapreduce-1176_v1.patch
        55 kB
        Mariappan Asokan
      2. mapreduce-1176_v2.patch
        55 kB
        Mariappan Asokan
      3. mapreduce-1176_v3.patch
        54 kB
        Mariappan Asokan
      4. MAPREDUCE-1176-v1.patch
        24 kB
        BitsOfInfo
      5. MAPREDUCE-1176-v2.patch
        23 kB
        BitsOfInfo
      6. MAPREDUCE-1176-v3.patch
        25 kB
        BitsOfInfo
      7. MAPREDUCE-1176-v4.patch
        42 kB
        BitsOfInfo

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Mariappan Asokan
              Reporter:
              BitsOfInfo
            • Votes:
              4 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development