Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5635

FileInputFormat does not specify how the file is split

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: 2.2.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Does not matter.

      Description

      Here is what the TextInputFormat javadoc says:
      TextInputFormat

      An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..

      FileInputFormat should say the same on
      FileInputFormat

        Activity

        Hide
        Jason Lowe added a comment -

        FileInputFormat does not require that the file is a plain text file broken into lines with carriage-return or linefeed used as line delimiters. That's what TextInputFormat is for.

        FileInputFormat is an abstract class that makes no assumptions about how the data in the file is formatted. Concrete implementations that derive from FileInputFormat must implement the getRecordReader method which will dictate how the records are read from the file and therefore what the format must be for that particular input format.

        Show
        Jason Lowe added a comment - FileInputFormat does not require that the file is a plain text file broken into lines with carriage-return or linefeed used as line delimiters. That's what TextInputFormat is for. FileInputFormat is an abstract class that makes no assumptions about how the data in the file is formatted. Concrete implementations that derive from FileInputFormat must implement the getRecordReader method which will dictate how the records are read from the file and therefore what the format must be for that particular input format.
        Hide
        Jason Lowe added a comment -

        Closing this as FileInputFormat is not supposed to specify the specifics on the file format per the previous comment.

        Show
        Jason Lowe added a comment - Closing this as FileInputFormat is not supposed to specify the specifics on the file format per the previous comment.
        Hide
        Pranay Varma added a comment -

        Please add your comments to the javadoc.
        That will help people understand the class better.

        https://issues.apache.org/jira/browse/MAPREDUCE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
        on the file format per the previous comment.
        https://issues.apache.org/jira/browse/MAPREDUCE-5635
        http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html
        ]
        Either linefeed or carriage-return are used to signal end of line. Keys are
        the position in the file, and values are the line of text..
        http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html
        ]

        Show
        Pranay Varma added a comment - Please add your comments to the javadoc. That will help people understand the class better. https://issues.apache.org/jira/browse/MAPREDUCE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] on the file format per the previous comment. https://issues.apache.org/jira/browse/MAPREDUCE-5635 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html ] Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text.. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Pranay Varma
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development