Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-7096

Allow setting of end-of-record delimiter for TextInputFormat

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0, 0.23.0
    • None
    • None
    • Reviewed

    Description

      The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:

      It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
      I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

      Attachments

        1. HADOOP-7096_r2.patch
          9 kB
          Ahmed Radwan
        2. HADOOP-7096_r3.patch
          12 kB
          Ahmed Radwan
        3. hadoop-7096_r4.patch
          6 kB
          Todd Lipcon
        4. hadoop-7096.branch-1.patch
          7 kB
          Suresh Srinivas
        5. HADOOP-7096.patch
          0.8 kB
          Ahmed Radwan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ahmed.radwan Ahmed Radwan
            ahmed.radwan Ahmed Radwan
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment