Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-819

LineRecordWriter should not always insert tab char between key and value

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • None
    • None

    Description

      With the current implementation of LineRecordWriter in TextOutputFormat, the client cannot pass null key/or value to the write function, and a tab char is always inserted between the key and value. This works fine most time. However, in some
      cases, one just does not want to have the extra tab char. A common example is that, if I need to implement a utility similar
      to the unix sort with some fields in the lines as the sort key, I can have my map to extract the sort key from each line and pass the whole line as the value. The reducer just outputs the values and ignore the keys. However, if I use TextOutputFormat, my output will have an extra tab key in each of the lines, which is annoying.

      A simple solution is that let the write function of LineRecordWriter accept null key argument, and write out the value only if the key is null.

      Attachments

        1. patch-819.txt
          10 kB
          Runping Qi

        Activity

          People

            runping Runping Qi
            runping Runping Qi
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: