Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-7760

BytesWritable / SequenceFile yields dummy linefeed at end as soon as content has one or more linefeeds.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 0.20.2
    • Fix Version/s: None
    • Component/s: record
    • Labels:
      None
    • Environment:
    • Tags:
      sequencefile byteswritable linefeed newline

      Description

      I create SequenceFiles which have BytesWritable as values.
      I notice that if I store content which contains no linefeeds ("\n") or one linefeed, in the value, the value can also be read out of the sequencefile properly.
      However, as soon as I store input which contains two or more linefeeds (which is actually pretty much always the case), during the process of writing to the sequencefile and reading my data back, one extra linefeed is yielded at the end of the value, a linefeed which did not exist in the input.
      So this effectively corrupts my data, although i could write a hacky workaround for it.
      I have written a program that demonstrates the behavior, by showing what happens when writing 2 sequencefiles:
      one that has a record which value contains one linefeeds.
      another that has a record which value contains two linefeeds.
      Upon reading, the latter value will contain 3 linefeeds.

      Test file is : http://pastie.org/2728797

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dieter_be Dieter Plaetinck
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified