Hadoop Common
  1. Hadoop Common
  2. HADOOP-106

Data blocks should be record-oriented.

    Details

    • Type: Wish Wish
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.2.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      If data blocks were starting and ending on data record boundaries, and not in random places within a file, it would give some important advantages:

      • it would be possible to avoid "fishing" for the beginning of first record in a split (see SequenceFile.Reader.sync()).
      • it would make recovering from DFS errors much more successful and easier - in most cases missing blocks could be just skipped and the remaining parts combined together.

        Issue Links

          Activity

          Hide
          Owen O'Malley added a comment -

          I agree with Eric. This is backwards. You want the sequence file to pad out to block boundaries.

          Show
          Owen O'Malley added a comment - I agree with Eric. This is backwards. You want the sequence file to pad out to block boundaries.
          Hide
          eric baldeschwieler added a comment -

          My intuition is it makes more sense to do this the other way around and have records aligned to blocks. This keeps the FS implementation trivial. Just pad near the end of a block. This way you keep a good seperation of APIs too. Fairly straight forward to change the record model to do that. Only issues are with huge records. You have a couple of options there. The simplest is to disallow them...

          Show
          eric baldeschwieler added a comment - My intuition is it makes more sense to do this the other way around and have records aligned to blocks. This keeps the FS implementation trivial. Just pad near the end of a block. This way you keep a good seperation of APIs too. Fairly straight forward to change the record model to do that. Only issues are with huge records. You have a couple of options there. The simplest is to disallow them...

            People

            • Assignee:
              Sameer Paranjpye
              Reporter:
              Andrzej Bialecki
            • Votes:
              2 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development