Hadoop Common
  1. Hadoop Common
  2. HADOOP-106

Data blocks should be record-oriented.

    Details

    • Type: Wish Wish
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.2.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      If data blocks were starting and ending on data record boundaries, and not in random places within a file, it would give some important advantages:

      • it would be possible to avoid "fishing" for the beginning of first record in a split (see SequenceFile.Reader.sync()).
      • it would make recovering from DFS errors much more successful and easier - in most cases missing blocks could be just skipped and the remaining parts combined together.

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          576d 17h 52m 1 Owen O'Malley 23/Oct/07 23:13
          Resolved Resolved Closed Closed
          23h 28m 1 Doug Cutting 24/Oct/07 22:42
          Harsh J made changes -
          Link This issue relates to HADOOP-7404 [ HADOOP-7404 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Doug Cutting made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Hide
          Owen O'Malley added a comment -

          I agree with Eric. This is backwards. You want the sequence file to pad out to block boundaries.

          Show
          Owen O'Malley added a comment - I agree with Eric. This is backwards. You want the sequence file to pad out to block boundaries.
          Doug Cutting made changes -
          Assignee Sameer Paranjpye [ sameerp ]
          Doug Cutting made changes -
          Workflow no-reopen-closed [ 12373238 ] no-reopen-closed, patch-avail [ 12377439 ]
          Doug Cutting made changes -
          Workflow no reopen closed [ 12372906 ] no-reopen-closed [ 12373238 ]
          Doug Cutting made changes -
          Field Original Value New Value
          Workflow jira [ 12352458 ] no reopen closed [ 12372906 ]
          Hide
          eric baldeschwieler added a comment -

          My intuition is it makes more sense to do this the other way around and have records aligned to blocks. This keeps the FS implementation trivial. Just pad near the end of a block. This way you keep a good seperation of APIs too. Fairly straight forward to change the record model to do that. Only issues are with huge records. You have a couple of options there. The simplest is to disallow them...

          Show
          eric baldeschwieler added a comment - My intuition is it makes more sense to do this the other way around and have records aligned to blocks. This keeps the FS implementation trivial. Just pad near the end of a block. This way you keep a good seperation of APIs too. Fairly straight forward to change the record model to do that. Only issues are with huge records. You have a couple of options there. The simplest is to disallow them...
          Andrzej Bialecki created issue -

            People

            • Assignee:
              Sameer Paranjpye
              Reporter:
              Andrzej Bialecki
            • Votes:
              2 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development