Hadoop Common
  1. Hadoop Common
  2. HADOOP-1838

Files created with an pre-0.15 gets blocksize as zero, causing performance degradation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None

      Description

      HADOOP-1656 introduced the support for storing block size persistently as inode metadata. Previously, if the file has only one block then it was not possible to accurately determine the blocksize that the application has requested at file-creation time.

      The upgrade of an older layout to the new layout kept the blocksize as zero for single-block files that were upgraded to the new layout. This was done to indicate the DFS really does not know the "true" blocksize of this file. This caused map-reduce to determine that a split is 1 byte in length!

      1. blockSizeZero.patch
        5 kB
        dhruba borthakur

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          I just committed this.

          Show
          dhruba borthakur added a comment - I just committed this.
          Show
          Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12365210/blockSizeZero.patch applied and successfully tested against trunk revision r573383. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/698/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/698/console
          Hide
          Owen O'Malley added a comment -

          +1

          Show
          Owen O'Malley added a comment - +1
          Hide
          dhruba borthakur added a comment -

          Use the default-block-size to be the block size for a file that has only one block.

          Show
          dhruba borthakur added a comment - Use the default-block-size to be the block size for a file that has only one block.
          Hide
          Owen O'Malley added a comment -

          I'd much rather have the upgrade set the blocksize to the default block size in the case of single block files, rather leave 0 as a special value. The problem with special values is that they need to be tested for in every single use of the field and are thus much much harder to maintain.

          Show
          Owen O'Malley added a comment - I'd much rather have the upgrade set the blocksize to the default block size in the case of single block files, rather leave 0 as a special value. The problem with special values is that they need to be tested for in every single use of the field and are thus much much harder to maintain.
          Hide
          Owen O'Malley added a comment -

          I'm upgrading this to blocker since it makes it almost impossible to run map/reduce jobs.

          Show
          Owen O'Malley added a comment - I'm upgrading this to blocker since it makes it almost impossible to run map/reduce jobs.
          Hide
          dhruba borthakur added a comment -

          If the blocksize is zero then return the size of the first block as the "blocksize".

          Show
          dhruba borthakur added a comment - If the blocksize is zero then return the size of the first block as the "blocksize".

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development