Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16272

Int overflow in computing safe length during EC block recovery

Details

    Description

      There exists an int overflow problem in StripedBlockUtil#getSafeLength, which will produce a negative or zero length:
      1. With negative length, it fails to the later >=0 check, and will crash the BlockRecoveryWorker thread, which make the lease recovery operation unable to finish.
      2. With zero length, it passes the check, and directly truncate the block size to zero, leads to data lossing.

      If you are using any of the default EC policies (3-2, 6-3 or 10-4) and the default HDFS block size of 128MB, then you will not be impacted by this issue.

      To be impacted, the EC dataNumber * blockSize has to be larger than the Java max int of 2,147,483,647.

      For example 10-4 is 10 * 134217728 = 1,342,177,280 which is OK.

      However 10-4 with 256MB blocks is 2,684,354,560 which overflows the INT and causes the problem.

      Attachments

        Issue Links

          Activity

            Committed down the active branches. Thanks for the contribution cndaimin.

            sodonnell Stephen O'Donnell added a comment - Committed down the active branches. Thanks for the contribution cndaimin .

            People

              cndaimin daimin
              cndaimin daimin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h