Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.24.0
    • Component/s: namenode
    • Labels:
      None

      Description

      This is a proposal to replace random generation of block ids with a sequential generator in order to avoid block id reuse in the future.

      1. blockid.tex
        3 kB
        Tsz Wo Nicholas Sze
      2. blockid20100122.pdf
        87 kB
        Tsz Wo Nicholas Sze
      3. FreeBlockIds.pdf
        6 kB
        Konstantin Shvachko
      4. DuplicateBlockIds.patch
        8 kB
        Konstantin Shvachko
      5. HighBitProjection.pdf
        35 kB
        Konstantin Shvachko

        Issue Links

          Activity

          Hide
          Konstantin Shvachko added a comment -

          Problem

          Currently HDFS generates ids for new blocks by randomly selecting a 64-bit number and verifying this id is not in the system yet. If the id is assigned to one of the blocks the procedure repeats again.
          This was discussed in several issues before: HADOOP-2656, HADOOP-146, HADOOP-158.
          The problem with this approach is that data-nodes that had been offline for a long time getting back online may bring in blocks that have already been eleased by the system and then their ids were regenerated and reused again for different blocks in different files. These are so called prehistoric blocks, see here. Bringing up a data-node with a prehistoric replica can potentially corrupt the block.

          Motivation

          1. The prehistoric block problem is still relevant, since the name-node's block-map keys blocks by their ids, see HDFS-512. Although now there is less chance to corrupt a block with a stale replica, because stale replicas will have older generation stamps.
          2. Non-reusable block ids may let us convert generation stamps to 32-bit numbers. Currently generation stamp is 64-bit and is a global file system variable. If we are sure block ids are not reusable, then we can implement per-file generation stamps and 32-bit will be enough for that.
          3. A forward looking reason for turning into sequential ids is that if we ever introduce multiple or distributed name-nodes, then it will not be feasible to verify the existence of a particular block id across these name-nodes.

          Solution

          This simple solution was born some time ago in a discussion with Nicholas and Rob.
          Suppose that you have a cluster with 64 million blocks (N = 2 26). Block ids are 64-bit numbers there is 2 64 of such numbers. If I order the existing block ids b 1 , ..., b N , where b i < b i+1 . Then each i defines a contiguous segment (b i , b i+1 ) of numbers, which does not contain other block ids inside it. It is easy to see that at least one segment will be of size 2 38 = 2 64 / 2 26 .
          So the proposal is to find such a segment and use it for generating block ids starting from b i + 1 by incrementing the previous generation stamp until it reaches b i+1 .
          Currently we increment the generation stamp every time the file is created or a block write fails. Suppose we do 200 new-generation-stamps per second. This is rather optimistic: I think in practice the number is lower. With that pace we can keep generating new stamps for 43 years. Then we will find another large gap in what will remain out of those 43-year-old blocks and start using this new segment. I don't think anybody in this community will care what may happen after the third segment is comsumed, but HDFS will keep looking for new segments and the expectation is that very few blocks will outlive their creators.

          As the next step I'll look at some real HDFS images and verify that practice confirms the theory.

          Show
          Konstantin Shvachko added a comment - Problem Currently HDFS generates ids for new blocks by randomly selecting a 64-bit number and verifying this id is not in the system yet. If the id is assigned to one of the blocks the procedure repeats again. This was discussed in several issues before: HADOOP-2656 , HADOOP-146 , HADOOP-158 . The problem with this approach is that data-nodes that had been offline for a long time getting back online may bring in blocks that have already been eleased by the system and then their ids were regenerated and reused again for different blocks in different files. These are so called prehistoric blocks, see here . Bringing up a data-node with a prehistoric replica can potentially corrupt the block. Motivation The prehistoric block problem is still relevant, since the name-node's block-map keys blocks by their ids, see HDFS-512 . Although now there is less chance to corrupt a block with a stale replica, because stale replicas will have older generation stamps. Non-reusable block ids may let us convert generation stamps to 32-bit numbers. Currently generation stamp is 64-bit and is a global file system variable. If we are sure block ids are not reusable, then we can implement per-file generation stamps and 32-bit will be enough for that. A forward looking reason for turning into sequential ids is that if we ever introduce multiple or distributed name-nodes, then it will not be feasible to verify the existence of a particular block id across these name-nodes. Solution This simple solution was born some time ago in a discussion with Nicholas and Rob. Suppose that you have a cluster with 64 million blocks (N = 2 26 ). Block ids are 64-bit numbers there is 2 64 of such numbers. If I order the existing block ids b 1 , ..., b N , where b i < b i+1 . Then each i defines a contiguous segment (b i , b i+1 ) of numbers, which does not contain other block ids inside it. It is easy to see that at least one segment will be of size 2 38 = 2 64 / 2 26 . So the proposal is to find such a segment and use it for generating block ids starting from b i + 1 by incrementing the previous generation stamp until it reaches b i+1 . Currently we increment the generation stamp every time the file is created or a block write fails. Suppose we do 200 new-generation-stamps per second. This is rather optimistic: I think in practice the number is lower. With that pace we can keep generating new stamps for 43 years. Then we will find another large gap in what will remain out of those 43-year-old blocks and start using this new segment. I don't think anybody in this community will care what may happen after the third segment is comsumed, but HDFS will keep looking for new segments and the expectation is that very few blocks will outlive their creators. As the next step I'll look at some real HDFS images and verify that practice confirms the theory.
          Hide
          dhruba borthakur added a comment -

          >The prehistoric block problem is still relevant, since the name-node's block-map keys blocks by their ids,
          > see HDFS-512. Although now there is less chance to corrupt a block with a stale replica,
          > because stale replicas will have older generation stamps.

          Can you pl explain why this is so? I thought that the list of blocks that hang off a Inode still has the generation stamp in it. If a datanode sends a block report with a pre-historic blockId, it can be detected at the NN because the genstamp of that block will not match the generation stamp of the blocks that hang off the inode.

          Show
          dhruba borthakur added a comment - >The prehistoric block problem is still relevant, since the name-node's block-map keys blocks by their ids, > see HDFS-512 . Although now there is less chance to corrupt a block with a stale replica, > because stale replicas will have older generation stamps. Can you pl explain why this is so? I thought that the list of blocks that hang off a Inode still has the generation stamp in it. If a datanode sends a block report with a pre-historic blockId, it can be detected at the NN because the genstamp of that block will not match the generation stamp of the blocks that hang off the inode.
          Hide
          Konstantin Shvachko added a comment -

          You are absolutely right if everything goes well generation stamps help recognizing prehistoric replicas of newly created blocks.
          When I say the problem is relevant I mean to emphasize two cases:

          1. In case of corrupt blocks, when all generation stamps of its replicas are less than the block's gen. stamp on the NN, we do not have means to recognize whether the replica is just a stale replica of this particular block or some other (prehistoric) block. This happens when you don't have up to date replicas of the block and a data-node with a prehistoric replica of previously removed block with the same id starts up.
          2. The really old blocks, created before introduction of generation stamps, can still cause the problem.

          I agree this is not exactly the problem as you defined it, but they are still related problems.

          Show
          Konstantin Shvachko added a comment - You are absolutely right if everything goes well generation stamps help recognizing prehistoric replicas of newly created blocks. When I say the problem is relevant I mean to emphasize two cases: In case of corrupt blocks, when all generation stamps of its replicas are less than the block's gen. stamp on the NN, we do not have means to recognize whether the replica is just a stale replica of this particular block or some other (prehistoric) block. This happens when you don't have up to date replicas of the block and a data-node with a prehistoric replica of previously removed block with the same id starts up. The really old blocks, created before introduction of generation stamps, can still cause the problem. I agree this is not exactly the problem as you defined it, but they are still related problems.
          Hide
          dhruba borthakur added a comment -

          > 1. In case of corrupt blocks, when all generation stamps of its replicas are less than the blocks' gen stamp

          I think the Namenode can handle both stale replicas as well as prehistoric blocks the same. It will assume that the block does not belong to any files in the namespace and can safely be deleted?

          > 2. The really old blocks, created before introduction of generation stamps, can still cause the problem
          Those datanodes will go through the upgrade process that will convert all those old blocks to have a generation stamp of 0, isn't it?

          Show
          dhruba borthakur added a comment - > 1. In case of corrupt blocks, when all generation stamps of its replicas are less than the blocks' gen stamp I think the Namenode can handle both stale replicas as well as prehistoric blocks the same. It will assume that the block does not belong to any files in the namespace and can safely be deleted? > 2. The really old blocks, created before introduction of generation stamps, can still cause the problem Those datanodes will go through the upgrade process that will convert all those old blocks to have a generation stamp of 0, isn't it?
          Hide
          Konstantin Shvachko added a comment -

          (1) I have a block (id = 1001, gs = 7777), which belongs to an existing file, replication factor = 3. I have two replicas of this block (id = 1001, gs = 7775) and (id = 1001, gs = 7770). This makes the block corrupt, and we don't remove replicas of a corrupt block. Then a third data-node starts, which was down for long time, it has replica (id = 1001, gs = 7), which at some point belonged to a different file, which was removed from the system since then. So I have three replicas, and the name-node does not have info to determine which replica is stale, and which one belonged to the removed file.
          Does that make sense?

          (2) Yes, all old blocks have gs=0. So if a have an old block with three valid replicas, and a DN with a prehistoric replica comes up, then you will not be able to distinguish between them. I mean that introduction of generation stamps does not solve the prehistoric block problem for blocks that existed before blocks had generation stamps.

          Show
          Konstantin Shvachko added a comment - (1) I have a block (id = 1001, gs = 7777), which belongs to an existing file, replication factor = 3. I have two replicas of this block (id = 1001, gs = 7775) and (id = 1001, gs = 7770). This makes the block corrupt, and we don't remove replicas of a corrupt block. Then a third data-node starts, which was down for long time, it has replica (id = 1001, gs = 7), which at some point belonged to a different file, which was removed from the system since then. So I have three replicas, and the name-node does not have info to determine which replica is stale, and which one belonged to the removed file. Does that make sense? (2) Yes, all old blocks have gs=0. So if a have an old block with three valid replicas, and a DN with a prehistoric replica comes up, then you will not be able to distinguish between them. I mean that introduction of generation stamps does not solve the prehistoric block problem for blocks that existed before blocks had generation stamps.
          Hide
          dhruba borthakur added a comment -

          Thanks Konstantin for the explanation. I understand the scenario much better now.

          My view is that we do not need to solve either of the problems (1) or (2) that you have listed below. Is it not? Is there a use case for (1)?

          Show
          dhruba borthakur added a comment - Thanks Konstantin for the explanation. I understand the scenario much better now. My view is that we do not need to solve either of the problems (1) or (2) that you have listed below. Is it not? Is there a use case for (1)?
          Hide
          Konstantin Shvachko added a comment -

          My main motivation is (3) - multiple name-nodes should have disjoint segments of ids. It would be good to do (2) - reduce the size of generation stamp. Many people will be happy to know some weird scenarios (1.1) like I described do not apply to HDFS. We cannot do anything about pre-generation-stamp blocks, I am just saying it exists.
          In a sense current generation stamps play the role of sequential ids, wouldn't it be good to finally turn it right?

          Show
          Konstantin Shvachko added a comment - My main motivation is (3) - multiple name-nodes should have disjoint segments of ids. It would be good to do (2) - reduce the size of generation stamp. Many people will be happy to know some weird scenarios (1.1) like I described do not apply to HDFS. We cannot do anything about pre-generation-stamp blocks, I am just saying it exists. In a sense current generation stamps play the role of sequential ids, wouldn't it be good to finally turn it right?
          Hide
          Konstantin Shvachko added a comment -

          I ran some experiments on some large and small images as a proof of concept. Here is the table.

          • First line is the number blocks in the file system. The largest I had is 40 million blocks.
          • Second line is the largest hole free of block ids.
          • Third line is the minimum segment that we expect to find which is calculated as the ration 2 64 / num_blocks.

          I don't know how to right align numbers, so I used leading zeroes, hope it is not confusing.

          Number of blocks 40,509,569 31,959,139 241,777 178,278 148,035
          Largest segment size 8,623,203,281,141 10,662,709,581,709 889,137,135,725,504 1,324,814,576,358,595 1,849,602,429,191,491
          Expected minimum 0,455,367,560,644 00,577,197,761,694 076,296,205,914,968 0,103,471,211,268,346 0,124,609,852,155,620

          We see that selected segments are larger than the expected minimums and larger than 2 38 = 274,877,906,944.
          This speaks of the quality of the random generator, but also projects longer than 43 years life span with the first segment we choose.

          Show
          Konstantin Shvachko added a comment - I ran some experiments on some large and small images as a proof of concept. Here is the table. First line is the number blocks in the file system. The largest I had is 40 million blocks. Second line is the largest hole free of block ids. Third line is the minimum segment that we expect to find which is calculated as the ration 2 64 / num_blocks. I don't know how to right align numbers, so I used leading zeroes, hope it is not confusing. Number of blocks 40,509,569 31,959,139 241,777 178,278 148,035 Largest segment size 8,623,203,281,141 10,662,709,581,709 889,137,135,725,504 1,324,814,576,358,595 1,849,602,429,191,491 Expected minimum 0,455,367,560,644 00,577,197,761,694 076,296,205,914,968 0,103,471,211,268,346 0,124,609,852,155,620 We see that selected segments are larger than the expected minimums and larger than 2 38 = 274,877,906,944. This speaks of the quality of the random generator, but also projects longer than 43 years life span with the first segment we choose.
          Hide
          Konstantin Shvachko added a comment -

          Approach #2

          Here is another idea initiated by Rob Chansler, which will potentially let us free all positive longs for future block id generation.
          Let's set 8 or less high bits of existing block ids to 1. There is a high probability there will be no id collisions in the resulting set.
          More formally. We have a collection of blocks <b i>. And we build a new collection <c i = Mask | b i>, where Mask = ff00 0000 0000 0000. Collection <b i> does not have repetitions, and there is a very good chance that collection <c i> also does not have them.
          The intuition here is that if you have randomly distributed points in n-dimensional space and you project them into (n-1)-dimensional sub-space, say along one of the axis, the probability that two points will be projected into the same image is low.

          I am attaching a document which derives a formula for the probability. The formula was independently confirmed by Nicholas. The bottom line is this:

          • The probability of getting a collision when setting the first 8 bits to 1 is < 0.03065
          • The probability of getting a collision when setting only one high bit to 1 is < 0.0001221

          From practical point of view it is enough to set at least one highest bit. Then we'll free the entire segment of positive longs for new block id generation.

          I implemented a block id analyzing routine, which combines the two approaches described in the issue. The routine reads the image using OIV and first tries to reset bits starting from the high 8 and going down to 1 highest bit whenever resetting more bits leads to collisions. If the routine fails to reset any bits collision-free, then it uses the initial approach, that is, just finds the biggest free gap in existing block ids.

          If the bit reset approach succeeds then block replicas need to be renamed on data-nodes. This will be done via an upgrade. During the upgrade data-nodes hardlink replicas into the new directory (automatically handled by the upgrade framework), and then rename the newly created links reflecting the new block ids (the specific part of this upgrade).
          In order to avoid any mishaps I also propose to assign new generation stamp to all blocks renamed. So that when data-nodes that missed the upgrade wake up they will not report old blocks. The latter is impossible, because data-nodes cannot join the cluster until they upgrade, but we don't want to take any chances.

          I tested the routine on 6 images so far, containing from 150,000 to 40,000,000 blocks. All of them successfully tolerated the reset of 8 bits without ANY collisions.

          I would like to ask the community for some help. If people could run the tool on their images and post some stats or just send the results to me I'll take care of summarizing them.

          Show
          Konstantin Shvachko added a comment - Approach #2 Here is another idea initiated by Rob Chansler, which will potentially let us free all positive longs for future block id generation. Let's set 8 or less high bits of existing block ids to 1. There is a high probability there will be no id collisions in the resulting set. More formally. We have a collection of blocks <b i >. And we build a new collection <c i = Mask | b i >, where Mask = ff00 0000 0000 0000. Collection <b i > does not have repetitions, and there is a very good chance that collection <c i > also does not have them. The intuition here is that if you have randomly distributed points in n-dimensional space and you project them into (n-1)-dimensional sub-space, say along one of the axis, the probability that two points will be projected into the same image is low. I am attaching a document which derives a formula for the probability. The formula was independently confirmed by Nicholas. The bottom line is this: The probability of getting a collision when setting the first 8 bits to 1 is < 0.03065 The probability of getting a collision when setting only one high bit to 1 is < 0.0001221 From practical point of view it is enough to set at least one highest bit. Then we'll free the entire segment of positive longs for new block id generation. I implemented a block id analyzing routine, which combines the two approaches described in the issue. The routine reads the image using OIV and first tries to reset bits starting from the high 8 and going down to 1 highest bit whenever resetting more bits leads to collisions. If the routine fails to reset any bits collision-free, then it uses the initial approach, that is, just finds the biggest free gap in existing block ids. If the bit reset approach succeeds then block replicas need to be renamed on data-nodes. This will be done via an upgrade. During the upgrade data-nodes hardlink replicas into the new directory (automatically handled by the upgrade framework), and then rename the newly created links reflecting the new block ids (the specific part of this upgrade). In order to avoid any mishaps I also propose to assign new generation stamp to all blocks renamed. So that when data-nodes that missed the upgrade wake up they will not report old blocks. The latter is impossible, because data-nodes cannot join the cluster until they upgrade, but we don't want to take any chances. I tested the routine on 6 images so far, containing from 150,000 to 40,000,000 blocks. All of them successfully tolerated the reset of 8 bits without ANY collisions. I would like to ask the community for some help. If people could run the tool on their images and post some stats or just send the results to me I'll take care of summarizing them.
          Hide
          Konstantin Shvachko added a comment -

          This is the analyzing tool. In order to run you need to apply the patch, compile it, and then run oiv

          bin/hdfs oiv -p BlockIdVisitor -i fsimage -o output.txt
          
          Show
          Konstantin Shvachko added a comment - This is the analyzing tool. In order to run you need to apply the patch, compile it, and then run oiv bin/hdfs oiv -p BlockIdVisitor -i fsimage -o output.txt
          Hide
          Todd Lipcon added a comment -

          I get slightly different figures than you guys... I am looking at this as identical to the well-known Birthday Problem: http://en.wikipedia.org/wiki/Birthday_problem

          In our case, we have 2^(64-b) "days" and 2^26 "people"

          We have 2^(64-b) "days" and B=2^26 "people". Following the formula on Wikipedia:

          In [21]: n = 2^26
          In [22]: d = 2^(64-8)
          In [23]: reduce(operator.mul, [(1 - float(i)/d)  for i in xrange(0, n)])
          Out[23]: 0.0037908372356959502
          

          whereas you've calculated 0.03065 for this case.

          The python above agrees with Wikipedia for the birthday example, so I think the code is correct:

          In [25]: d = 365
          In [26]: n = 23
          In [27]: reduce(operator.mul, [(1 - float(i)/d)  for i in xrange(0, n)])
          Out[27]: 0.49270276567601451
          

          Wary of floating point math, I also checked using int math to calculate numerator and denominator, then int division to make them smaller, then float division to get a fraction:

          In [70]: num,denom = (reduce(operator.mul, [d - i for i in xrange(0, n)])), (d**(n))
          In [71]: float(num/100000000000000000000)/float(denom/100000000000000000000)
          Out[71]: 0.0037908372356959502
          

          So where are our numbers diverging?

          Show
          Todd Lipcon added a comment - I get slightly different figures than you guys... I am looking at this as identical to the well-known Birthday Problem: http://en.wikipedia.org/wiki/Birthday_problem In our case, we have 2^(64-b) "days" and 2^26 "people" We have 2^(64-b) "days" and B=2^26 "people". Following the formula on Wikipedia: In [21]: n = 2^26 In [22]: d = 2^(64-8) In [23]: reduce(operator.mul, [(1 - float(i)/d) for i in xrange(0, n)]) Out[23]: 0.0037908372356959502 whereas you've calculated 0.03065 for this case. The python above agrees with Wikipedia for the birthday example, so I think the code is correct: In [25]: d = 365 In [26]: n = 23 In [27]: reduce(operator.mul, [(1 - float(i)/d) for i in xrange(0, n)]) Out[27]: 0.49270276567601451 Wary of floating point math, I also checked using int math to calculate numerator and denominator, then int division to make them smaller, then float division to get a fraction: In [70]: num,denom = (reduce(operator.mul, [d - i for i in xrange(0, n)])), (d**(n)) In [71]: float(num/100000000000000000000)/float(denom/100000000000000000000) Out[71]: 0.0037908372356959502 So where are our numbers diverging?
          Hide
          Todd Lipcon added a comment -

          Wait, I just realized my mistake... I had used ^ instead of **. Please ignore!

          Show
          Todd Lipcon added a comment - Wait, I just realized my mistake... I had used ^ instead of **. Please ignore!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          This is not a birthday problem. There is no limitation on the number of people having the same birthday but there is a limitation on block ids.

          For 365 days and 2^26 people, all 2^26 people could have the same birthday.

          However, for 2^(64-b) id space and B=2^26 blocks, at most two blocks could share the same block id after removing 1 bit.

          Show
          Tsz Wo Nicholas Sze added a comment - This is not a birthday problem. There is no limitation on the number of people having the same birthday but there is a limitation on block ids. For 365 days and 2^26 people, all 2^26 people could have the same birthday. However, for 2^(64-b) id space and B=2^26 blocks, at most two blocks could share the same block id after removing 1 bit.
          Hide
          Todd Lipcon added a comment -

          You're completely right, now that I think more about it. However, it's a good sanity check since we know the probability of duplicates to be relatively small. Using the correct numbers, the birthday problem approximation is 0.0307 which lines up with yours very closely. Thanks, and apologies for the junk on the jira.

          Show
          Todd Lipcon added a comment - You're completely right, now that I think more about it. However, it's a good sanity check since we know the probability of duplicates to be relatively small. Using the correct numbers, the birthday problem approximation is 0.0307 which lines up with yours very closely. Thanks, and apologies for the junk on the jira.
          Hide
          Konstantin Shvachko added a comment -

          Yes the birthday problem is similar, but not the same. In BDP you project yyyy/mm/dd to yyyy.
          But in BDP years are arbitrary, while in our case yyyy-s are limited to a certain number as Nicholas mentioned.
          Also birthdays do not uniquely identify a person, while in our case all blocks have unique ids.

          Show
          Konstantin Shvachko added a comment - Yes the birthday problem is similar, but not the same. In BDP you project yyyy/mm/dd to yyyy. But in BDP years are arbitrary, while in our case yyyy-s are limited to a certain number as Nicholas mentioned. Also birthdays do not uniquely identify a person, while in our case all blocks have unique ids.
          Hide
          Konstantin Shvachko added a comment -

          Here is what I got running the block id freeing tool on some cluster images.
          All 10 images tolerated the reset of 8 bits not causing any collisions among projected ids.

          Show
          Konstantin Shvachko added a comment - Here is what I got running the block id freeing tool on some cluster images. All 10 images tolerated the reset of 8 bits not causing any collisions among projected ids.
          Hide
          Dmytro Molkov added a comment -

          I just ran a tool on FB cluster. Here is the output:

          Bit map applied: ff00000000000000
          Number of collisions = 0
          =====================
          Number of blocks = 57909756
          Number of negtive ids = 57909756
          Number of positive ids = 0
          Largest segment = (-277768208, 9223372036854775807)
          Segment size = 9.223372037132544E18
          Expected max = 318542942464

          Show
          Dmytro Molkov added a comment - I just ran a tool on FB cluster. Here is the output: Bit map applied: ff00000000000000 Number of collisions = 0 ===================== Number of blocks = 57909756 Number of negtive ids = 57909756 Number of positive ids = 0 Largest segment = (-277768208, 9223372036854775807) Segment size = 9.223372037132544E18 Expected max = 318542942464
          Hide
          Konstantin Shvachko added a comment -

          Great! Yet another cluster would have had its block ids converted using 8 bit projection collision free. Thanks Dmytro.

          Show
          Konstantin Shvachko added a comment - Great! Yet another cluster would have had its block ids converted using 8 bit projection collision free. Thanks Dmytro.
          Hide
          dhruba borthakur added a comment -

          I think we should do sequential-ids only for newly formatted clusters and not require that existing upgrade their disk formats.

          Show
          dhruba borthakur added a comment - I think we should do sequential-ids only for newly formatted clusters and not require that existing upgrade their disk formats.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Let n be the number of bits used for storing block ID. Let m be the number of existing blocks such that their ID are distinct. Now, remove the first b bits from the block IDs. What is the probability, p(n; m; b), that all the block IDs remain distinct?

          See blockid20100122.pdf.

          Show
          Tsz Wo Nicholas Sze added a comment - Let n be the number of bits used for storing block ID. Let m be the number of existing blocks such that their ID are distinct. Now, remove the first b bits from the block IDs. What is the probability, p(n; m; b), that all the block IDs remain distinct? See blockid20100122.pdf.

            People

            • Assignee:
              Unassigned
              Reporter:
              Konstantin Shvachko
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:

                Development