Accumulo
  1. Accumulo
  2. ACCUMULO-683

Accumulo ignores HDFS max replication configuration

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.1
    • Fix Version/s: 1.4.2, 1.5.0
    • Component/s: tserver
    • Labels:
      None

      Description

      I setup a new 1.4.1 instance that was running on top of a Hadoop installation that had the maximum block replications set to 3 and the following error showed up on the monitor page.

      java.io.IOException: failed to create file /accumulo/tables/!0/table_info/F0000001.rf_tmp on client 127.0.0.1. Requested replication 5 exceeds maximum 3 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1220) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1123) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:551) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)

      Tablet server error is:

      10 10:56:25,408 [tabletserver.MinorCompactor] WARN : MinC failed (java.io.IOException: failed to create file /accumulo/tables/!0/
      table_info/F0000001.rf_tmp on client 127.0.0.1.
      Requested replication 5 exceeds maximum 3
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1220)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1123)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:551)
      at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
      ) to create /accumulo/tables/!0/table_info/F0000001.rf_tmp retrying ...

        Activity

        Hide
        Jim Klucar added a comment -

        Here's the hdfs-site.xml property entry

        <property>
        <name>dfs.replication.max</name>
        <value>3</value>
        <description>Maximal block replication.
        </description>
        </property>

        Show
        Jim Klucar added a comment - Here's the hdfs-site.xml property entry <property> <name>dfs.replication.max</name> <value>3</value> <description>Maximal block replication. </description> </property>
        Hide
        Jim Klucar added a comment -

        I didn't have table.file.replication set, and the documentation says a value of 0 will use HDFS defaults. I will explicitly set this property to zero and try again.

        However, looks like the offending line could be here:

        ./server/src/main/java/org/apache/accumulo/server/util/Initialize.java: initialMetadataConf.put(Property.TABLE_FILE_REPLICATION.getKey(), "5");

        Show
        Jim Klucar added a comment - I didn't have table.file.replication set, and the documentation says a value of 0 will use HDFS defaults. I will explicitly set this property to zero and try again. However, looks like the offending line could be here: ./server/src/main/java/org/apache/accumulo/server/util/Initialize.java: initialMetadataConf.put(Property.TABLE_FILE_REPLICATION.getKey(), "5");
        Hide
        Billie Rinaldi added a comment - - edited

        That looks like something we should figure out how to handle better. The !METADATA table has a larger number of replicas (5 by default) because it is really important not to lose those files. To make 1.4.1 run with a max replication of 3, you could manually change the parameter table.file.replication for the !METADATA table to equal the max replication of HDFS. However, I would strongly urge against making this lower than 5.

        We should decide if we want Accumulo to automatically set the !METADATA replication to the HDFS max if it is less than five, or if we want it to throw an informative error saying that the user should lower the !METADATA replication at a greater risk of losing critical data.

        Show
        Billie Rinaldi added a comment - - edited That looks like something we should figure out how to handle better. The !METADATA table has a larger number of replicas (5 by default) because it is really important not to lose those files. To make 1.4.1 run with a max replication of 3, you could manually change the parameter table.file.replication for the !METADATA table to equal the max replication of HDFS. However, I would strongly urge against making this lower than 5. We should decide if we want Accumulo to automatically set the !METADATA replication to the HDFS max if it is less than five, or if we want it to throw an informative error saying that the user should lower the !METADATA replication at a greater risk of losing critical data.
        Hide
        Adam Fuchs added a comment -

        A workaround should be to set table.file.replication to 3 for the !METADATA table. This can be done in the shell via
        $ config -t !METADATA -s table.file.replication=3

        There's another debate about whether we should fix this automatically. This is a prickly issue – on the one hand we could just ignore the table.file.replication if it is set to more than the hadoop dfs.replication.max, and on the other hand we could enforce the table.file.replication and make an administrator resolve the conflict. I would argue the later is better because of the following:

        1. Automatically defaulting to the lower replication setting would constitute a subtle decrease in durability, which may not be obvious to the administrator when they are modifying a dependent system. We should make the the administrator fully aware of the consequences by forcing the administrator to resolve the conflict.
        2. There is already an administrator involved in creating the conflict (by setting the HDFS parameter), so adding another human-in-the-loop step here should not be overly costly.
        3. Accumulo shouldn't have to keep track of all the HDFS configuration parameters that might affect it. These parameters change relatively frequently, so we could introduce future incompatibilities by trying to avoid current incompatibilities.

        That said, I think we should catch this exception and suggest the fix in the error message that we log.

        Show
        Adam Fuchs added a comment - A workaround should be to set table.file.replication to 3 for the !METADATA table. This can be done in the shell via $ config -t !METADATA -s table.file.replication=3 There's another debate about whether we should fix this automatically. This is a prickly issue – on the one hand we could just ignore the table.file.replication if it is set to more than the hadoop dfs.replication.max, and on the other hand we could enforce the table.file.replication and make an administrator resolve the conflict. I would argue the later is better because of the following: Automatically defaulting to the lower replication setting would constitute a subtle decrease in durability, which may not be obvious to the administrator when they are modifying a dependent system. We should make the the administrator fully aware of the consequences by forcing the administrator to resolve the conflict. There is already an administrator involved in creating the conflict (by setting the HDFS parameter), so adding another human-in-the-loop step here should not be overly costly. Accumulo shouldn't have to keep track of all the HDFS configuration parameters that might affect it. These parameters change relatively frequently, so we could introduce future incompatibilities by trying to avoid current incompatibilities. That said, I think we should catch this exception and suggest the fix in the error message that we log.
        Hide
        jv added a comment -

        Perhaps there's a middle ground where we can handle this. I think when we init we should do a check and alert the user if they have an issue. We can prompt them to set the replication there, so if they want to keep the 5 and mess with their HDFS configuration they can. Or they can set their !METADATA replication to 3 and be done with it.

        Show
        jv added a comment - Perhaps there's a middle ground where we can handle this. I think when we init we should do a check and alert the user if they have an issue. We can prompt them to set the replication there, so if they want to keep the 5 and mess with their HDFS configuration they can. Or they can set their !METADATA replication to 3 and be done with it.
        Hide
        jv added a comment -

        Actually, this points out a bug. Our initial root tablet doesn't have it's replication set to 5, it's the system default.

        Show
        jv added a comment - Actually, this points out a bug. Our initial root tablet doesn't have it's replication set to 5, it's the system default.
        Hide
        Keith Turner added a comment -

        One concern I have about issues like this is for indirect users of Accumulo. An indirect user of Accumulo is someone who is using something like OpenTSDB[1] or Jaccson[2]. These users want to know as little about Accumuo as possible, they just want OpenTSDB or Jaccson to work. For these users, the assumption that setting a config option on the metadata table is a trivial operation does not hold. It could take them a while to figure out what they need to do to fix this issue.

        This issue is somewhat similar to the java policy file that could keep Accumulo from running and make users spend a lot of time figuring it out. The issue differs in that its much less likely and increases the probability of loss of critical data. It changes probabilities, but it does not make data loss certain.

        Also I think if a user sets max replication, they mean it. I think Accumulo should just work (automatically take the min setting) and log a warning.

        Basically I am in favor of Accumulo just working in as many situations as possible.

        Also, setting the metadata table replication to 5 instead of 3 was something I did w/o much thought when adding per table replication setting. The basic premise what that 5 is better than 3. I certainly did not analyze the probabilities of how it affects your odds under different datanode loss situations. For example instead of being 5, should it be some function of the cluster size? I dunno know.

        We will probably also run into a similar issue if someone sets the min hdfs replication > 5. We should handle that issue in this ticket also.

        [1] : https://github.com/ericnewton/opentsdb
        [2] : https://github.com/acordova/jaccson

        John, I like your suggestion about checking in when the user runs init and prompting. The prompt should suggest a default that will work.

        Show
        Keith Turner added a comment - One concern I have about issues like this is for indirect users of Accumulo. An indirect user of Accumulo is someone who is using something like OpenTSDB [1] or Jaccson [2] . These users want to know as little about Accumuo as possible, they just want OpenTSDB or Jaccson to work. For these users, the assumption that setting a config option on the metadata table is a trivial operation does not hold. It could take them a while to figure out what they need to do to fix this issue. This issue is somewhat similar to the java policy file that could keep Accumulo from running and make users spend a lot of time figuring it out. The issue differs in that its much less likely and increases the probability of loss of critical data. It changes probabilities, but it does not make data loss certain. Also I think if a user sets max replication, they mean it. I think Accumulo should just work (automatically take the min setting) and log a warning. Basically I am in favor of Accumulo just working in as many situations as possible. Also, setting the metadata table replication to 5 instead of 3 was something I did w/o much thought when adding per table replication setting. The basic premise what that 5 is better than 3. I certainly did not analyze the probabilities of how it affects your odds under different datanode loss situations. For example instead of being 5, should it be some function of the cluster size? I dunno know. We will probably also run into a similar issue if someone sets the min hdfs replication > 5. We should handle that issue in this ticket also. [1] : https://github.com/ericnewton/opentsdb [2] : https://github.com/acordova/jaccson John, I like your suggestion about checking in when the user runs init and prompting. The prompt should suggest a default that will work.
        Hide
        jv added a comment -

        Unless there are any objections, I'm going to implement this as an initialization time prompt which will check both max and mins and ask for what setting the user wants IFF the default of 5 is not within bounds.

        Show
        jv added a comment - Unless there are any objections, I'm going to implement this as an initialization time prompt which will check both max and mins and ask for what setting the user wants IFF the default of 5 is not within bounds.
        Hide
        Charles Ott added a comment -

        I am still seeing the replication factor for !METADATA table as having override value of 5 upon a fresh installation of 1.4.4.

        It took a while to find this thread and resolve the issue. This should either be in the install guide or should be something I am prompted to choose during the init process.

        Show
        Charles Ott added a comment - I am still seeing the replication factor for !METADATA table as having override value of 5 upon a fresh installation of 1.4.4. It took a while to find this thread and resolve the issue. This should either be in the install guide or should be something I am prompted to choose during the init process.
        Hide
        John Vines added a comment -

        Did you have the max replication set below 5?

        Show
        John Vines added a comment - Did you have the max replication set below 5?
        Hide
        Charles Ott added a comment -

        Yes, using Cloud Manager 4.7, the 'hdfs-site.xml' defaults to a factor of 3 for cdh3u6 installation.

        Show
        Charles Ott added a comment - Yes, using Cloud Manager 4.7, the 'hdfs-site.xml' defaults to a factor of 3 for cdh3u6 installation.
        Hide
        John Vines added a comment -

        Not default replication, max replication.

        Show
        John Vines added a comment - Not default replication, max replication.
        Hide
        Charles Ott added a comment -

        ah. "Max" Replication is not specified. In HDFS the property looks like this:

        <property>
        <name>dfs.replication</name>
        <value>3</value>
        </property>

        Looks like the Hadoop default Max Replication is 512 for cdh3u6.
        So I guess the replica of 5 will almost always be used by accumulo when deployed on cdh3u6.

        Sorry about the confusion.

        Show
        Charles Ott added a comment - ah. "Max" Replication is not specified. In HDFS the property looks like this: <property> <name>dfs.replication</name> <value>3</value> </property> Looks like the Hadoop default Max Replication is 512 for cdh3u6. So I guess the replica of 5 will almost always be used by accumulo when deployed on cdh3u6. Sorry about the confusion.
        Hide
        John Vines added a comment -

        Correct, you can override this by setting the table replication for the !METADATA table after you init. However, it is not advised because you really want to have that extra reliability for the !METADATA.

        Show
        John Vines added a comment - Correct, you can override this by setting the table replication for the !METADATA table after you init. However, it is not advised because you really want to have that extra reliability for the !METADATA.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jim Klucar
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development