Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-1946

Include dfs.datanode.synconclose in hdfs configuration documentation

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.5, 1.5.1, 1.6.0
    • Component/s: docs
    • Labels:
      None

      Description

      We should be including some writeup about dfs.datanode.synconclose in our documentation surrounding the HDFS configuration as it better ensures that data is lost in the face of hard shutdown (power loss) of the datanode process.

        Issue Links

          Activity

          Hide
          elserj Josh Elser added a comment -

          Pulled the changes back to 1.4 and noted about the lack of this property before apache hadoop 1.1.1

          Show
          elserj Josh Elser added a comment - Pulled the changes back to 1.4 and noted about the lack of this property before apache hadoop 1.1.1
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 47403ba4f5d552cabb94bc710e050dc114bcc922 in branch refs/heads/master from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=47403ba ]

          ACCUMULO-1946 Note in the README about dfs.datanode.synconclose, with a slight amendment to hadoop version application.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 47403ba4f5d552cabb94bc710e050dc114bcc922 in branch refs/heads/master from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=47403ba ] ACCUMULO-1946 Note in the README about dfs.datanode.synconclose, with a slight amendment to hadoop version application.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 47403ba4f5d552cabb94bc710e050dc114bcc922 in branch refs/heads/1.6.0-SNAPSHOT from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=47403ba ]

          ACCUMULO-1946 Note in the README about dfs.datanode.synconclose, with a slight amendment to hadoop version application.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 47403ba4f5d552cabb94bc710e050dc114bcc922 in branch refs/heads/1.6.0-SNAPSHOT from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=47403ba ] ACCUMULO-1946 Note in the README about dfs.datanode.synconclose, with a slight amendment to hadoop version application.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 47403ba4f5d552cabb94bc710e050dc114bcc922 in branch refs/heads/1.5.1-SNAPSHOT from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=47403ba ]

          ACCUMULO-1946 Note in the README about dfs.datanode.synconclose, with a slight amendment to hadoop version application.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 47403ba4f5d552cabb94bc710e050dc114bcc922 in branch refs/heads/1.5.1-SNAPSHOT from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=47403ba ] ACCUMULO-1946 Note in the README about dfs.datanode.synconclose, with a slight amendment to hadoop version application.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 47403ba4f5d552cabb94bc710e050dc114bcc922 in branch refs/heads/1.4.5-SNAPSHOT from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=47403ba ]

          ACCUMULO-1946 Note in the README about dfs.datanode.synconclose, with a slight amendment to hadoop version application.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 47403ba4f5d552cabb94bc710e050dc114bcc922 in branch refs/heads/1.4.5-SNAPSHOT from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=47403ba ] ACCUMULO-1946 Note in the README about dfs.datanode.synconclose, with a slight amendment to hadoop version application.
          Hide
          kturner Keith Turner added a comment -

          however substantially less probable than the walog issue.

          I am not sure its less probable, I think it depends on how frequently compactions are running. If a system is constantly compacting, then it will have compacted data in memory thats not flushed to disk a significant percentage of a given time period. If lots of nodes are running compactions, then there may always be unflushed data somewhere. If the power goes out on a system running lots of compaction, then data loss seems highly likely.

          Show
          kturner Keith Turner added a comment - however substantially less probable than the walog issue. I am not sure its less probable, I think it depends on how frequently compactions are running. If a system is constantly compacting, then it will have compacted data in memory thats not flushed to disk a significant percentage of a given time period. If lots of nodes are running compactions, then there may always be unflushed data somewhere. If the power goes out on a system running lots of compaction, then data loss seems highly likely.
          Hide
          elserj Josh Elser added a comment -

          Reopen for addition to 1.4 series

          Show
          elserj Josh Elser added a comment - Reopen for addition to 1.4 series
          Hide
          kturner Keith Turner added a comment -

          Found the following link discussing this property, according to it the the flag seems relevant to compactions. It also mentions setting dfs.datanode.sync.behind.writes. When I click on the link for that ticket, it says the config is experimental.

          http://hadoop-hbase.blogspot.com/2013/07/protected-hbase-against-data-center.html

          Show
          kturner Keith Turner added a comment - Found the following link discussing this property, according to it the the flag seems relevant to compactions. It also mentions setting dfs.datanode.sync.behind.writes. When I click on the link for that ticket, it says the config is experimental. http://hadoop-hbase.blogspot.com/2013/07/protected-hbase-against-data-center.html
          Hide
          vines John Vines added a comment -

          After reading a bit more about how long natural syncs can take (up to 30 seconds), then I think it is feasible for it to occur. So yeah, it does apply to 1.4.5+, however substantially less probable than the walog issue.

          Show
          vines John Vines added a comment - After reading a bit more about how long natural syncs can take (up to 30 seconds), then I think it is feasible for it to occur. So yeah, it does apply to 1.4.5+, however substantially less probable than the walog issue.
          Hide
          kturner Keith Turner added a comment -

          What about the following situation, is it relevant?

          1. Compaction occurs writes to file F1 and closes file F1
          2. Portions of file F1 on 3 datanodes are in memory (OS memory maybe) and not flushed to disk
          3. All 3 datanodes F1 was written to are powered off
          Show
          kturner Keith Turner added a comment - What about the following situation, is it relevant? Compaction occurs writes to file F1 and closes file F1 Portions of file F1 on 3 datanodes are in memory (OS memory maybe) and not flushed to disk All 3 datanodes F1 was written to are powered off
          Hide
          elserj Josh Elser added a comment -

          Keith Turner, I was talking to John Vines who said this was only a problem with HDFS WALs (not with closing of files during compaction). That's what influenced the fixVersion.

          Show
          elserj Josh Elser added a comment - Keith Turner , I was talking to John Vines who said this was only a problem with HDFS WALs (not with closing of files during compaction). That's what influenced the fixVersion.
          Hide
          kturner Keith Turner added a comment -

          Josh Elser why not document this in 1.4 also, now that it has hadoop 2 support? Is this config introduced in hadoop 2?

          Show
          kturner Keith Turner added a comment - Josh Elser why not document this in 1.4 also, now that it has hadoop 2 support? Is this config introduced in hadoop 2?
          Hide
          kturner Keith Turner added a comment -

          Eventually, we may want to have these as table settings, or as mentioned in HBASE-5954, with each update.

          Eric Newton, should we open a ticket? As you mentioned having these for ROOT and METADATA tables sounds important.

          Show
          kturner Keith Turner added a comment - Eventually, we may want to have these as table settings, or as mentioned in HBASE-5954 , with each update. Eric Newton , should we open a ticket? As you mentioned having these for ROOT and METADATA tables sounds important.
          Hide
          kturner Keith Turner added a comment -

          Nevermind, I was confused about the nature of this DFS config. I think ACCUMULO-1905 could be reopened for config changes to 1.6.0 and another ticket opened for 1.7.0

          Show
          kturner Keith Turner added a comment - Nevermind, I was confused about the nature of this DFS config. I think ACCUMULO-1905 could be reopened for config changes to 1.6.0 and another ticket opened for 1.7.0
          Hide
          ecn Eric Newton added a comment -

          Let's keep the two issues separate:

          1. sync the WAL

          Writes to the WAL are sent out to the data nodes, and they (in 2.0) attempt to push it to persistent media. This happens for every group commit, but each group can be sync'd in parallel. By increasing tserver.mutation.queue.max we can get good performance at the cost of ~50ms of latency.

          If you really don't care about persistence, you can just turn off the WAL for your table.

          2. sync-on-close

          I'm sure there's some penalty for sync-on-close, but I'm less concerned because we close files a lot less frequently than we flush the WALog.

          As long as these are global settings, we need them set because the METADATA table (and ROOT table) need these to keep everything working in the face of an HDFS restart or power loss event. Eventually, we may want to have these as table settings, or as mentioned in HBASE-5954, with each update.

          Show
          ecn Eric Newton added a comment - Let's keep the two issues separate: 1. sync the WAL Writes to the WAL are sent out to the data nodes, and they (in 2.0) attempt to push it to persistent media. This happens for every group commit, but each group can be sync'd in parallel. By increasing tserver.mutation.queue.max we can get good performance at the cost of ~50ms of latency. If you really don't care about persistence, you can just turn off the WAL for your table. 2. sync-on-close I'm sure there's some penalty for sync-on-close, but I'm less concerned because we close files a lot less frequently than we flush the WALog. As long as these are global settings, we need them set because the METADATA table (and ROOT table) need these to keep everything working in the face of an HDFS restart or power loss event. Eventually, we may want to have these as table settings, or as mentioned in HBASE-5954 , with each update.
          Hide
          elserj Josh Elser added a comment -

          For 1.7.0 we may want to consider moving away from this fixed size buffer, I will open a ticket for that.

          Sounds good.

          I feel comfortable tweaking these defaults for 1.6.0, but I am not sure about 1.5 and 1.4. Changing the defaults in a bug fix release could break an existing system.

          Good point.

          I'll go ahead and close out this ticket then.

          Show
          elserj Josh Elser added a comment - For 1.7.0 we may want to consider moving away from this fixed size buffer, I will open a ticket for that. Sounds good. I feel comfortable tweaking these defaults for 1.6.0, but I am not sure about 1.5 and 1.4. Changing the defaults in a bug fix release could break an existing system. Good point. I'll go ahead and close out this ticket then.
          Hide
          kturner Keith Turner added a comment -

          Sadly, afaik, this isn't something we can "turn on" for users.

          Right, but we can assume they will enable it and bump the default for tserver.mutation.queue.max to 512K or 1M maybe. This will give better out of the box performance. Also we can document that users could further increase it for more performance gains in the readme (and/or we can update the docs for the tserver.mutation.queue.max property). For 1.7.0 we may want to consider moving away from this fixed size buffer, I will open a ticket for that.

          I feel comfortable tweaking these defaults for 1.6.0, but I am not sure about 1.5 and 1.4. Changing the defaults in a bug fix release could break an existing system.

          Show
          kturner Keith Turner added a comment - Sadly, afaik, this isn't something we can "turn on" for users. Right, but we can assume they will enable it and bump the default for tserver.mutation.queue.max to 512K or 1M maybe. This will give better out of the box performance. Also we can document that users could further increase it for more performance gains in the readme (and/or we can update the docs for the tserver.mutation.queue.max property). For 1.7.0 we may want to consider moving away from this fixed size buffer, I will open a ticket for that. I feel comfortable tweaking these defaults for 1.6.0, but I am not sure about 1.5 and 1.4. Changing the defaults in a bug fix release could break an existing system.
          Hide
          elserj Josh Elser added a comment -

          Oh, so you aren't against recommending to users that they turn this on, but more concerned that we revisit the default for tserver.mutation.queue.max?

          Sadly, afaik, this isn't something we can "turn on" for users. They have to enable it in hdfs-site.xml themselves. We can only complain (or kill ourselves like we do for durable-sync/append) if we don't see it enabled.

          Show
          elserj Josh Elser added a comment - Oh, so you aren't against recommending to users that they turn this on, but more concerned that we revisit the default for tserver.mutation.queue.max? Sadly, afaik, this isn't something we can "turn on" for users. They have to enable it in hdfs-site.xml themselves. We can only complain (or kill ourselves like we do for durable-sync/append) if we don't see it enabled.
          Hide
          kturner Keith Turner added a comment -

          I think the dfs sync option should be turned on and that Accumulo should be tuned to compensate some. I didn't mention tserver.mutation.queue.max that defaults to 256K, which results in frequent syncs. Making the default 1M would reduce the number of flushes 4x, setting it to 4M would decrease the frequency by 16x. At 4M 100 concurrent writers could use 400M of memory to buffer writes for the walog. We need to change how this behaves. Basically when you have a few concurrent writer you want bigger buffers per writer. With lots of concurrent writers, small buffers are ok and you make up for it w/ group commit.

          Show
          kturner Keith Turner added a comment - I think the dfs sync option should be turned on and that Accumulo should be tuned to compensate some. I didn't mention tserver.mutation.queue.max that defaults to 256K, which results in frequent syncs. Making the default 1M would reduce the number of flushes 4x, setting it to 4M would decrease the frequency by 16x. At 4M 100 concurrent writers could use 400M of memory to buffer writes for the walog. We need to change how this behaves. Basically when you have a few concurrent writer you want bigger buffers per writer. With lots of concurrent writers, small buffers are ok and you make up for it w/ group commit.
          Hide
          elserj Josh Elser added a comment -

          Yes, I would assume that it does slow down writes to some extent. I thought briefly about adding a note about this potentially being unnecessary if you're running with UPCs or redundant power supplies, but decided against it.

          Do Keith Turner or Eric Newton have an idea for the amount of penalty incurred? Looking at HBASE-5954, Lars reported a little under 50% penalty (if I'm reading that correctly, and those numbers didn't change in the end). Personally, I'd much rather warn that they should turn it on, and let someone turn it off if they're comfortable. What do you think about making a note about being able to turn it off when you have more robust hardware power failures? I don't necessarily want to say "we recommend this, but it's going to make it slower". I doesn't seem like the norm to accept data corruption for a relatively small improvement in performance, especially when we think in terms of commodity hardware.

          Thoughts?

          Show
          elserj Josh Elser added a comment - Yes, I would assume that it does slow down writes to some extent. I thought briefly about adding a note about this potentially being unnecessary if you're running with UPCs or redundant power supplies, but decided against it. Do Keith Turner or Eric Newton have an idea for the amount of penalty incurred? Looking at HBASE-5954 , Lars reported a little under 50% penalty (if I'm reading that correctly, and those numbers didn't change in the end). Personally, I'd much rather warn that they should turn it on, and let someone turn it off if they're comfortable. What do you think about making a note about being able to turn it off when you have more robust hardware power failures? I don't necessarily want to say "we recommend this, but it's going to make it slower". I doesn't seem like the norm to accept data corruption for a relatively small improvement in performance, especially when we think in terms of commodity hardware. Thoughts?
          Hide
          kturner Keith Turner added a comment -

          I think encouraging people to set dfs.datanode.synconclose may slow down writes. Eric Newton did some experiments with this. This can be mitigated by increasing tserver.mutation.queue.max, which will decrease the frequency of wal writes. Enabling dfs.datanode.synconclose adds a fixed cost to each write. I am thinking we should add something to the readme or up that default a bit. Its a per writer buffer, so it can not be too big.

          Show
          kturner Keith Turner added a comment - I think encouraging people to set dfs.datanode.synconclose may slow down writes. Eric Newton did some experiments with this. This can be mitigated by increasing tserver.mutation.queue.max, which will decrease the frequency of wal writes. Enabling dfs.datanode.synconclose adds a fixed cost to each write. I am thinking we should add something to the readme or up that default a bit. Its a per writer buffer, so it can not be too big.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0fd190b602ab88458e9932b8467a17c8e8201d09 in branch refs/heads/master from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0fd190b ]

          ACCUMULO-1946 Note in the README about dfs.datanode.synconclose.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0fd190b602ab88458e9932b8467a17c8e8201d09 in branch refs/heads/master from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0fd190b ] ACCUMULO-1946 Note in the README about dfs.datanode.synconclose.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0fd190b602ab88458e9932b8467a17c8e8201d09 in branch refs/heads/1.6.0-SNAPSHOT from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0fd190b ]

          ACCUMULO-1946 Note in the README about dfs.datanode.synconclose.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0fd190b602ab88458e9932b8467a17c8e8201d09 in branch refs/heads/1.6.0-SNAPSHOT from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0fd190b ] ACCUMULO-1946 Note in the README about dfs.datanode.synconclose.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0fd190b602ab88458e9932b8467a17c8e8201d09 in branch refs/heads/1.5.1-SNAPSHOT from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0fd190b ]

          ACCUMULO-1946 Note in the README about dfs.datanode.synconclose.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0fd190b602ab88458e9932b8467a17c8e8201d09 in branch refs/heads/1.5.1-SNAPSHOT from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0fd190b ] ACCUMULO-1946 Note in the README about dfs.datanode.synconclose.

            People

            • Assignee:
              elserj Josh Elser
              Reporter:
              elserj Josh Elser
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development