Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3025

Automatic log sync shouldn't happen inside logEdit path

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.3
    • Fix Version/s: None
    • Component/s: namenode, performance
    • Labels:
      None

      Description

      HDFS-3020 fixes the "automatic log sync" functionality so that, when logEdits is called without log sync, it eventually triggers a sync. That sync ends up being inline, though, which means the FSN lock is usually held during it. This causes a bunch of threads to pile up.

      Instead, we should have it just set a "syncNeeded" flag and trigger a sync from another thread which isn't holding the lock (or from the same thread using a "logSyncIfNeeded" call).

      (credit to the FB branch for this idea)

      1. hdfs-3025.txt
        2 kB
        Todd Lipcon
      2. hdfs-3025.txt
        3 kB
        Todd Lipcon
      3. hdfs-3025.txt
        4 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        Strange that this is failing on Hudson when it passes locally for me. I want to look at this again with a fresh mind next week.

        Show
        Todd Lipcon added a comment - Strange that this is failing on Hudson when it passes locally for me. I want to look at this again with a fresh mind next week.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12516628/hdfs-3025.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hdfs.server.namenode.TestEditLog

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1932//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1932//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1932//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516628/hdfs-3025.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.namenode.TestEditLog +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1932//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1932//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1932//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Address findbugs issue

        Show
        Todd Lipcon added a comment - Address findbugs issue
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12516620/hdfs-3025.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1931//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1931//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1931//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516620/hdfs-3025.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1931//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1931//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1931//console This message is automatically generated.
        Hide
        Eli Collins added a comment -

        +1 looks good

        Show
        Eli Collins added a comment - +1 looks good
        Hide
        Todd Lipcon added a comment -

        Slightly modified patch. The issue in the previous patch was the following possible interleaving:

        • thread A calls logEdit, pushing it over the size threshold for an automatic sync
        • thread A exits the synchronized section in logEdit and context-switches out
        • thread B calls logSync(), which sets the isAutoSyncScheduled flag false
        • thread A continues into the code which schedules the automatic sync, and fails the assert.

        I've tested this new patch using testMultiThreadedEditLog with the number of edits bumped up by a factor of 10. I'll also throw it on the cluster some time in the next few days, but I think it should be OK to commit in the meantime.

        Show
        Todd Lipcon added a comment - Slightly modified patch. The issue in the previous patch was the following possible interleaving: thread A calls logEdit, pushing it over the size threshold for an automatic sync thread A exits the synchronized section in logEdit and context-switches out thread B calls logSync(), which sets the isAutoSyncScheduled flag false thread A continues into the code which schedules the automatic sync, and fails the assert. I've tested this new patch using testMultiThreadedEditLog with the number of edits bumped up by a factor of 10. I'll also throw it on the cluster some time in the next few days, but I think it should be OK to commit in the meantime.
        Hide
        Todd Lipcon added a comment -

        Looks like the test failure is legit. I'll look into this and post an updated patch.

        Show
        Todd Lipcon added a comment - Looks like the test failure is legit. I'll look into this and post an updated patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12516468/hdfs-3025.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hdfs.server.namenode.TestEditLog

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1924//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1924//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516468/hdfs-3025.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.namenode.TestEditLog +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1924//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1924//console This message is automatically generated.
        Hide
        Eli Collins added a comment -

        +1 pending jenkins

        Show
        Eli Collins added a comment - +1 pending jenkins
        Hide
        Todd Lipcon added a comment -

        Fairly simple patch.

        I tested this out on a 100-node cluster running the HA branch with HA off – a workload which causes a lot of non-synced log edits. I could see the logsyncs happening on the correct thread using jstack, and verified that performance was improved, with fewer threads blocked.

        Show
        Todd Lipcon added a comment - Fairly simple patch. I tested this out on a 100-node cluster running the HA branch with HA off – a workload which causes a lot of non-synced log edits. I could see the logsyncs happening on the correct thread using jstack, and verified that performance was improved, with fewer threads blocked.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development