Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1497

Write pipeline sequence numbers should be sequential with no skips or duplicates

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20-append, 0.22.0
    • Fix Version/s: None
    • Component/s: hdfs-client
    • Labels:
      None

      Description

      In HDFS-895 we discovered that multiple hflush() calls in a row without intervening writes could cause a skip in sequence number. This doesn't seem to have any direct consequences, but we should maintain and assert the invariant that sequence numbers have no gaps or duplicates.

      1. hdfs-1497.txt
        11 kB
        Todd Lipcon
      2. hdfs-1497.txt
        11 kB
        Todd Lipcon
      3. hdfs-1497.txt
        11 kB
        Todd Lipcon
      4. hdfs-1497.txt
        6 kB
        Todd Lipcon
      5. hdfs-1497.txt
        7 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Patch adds a check for sequential seqnos, and also a unit test that shows that we were failing to maintain this property before with empty hflushes.

          Running unit test suite now.

          Show
          Todd Lipcon added a comment - Patch adds a check for sequential seqnos, and also a unit test that shows that we were failing to maintain this property before with empty hflushes. Running unit test suite now.
          Hide
          Todd Lipcon added a comment -

          Rebased patch against trunk

          Show
          Todd Lipcon added a comment - Rebased patch against trunk
          Hide
          Todd Lipcon added a comment -

          Ran unit tests, only got known failures, this should be ready for commit.

          Show
          Todd Lipcon added a comment - Ran unit tests, only got known failures, this should be ready for commit.
          Hide
          Hairong Kuang added a comment -

          I like the idea but I am still concerned about this patch. Creating a packet and then decrement the sequence number under a different condition makes me nervous.

          How about we taking the following approach?
          In the beginning of hflush, check if bytesCurBlock + buffered data size <= lastFlushOffset. If yes, return.

          This would avoid creating an unnecessary patch when there is no data to flush.

          Show
          Hairong Kuang added a comment - I like the idea but I am still concerned about this patch. Creating a packet and then decrement the sequence number under a different condition makes me nervous. How about we taking the following approach? In the beginning of hflush, check if bytesCurBlock + buffered data size <= lastFlushOffset. If yes, return. This would avoid creating an unnecessary patch when there is no data to flush.
          Hide
          Todd Lipcon added a comment -

          We can do that, but we'll have to do a common patch too. Currently there's no way to get at the FSOutputSummer buffer/count members. mumbles something about project split

          Show
          Todd Lipcon added a comment - We can do that, but we'll have to do a common patch too. Currently there's no way to get at the FSOutputSummer buffer/count members. mumbles something about project split
          Hide
          Todd Lipcon added a comment -

          What do you think about something like this, Hairong? (depends on linked common JIRA)

          Show
          Todd Lipcon added a comment - What do you think about something like this, Hairong? (depends on linked common JIRA)
          Hide
          Hairong Kuang added a comment -

          Todd, thanks for addressing my concern. The patch looks good but I hope that you can clean it a little bit. Something like this in the beginning of hflush should be good enough. I feel that you are in the mode of "assertion".

          if (bytesCurBlock + buffered data size == lastFlushedOffset) {
            // no new data
            return;
          }
          
          Show
          Hairong Kuang added a comment - Todd, thanks for addressing my concern. The patch looks good but I hope that you can clean it a little bit. Something like this in the beginning of hflush should be good enough. I feel that you are in the mode of "assertion". if (bytesCurBlock + buffered data size == lastFlushedOffset) { // no new data return ; }
          Hide
          Todd Lipcon added a comment -

          Unfortunately it's not quite that simple, since you have to consider the multi-threaded case. For example, if you have two threads call hflush() at the same time, the first will set lastFlushedOffset in its synchronized section, after sending the data into the pipeline. It then drops the lock to wait for the ack. If a second thread then calls hflush() it also needs to wait for the same ack - it can't just return, otherwise it's acting like data has been acknowledged that is still somewhere in the pipeline.

          Show
          Todd Lipcon added a comment - Unfortunately it's not quite that simple, since you have to consider the multi-threaded case. For example, if you have two threads call hflush() at the same time, the first will set lastFlushedOffset in its synchronized section, after sending the data into the pipeline. It then drops the lock to wait for the ack. If a second thread then calls hflush() it also needs to wait for the same ack - it can't just return, otherwise it's acting like data has been acknowledged that is still somewhere in the pipeline.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12465656/hdfs-1497.txt
          against trunk revision 1051669.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:

          -1 contrib tests. The patch failed contrib unit tests.

          -1 system test framework. The patch failed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/17//testReport/
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/17//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12465656/hdfs-1497.txt against trunk revision 1051669. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/17//testReport/ Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/17//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          resubmit for QA

          Show
          Todd Lipcon added a comment - resubmit for QA
          Hide
          Todd Lipcon added a comment -

          Reupload to re-trigger Hudson

          Show
          Todd Lipcon added a comment - Reupload to re-trigger Hudson
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467659/hdfs-1497.txt
          against trunk revision 1056009.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:

          -1 contrib tests. The patch failed contrib unit tests.

          -1 system test framework. The patch failed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/87//testReport/
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/87//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467659/hdfs-1497.txt against trunk revision 1056009. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/87//testReport/ Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/87//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          This patch compiles against trunk

          Show
          Todd Lipcon added a comment - This patch compiles against trunk
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12468045/hdfs-1497.txt
          against trunk revision 1057414.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/97//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/97//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/97//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468045/hdfs-1497.txt against trunk revision 1057414. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/97//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/97//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/97//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12468045/hdfs-1497.txt
          against trunk revision 1072023.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/183//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468045/hdfs-1497.txt against trunk revision 1072023. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/183//console This message is automatically generated.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development