Hadoop Common
  1. Hadoop Common
  2. HADOOP-4045

Increment checkpoint if we see failures in rollEdits

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In FSEditLog::rollEdits, if we encounter an error during opening edits.new, we remove the store directory associated with it. At this point we should also increment checkpoint on all other directories.

      1. HADOOP-4045-3.patch
        26 kB
        Boris Shkolnik
      2. HADOOP-4045-1.patch
        24 kB
        Boris Shkolnik
      3. HADOOP-4045.patch
        24 kB
        Boris Shkolnik

        Issue Links

          Activity

          Hide
          Lohit Vijayarenu added a comment -

          It would be good to see if we miss this any other place as well.

          Show
          Lohit Vijayarenu added a comment - It would be good to see if we miss this any other place as well.
          Hide
          Konstantin Shvachko added a comment -

          I see also that incrementCheckpointTime() is not called on other occasions, like

          • EditLog.logEdit()
          • EditLog.close()
          • FSImage.rollFSImage()

          If we don't increment fsTime for remaining directories, which is done by incrementCheckpointTime(), we risk loosing data on name-node restart. I think this is critical enough to be a blocker.

          Show
          Konstantin Shvachko added a comment - I see also that incrementCheckpointTime() is not called on other occasions, like EditLog.logEdit() EditLog.close() FSImage.rollFSImage() If we don't increment fsTime for remaining directories, which is done by incrementCheckpointTime() , we risk loosing data on name-node restart. I think this is critical enough to be a blocker.
          Hide
          Konstantin Shvachko added a comment -

          fstime and its relation to processIOError() was discussed in HADOOP-1188.

          Show
          Konstantin Shvachko added a comment - fstime and its relation to processIOError() was discussed in HADOOP-1188 .
          Hide
          Konstantin Shvachko added a comment -

          This should include some cleanup. We have at least for different methods called processIOError(). It would be nice to reduce the number just to 2

          • FSImage.processIOError() and
          • FSEdits.processIOError()
          Show
          Konstantin Shvachko added a comment - This should include some cleanup. We have at least for different methods called processIOError() . It would be nice to reduce the number just to 2 FSImage.processIOError() and FSEdits.processIOError()
          Hide
          Konstantin Shvachko added a comment - - edited

          Some more requirements for processIOError()

          1. FSEdits.processIOError() should always unlock the storage being removed from service. This is necessary because if only the edits file causes the problem the directory will still be locked when we try to reuse it later as proposed in HADOOP-4885.
          2. It should always call incrementCheckpointTime(). Otherwise the abandoned directories may mistakenly be used for loading the latest image/edits.
          3. incrementCheckpointTime() should not recursively call processIOError(). But we should be able to handle failures of multiple edits directories.
          4. It should close the EditsOutputStream.
          5. All the logic with adding lost edits dirs to removedStorageDirs should be in processIOError().
          6. FSEdits.processIOError(int) should be eliminated completely.
          7. It seems to me that the most appropriate prototype for processIOError would be
            FSEdits.processIOError(Collection<EditLogOutputStream> errorStreams)
            

            Every method which works with streams should accumulate failed streams and then call processIOError(Collection) for the accumulated collection like in FSEdits.logSync().
            This might even let us have only 1 such method rather than two.

          [ edited the last item ]

          Show
          Konstantin Shvachko added a comment - - edited Some more requirements for processIOError() FSEdits.processIOError() should always unlock the storage being removed from service. This is necessary because if only the edits file causes the problem the directory will still be locked when we try to reuse it later as proposed in HADOOP-4885 . It should always call incrementCheckpointTime() . Otherwise the abandoned directories may mistakenly be used for loading the latest image/edits. incrementCheckpointTime() should not recursively call processIOError() . But we should be able to handle failures of multiple edits directories. It should close the EditsOutputStream . All the logic with adding lost edits dirs to removedStorageDirs should be in processIOError() . FSEdits.processIOError(int) should be eliminated completely. It seems to me that the most appropriate prototype for processIOError would be FSEdits.processIOError(Collection<EditLogOutputStream> errorStreams) Every method which works with streams should accumulate failed streams and then call processIOError(Collection) for the accumulated collection like in FSEdits.logSync() . This might even let us have only 1 such method rather than two. [ edited the last item ]
          Hide
          Nigel Daley added a comment -

          As discussed on core-dev@ (http://www.nabble.com/Hadoop-0.19.1-td21739202.html) we will disable append in 0.19.1. Moving these append related issues to 0.19.2.

          Show
          Nigel Daley added a comment - As discussed on core-dev@ ( http://www.nabble.com/Hadoop-0.19.1-td21739202.html ) we will disable append in 0.19.1. Moving these append related issues to 0.19.2.
          Hide
          Konstantin Shvachko added a comment -

          This is not append related, but it is ok to move it out of 0.19.1.

          Show
          Konstantin Shvachko added a comment - This is not append related, but it is ok to move it out of 0.19.1.
          Hide
          Robert Chansler added a comment -

          Demoted unless there is an argument that this is a serious regression from earlier.

          Show
          Robert Chansler added a comment - Demoted unless there is an argument that this is a serious regression from earlier.
          Hide
          Boris Shkolnik added a comment -

          Couple of words about the design.
          There are StorageDirectories(SDs) of different types: IMAGE or EDITS or both. Which means some do not have any EditLogStreams associated with them. On the other hand there are some EditLogStreams which are attached to SDs and some that are not (BACKUP node streaming). Thus we should be able to take care of IOErrors from both sides. So the processIOError could be called for an SD or for a EditLogStream (eStream). If it is called for a SD and this SD has associated eStream we need to call processIOError for the stream too and vice-versa. So I left one processIOError function in each class with an optional flag to specify if the error should be propagated to the corresponding SD or eStream.
          All functions accept arrayList as an argument.

          Show
          Boris Shkolnik added a comment - Couple of words about the design. There are StorageDirectories(SDs) of different types: IMAGE or EDITS or both. Which means some do not have any EditLogStreams associated with them. On the other hand there are some EditLogStreams which are attached to SDs and some that are not (BACKUP node streaming). Thus we should be able to take care of IOErrors from both sides. So the processIOError could be called for an SD or for a EditLogStream (eStream). If it is called for a SD and this SD has associated eStream we need to call processIOError for the stream too and vice-versa. So I left one processIOError function in each class with an optional flag to specify if the error should be propagated to the corresponding SD or eStream. All functions accept arrayList as an argument.
          Hide
          Boris Shkolnik added a comment -

          Manual testing done:
          1. Mount two directories (one for Edits and Image, one for Edits only).
          2. create some files
          3. unmount one of them and wait for checkpoint (or create a file) , verify that failed dir is removed
          4. unmount another one (optional) - more verifications
          5. mount one back - (checkpoint or new files), verify that checkpointtime is updated and files have the same size and MD5
          6. mount the other one (optional) - more verifications
          7. repeat 3 and 5
          8. check WebUI all the time.

          Show
          Boris Shkolnik added a comment - Manual testing done: 1. Mount two directories (one for Edits and Image, one for Edits only). 2. create some files 3. unmount one of them and wait for checkpoint (or create a file) , verify that failed dir is removed 4. unmount another one (optional) - more verifications 5. mount one back - (checkpoint or new files), verify that checkpointtime is updated and files have the same size and MD5 6. mount the other one (optional) - more verifications 7. repeat 3 and 5 8. check WebUI all the time.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12403739/HADOOP-4045-1.patch
          against trunk revision 758593.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/144/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/144/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/144/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/144/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12403739/HADOOP-4045-1.patch against trunk revision 758593. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/144/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/144/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/144/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/144/console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -
          1. FSImage.setCheckpointTime() variable al is not used.
          2. processIOError(ArrayList<StorageDirectory> sds) may be eliminated.
          3. I would also get rid of processIOError(ArrayList<EditLogOutputStream> errorStreams).
            The point is that it is better to have only one processIOError in each class, otherwise it can get
            as bad as it is now with all different variants of it.
            If you think it is a lot of changes, then lets at least make both of them private.
          4. Do we want to make removedStorageDirs a map in order to avoid adding the same directory
            twice into it or does it never happen?
          5. Same with Storage.storageDirs. If we search in a collection then we might want to use
            searchable collections. This may be done in a separate issue.
          6. It's somewhat confusing: FSImage.processIOError() calls editLog.processIOError() and
            then FSEditLog.processIOError() calls fsimage.processIOError(). Is it going to converge
            at some point?
          7. setCheckpointTime() ignores io errors. Just mentioning this, I don't see how to avoid it.
            Failed streams/directories will be remove next time flushAndSync() called.
          Show
          Konstantin Shvachko added a comment - FSImage.setCheckpointTime() variable al is not used. processIOError(ArrayList<StorageDirectory> sds) may be eliminated. I would also get rid of processIOError(ArrayList<EditLogOutputStream> errorStreams) . The point is that it is better to have only one processIOError in each class, otherwise it can get as bad as it is now with all different variants of it. If you think it is a lot of changes, then lets at least make both of them private. Do we want to make removedStorageDirs a map in order to avoid adding the same directory twice into it or does it never happen? Same with Storage.storageDirs . If we search in a collection then we might want to use searchable collections. This may be done in a separate issue. It's somewhat confusing: FSImage.processIOError() calls editLog.processIOError() and then FSEditLog.processIOError() calls fsimage.processIOError() . Is it going to converge at some point? setCheckpointTime() ignores io errors. Just mentioning this, I don't see how to avoid it. Failed streams/directories will be remove next time flushAndSync() called.
          Hide
          Boris Shkolnik added a comment -

          1. FSImage.setCheckpointTime() variable al is not used.

          fixed

          2. processIOError(ArrayList<StorageDirectory> sds) may be eliminated.

          This will force using two-argument version of the function everywhere, in most cases with "true" value for the second argument.

          3. I would also get rid of processIOError(ArrayList<EditLogOutputStream> errorStreams). The point is that it is better to have only one processIOError in each class, otherwise it can get as bad as it is now with all different variants of it. If you think it is a lot of changes, then lets at least make both of them private.

          see 2.

          4. Do we want to make removedStorageDirs a map in order to avoid adding the same directory twice into it or does it never happen?

          good idea. will need a separate JIRA for it

          5. Same with Storage.storageDirs. If we search in a collection then we might want to use searchable collections. This may be done in a separate issue.

          same as 4.

          6. It's somewhat confusing: FSImage.processIOError() calls editLog.processIOError() and then FSEditLog.processIOError() calls fsimage.processIOError(). Is it going to converge at some point?

          it should. every time processIOError calles its counterpart in the other class it passes false as second (propagate) argument to make sure it will not call the original function.

          7. setCheckpointTime() ignores io errors. Just mentioning this, I don't see how to avoid it. Failed streams/directories will be remove next time flushAndSync() called.

          Yes, it should be cought elsewhere.

          Show
          Boris Shkolnik added a comment - 1. FSImage.setCheckpointTime() variable al is not used. fixed 2. processIOError(ArrayList<StorageDirectory> sds) may be eliminated. This will force using two-argument version of the function everywhere, in most cases with "true" value for the second argument. 3. I would also get rid of processIOError(ArrayList<EditLogOutputStream> errorStreams). The point is that it is better to have only one processIOError in each class, otherwise it can get as bad as it is now with all different variants of it. If you think it is a lot of changes, then lets at least make both of them private. see 2. 4. Do we want to make removedStorageDirs a map in order to avoid adding the same directory twice into it or does it never happen? good idea. will need a separate JIRA for it 5. Same with Storage.storageDirs. If we search in a collection then we might want to use searchable collections. This may be done in a separate issue. same as 4. 6. It's somewhat confusing: FSImage.processIOError() calls editLog.processIOError() and then FSEditLog.processIOError() calls fsimage.processIOError(). Is it going to converge at some point? it should. every time processIOError calles its counterpart in the other class it passes false as second (propagate) argument to make sure it will not call the original function. 7. setCheckpointTime() ignores io errors. Just mentioning this, I don't see how to avoid it. Failed streams/directories will be remove next time flushAndSync() called. Yes, it should be cought elsewhere.
          Hide
          Boris Shkolnik added a comment -

          implemented Konstantin's comments

          Show
          Boris Shkolnik added a comment - implemented Konstantin's comments
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12404404/HADOOP-4045-3.patch
          against trunk revision 761082.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/97/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/97/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/97/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/97/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12404404/HADOOP-4045-3.patch against trunk revision 761082. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/97/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/97/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/97/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/97/console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          I just committed this.
          Thank you Boris.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thank you Boris.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #796 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/796/)
          . Fix processing of IO errors in EditsLog. Contributed by Boris Shkolnik.

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #796 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/796/ ) . Fix processing of IO errors in EditsLog. Contributed by Boris Shkolnik.

            People

            • Assignee:
              Boris Shkolnik
              Reporter:
              Lohit Vijayarenu
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development