Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: QuorumJournalManager (HDFS-3077)
    • Component/s: ha
    • Labels:
      None

      Description

      After doing a bunch of fault testing, I noticed that the JNs had a bunch of temporary files left around in their journal directories which were no longer within the retention period. For example, if a JN crashes in the middle of recovery, it can leave around a file like edits_inprogress_123.epoch=10. These files are handy to keep around for forensics/debugging while they are still in their retention period, but we should not leave them forever. The normal purging policy should apply.

      1. hdfs-3956.txt
        8 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        Attached patch fixes the issue.

        Testing:

        • I added some new files to the existing purging test
        • I fixed a bug whereby the random fault test wasn't actually purging the files before – since it was calling purgeLogsOlderThan before it called recoverUnclosedSegments, the request was just getting rejected. Now it properly purges them, and I verified the purging behavior by running watch 'find ./build/test/data/dfs/journalnode-2 | sort' during the test run.
        • I ran 5000 instances of the random fault test and it passed with no AssertionErrors

        This applies on top of HDFS-3950 and HDFS-3955

        Show
        Todd Lipcon added a comment - Attached patch fixes the issue. Testing: I added some new files to the existing purging test I fixed a bug whereby the random fault test wasn't actually purging the files before – since it was calling purgeLogsOlderThan before it called recoverUnclosedSegments , the request was just getting rejected. Now it properly purges them, and I verified the purging behavior by running watch 'find ./build/test/data/dfs/journalnode-2 | sort' during the test run. I ran 5000 instances of the random fault test and it passed with no AssertionErrors This applies on top of HDFS-3950 and HDFS-3955
        Hide
        Eli Collins added a comment -

        +1 looks great

        Show
        Eli Collins added a comment - +1 looks great
        Hide
        Todd Lipcon added a comment -

        Committed to branch, thanks Eli.

        Show
        Todd Lipcon added a comment - Committed to branch, thanks Eli.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development