Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3077 Quorum-based protocol for reading and writing edit logs
  3. HDFS-3894

QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: QuorumJournalManager (HDFS-3077)
    • Component/s: test
    • Labels:
      None

      Description

      TestQJMWithFaults.testRecoverAfterDoubleFailures fails really occasionally. Looking into it, the issue seems to be that it's possible by random chance for an IPC server port to be reused between two different iterations of the test loop. The client will then pick up and re-use the existing IPC connection to the old server. However, the old server was shut down and restarted, so the old IPC connection is stale (ie disconnected). This causes the new client to get an EOF when it sends the "format()" call.

        Attachments

        1. hdfs-3894.txt
          3 kB
          Todd Lipcon

          Activity

            People

            • Assignee:
              tlipcon Todd Lipcon
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: