Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3077 Quorum-based protocol for reading and writing edit logs
  3. HDFS-3726

QJM: if a logger misses an RPC, don't retry that logger until next segment

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • QuorumJournalManager (HDFS-3077)
    • ha
    • None

    Description

      Currently, if a logger misses an RPC in the middle of a log segment, or misses the startLogSegment RPC (eg it was down or network was disconnected during that time period), then it will throw an exception on every subsequent journal() call in that segment, since it knows that it missed some edits in the middle.

      We should change this exception to a specific IOE subclass, and have the client side of QJM detect the situation and stop sending IPCs until the next startLogSegment call.

      This isn't critical for correctness but will help reduce log spew on both sides.

      Attachments

        1. amend.txt
          2 kB
          Todd Lipcon
        2. hdfs-3726.txt
          13 kB
          Todd Lipcon
        3. hdfs-3726.txt
          15 kB
          Todd Lipcon

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: