Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4828

Make QJM epoch-related errors more understandable

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.0-beta, 3.0.0-alpha1
    • Fix Version/s: None
    • Component/s: qjm
    • Labels:
      None

      Description

      Since we started running QJM on production clusters, we've found that users are very confused by some of the error messages that it produces. For example, when a failover occurs and an old NN gets fenced out, it sees errors about its epoch being out of date. We should amend these errors to add text like "This is likely because another NameNode took over as Active." Potentially we can even include the other NN's hostname, timestamp it became active, etc.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: