Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10214

Checkpoint Can not be done by StandbyNameNode.Because checkpoint may cause DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc timeout when the object num > 100000000.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.5.0, 2.6.4
    • Fix Version/s: None
    • Component/s: ha, namenode
    • Labels:
      None
    • Environment:

      500 DataNode.

      137407265 files and directories, 129614074 blocks = 267021339 total filesystem object(s)

      Description

      The current Cluster status :
      137407265 files and directories, 129614074 blocks = 267021339 total filesystem object(s).

      The checkpoint save namespace cost more than 5 min.

      DataNode rpc timeout.

      Standby NameNode skip the DataNode rpc request(because datanode rpc timeout , datanode close the socket channel).

      There are many corrupt files when failover.

      So, Checkpoint may be done by other component, not Standby NameNode.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                chenfolin ChenFolin
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 672h
                  672h
                  Remaining:
                  Remaining Estimate - 672h
                  672h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified