Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5363

Datanode shutdown due to too many bad volumes in CI

    XMLWordPrintableJSON

Details

    Description

      Acceptance (secure) check is frequently failing, usually at S3 tests. The root cause is that datanodes are shut down due to too many "bad" volumes.

      S3 Gateway log
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 blocks. Requested 1 blocks
      
      SCM log
      Pipeline creation failed due to no sufficient healthy datanodes. Required 3. Found 0.
      
      Datanode log
      datanode_2  | 2021-06-19 13:26:08,010 [Periodic HDDS volume checker] INFO volume.StorageVolumeChecker: Scheduled health check for volume /data/hdds/hdds
      datanode_2  | 2021-06-19 13:36:08,013 [Periodic HDDS volume checker] WARN volume.StorageVolumeChecker: checkAllVolumes timed out after 600000 ms
      datanode_2  | 2021-06-19 13:36:08,014 [Periodic HDDS volume checker] WARN volume.MutableVolumeSet: checkAllVolumes got 1 failed volumes - [/data/hdds/hdds]
      datanode_2  | 2021-06-19 13:36:08,016 [Periodic HDDS volume checker] INFO volume.MutableVolumeSet: Moving Volume : /data/hdds/hdds to failed Volumes
      datanode_2  | 2021-06-19 13:36:08,016 [Periodic HDDS volume checker] ERROR statemachine.DatanodeStateMachine: DatanodeStateMachine Shutdown due to too many bad volumes, check hdds.datanode.failed.data.volumes.tolerated and hdds.datanode.failed.metadata.volumes.tolerated
      

      Attachments

        Issue Links

          Activity

            People

              adoroszlai Attila Doroszlai
              adoroszlai Attila Doroszlai
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: