Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6465

Measure and expose DU metrics



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.0
    • None
    • Ozone Datanode
    • None


      We need metrics about du running stats like this;

      # HELP total count of du started per data directory
      du_started_count\{path="/ozone/data/storage1", node="node1.example.com"} 234
      # HELP total count of du done per data directory
      du_finished_count\{path="/ozone/data/storage1", node="node1.example.com"} 233
      # HELP du latency in total (milli)seconds
      du_latency_time \{path="/ozone/data/storage1", node="node1.example.com"} 123423e+10

      Datanodes run du command to measure observe disk usage by block files. Besides, it could be fairly heavy load to disk device due to the recursive nature of du command, especially in case block files are relatively small (e.g. the small file problem in local file systems). du itself is not that heavy load alone, but in case when it overlaps with container scan tasks, it is relatively hard to observe du is an additional load to the disk. (The default interval of container metadata scan is 3h and du interval is 1h - I already changed them in our environment).

      We can't observe du load easily, until we log in to the datanode and hit "top" or whatever, or the log level be in debug. The log level should be in INFO IMO.




            Unassigned Unassigned
            kuenishi UENISHI Kota
            0 Vote for this issue
            1 Start watching this issue