Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6292

Display HDFS per user and per group usage on the webUI

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      It would be nice to show HDFS usage per user and per group on a web ui.

      1. HDFS-6292.png
        99 kB
        Ravi Prakash
      2. HDFS-6292.patch
        356 kB
        Ravi Prakash
      3. HDFS-6292.01.patch
        363 kB
        Ravi Prakash

        Activity

        Hide
        Ravi Prakash added a comment -

        Since we probably don't want to place the burden of calculating this on an already busy NN, we can do this on the fsimage on the SNN. Ofcourse this has the downside of being refreshed only when a checkpoint happens.

        Show
        Ravi Prakash added a comment - Since we probably don't want to place the burden of calculating this on an already busy NN, we can do this on the fsimage on the SNN. Ofcourse this has the downside of being refreshed only when a checkpoint happens.
        Hide
        Vinayakumar B added a comment -

        Good one Ravi.

        I think calculating in Secondary NN side is OK. But I have a feeling like, just to get these statistics user needs to navigate to SNN page is not a good idea.
        How about keeping track of these in NameNode side from the starting itself and update these statistics (same as other metrics.) for every operation which modifies these and avoid re-calculation of whole statistics in between to avoid holding namesystem lock for more time.

        Show
        Vinayakumar B added a comment - Good one Ravi. I think calculating in Secondary NN side is OK. But I have a feeling like, just to get these statistics user needs to navigate to SNN page is not a good idea. How about keeping track of these in NameNode side from the starting itself and update these statistics (same as other metrics.) for every operation which modifies these and avoid re-calculation of whole statistics in between to avoid holding namesystem lock for more time.
        Hide
        Ravi Prakash added a comment -

        Hi Vinayakumar!
        Thanks for your feedback! I considered that option, and I wondered what the overhead might be (during startup + every modifying op). I guess we won't really know unless we have a working prototype/solution.
        I did this as a side hack, so I can try to continue hacking on this at a very slow pace, or if you / someone wants to take it over and get it done sooner, please feel free to assign it to yourself.

        Show
        Ravi Prakash added a comment - Hi Vinayakumar! Thanks for your feedback! I considered that option, and I wondered what the overhead might be (during startup + every modifying op). I guess we won't really know unless we have a working prototype/solution. I did this as a side hack, so I can try to continue hacking on this at a very slow pace, or if you / someone wants to take it over and get it done sooner, please feel free to assign it to yourself.
        Hide
        Ravi Prakash added a comment -

        Ok! Here's the skeleton code that has come out of my attempt to add this functionality to the NameNode. DISCLAIMER: This patch is not ready and I'm uploading it only so that you folks can see what I'm thinking so far.

        I would request feedback on the following (and whatever else you think of):
        1. Should HdfsUsageMetricsSource be thread safe? Should I just assume the FSN write lock is always held when calling into here?
        2. I understand that we need to plug into a LOT of places to correctly update the stats. I have only plugged into 2-3 places (so obviously the usage will be incorrect if you venture out of those ops: create / delete / chown files+dirs and even these have wrinkles I need to smooth) . I propose we do this all as another sub-task after the framework gets committed.
        3. I still need to figure out how best to let this be configurable for any of the HDFS daemons: NameNode/Standby/SecondaryNamenode
        4. Enable and disable this feature dynamically.

        Show
        Ravi Prakash added a comment - Ok! Here's the skeleton code that has come out of my attempt to add this functionality to the NameNode. DISCLAIMER: This patch is not ready and I'm uploading it only so that you folks can see what I'm thinking so far. I would request feedback on the following (and whatever else you think of): 1. Should HdfsUsageMetricsSource be thread safe? Should I just assume the FSN write lock is always held when calling into here? 2. I understand that we need to plug into a LOT of places to correctly update the stats. I have only plugged into 2-3 places (so obviously the usage will be incorrect if you venture out of those ops: create / delete / chown files+dirs and even these have wrinkles I need to smooth) . I propose we do this all as another sub-task after the framework gets committed. 3. I still need to figure out how best to let this be configurable for any of the HDFS daemons: NameNode/Standby/SecondaryNamenode 4. Enable and disable this feature dynamically.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > I think calculating in Secondary NN side is OK. But I have a feeling like, just to get these statistics user needs to navigate to SNN page is not a good idea.

        How about adding a link to SNN in the namenode web page?

        Show
        Tsz Wo Nicholas Sze added a comment - > I think calculating in Secondary NN side is OK. But I have a feeling like, just to get these statistics user needs to navigate to SNN page is not a good idea. How about adding a link to SNN in the namenode web page?

          People

          • Assignee:
            Ravi Prakash
            Reporter:
            Ravi Prakash
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:

              Development