Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2353

Review our analsys of the HDFS architecture and its refactoring opportunities

    Details

    • Type: Task Task
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.21.0
    • Fix Version/s: None
    • Component/s: documentation
    • Labels:
    • Environment:

      N/A.

      Description

      We are currently capturing (reverse engineering and documenting) a representation of the module structure of the Hadoop Distributed File System (HDFS), version 0.21. We are doing this by analyzing the HDFS source code, specifically the dependencies in the code. By looking for modularity violations using this representation, we are able to suggest a number of improvements to the modularity of HDFS.

      We would like input from you, contributers of HDFS, on the correctness of the refactoring possibilities that we have identified, and on the feasibility of actually making the proposed changes. In return, you are getting documentation of the module structure of the HDFS, and a possibly useful list of changes that could be made to improve the maintainability of the source code.

        Activity

        Hide
        Todd Lipcon added a comment -

        You might look at recent issues assigned to Tsz Wo (Nicholas) Sze - he did a lot of work recently. Some other ones are HDFS-2197, HDFS-2225, HDFS-2180, some of the subtasks of HDFS-1073, etc.

        Show
        Todd Lipcon added a comment - You might look at recent issues assigned to Tsz Wo (Nicholas) Sze - he did a lot of work recently. Some other ones are HDFS-2197 , HDFS-2225 , HDFS-2180 , some of the subtasks of HDFS-1073 , etc.
        Hide
        Florian Uunk added a comment -

        Thanks for your reply, we will look into creating a renewed version for the current trunk. Could you (or other contributors) point us to some issues that contain the refactoring that you speak of? We would like to compare them to those that we propose in our document to see if there is any correlation.

        Thanks in advance!

        Show
        Florian Uunk added a comment - Thanks for your reply, we will look into creating a renewed version for the current trunk. Could you (or other contributors) point us to some issues that contain the refactoring that you speak of? We would like to compare them to those that we propose in our document to see if there is any correlation. Thanks in advance!
        Hide
        Todd Lipcon added a comment -

        Might be more relevant to run this on trunk, where a bunch of refactoring has already gone on since 0.21. Additionally, looking at packages is less relevant IMO than looking at particular classes (eg FSNamesystem, FSImage, NNStorage, NameNode, FSEditLog has been a source of messy dependencies in the past)

        My guess is that no one really has time to do "refactoring for the sake of refactoring", but the results would at least be an interesting read.

        Show
        Todd Lipcon added a comment - Might be more relevant to run this on trunk, where a bunch of refactoring has already gone on since 0.21. Additionally, looking at packages is less relevant IMO than looking at particular classes (eg FSNamesystem, FSImage, NNStorage, NameNode, FSEditLog has been a source of messy dependencies in the past) My guess is that no one really has time to do "refactoring for the sake of refactoring", but the results would at least be an interesting read.
        Hide
        Florian Uunk added a comment -

        Our reconstruction with suggested refactoring opportunities

        Show
        Florian Uunk added a comment - Our reconstruction with suggested refactoring opportunities

          People

          • Assignee:
            Unassigned
            Reporter:
            Florian Uunk
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development