Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10285

Storage Policy Satisfier in HDFS

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: HDFS-10285
    • Fix Version/s: HDFS-10285, 3.2.0
    • Component/s: datanode, namenode
    • Labels:
      None
    • Release Note:
      Hide
      StoragePolicySatisfier(SPS) allows users to track and satisfy the storage policy requirement of a given file/directory in HDFS. User can specify a file/directory path by invoking “hdfs storagepolicies -satisfyStoragePolicy -path <path>” command or via HdfsAdmin#satisfyStoragePolicy(path) API. For the blocks which has storage policy mismatches, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Since API calls goes to NN for tracking the invoked satisfier path(iNodes), administrator need to enable dfs.storage.policy.satisfier.mode’ config at NN to allow these operations. It can be enabled by setting ‘dfs.storage.policy.satisfier.mode’ to ‘external’ in hdfs-site.xml. The configs can be disabled dynamically without restarting Namenode. SPS should be started outside Namenode using "hdfs --daemon start sps". If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. See the "Storage Policy Satisfier (SPS)" section in the Archival Storage guide for detailed usage.
      Show
      StoragePolicySatisfier(SPS) allows users to track and satisfy the storage policy requirement of a given file/directory in HDFS. User can specify a file/directory path by invoking “hdfs storagepolicies -satisfyStoragePolicy -path <path>” command or via HdfsAdmin#satisfyStoragePolicy(path) API. For the blocks which has storage policy mismatches, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Since API calls goes to NN for tracking the invoked satisfier path(iNodes), administrator need to enable dfs.storage.policy.satisfier.mode’ config at NN to allow these operations. It can be enabled by setting ‘dfs.storage.policy.satisfier.mode’ to ‘external’ in hdfs-site.xml. The configs can be disabled dynamically without restarting Namenode. SPS should be started outside Namenode using "hdfs --daemon start sps". If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. See the "Storage Policy Satisfier (SPS)" section in the Archival Storage guide for detailed usage.

      Description

      Heterogeneous storage in HDFS introduced the concept of storage policy. These policies can be set on directory/file to specify the user preference, where to store the physical block. When user set the storage policy before writing data, then the blocks could take advantage of storage policy preferences and stores physical block accordingly.

      If user set the storage policy after writing and completing the file, then the blocks would have been written with default storage policy (nothing but DISK). User has to run the ‘Mover tool’ explicitly by specifying all such file names as a list. In some distributed system scenarios (ex: HBase) it would be difficult to collect all the files and run the tool as different nodes can write files separately and file can have different paths.

      Another scenarios is, when user rename the files from one effected storage policy file (inherited policy from parent directory) to another storage policy effected directory, it will not copy inherited storage policy from source. So it will take effect from destination file/dir parent storage policy. This rename operation is just a metadata change in Namenode. The physical blocks still remain with source storage policy.

      So, Tracking all such business logic based file names could be difficult for admins from distributed nodes(ex: region servers) and running the Mover tool.

      Here the proposal is to provide an API from Namenode itself for trigger the storage policy satisfaction. A Daemon thread inside Namenode should track such calls and process to DN as movement commands.

      Will post the detailed design thoughts document soon.

        Attachments

        1. HDFS-10285-consolidated-merge-patch-00.patch
          351 kB
          Rakesh Radhakrishnan
        2. HDFS-10285-consolidated-merge-patch-01.patch
          351 kB
          Rakesh Radhakrishnan
        3. HDFS-10285-consolidated-merge-patch-02.patch
          453 kB
          Rakesh Radhakrishnan
        4. HDFS-10285-consolidated-merge-patch-03.patch
          458 kB
          Rakesh Radhakrishnan
        5. HDFS-10285-consolidated-merge-patch-04.patch
          540 kB
          Rakesh Radhakrishnan
        6. HDFS-10285-consolidated-merge-patch-05.patch
          584 kB
          Rakesh Radhakrishnan
        7. HDFS-SPS-TestReport-20170708.pdf
          55 kB
          Surendra Singh Lilhore
        8. HDFS SPS Test Report-31July2018-v1.pdf
          516 kB
          Surendra Singh Lilhore
        9. SPS Modularization.pdf
          182 kB
          Uma Maheswara Rao G
        10. Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf
          960 kB
          Rakesh Radhakrishnan
        11. Storage-Policy-Satisfier-in-HDFS-May10.pdf
          276 kB
          Uma Maheswara Rao G
        12. Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
          1.51 MB
          Rakesh Radhakrishnan

          Issue Links

          1.
          [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work Sub-task Resolved Rakesh Radhakrishnan
          2.
          [SPS]: Daemon thread in Namenode to find blocks placed in other storage than what the policy specifies Sub-task Resolved Uma Maheswara Rao G
          3.
          [SPS]: Protocol buffer changes for sending storage movement commands from NN to DN Sub-task Resolved Rakesh Radhakrishnan
          4.
          [SPS]: Add satisfyStoragePolicy API in HdfsAdmin Sub-task Resolved Yuanbo Liu
          5.
          [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN Sub-task Resolved Rakesh Radhakrishnan
          6.
          [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier is on Sub-task Resolved Wei Zhou
          7.
          [SPS]: Provide mechanism to send blocks movement result back to NN from coordinator DN Sub-task Resolved Rakesh Radhakrishnan
          8.
          [SPS]:Provide retry mechanism for the blocks which were failed while moving its storage at DNs Sub-task Resolved Uma Maheswara Rao G
          9.
          [SPS]: Handling of block movement failure at the coordinator datanode Sub-task Resolved Rakesh Radhakrishnan
          10.
          [SPS]: Provide unique trackID to track the block movement sends to coordinator Sub-task Resolved Rakesh Radhakrishnan
          11.
          [SPS] Make storage policy satisfier daemon work on/off dynamically Sub-task Resolved Uma Maheswara Rao G
          12.
          [SPS]: StoragePolicySatisfier should gracefully handle when there is no target node with the required storage type Sub-task Resolved Rakesh Radhakrishnan
          13.
          [SPS]: Handle partial block location movements Sub-task Resolved Rakesh Radhakrishnan
          14.
          [SPS]: Erasure coded files should be considered for satisfying storage policy Sub-task Resolved Rakesh Radhakrishnan
          15.
          [SPS]: Make SPS movement monitor timeouts configurable Sub-task Resolved Uma Maheswara Rao G
          16.
          [SPS]: Local DN should be given preference as source node, when target available in same node Sub-task Resolved Uma Maheswara Rao G
          17.
          [SPS]: Provide persistence when satisfying storage policy. Sub-task Resolved Yuanbo Liu
          18.
          [SPS]: Daemon thread of SPS should start only in Active NN Sub-task Resolved Wei Zhou
          19.
          [SPS]: chooseTargetTypeInSameNode should pass accurate block size to chooseStorage4Block while choosing target Sub-task Resolved Uma Maheswara Rao G
          20.
          [SPS]: Add a protocol command from NN to DN for dropping the SPS work and queues Sub-task Resolved Uma Maheswara Rao G
          21.
          [SPS]: Check Mover file ID lease also to determine whether Mover is running Sub-task Resolved Wei Zhou
          22.
          [SPS]: Remove xAttrs when movements done or SPS disabled Sub-task Resolved Yuanbo Liu
          23.
          [SPS]: Fix timeout issue in unit tests caused by longger NN down time Sub-task Resolved Rakesh Radhakrishnan
          24.
          [SPS]: NN switch and rescheduling movements can lead to have more than one coordinator for same file blocks Sub-task Resolved Rakesh Radhakrishnan
          25.
          [SPS]: SPS should clean Xattrs when no blocks required to satisfy for a file Sub-task Resolved Uma Maheswara Rao G
          26.
          [SPS]: fix issue of moving blocks with satisfier while changing replication factor Sub-task Resolved Yuanbo Liu
          27.
          [SPS]: Namenode failed to start while loading SPS xAttrs from the edits log. Sub-task Resolved Surendra Singh Lilhore
          28.
          [SPS] : Empty files should be ignored in StoragePolicySatisfier. Sub-task Resolved Surendra Singh Lilhore
          29.
          [SPS] : Handle NPE in BlockStorageMovementTracker when dropSPSWork() called Sub-task Resolved Surendra Singh Lilhore
          30.
          [SPS] : StoragePolicySatisfier should not select same storage type as source and destination in same datanode. Sub-task Resolved Surendra Singh Lilhore
          31.
          [SPS] Correct the log in BlockStorageMovementAttemptedItems#blockStorageMovementResultCheck Sub-task Resolved Surendra Singh Lilhore
          32.
          [SPS]: Add CLI command for satisfy storage policy operations Sub-task Resolved Surendra Singh Lilhore
          33.
          [SPS]: Should give chance to satisfy the low redundant blocks before removing the xattr Sub-task Resolved Surendra Singh Lilhore
          34.
          [SPS]: Double checks to ensure that SPS/Mover are not running together Sub-task Resolved Rakesh Radhakrishnan
          35.
          [SPS]: Document the SPS feature Sub-task Resolved Uma Maheswara Rao G
          36.
          [SPS] : Fix TestStoragePolicySatisfierWithStripedFile#testSPSWhenFileHasLowRedundancyBlocks Sub-task Resolved Surendra Singh Lilhore
          37.
          [SPS]: Fix checkstyle warnings Sub-task Resolved Rakesh Radhakrishnan
          38.
          [SPS]: Re-arrange StoragePolicySatisfyWorker stopping sequence to improve thread cleanup time Sub-task Resolved Rakesh Radhakrishnan
          39.
          [SPS]: Fix review comments of StoragePolicySatisfier feature Sub-task Resolved Rakesh Radhakrishnan
          40.
          [SPS]: Optimize extended attributes for tracking SPS movements Sub-task Resolved Surendra Singh Lilhore
          41.
          [SPS]: Provide a mechanism to recursively iterate and satisfy storage policy of all the files under the given dir Sub-task Resolved Surendra Singh Lilhore
          42.
          [SPS] : Block movement analysis should be done in read lock. Sub-task Resolved Surendra Singh Lilhore
          43.
          [SPS]: Refactor Co-ordinator datanode logic to track the block storage movements Sub-task Resolved Rakesh Radhakrishnan
          44.
          [SPS]: Provide an option to track the status of in progress requests Sub-task Resolved Surendra Singh Lilhore
          45.
          [SPS]: Improve storage policy satisfier configurations Sub-task Resolved Surendra Singh Lilhore
          46.
          [SPS]: Rebasing HDFS-10285 branch after HDFS-10467, HDFS-12599 and HDFS-11968 commits Sub-task Resolved Rakesh Radhakrishnan
          47.
          [SPS]: Modularize the SPS code and expose necessary interfaces for external/internal implementations. Sub-task Resolved Uma Maheswara Rao G
          48.
          [SPS]: Move SPS classes to a separate package Sub-task Resolved Rakesh Radhakrishnan
          49.
          [SPS]: Reduce the locking and cleanup the Namesystem access Sub-task Resolved Rakesh Radhakrishnan
          50.
          [SPS]: Implement a mechanism to scan the files for external SPS Sub-task Resolved Uma Maheswara Rao G
          51.
          [SPS]: Implement a mechanism to do file block movements for external SPS Sub-task Resolved Rakesh Radhakrishnan
          52.
          [SPS] : Create start/stop script to start external SPS process. Sub-task Resolved Surendra Singh Lilhore
          53.
          [SPS]: Revisit configurations to make SPS service modes internal/external/none Sub-task Resolved Rakesh Radhakrishnan
          54.
          [SPS]: Provide External Context implementation. Sub-task Resolved Uma Maheswara Rao G
          55.
          [SPS]: Fix review comments of external storage policy satisfier Sub-task Resolved Rakesh Radhakrishnan
          56.
          [SPS]: Fix the branch review comments(Part1) Sub-task Resolved Surendra Singh Lilhore
          57.
          [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier Sub-task Resolved Rakesh Radhakrishnan
          58.
          [SPS]: Collects successfully moved block details via IBR Sub-task Resolved Rakesh Radhakrishnan
          59.
          [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls Sub-task Resolved Rakesh Radhakrishnan
          60.
          [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path Sub-task Resolved Rakesh Radhakrishnan
          61.
          [SPS]: Cleanup work for HDFS-10285 Sub-task Resolved Rakesh Radhakrishnan
          62.
          [SPS]: Fix the branch review comments Sub-task Resolved Rakesh Radhakrishnan
          63.
          [SPS] : Merge work for HDFS-10285 branch Sub-task Resolved Rakesh Radhakrishnan
          64.
          [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function Sub-task Resolved Rakesh Radhakrishnan

            Activity

              People

              • Assignee:
                umamaheswararao Uma Maheswara Rao G
                Reporter:
                umamaheswararao Uma Maheswara Rao G
              • Votes:
                0 Vote for this issue
                Watchers:
                60 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: