Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10285

Storage Policy Satisfier in HDFS

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • HDFS-10285
    • HDFS-10285, 3.2.0
    • datanode, namenode
    • None
    • Hide
      StoragePolicySatisfier(SPS) allows users to track and satisfy the storage policy requirement of a given file/directory in HDFS. User can specify a file/directory path by invoking “hdfs storagepolicies -satisfyStoragePolicy -path <path>” command or via HdfsAdmin#satisfyStoragePolicy(path) API. For the blocks which has storage policy mismatches, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Since API calls goes to NN for tracking the invoked satisfier path(iNodes), administrator need to enable dfs.storage.policy.satisfier.mode’ config at NN to allow these operations. It can be enabled by setting ‘dfs.storage.policy.satisfier.mode’ to ‘external’ in hdfs-site.xml. The configs can be disabled dynamically without restarting Namenode. SPS should be started outside Namenode using "hdfs --daemon start sps". If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. See the "Storage Policy Satisfier (SPS)" section in the Archival Storage guide for detailed usage.
      Show
      StoragePolicySatisfier(SPS) allows users to track and satisfy the storage policy requirement of a given file/directory in HDFS. User can specify a file/directory path by invoking “hdfs storagepolicies -satisfyStoragePolicy -path <path>” command or via HdfsAdmin#satisfyStoragePolicy(path) API. For the blocks which has storage policy mismatches, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Since API calls goes to NN for tracking the invoked satisfier path(iNodes), administrator need to enable dfs.storage.policy.satisfier.mode’ config at NN to allow these operations. It can be enabled by setting ‘dfs.storage.policy.satisfier.mode’ to ‘external’ in hdfs-site.xml. The configs can be disabled dynamically without restarting Namenode. SPS should be started outside Namenode using "hdfs --daemon start sps". If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. See the "Storage Policy Satisfier (SPS)" section in the Archival Storage guide for detailed usage.

    Description

      Heterogeneous storage in HDFS introduced the concept of storage policy. These policies can be set on directory/file to specify the user preference, where to store the physical block. When user set the storage policy before writing data, then the blocks could take advantage of storage policy preferences and stores physical block accordingly.

      If user set the storage policy after writing and completing the file, then the blocks would have been written with default storage policy (nothing but DISK). User has to run the ‘Mover tool’ explicitly by specifying all such file names as a list. In some distributed system scenarios (ex: HBase) it would be difficult to collect all the files and run the tool as different nodes can write files separately and file can have different paths.

      Another scenarios is, when user rename the files from one effected storage policy file (inherited policy from parent directory) to another storage policy effected directory, it will not copy inherited storage policy from source. So it will take effect from destination file/dir parent storage policy. This rename operation is just a metadata change in Namenode. The physical blocks still remain with source storage policy.

      So, Tracking all such business logic based file names could be difficult for admins from distributed nodes(ex: region servers) and running the Mover tool.

      Here the proposal is to provide an API from Namenode itself for trigger the storage policy satisfaction. A Daemon thread inside Namenode should track such calls and process to DN as movement commands.

      Will post the detailed design thoughts document soon.

      Attachments

        1. Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
          1.51 MB
          Rakesh Radhakrishnan
        2. Storage-Policy-Satisfier-in-HDFS-May10.pdf
          276 kB
          Uma Maheswara Rao G
        3. Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf
          960 kB
          Rakesh Radhakrishnan
        4. SPS Modularization.pdf
          182 kB
          Uma Maheswara Rao G
        5. HDFS SPS Test Report-31July2018-v1.pdf
          516 kB
          Surendra Singh Lilhore
        6. HDFS-SPS-TestReport-20170708.pdf
          55 kB
          Surendra Singh Lilhore
        7. HDFS-10285-consolidated-merge-patch-05.patch
          584 kB
          Rakesh Radhakrishnan
        8. HDFS-10285-consolidated-merge-patch-04.patch
          540 kB
          Rakesh Radhakrishnan
        9. HDFS-10285-consolidated-merge-patch-03.patch
          458 kB
          Rakesh Radhakrishnan
        10. HDFS-10285-consolidated-merge-patch-02.patch
          453 kB
          Rakesh Radhakrishnan
        11. HDFS-10285-consolidated-merge-patch-01.patch
          351 kB
          Rakesh Radhakrishnan
        12. HDFS-10285-consolidated-merge-patch-00.patch
          351 kB
          Rakesh Radhakrishnan

        Issue Links

          1.
          [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work Sub-task Resolved Rakesh Radhakrishnan  
          2.
          [SPS]: Daemon thread in Namenode to find blocks placed in other storage than what the policy specifies Sub-task Resolved Uma Maheswara Rao G  
          3.
          [SPS]: Protocol buffer changes for sending storage movement commands from NN to DN Sub-task Resolved Rakesh Radhakrishnan  
          4.
          [SPS]: Add satisfyStoragePolicy API in HdfsAdmin Sub-task Resolved Yuanbo Liu  
          5.
          [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN Sub-task Resolved Rakesh Radhakrishnan  
          6.
          [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier is on Sub-task Resolved Wei Zhou  
          7.
          [SPS]: Provide mechanism to send blocks movement result back to NN from coordinator DN Sub-task Resolved Rakesh Radhakrishnan  
          8.
          [SPS]:Provide retry mechanism for the blocks which were failed while moving its storage at DNs Sub-task Resolved Uma Maheswara Rao G  
          9.
          [SPS]: Handling of block movement failure at the coordinator datanode Sub-task Resolved Rakesh Radhakrishnan  
          10.
          [SPS]: Provide unique trackID to track the block movement sends to coordinator Sub-task Resolved Rakesh Radhakrishnan  
          11.
          [SPS] Make storage policy satisfier daemon work on/off dynamically Sub-task Resolved Uma Maheswara Rao G  
          12.
          [SPS]: StoragePolicySatisfier should gracefully handle when there is no target node with the required storage type Sub-task Resolved Rakesh Radhakrishnan  
          13.
          [SPS]: Handle partial block location movements Sub-task Resolved Rakesh Radhakrishnan  
          14.
          [SPS]: Erasure coded files should be considered for satisfying storage policy Sub-task Resolved Rakesh Radhakrishnan  
          15.
          [SPS]: Make SPS movement monitor timeouts configurable Sub-task Resolved Uma Maheswara Rao G  
          16.
          [SPS]: Local DN should be given preference as source node, when target available in same node Sub-task Resolved Uma Maheswara Rao G  
          17.
          [SPS]: Provide persistence when satisfying storage policy. Sub-task Resolved Yuanbo Liu  
          18.
          [SPS]: Daemon thread of SPS should start only in Active NN Sub-task Resolved Wei Zhou  
          19.
          [SPS]: chooseTargetTypeInSameNode should pass accurate block size to chooseStorage4Block while choosing target Sub-task Resolved Uma Maheswara Rao G  
          20.
          [SPS]: Add a protocol command from NN to DN for dropping the SPS work and queues Sub-task Resolved Uma Maheswara Rao G  
          21.
          [SPS]: Check Mover file ID lease also to determine whether Mover is running Sub-task Resolved Wei Zhou  
          22.
          [SPS]: Remove xAttrs when movements done or SPS disabled Sub-task Resolved Yuanbo Liu  
          23.
          [SPS]: Fix timeout issue in unit tests caused by longger NN down time Sub-task Resolved Rakesh Radhakrishnan  
          24.
          [SPS]: NN switch and rescheduling movements can lead to have more than one coordinator for same file blocks Sub-task Resolved Rakesh Radhakrishnan  
          25.
          [SPS]: SPS should clean Xattrs when no blocks required to satisfy for a file Sub-task Resolved Uma Maheswara Rao G  
          26.
          [SPS]: fix issue of moving blocks with satisfier while changing replication factor Sub-task Resolved Yuanbo Liu  
          27.
          [SPS]: Namenode failed to start while loading SPS xAttrs from the edits log. Sub-task Resolved Surendra Singh Lilhore  
          28.
          [SPS] : Empty files should be ignored in StoragePolicySatisfier. Sub-task Resolved Surendra Singh Lilhore  
          29.
          [SPS] : Handle NPE in BlockStorageMovementTracker when dropSPSWork() called Sub-task Resolved Surendra Singh Lilhore  
          30.
          [SPS] : StoragePolicySatisfier should not select same storage type as source and destination in same datanode. Sub-task Resolved Surendra Singh Lilhore  
          31.
          [SPS] Correct the log in BlockStorageMovementAttemptedItems#blockStorageMovementResultCheck Sub-task Resolved Surendra Singh Lilhore  
          32.
          [SPS]: Add CLI command for satisfy storage policy operations Sub-task Resolved Surendra Singh Lilhore  
          33.
          [SPS]: Should give chance to satisfy the low redundant blocks before removing the xattr Sub-task Resolved Surendra Singh Lilhore  
          34.
          [SPS]: Double checks to ensure that SPS/Mover are not running together Sub-task Resolved Rakesh Radhakrishnan  
          35.
          [SPS]: Document the SPS feature Sub-task Resolved Uma Maheswara Rao G  
          36.
          [SPS] : Fix TestStoragePolicySatisfierWithStripedFile#testSPSWhenFileHasLowRedundancyBlocks Sub-task Resolved Surendra Singh Lilhore  
          37.
          [SPS]: Fix checkstyle warnings Sub-task Resolved Rakesh Radhakrishnan  
          38.
          [SPS]: Re-arrange StoragePolicySatisfyWorker stopping sequence to improve thread cleanup time Sub-task Resolved Rakesh Radhakrishnan  
          39.
          [SPS]: Fix review comments of StoragePolicySatisfier feature Sub-task Resolved Rakesh Radhakrishnan  
          40.
          [SPS]: Optimize extended attributes for tracking SPS movements Sub-task Resolved Surendra Singh Lilhore  
          41.
          [SPS]: Provide a mechanism to recursively iterate and satisfy storage policy of all the files under the given dir Sub-task Resolved Surendra Singh Lilhore  
          42.
          [SPS] : Block movement analysis should be done in read lock. Sub-task Resolved Surendra Singh Lilhore  
          43.
          [SPS]: Refactor Co-ordinator datanode logic to track the block storage movements Sub-task Resolved Rakesh Radhakrishnan  
          44.
          [SPS]: Provide an option to track the status of in progress requests Sub-task Resolved Surendra Singh Lilhore  
          45.
          [SPS]: Improve storage policy satisfier configurations Sub-task Resolved Surendra Singh Lilhore  
          46.
          [SPS]: Rebasing HDFS-10285 branch after HDFS-10467, HDFS-12599 and HDFS-11968 commits Sub-task Resolved Rakesh Radhakrishnan  
          47.
          [SPS]: Modularize the SPS code and expose necessary interfaces for external/internal implementations. Sub-task Resolved Uma Maheswara Rao G  
          48.
          [SPS]: Move SPS classes to a separate package Sub-task Resolved Rakesh Radhakrishnan  
          49.
          [SPS]: Reduce the locking and cleanup the Namesystem access Sub-task Resolved Rakesh Radhakrishnan  
          50.
          [SPS]: Implement a mechanism to scan the files for external SPS Sub-task Resolved Uma Maheswara Rao G  
          51.
          [SPS]: Implement a mechanism to do file block movements for external SPS Sub-task Resolved Rakesh Radhakrishnan  
          52.
          [SPS] : Create start/stop script to start external SPS process. Sub-task Resolved Surendra Singh Lilhore  
          53.
          [SPS]: Revisit configurations to make SPS service modes internal/external/none Sub-task Resolved Rakesh Radhakrishnan  
          54.
          [SPS]: Provide External Context implementation. Sub-task Resolved Uma Maheswara Rao G  
          55.
          [SPS]: Fix review comments of external storage policy satisfier Sub-task Resolved Rakesh Radhakrishnan  
          56.
          [SPS]: Fix the branch review comments(Part1) Sub-task Resolved Surendra Singh Lilhore  
          57.
          [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier Sub-task Resolved Rakesh Radhakrishnan  
          58.
          [SPS]: Collects successfully moved block details via IBR Sub-task Resolved Rakesh Radhakrishnan  
          59.
          [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls Sub-task Resolved Rakesh Radhakrishnan  
          60.
          [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path Sub-task Resolved Rakesh Radhakrishnan  
          61.
          [SPS]: Cleanup work for HDFS-10285 Sub-task Resolved Rakesh Radhakrishnan  
          62.
          [SPS]: Fix the branch review comments Sub-task Resolved Rakesh Radhakrishnan  
          63.
          [SPS] : Merge work for HDFS-10285 branch Sub-task Resolved Rakesh Radhakrishnan  
          64.
          [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function Sub-task Resolved Rakesh Radhakrishnan  
          65.
          [SPS]: Fix bug for unit test of reconfiguring SPS mode Sub-task Resolved Tao Li

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          66.
          [SPS]: Handle failure retries for moving tasks Sub-task Resolved Tao Li

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          67.
          [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS Sub-task Resolved Tao Li

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 7h 50m
          68.
          [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread Sub-task Resolved qinyuren

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 10m
          69.
          [SPS]: allow re-satisfy path after restarting sps process Sub-task Open Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          70.
          [SPS]: Expose metrics to JMX for external SPS Sub-task Resolved Tao Li

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h
          71.
          [SPS]: Should not start indefinitely while another SPS process is running Sub-task Resolved Tao Li

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 20m

          Activity

            People

              umamaheswararao Uma Maheswara Rao G
              umamaheswararao Uma Maheswara Rao G
              Votes:
              0 Vote for this issue
              Watchers:
              63 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 23h 50m
                  23h 50m