Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10285

Storage Policy Satisfier in HDFS

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: HDFS-10285
    • Fix Version/s: HDFS-10285, 3.2.0
    • Component/s: datanode, namenode
    • Labels:
      None
    • Release Note:
      Hide
      StoragePolicySatisfier(SPS) allows users to track and satisfy the storage policy requirement of a given file/directory in HDFS. User can specify a file/directory path by invoking “hdfs storagepolicies -satisfyStoragePolicy -path <path>” command or via HdfsAdmin#satisfyStoragePolicy(path) API. For the blocks which has storage policy mismatches, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Since API calls goes to NN for tracking the invoked satisfier path(iNodes), administrator need to enable dfs.storage.policy.satisfier.mode’ config at NN to allow these operations. It can be enabled by setting ‘dfs.storage.policy.satisfier.mode’ to ‘external’ in hdfs-site.xml. The configs can be disabled dynamically without restarting Namenode. SPS should be started outside Namenode using "hdfs --daemon start sps". If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. See the "Storage Policy Satisfier (SPS)" section in the Archival Storage guide for detailed usage.
      Show
      StoragePolicySatisfier(SPS) allows users to track and satisfy the storage policy requirement of a given file/directory in HDFS. User can specify a file/directory path by invoking “hdfs storagepolicies -satisfyStoragePolicy -path <path>” command or via HdfsAdmin#satisfyStoragePolicy(path) API. For the blocks which has storage policy mismatches, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Since API calls goes to NN for tracking the invoked satisfier path(iNodes), administrator need to enable dfs.storage.policy.satisfier.mode’ config at NN to allow these operations. It can be enabled by setting ‘dfs.storage.policy.satisfier.mode’ to ‘external’ in hdfs-site.xml. The configs can be disabled dynamically without restarting Namenode. SPS should be started outside Namenode using "hdfs --daemon start sps". If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. See the "Storage Policy Satisfier (SPS)" section in the Archival Storage guide for detailed usage.

      Description

      Heterogeneous storage in HDFS introduced the concept of storage policy. These policies can be set on directory/file to specify the user preference, where to store the physical block. When user set the storage policy before writing data, then the blocks could take advantage of storage policy preferences and stores physical block accordingly.

      If user set the storage policy after writing and completing the file, then the blocks would have been written with default storage policy (nothing but DISK). User has to run the ‘Mover tool’ explicitly by specifying all such file names as a list. In some distributed system scenarios (ex: HBase) it would be difficult to collect all the files and run the tool as different nodes can write files separately and file can have different paths.

      Another scenarios is, when user rename the files from one effected storage policy file (inherited policy from parent directory) to another storage policy effected directory, it will not copy inherited storage policy from source. So it will take effect from destination file/dir parent storage policy. This rename operation is just a metadata change in Namenode. The physical blocks still remain with source storage policy.

      So, Tracking all such business logic based file names could be difficult for admins from distributed nodes(ex: region servers) and running the Mover tool.

      Here the proposal is to provide an API from Namenode itself for trigger the storage policy satisfaction. A Daemon thread inside Namenode should track such calls and process to DN as movement commands.

      Will post the detailed design thoughts document soon.

        Attachments

        1. HDFS SPS Test Report-31July2018-v1.pdf
          516 kB
          Surendra Singh Lilhore
        2. HDFS-10285-consolidated-merge-patch-05.patch
          584 kB
          Rakesh R
        3. HDFS-10285-consolidated-merge-patch-04.patch
          540 kB
          Rakesh R
        4. SPS Modularization.pdf
          182 kB
          Uma Maheswara Rao G
        5. HDFS-10285-consolidated-merge-patch-03.patch
          458 kB
          Rakesh R
        6. HDFS-10285-consolidated-merge-patch-02.patch
          453 kB
          Rakesh R
        7. Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
          1.51 MB
          Rakesh R
        8. HDFS-10285-consolidated-merge-patch-01.patch
          351 kB
          Rakesh R
        9. HDFS-10285-consolidated-merge-patch-00.patch
          351 kB
          Rakesh R
        10. HDFS-SPS-TestReport-20170708.pdf
          55 kB
          Surendra Singh Lilhore
        11. Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf
          960 kB
          Rakesh R
        12. Storage-Policy-Satisfier-in-HDFS-May10.pdf
          276 kB
          Uma Maheswara Rao G

          Issue Links

          There are no Sub-Tasks for this issue.

            Activity

              People

              • Assignee:
                umamaheswararao Uma Maheswara Rao G
                Reporter:
                umamaheswararao Uma Maheswara Rao G
              • Votes:
                0 Vote for this issue
                Watchers:
                60 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: