Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12090 Handling writes from HDFS to Provided storages
  3. HDFS-11828

[PROVIDED Phase 2] Refactor FsDatasetImpl to use the BlockAlias from ClientProtocol for PROVIDED blocks.

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      From HDFS-11639:

      Virajith Jalaparti
      Looking over this patch, one thing that occurred to me is if it makes sense to unify FileRegionProvider with BlockProvider? They both have very close functionality.

      I like the use of BlockProvider#resolve(). If we unify FileRegionProvider with BlockProvider, then resolve can return null if the block map is accessible from the Datanodes also. If it is accessible only from the Namenode, then a non-null value can be propagated to the Datanode.
      One of the motivations for adding the BlockAlias to the client protocol was to have the blocks map only on the Namenode. In this scenario, the ReplicaMap in FsDatasetImpl of will not have any replicas apriori. Thus, one way to ensure that the FsDatasetImpl interface continues to function as today is to create a FinalizedProvidedReplica in FsDatasetImpl#getBlockInputStream when BlockAlias is not null.

      Ewan Higgs
      With the pending refactoring of the FsDatasetImpl which won't have replicas a priori, I wonder if it makes sense for the Datanode to have a FileRegionProvider or BlockProvider at all. They are given the appropriate block ID and block alias in the readBlock or writeBlock message. Maybe I'm overlooking what's still being provided.

      Virajith Jalaparti
      I was trying to reconcile the existing design (FsDatasetImpl knows about provided blocks apriori) with the new design where FsDatasetImpl will not know about these before but just constructs them on-the-fly using the BlockAlias from readBlock or writeBlock. Using BlockProvider#resolve() allows us to have both designs exist in parallel. I was wondering if we should still retain the earlier given the latter design.

        Attachments

          Activity

            People

            • Assignee:
              ehiggs Ewan Higgs
              Reporter:
              ehiggs Ewan Higgs
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: