Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-2139

Fast copy for HDFS.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      There is a need to perform fast file copy on HDFS. The fast copy mechanism for a file works as
      follows :

      1) Query metadata for all blocks of the source file.

      2) For each block 'b' of the file, find out its datanode locations.

      3) For each block of the file, add an empty block to the namesystem for
      the destination file.

      4) For each location of the block, instruct the datanode to make a local
      copy of that block.

      5) Once each datanode has copied over its respective blocks, they
      report to the namenode about it.

      6) Wait for all blocks to be copied and exit.

      This would speed up the copying process considerably by removing top of
      the rack data transfers.

      Note : An extra improvement, would be to instruct the datanode to create a
      hardlink of the block file if we are copying a block on the same datanode

      xuzq_zanderProvided a design doc https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing

      Attachments

        1. image-2022-08-11-11-48-17-994.png
          187 kB
          ZanderXu
        2. HDFS-2139-For-2.7.1.patch
          96 kB
          Liu Junhong
        3. HDFS-2139.patch
          55 kB
          yunjiong zhao
        4. HDFS-2139.patch
          50 kB
          yunjiong zhao

        Issue Links

          Activity

            People

              rituraj Rituraj
              blackpearl Pritam Damania
              Votes:
              4 Vote for this issue
              Watchers:
              70 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 168h
                  168h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified