Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      There are many instances when the same piece of data resides on multiple HDFS clusters in different data centers. The primary reason being that the physical limitation of one data center is insufficient to host the entire data set. In that case, the administrator(s) typically partition that data into two (or more) HDFS clusters on two different data centers and then duplicates some subset of that data into both the HDFS clusters.

      In such a situation, there will be six physical copies of data that is duplicated, three copies in one data center and another three copies in another data center. It would be nice if we can keep fewer than 3 replicas on each of the data centers and have the ability to fix a replica in the local data center by copying data from the remote copy in the remote data center.

        Issue Links

          Activity

          Jeff Hammerbacher made changes -
          Link This issue relates to HIVE-1813 [ HIVE-1813 ]
          dhruba borthakur made changes -
          Field Original Value New Value
          Description here are many instances when the same piece of data resides on multiple HDFS clusters in different data centers. The primary reason being that the physical limitation of one data center is insufficient to host the entire data set. In that case, the administrator(s) typically partition that data into two (or more) HDFS clusters on two different data centers and then duplicates some subset of that data into both the HDFS clusters.

          In such a situation, there will be six physical copies of data that is duplicated, three copies in one data center and another three copies in another data center. It would be nice if we can keep fewer than 3 replicas on each of the data centers and have the ability to fix a replica in the local data center by copying data from the remote copy in the remote data center.
          There are many instances when the same piece of data resides on multiple HDFS clusters in different data centers. The primary reason being that the physical limitation of one data center is insufficient to host the entire data set. In that case, the administrator(s) typically partition that data into two (or more) HDFS clusters on two different data centers and then duplicates some subset of that data into both the HDFS clusters.

          In such a situation, there will be six physical copies of data that is duplicated, three copies in one data center and another three copies in another data center. It would be nice if we can keep fewer than 3 replicas on each of the data centers and have the ability to fix a replica in the local data center by copying data from the remote copy in the remote data center.
          dhruba borthakur created issue -

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              1 Vote for this issue
              Watchers:
              58 Start watching this issue

              Dates

              • Created:
                Updated:

                Development