Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24187

Handle _files creation for HA config with same nameservice name on source and destination

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      Current HA is supported only for different nameservices on Source and Destination. We need to add support of same nameservice on Source and Destination.
      Local nameservice will be passed correctly to the repl command.
      Remote nameservice will be a random name and corresponding configs for the same.

      Example:
      Clusters originally configured with ns for hdfs:
      src: ns1
      target : ns1

      We can denote remote name with some random name, say for example: nsRemote. This is how the command will see the ns w.r.t source and target:

      Repl Dump : src: ns1, target: nsRemote
      Repl Load: src: nsRemote, target: ns1

      Entries in the _files(for managed table data loc) will be made with nsRemote in stead of ns1(for src).
      Example: hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot

      Same way list of external table data locations will also be modified using nsRemote in stead of ns1(for src).

      New configs can control the behavior:
      hive.repl.ha.datapath.replace.remote.nameservice = <boolean>
      hive.repl.ha.datapath.replace.remote.nameservice.name = <string>

      Based on the above configs replacement of nameservice can be done.

      This will also require that 'hive.repl.rootdir' is passed accordingly during dump and load:
      Repl dump:

      Repl Operation Repl Command
      Staging on source cluster
      Repl Dump repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')
      Repl Load repl load dbName into dbName with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')
      Staging on target cluster
      Repl Dump repl dump dbName with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')
      Repl Load repl load dbName into dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')

      Attachments

        1. HIVE-24187.01.patch
          24 kB
          Pravin Sinha
        2. HIVE-24187.02.patch
          24 kB
          Pravin Sinha

        Issue Links

          Activity

            People

              pkumarsinha Pravin Sinha
              pkumarsinha Pravin Sinha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h