Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14841 Replication - Phase 2
  3. HIVE-16686

repl invocations of distcp needs additional handling

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • repl
    • Hide
      This introduces parsing of additional parameters that are not directly used by hive, but are passed on to distcp when hive invokes it. We now introduce the ability to use the hive command to do "set" commands to pass along cli arguments to distcp.

      Any parameter set as "set distcp.options.blah=''" will result in an extra "-blah" argument going into distcp, as well as any parameter set as "set distcp.options.foo='bar'" will result in an extra "-foo bar" argument going to distcp.

      Currently, we always pass along "-update" and "-skipcrccheck" to distcp - that is retained as defaults if no distcp.options.* params are found. If they are found, then these options are not added by default, letting the user instead provide an excplicit list.

      Note that all of these properties affect how distcp runs when it is launched by hive, but are not directly hive settings. Instead, hive will allow setting them through the use of the "set" command.
      Show
      This introduces parsing of additional parameters that are not directly used by hive, but are passed on to distcp when hive invokes it. We now introduce the ability to use the hive command to do "set" commands to pass along cli arguments to distcp. Any parameter set as "set distcp.options.blah=''" will result in an extra "-blah" argument going into distcp, as well as any parameter set as "set distcp.options.foo='bar'" will result in an extra "-foo bar" argument going to distcp. Currently, we always pass along "-update" and "-skipcrccheck" to distcp - that is retained as defaults if no distcp.options.* params are found. If they are found, then these options are not added by default, letting the user instead provide an excplicit list. Note that all of these properties affect how distcp runs when it is launched by hive, but are not directly hive settings. Instead, hive will allow setting them through the use of the "set" command.

    Description

      When REPL LOAD invokes distcp, there needs to be a way for the user invoking REPL LOAD to pass on arguments to distcp. In addition, there is sometimes a need for distcp to be invoked from within an impersonated context, such as running as user "hdfs", asking distcp to preserve ownerships of individual files.

      Attachments

        1. HIVE-16686.1.patch
          12 kB
          Sushanth Sowmyan
        2. HIVE-16686.2.patch
          17 kB
          Sushanth Sowmyan
        3. HIVE-16686.3.patch
          17 kB
          Sushanth Sowmyan

        Issue Links

          Activity

            People

              sushanth Sushanth Sowmyan
              sushanth Sushanth Sowmyan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: