Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7871

Don't load Hive builtin jars for dataload

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 3.1.0
    • Fix Version/s: Impala 3.2.0
    • Component/s: Infrastructure
    • Labels:
      None
    • Epic Color:
      ghx-label-1

      Description

      One step in dataload is "Loading Hive Builtins", which copies a large number of jars into HDFS (or whatever storage). This step takes a couple minutes on HDFS dataload and 8 minutes on S3. Despite its name, I can't find any indication that Hive or anything else uses these jars. Dataload and core tests run fine without it. S3 can load data without it. There's no indication that this is needed.

      Unless we find something using these jars, we should remove this step.

        Attachments

          Activity

            People

            • Assignee:
              joemcdonnell Joe McDonnell
              Reporter:
              joemcdonnell Joe McDonnell
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: