Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20330

HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • HCatalog
    • None

    Description

      While running performance tests on Pig (0.12 and 0.17) we've observed a huge performance drop in a workload that has multiple inputs from HCatLoader.

      The reason is that for a particular MR job with multiple Hive tables as input, Pig calls setLocation on each LoaderFunc (HCatLoader) instance but only one table's information (InputJobInfo instance) gets tracked in the JobConf. (This is under config key HCatConstants.HCAT_KEY_JOB_INFO).

      Any such call overwrites preexisting values, and thus only the last table's information will be considered when Pig calls getStatistics to calculate and estimate required reducer count.

      In cases when there are 2 input tables, 256GB and 1MB in size respectively, Pig will query the size information from HCat for both of them, but it will either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the execution plan's DAG.
      It should of course see 256.00097GB in total and use 257 reducers by default accordingly.

      In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle with the actual 256.00097GB...

      Attachments

        1. HIVE-20330.0.patch
          23 kB
          Ádám Szita
        2. HIVE-20330.1.patch
          23 kB
          Ádám Szita
        3. HIVE-20330.2.patch
          23 kB
          Ádám Szita
        4. HIVE-20330.3.patch
          24 kB
          Ádám Szita
        5. HIVE-20330.4.patch
          24 kB
          Ádám Szita
        6. HIVE-20330.5.patch
          24 kB
          Ádám Szita
        7. HIVE-20330.6.patch
          24 kB
          Ádám Szita

        Issue Links

          Activity

            People

              szita Ádám Szita
              szita Ádám Szita
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: