Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15104

Hive on Spark generate more shuffle data than hive on mr

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.1
    • Fix Version/s: 3.0.0
    • Component/s: Spark
    • Labels:
      None

      Description

      the same sql, running on spark and mr engine, will generate different size of shuffle data.

      i think it is because of hive on mr just serialize part of HiveKey, but hive on spark which using kryo will serialize full of Hivekey object.

      what is your opionion?

        Attachments

        1. HIVE-15104.1.patch
          8 kB
          Rui Li
        2. HIVE-15104.10.patch
          20 kB
          Rui Li
        3. HIVE-15104.2.patch
          8 kB
          Rui Li
        4. HIVE-15104.3.patch
          23 kB
          Rui Li
        5. HIVE-15104.4.patch
          21 kB
          Rui Li
        6. HIVE-15104.5.patch
          21 kB
          Rui Li
        7. HIVE-15104.6.patch
          20 kB
          Rui Li
        8. HIVE-15104.7.patch
          20 kB
          Rui Li
        9. HIVE-15104.8.patch
          20 kB
          Rui Li
        10. HIVE-15104.9.patch
          20 kB
          Rui Li
        11. TPC-H 100G.xlsx
          30 kB
          Rui Li

          Issue Links

            Activity

              People

              • Assignee:
                lirui Rui Li
                Reporter:
                wenli wangwenli
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: