Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15104

Hive on Spark generate more shuffle data than hive on mr

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.1
    • 3.0.0
    • Spark
    • None

    Description

      the same sql, running on spark and mr engine, will generate different size of shuffle data.

      i think it is because of hive on mr just serialize part of HiveKey, but hive on spark which using kryo will serialize full of Hivekey object.

      what is your opionion?

      Attachments

        1. TPC-H 100G.xlsx
          30 kB
          Rui Li
        2. HIVE-15104.9.patch
          20 kB
          Rui Li
        3. HIVE-15104.8.patch
          20 kB
          Rui Li
        4. HIVE-15104.7.patch
          20 kB
          Rui Li
        5. HIVE-15104.6.patch
          20 kB
          Rui Li
        6. HIVE-15104.5.patch
          21 kB
          Rui Li
        7. HIVE-15104.4.patch
          21 kB
          Rui Li
        8. HIVE-15104.3.patch
          23 kB
          Rui Li
        9. HIVE-15104.2.patch
          8 kB
          Rui Li
        10. HIVE-15104.10.patch
          20 kB
          Rui Li
        11. HIVE-15104.1.patch
          8 kB
          Rui Li

        Issue Links

          Activity

            People

              lirui Rui Li
              wenli wangwenli
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: