Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-8017

Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark

    Description

      HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use HiveKey.hashCode for more complicated ones, e.g. join, bucketed table, etc.

      Attachments

        1. HIVE-8017.5-spark.patch
          76 kB
          Rui Li
        2. HIVE-8017.4-spark.patch
          75 kB
          Rui Li
        3. HIVE-8017.3-spark.patch
          97 kB
          Rui Li
        4. HIVE-8017.2-spark.patch
          76 kB
          Rui Li
        5. HIVE-8017-spark.patch
          31 kB
          Rui Li

        Issue Links

          Activity

            People

              lirui Rui Li
              lirui Rui Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: