Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-8844

Choose a persisent policy for RDD caching [Spark Branch]

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark
    • None

    Description

      RDD caching is used for performance reasons in some multi-insert queries. Currently, we call RDD.cache(), which indicates a persistency policy of using memory only. We should choose a better policy. I think memory+disk will be good enough. Refer to RDD.persist() for more information.

      Attachments

        1. HIVE-8844.3-spark.patch
          2 kB
          Jimmy Xiang
        2. HIVE-8844.2-spark.patch
          7 kB
          Jimmy Xiang
        3. HIVE-8844.1-spark.patch
          7 kB
          Jimmy Xiang

        Activity

          People

            jxiang Jimmy Xiang
            xuefuz Xuefu Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: