Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18111

Fix temp path for Spark DPP sink

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • Spark
    • None

    Description

      Before HIVE-17877, each DPP sink has only one target work. The output path of a DPP work is TMP_PATH/targetWorkId/dppWorkId. When we do the pruning, each map work reads DPP outputs under TMP_PATH/targetWorkId.

      After HIVE-17877, each DPP sink can have multiple target works. It's possible that a map work needs to read DPP outputs from multiple TMP_PATH/targetWorkId. To solve this, I think we can have a DPP output path specific to each query, e.g. QUERY_TMP_PATH/dpp_output. Each DPP work outputs to QUERY_TMP_PATH/dpp_output/dppWorkId. And each map work reads from QUERY_TMP_PATH/dpp_output.

      Attachments

        1. HIVE-18111.5.patch
          27 kB
          Rui Li
        2. HIVE-18111.5.patch
          27 kB
          Rui Li
        3. HIVE-18111.4.patch
          26 kB
          Rui Li
        4. HIVE-18111.3.patch
          8 kB
          Rui Li
        5. HIVE-18111.2.patch
          6 kB
          Rui Li
        6. HIVE-18111.1.patch
          1 kB
          Rui Li

        Issue Links

          Activity

            People

              lirui Rui Li
              lirui Rui Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: