Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • Spark
    • None

    Description

      The SparkPlan class does some logging to show the mapping between different SparkTran, what shuffle types are used, and what trans are cached. However, there is room for improvement.

      When debug logging is enabled the RDD graph is logged, but there isn't much information printed about each RDD.

      We should combine both of the graphs and improve them. We could even make the Spark Plan graph part of the EXPLAIN EXTENDED output.

      Ideally, the final graph shows a clear relationship between Tran objects, RDDs, and BaseWorks. Edge should include information about number of partitions, shuffle types, Spark operations used, etc.

      Attachments

        1. Completed Stages.png
          72 kB
          Sahil Takiar
        2. HIVE-18368.1.patch
          20 kB
          Sahil Takiar
        3. HIVE-18368.2.patch
          20 kB
          Sahil Takiar
        4. HIVE-18368.3.patch
          34 kB
          Sahil Takiar
        5. HIVE-18368.4.patch
          24 kB
          Sahil Takiar
        6. Job Ids.png
          41 kB
          Sahil Takiar
        7. Stage DAG 1.png
          41 kB
          Sahil Takiar
        8. Stage DAG 2.png
          25 kB
          Sahil Takiar

        Issue Links

          Activity

            People

              stakiar Sahil Takiar
              stakiar Sahil Takiar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: