Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20487

`HiveTableScan` node is quite verbose in explained plan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • SQL
    • None

    Description

      For hive tables, `explain()` prints a lot of information. This makes it hard to read the plan (esp. for large sql strings with numerous tables).

      eg.

      scala> hc.sql(" SELECT * FROM my_table WHERE name = 'foo' ").explain(true)
      == Parsed Logical Plan ==
      'Project [*]
      +- 'Filter ('name = foo)
         +- 'UnresolvedRelation `my_table`
      
      == Analyzed Logical Plan ==
      user_id: bigint, name: string, ds: string
      Project [user_id#13L, name#14, ds#15]
      +- Filter (name#14 = foo)
         +- SubqueryAlias my_table
            +- CatalogRelation CatalogTable(
      Database: default
      Table: my_table
      Owner: tejasp
      Created: Fri Apr 14 17:05:50 PDT 2017
      Last Access: Wed Dec 31 16:00:00 PST 1969
      Type: MANAGED
      Provider: hive
      Properties: [serialization.format=1]
      Statistics: 9223372036854775807 bytes
      Location: file:/tmp/warehouse/my_table
      Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      InputFormat: org.apache.hadoop.mapred.TextInputFormat
      OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      Partition Provider: Catalog
      Partition Columns: [`ds`]
      Schema: root
      -- user_id: long (nullable = true)
      -- name: string (nullable = true)
      -- ds: string (nullable = true)
      ), [user_id#13L, name#14], [ds#15]
      
      == Optimized Logical Plan ==
      Filter (isnotnull(name#14) && (name#14 = foo))
      +- CatalogRelation CatalogTable(
      Database: default
      Table: my_table
      Owner: tejasp
      Created: Fri Apr 14 17:05:50 PDT 2017
      Last Access: Wed Dec 31 16:00:00 PST 1969
      Type: MANAGED
      Provider: hive
      Properties: [serialization.format=1]
      Statistics: 9223372036854775807 bytes
      Location: file:/tmp/warehouse/my_table
      Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      InputFormat: org.apache.hadoop.mapred.TextInputFormat
      OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      Partition Provider: Catalog
      Partition Columns: [`ds`]
      Schema: root
      -- user_id: long (nullable = true)
      -- name: string (nullable = true)
      -- ds: string (nullable = true)
      ), [user_id#13L, name#14], [ds#15]
      
      == Physical Plan ==
      *Filter (isnotnull(name#14) && (name#14 = foo))
      +- HiveTableScan [user_id#13L, name#14, ds#15], CatalogRelation CatalogTable(
      Database: default
      Table: my_table
      Owner: tejasp
      Created: Fri Apr 14 17:05:50 PDT 2017
      Last Access: Wed Dec 31 16:00:00 PST 1969
      Type: MANAGED
      Provider: hive
      Properties: [serialization.format=1]
      Statistics: 9223372036854775807 bytes
      Location: file:/tmp/warehouse/my_table
      Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      InputFormat: org.apache.hadoop.mapred.TextInputFormat
      OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      Partition Provider: Catalog
      Partition Columns: [`ds`]
      Schema: root
      -- user_id: long (nullable = true)
      -- name: string (nullable = true)
      -- ds: string (nullable = true)
      ), [user_id#13L, name#14], [ds#15]
      

      Attachments

        Activity

          People

            tejasp Tejas Patil
            tejasp Tejas Patil
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: