Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11012

Canonicalize view definitions

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • SQL
    • None

    Description

      In SPARK-10337, we added the first step of supporting view natively, which is basically wrapping the original view definition SQL text with an extra SELECT and then store the wrapped SQL text into metastore. This approach suffers at least two issues:

      1. Switching current database may break view queries
      2. HiveQL doesn't allow CTE as subquery, thus CTE can't be used in view definition

      To fix these issues, we need to canonicalize the view definition. For example, for a SQL string

      SELECT a, b FROM table
      

      we will save this text to Hive metastore as

      SELECT `table`.`a`, `table`.`b` FROM `currentDB`.`table`
      

      The core infrastructure of this work is SQL query string generation (SPARK-12593). Namely, converting resolved logical query plans back to canonicalized SQL query strings. PR #10541 set up basic infrastructure of SQL generation, but more language structures need to be supported.

      PR #10541 added round-trip testing infrastructure for SQL generation. All queries tested by test suites extending HiveComparisonTest are executed in the following order:

      1. Parsing query string to logical plan
      2. Converting resolved logical plan back to canonicalized SQL query string
      3. Executing generated SQL query string
      4. Comparing query results with golden answers

      Note that not all resolved logical query plan can be converted back to SQL query string. Either because it consists of some language structure that has not been supported yet, or it doesn't have a SQL representation inherently (e.g. query plans built on top of local Scala collections).

      If a logical plan is inconvertible, HiveComparisonTest falls back to its original behavior, namely executing the original SQL query string and compare the results with golden answers.

      SQL generation details are logged and can be found in sql/hive/target/unit-tests.log (log level should be at least DEBUG).

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yhuai Yin Huai
            yhuai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment