Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10732

Use consistent DDL for specifying Iceberg partitions

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.1.0
    • None

    Description

      Currently we have a DDL syntax for defining Iceberg partitions that differs from SparkSQL:
      https://iceberg.apache.org/spark-ddl/#partitioned-by
       
      E.g. Impala is using the following syntax:
       
      CREATE TABLE ice_t (i int, s string, ts timestamp, d date)

      PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)

      STORED AS ICEBERG;
      The same in Spark is:
      CREATE TABLE ice_t (i int, s string, ts timestamp, d date)

      USING ICEBERG

      PARTITIONED BY (bucket(5, i), months(ts), years(d))
       
      Impala's syntax is older but hasn't been released yet. Spark's syntax is released so it cannot be changed.
       
      Hive is also working on DDL support for Iceberg partitions, and they are favoring the released SparkSQL syntax. See HIVE-25179
       
      After dicsussing it on dev@impala we decided to use SparkSQL's syntax.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            boroknagyz Zoltán Borók-Nagy
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment