Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27592

Set the bucketed data source table SerDe correctly

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      We hint Hive using incorrect InputFormat(org.apache.hadoop.mapred.SequenceFileInputFormat) to read Spark's Parquet datasource bucket table:

      spark-sql> CREATE TABLE t (c1 INT, c2 INT) USING parquet CLUSTERED BY (c1) SORTED BY (c1) INTO 2 BUCKETS;
       2019-04-29 17:52:05 WARN HiveExternalCatalog:66 - Persisting bucketed data source table `default`.`t` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
       spark-sql> DESC EXTENDED t;
       c1 int NULL
       c2 int NULL
       # Detailed Table Information
       Database default
       Table t
       Owner yumwang
       Created Time Mon Apr 29 17:52:05 CST 2019
       Last Access Thu Jan 01 08:00:00 CST 1970
       Created By Spark 2.4.0
       Type MANAGED
       Provider parquet
       Num Buckets 2
       Bucket Columns [`c1`]
       Sort Columns [`c1`]
       Table Properties [transient_lastDdlTime=1556531525]
       Location [file:/user/hive/warehouse/t|file:///user/hive/warehouse/t]
       Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
       InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat
       OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
       Storage Properties [serialization.format=1]
      

      We can see incompatible information when creating the table:

      WARN HiveExternalCatalog:66 - Persisting bucketed data source table `default`.`t` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
      

      But downstream don’t know the compatibility. I'd like to write the write information of this table to metadata so that each engine decides compatibility itself.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yumwang Yuming Wang
                Reporter:
                yumwang Yuming Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: