Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27592

Set the bucketed data source table SerDe correctly

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      We hint Hive using incorrect InputFormat(org.apache.hadoop.mapred.SequenceFileInputFormat) to read Spark's Parquet datasource bucket table:

      spark-sql> CREATE TABLE t (c1 INT, c2 INT) USING parquet CLUSTERED BY (c1) SORTED BY (c1) INTO 2 BUCKETS;
       2019-04-29 17:52:05 WARN HiveExternalCatalog:66 - Persisting bucketed data source table `default`.`t` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
       spark-sql> DESC EXTENDED t;
       c1 int NULL
       c2 int NULL
       # Detailed Table Information
       Database default
       Table t
       Owner yumwang
       Created Time Mon Apr 29 17:52:05 CST 2019
       Last Access Thu Jan 01 08:00:00 CST 1970
       Created By Spark 2.4.0
       Type MANAGED
       Provider parquet
       Num Buckets 2
       Bucket Columns [`c1`]
       Sort Columns [`c1`]
       Table Properties [transient_lastDdlTime=1556531525]
       Location [file:/user/hive/warehouse/t|file:///user/hive/warehouse/t]
       Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
       InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat
       OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
       Storage Properties [serialization.format=1]
      

      We can see incompatible information when creating the table:

      WARN HiveExternalCatalog:66 - Persisting bucketed data source table `default`.`t` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
      

      But downstream don’t know the compatibility. I'd like to write the write information of this table to metadata so that each engine decides compatibility itself.

      Attachments

        Issue Links

          Activity

            People

              yumwang Yuming Wang
              yumwang Yuming Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: