Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12682

Hive will fail if the schema of a parquet table has a very wide schema

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.1, 2.0.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      To reproduce it, you can create a table with many many columns. You need to make sure that all of data type strings combined exceeds 4000 chars (strings are generated by HiveMetastoreTypes.toMetastoreType). Then, save the table as parquet. Because we will try to use a hive compatible way to store the metadata, we will set the serde to parquet serde. Then, when you load the table, you will see a java.lang.IllegalArgumentException thrown from Hive's TypeInfoUtils. I believe the cause is the same as SPARK-6024. Hive's parquet does not handle wide schema well and the data type string is truncated.

      Once you hit this problem, you will not be able to drop the table because Hive fails to evaluate drop table command. To at least provide a better workaround. We should see if we should have a native drop table call to metastore and if we should add a flag to disable saving a data source table's metadata in hive compatible way.

        Attachments

          Activity

            People

            • Assignee:
              sameerag Sameer Agarwal
              Reporter:
              yhuai Yin Huai
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: