Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30709

Spark 2.3 to Spark 2.4 Upgrade. Problems reading HIVE partitioned tables.

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.4.0
    • None
    • SQL
    • PRE- Production

    Description

      Hello

      We recently updated our preproduction environment from Spark 2.3 to Spark 2.4.0

      Along time we have created a big amount of tables in Hive Metastore, partitioned by 2 fields one of them String and the other one BigInt.

      We were reading this tables with Spark 2.3 with no problem, but after upgrading to Spark 2.4 we get the following log every time we run our SW:

      <log>

      log_filterBIGINT.out:

       Caused by: MetaException(message:Filtering is supported only on partition keys of type string) Caused by: MetaException(message:Filtering is supported only on partition keys of type string) Caused by: MetaException(message:Filtering is supported only on partition keys of type string)

       

      hadoop-cmf-hive-HIVEMETASTORE-isblcsmsttc0001.scisb.isban.corp.log.out.1:

       

      2020-01-10 09:36:05,781 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-138]: MetaException(message:Filtering is supported only on partition keys of type string)

      2020-01-10 11:19:19,208 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-187]: MetaException(message:Filtering is supported only on partition keys of type string)

      2020-01-10 11:19:54,780 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-167]: MetaException(message:Filtering is supported only on partition keys of type string)

       </log>

       

      We know the best practice from Spark point of view is to use 'STRING' type for partition columns, but we need to explore a solution we'll be able to deploy with ease, due to the big amount of tables created with a bigiint type column partition.

       

      As a first solution we tried to set the  spark.sql.hive.manageFilesourcePartitions parameter to false in the Spark Submmit, but after reruning the SW the error stood still.

       

      Is there anyone in the community who experienced the same problem? What was the solution for it? 

       

      Kind Regards and thanks in advance.

      Attachments

        Activity

          People

            Unassigned Unassigned
            cbragaor Carlos Mario
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: