Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30262

Avoid NumberFormatException when totalSize is empty

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.3
    • 3.0.0
    • SQL
    • None

    Description

      We could get the Partitions Statistics Info.But in some specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like   

      desc formatted partition

       ,the NumberFormatException is showed as below:

      spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23');
      19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')]
      java.lang.NumberFormatException: Zero length BigInteger
      at java.math.BigInteger.(BigInteger.java:411)
      at java.math.BigInteger.(BigInteger.java:597)
      at scala.math.BigInt$.apply(BigInt.scala:77)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
      at scala.Option.map(Option.scala:146)
      at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
      at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
      at scala.Option.map(Option.scala:146)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
      at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
      at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
      at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
      at org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
      at org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
      

      Although we can use 'Analyze table partition ' to update the totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw NumberFormatException for Empty totalSize.We should fix the empty case when readHiveStats.

      Here is the empty case:

      Attachments

        1. screenshot-1.png
          906 kB
          chenliang

        Issue Links

          Activity

            People

              southernriver chenliang
              southernriver chenliang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: