Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30262

Avoid NumberFormatException when totalSize is empty

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.3
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      We could get the Partitions Statistics Info.But in some specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like   

      desc formatted partition

       ,the NumberFormatException is showed as below:

      spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23');
      19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')]
      java.lang.NumberFormatException: Zero length BigInteger
      at java.math.BigInteger.(BigInteger.java:411)
      at java.math.BigInteger.(BigInteger.java:597)
      at scala.math.BigInt$.apply(BigInt.scala:77)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
      at scala.Option.map(Option.scala:146)
      at org.apache.spark.sql.hive.client.HiveClientImpl$.org$apache$spark$sql$hive$client$HiveClientImpl$$readHiveStats(HiveClientImpl.scala:1056)
      at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHivePartition(HiveClientImpl.scala:1048)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1$$anonfun$apply$16.apply(HiveClientImpl.scala:659)
      at scala.Option.map(Option.scala:146)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:659)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionOption$1.apply(HiveClientImpl.scala:656)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:281)
      at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:219)
      at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:218)
      at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:264)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:656)
      at org.apache.spark.sql.hive.client.HiveClient$class.getPartitionOption(HiveClient.scala:194)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionOption(HiveClientImpl.scala:84)
      at org.apache.spark.sql.hive.client.HiveClient$class.getPartition(HiveClient.scala:174)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getPartition(HiveClientImpl.scala:84)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1125)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getPartition$1.apply(HiveExternalCatalog.scala:1124)
      

      Although we can use 'Analyze table partition ' to update the totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw NumberFormatException for Empty totalSize.We should fix the empty case when readHiveStats.

      Here is the empty case:

        Attachments

        1. screenshot-1.png
          906 kB
          chenliang

          Issue Links

            Activity

              People

              • Assignee:
                southernriver chenliang
                Reporter:
                southernriver chenliang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: