Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18856

Newly created catalog table assumed to have 0 rows and 0 bytes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 2.1.0
    • SQL
    • None

    Description

      scala> spark.range(100).selectExpr("id % 10 p", "id").write.partitionBy("p").format("json").saveAsTable("testjson")
      
      scala> spark.table("testjson").queryExecution.optimizedPlan.statistics
      res6: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(sizeInBytes=0, isBroadcastable=false)
      

      It shouldn't be 0. The issue is that in DataSource.scala, we do:

              val fileCatalog = if (sparkSession.sqlContext.conf.manageFilesourcePartitions &&
                  catalogTable.isDefined && catalogTable.get.tracksPartitionsInCatalog) {
                new CatalogFileIndex(
                  sparkSession,
                  catalogTable.get,
                  catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(0L))
              } else {
                new InMemoryFileIndex(sparkSession, globbedPaths, options, Some(partitionSchema))
              }
      

      We shouldn't use 0L as the fallback.

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: