Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18108

Partition discovery fails with explicitly written long partitions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.1
    • 2.1.1, 2.2.0
    • SQL
    • None

    Description

      We have parquet data written from Spark1.6 that, when read from 2.0.1, produces errors.

      case class A(a: Long, b: Int)
      val as = Seq(A(1,2))
      //partition explicitly written
      spark.createDataFrame(as).write.parquet("/data/a=1/")
      spark.read.parquet("/data/").collect
      

      The above code fails; stack trace attached.

      If an integer used, explicit partition discovery succeeds.

      case class A(a: Int, b: Int)
      val as = Seq(A(1,2))
      //partition explicitly written
      spark.createDataFrame(as).write.parquet("/data/a=1/")
      spark.read.parquet("/data/").collect
      

      The action succeeds. Additionally, if 'partitionBy' is used instead of explicit writes, partition discovery succeeds.

      Question: Is the first example a reasonable use case? PartitioningUtils seems to default to Integer types unless the partition value exceeds the integer type's length.

      Attachments

        1. stacktrace.out
          9 kB
          Richard Moorhead

        Activity

          People

            maropu Takeshi Yamamuro
            richard.moorhead Richard Moorhead
            Votes:
            4 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: