Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18108

Partition discovery fails with explicitly written long partitions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.1.1, 2.2.0
    • Component/s: SQL
    • Labels:
      None

      Description

      We have parquet data written from Spark1.6 that, when read from 2.0.1, produces errors.

      case class A(a: Long, b: Int)
      val as = Seq(A(1,2))
      //partition explicitly written
      spark.createDataFrame(as).write.parquet("/data/a=1/")
      spark.read.parquet("/data/").collect
      

      The above code fails; stack trace attached.

      If an integer used, explicit partition discovery succeeds.

      case class A(a: Int, b: Int)
      val as = Seq(A(1,2))
      //partition explicitly written
      spark.createDataFrame(as).write.parquet("/data/a=1/")
      spark.read.parquet("/data/").collect
      

      The action succeeds. Additionally, if 'partitionBy' is used instead of explicit writes, partition discovery succeeds.

      Question: Is the first example a reasonable use case? PartitioningUtils seems to default to Integer types unless the partition value exceeds the integer type's length.

        Attachments

        1. stacktrace.out
          9 kB
          Richard Moorhead

          Activity

            People

            • Assignee:
              maropu Takeshi Yamamuro
              Reporter:
              richard.moorhead Richard Moorhead
            • Votes:
              4 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: