Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18108

Partition discovery fails with explicitly written long partitions

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.1
    • 2.1.1, 2.2.0
    • SQL
    • None

    Description

      We have parquet data written from Spark1.6 that, when read from 2.0.1, produces errors.

      case class A(a: Long, b: Int)
      val as = Seq(A(1,2))
      //partition explicitly written
      spark.createDataFrame(as).write.parquet("/data/a=1/")
      spark.read.parquet("/data/").collect
      

      The above code fails; stack trace attached.

      If an integer used, explicit partition discovery succeeds.

      case class A(a: Int, b: Int)
      val as = Seq(A(1,2))
      //partition explicitly written
      spark.createDataFrame(as).write.parquet("/data/a=1/")
      spark.read.parquet("/data/").collect
      

      The action succeeds. Additionally, if 'partitionBy' is used instead of explicit writes, partition discovery succeeds.

      Question: Is the first example a reasonable use case? PartitioningUtils seems to default to Integer types unless the partition value exceeds the integer type's length.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maropu Takeshi Yamamuro Assign to me
            richard.moorhead Richard Moorhead
            Votes:
            4 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment