Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7939

Make URL partition recognition return String by default for all partition column types and values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 1.5.0
    • SQL

    Description

      Imagine the following HDFS paths:

      /data/split=00
      /data/split=01
      ...
      /data/split=FF

      If I have less than or equal to 10 partitions (00, 01, ... 09), currently partition recognition will treat column 'split' as integer column.

      If I have more than 10 partitions, column 'split' will be recognized as String...

      This is very confusing. So I'm suggesting to treat partition columns as String by default, and allow user to specify types if needed.

      Another example is date:
      /data/date=2015-04-01 => 'date' is String
      /data/date=20150401 => 'date' is Int

      Jianshi

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            huangjs Jianshi Huang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: